The effect of segmentation on the performance of machine learning methods on the morphological classification of Friesien Holstein dairy cows

ABSTRACT


INTRODUCTION
Currently, in all countries in the world, the agricultural sector significantly contributes to the economic progress of other sectors.Since milk is the most in-demand livestock product for human survival and the quality of its production depends on the health conditions of the producing cow, it is necessary to improve livestock management and welfare standards, including for farmers [1].Choosing a mother cow to raise is one of cattle farmers' main problems.Cows kept without good selection have an impact on cows' milk production, with various methods for classification having been conducted before.
Many studies have been conducted in the past regarding cow udder disease, which is associated with cow milk production.Simple regression methods and image processing techniques using features such as chest circumference by using mobile applications [2], body length, height, shoulder height, using the udder length feature, udder width [3] and [4].Another feature used for calculating body measurement is live weight  ISSN: 2722-3221 Comput Sci Inf Technol, Vol. 4, No. 1, March 2023: 59-68 60 [5].However, this method of livestock classification still has many flaws, resulting in the mother cow's selection not being maximized.This classic classification method causes problems because it relies solely on the experience of an expert and cattle farmers, and the public still relies on it visually [6].
Many studies in images use the segmentation method to extract visual features in analyzing and evaluating animal health behaviour such as width, length, body posture and curvature [7], [8].Image analysis with segmentation is evident in recording the performance of individual cows and visual displays.However, given Slam's conventional algorithm that covers the region-based convolutional neural network (R-CNN) mask-based network, relying on segmentation, for example, the position of map points is the only dense or rarely located geometric point in space.
The potential of deep neural networks in feature learning [9], [10] has enabled significant advances in object detection, segmentation, and computer vision.In terms of the Faster R-CNN method for object detection [11], which is based on the R-CNN mask method from [12], made a significant contribution.Other studies, discussing preprocessing whose results can later be used in determining the weight of cows, have at this stage conducted edge detection-based segmentation of cow images using a combination of Canny algorithms with median blur and sharp operators.The Canny algorithm is an edge detection algorithm that produces the best edge detection when used in an image state that has a lot of noise.
Another method for cow selection based on machine learning methods is expected to be a solution, making selection tools smart, precise, and simple to use.Machine learning is a popular method of analyzing problems in both quantitative and qualitative data.Machine learning research is being done to predict nitrogen excretion [13], individual insemination of cows based on phenotype and genotype [14], the breeding value of Friesian Holstein cows during lactation [15], the characterization of prepartum behavior and the calving process in dairy cows [16], predicting the outcome of conception for future mating is helpful for producers [17], and identifying dairy cows based on their tailheads [18].
This research uses machine learning methods by maximizing the segmentation process to improve accuracy when the classification stage.The segmentation used is edge detection with the Canny algorithm and mask R-CNN.The goal is to find a suitable model that can tell how good a cow is based on how it looks.

METHOD
This section presents the methods and materials used in carrying out tasks in research.This section is practiced in several parts.Research has several stages such as preprocessing, modeling and evaluation.Each stage has several steps such as preprocessing using canny and mask R-CNN, modeling including four algorithms support vector machine (SVM), logistic regression, random forest and artificial neural network, as well as model evaluation using accuracy, precision, recall and f1-score.Figure 1 shows the research flow used.

Data acquisition
Friesian Holstein dairy data collection on Indonesia was taken from 2 farms, namely Cibugary East Jakarta at an altitude of 50 m above sea level and Pak Haji Acep's Kunak Bogor farm at a length of 190-330 m above sea level.Object 102: cows with two positions, the side and back.Shooting is done before milking, object retrieval distance is 2-2.5 meters, and image resolution is set to 3456×5184 pixels with a digital DSLR camera [19], [20].The total dataset of 102 cows consists of 5 side views and five rear views, so a total of 1020 images comprised of three categories, namely high, medium and low.
Comput Sci Inf Technol ISSN: 2722-3221  The effect of segmentation on the performance of machine learning methods … (Amril Mutoi Siregar) 61

Segmentations
First, the collected morphological images will improve quality by brightening, sharpening, blurring, and separating objects and backgrounds in cow images as segmentation steps to filter dairy cow image pixels.To do this, zero colour capture is set as a background image and then reduced to the cow's image, reducing the noise.Thus, only the back of the cow, not in the background image, is preserved.Such pixels have been removed from the picture of the cow, as this difference can be generated by small changes in the camera's size.Canny and mask R-CNN algorithm approaches are used for segmentation [10], [21].

Canny edge detection
Canny edge detection, developed by John F. Canny in 1986, is commonly used in a digital image to detect edges.Edge detection is an important step in the digital image processing process.[22], including the first steps for pattern recognition and segmentation [23].In this study, the edges of the dairy cow image were used to extract useful information from the imagery.Here are the edge detection steps.The Canny algorithm's final step is to compute hysteresis thresholds.If the pixel value of the previous processing result has a value greater than the upper threshold, then the pixel will be accepted as the edge of the image.If the pixel value of the previous process result has a lower value than the lower threshold, then the pixel was rejected or not considered the edge of the image.If the pixel value of the previous process has a value between the upper threshold and lower threshold, then this pixel will be accepted only if it is connected to a pixel whose value is greater than the upper threshold value [24].Figure 3 shows the flow of Canny's algorithm.

Mask R-CNN
The mask R-CNN model is an instance segmentation method that includes additional branches such as Figure 4 as the R-CNN mask architecture [12].mask R-CNN have two stages: the region proposal network (RPN) stage and the faster R-CNN [25].He second phase involves mask R-CNN outputting with binary masks for each region of interest in parallel for class, box and mask predictions.Mask R-CNN is different from R-CNN, which is faster because it has a classification level after segmentation.The instant segmentation method focuses on object segmentation in cow images [10].Enhanced mask R-CNN is proposed for instanced segmentation of dairy cow morphology and continued classification using machine learning with SVM, logistic regression, random forest, and artificial neural network algorithms.Figure 4 is a workflow for the mask R-CNN model.

Support vector machine (SVM)
SVM is a popular algorithm because it can handle continuous data and categories and is very good for regression and classification cases.First introduced in the 1960s and then presented in 1990.The performance of the SVM model is a representation of different classes in a multidimensional space with hyperplanes.To minimize errors, hyperplanes are needed to be generated iteratively.
SVM's task is to divide the dataset into classes to find the maximum marginal hyperplane (MMH).It is implemented with the kernel, namely changing the input data space into the desired form.The SVM technique is also called the "kernel trick", in which the kernel converts low-dimensional space into higherdimensional space.A kernel can turn a problem into multiple problems and put more dimensions into it, making SVM more accurate, robust and flexible.The following is the type of kernel used in this study radial base function (RBF) Kernel, widely used in classification, which can drain the input space into infinite dimensional space.

Random forest classifier
Random forest is one of the machine learning algorithms used for regression and classification [26].Classification problems have many algorithms, such as decision trees and more and more trees, which are often referred to as random forests, how the random forest algorithm works by making tree decisions on sample data, then getting predictions from each and finally choosing the best solution through voting.This method is well-known with an ensemble as having better performance than a single decision tree because it can reduce overfitting by averaging the results in stages.The random forest algorithm has the following steps.a. Selection of a random sample from a collection of datasets.b.Build a decision tree for each sample.Then the prediction results from each decision tree.c.Vote for each prediction result.d.Choose the prediction result that is most chosen as the final prediction result.

Logistic regression classifier
The logistic regression algorithm is widely used for various classification problems because it is known to be efficient and straightforward [27].The logistic regression algorithm can be applied directly to pixels after the preprocessing stage.The output of the logistics regression algorithm is an estimate of the The effect of segmentation on the performance of machine learning methods … (Amril Mutoi Siregar) 63 possible pixels and the input of a particular pixel value.A very large lambda value will further add to the weight of the process, and if it is too much, it will cause it not to fit correctly.So, it is essential to how the lambda value is chosen appropriately.The technique of using lambda works very well to reduce the problem of over-fitting.

Artificial neural network
The neural network algorithm has two stages in the first training of data input as training data at the input layer, from layer to layer onwards will be proportional input pattern, and output pattern will be obtained from the output layer, if the input and output patterns are different aliases do not match the expected, the error is calculated and reprobated through the output layer network Back to the input layer, weight modified along with the backpropagation process [28].Activation functions and learning algorithms will be carried out weight changes are carried out, and the backpropagation neural network is a complex layer that compounds can have 3 and so on layers, and are fully connected.Each nerve layer is connected to the nearest layer.Train the artificial neural network method with training data in the form of record records on tables and vectors that add weight until there is no change, achieved convergent conditions.Inputs in the form of several layers, number of neurons, learning rate, training data set, and epsilon.Output in the form of neural network methods for the classification of data or new vectors.Here are the steps of the backpropagation algorithm.

Evaluation
The study used confusion matrix terminology to evaluate Machine learning models for classification [29].True Positive (TP): Object class detected correctly between prediction and actual.False Positive (FP): The detected class of objects is wrong between prediction and actual.False Negative (FN): Class object in a particular position and not detected by the model.True Negative (TN): There is no object class in a particular position, and the model cannot detect that object class.Using complexity matrix terminology, the following metrics are calculated.

RESULTS AND DISCUSSION
The results of the study showed after testing segmentation methods and machine learning methods, first testing canny algorithm models with SVM, logistic regression, random forest, and artificial neural network algorithms, secondly conducting mask R-CNN tests with SVM, logistic regression, random forest, and artificial neural network algorithms.Two kinds of tests are carried out by dividing training and testing data with ratios of 90:10 and 80:20.The evaluation using accuracy = Acc, Precision = Prec, recall = Rec, F1score = F1.
This section involves describing the results obtained from the research and drawing similarities and differences between the research and previous others from methods, data, and results.However, describe whether the problems have been researched successfully according to the objectives using the proposed methods.This should involve the description of the analysis conducted, cause and benchmark of success/failure, and the unfinished part of the research followed with the steps to be taken as follow up process.
Figure 5 shows the results of the SVM model with Canny segmentation and mask R-CNN achieving the highest accuracy of 82.52%, precision of 90.32%, recall of 80.49%, and F1-score of 82.44% with mask R-CNN, for the highest test ratio using 90:10.Figure 6 shows random forest model results with Canny segmentation and mask R-CNN achieving the highest accuracy of 84.39%, precision 88.46%, recall 80.47%, and F1-score 83.00% with mask R-CNN, for the highest test ratio using 80:20.
Figure 7 shows the results of the logistic regression model with Canny segmentation and mask R-CNN achieving the highest accuracy of 82.52%, recall 81.42%, and F1-score 82.18% with mask R-CNN while precision 84.43% with Canny algorithm, for the highest accuracy mask R-CNN using 90:10 ratio testing, and precision with a ratio of 80:20.Figure 8 shows the results of artificial neural network models with Canny and mask R-CNN segmentation achieving the highest accuracy of 82.52%, 87.00% precision, 79.00% recall, and 81.62% F1-score using the Canny algorithm for the highest test ratio using 90:10.The best algorithm performance is to recall 82.44% using the logistic regression algorithm with mask R-CNN segmentation while artificial neural network and logistic regression use Canny segmentation with 79.00% precision.The best algorithm performance is for F1-Score 82.52% using the SVM algorithm with mask R-CNN segmentation while artificial neural network uses Canny segmentation with 81.62% precision.
Figure 9. Machine learning model results with a test ratio of 90:10 Figure 10 the result of machine learning models with SVM algorithms, logistic regression, random forest, and artificial neural network with a test ratio of 80:20.The best algorithm performance is for 84.35% accuracy using random forest algorithm with mask R-CNN segmentation while logistic regression forest uses Canny segmentation accuracy of 82.44%.The best algorithm performance is for a precision of 88.46% using random forest algorithms with mask R-CNN segmentation while SVM uses Canny segmentation with 87.11% precision.The best algorithm performance is to recall 80.47% using random forest algorithms with mask R-CNN segmentation while logistic regression uses Canny segmentation with 78.81% precision.The best algorithm performance is for F1-Score 83.00% using random forest algorithm with mask R-CNN segmentation while random forest uses Canny segmentation with 75.79% precision.The research in this paper discusses improving the accuracy of models with the approach of machine learning methods with the segmentation of the morphology of dairy cows.Figure 2 an example of the results of segmentation of the Canny algorithm with the mask R-CNN, the purpose of which is to improve the accuracy of the model when classification is carried out with machine learning methods such as SVM, logistic regression, random forests, and artificial neural network.Segmentation results can remove noise from the background of the image of a dairy cow.
Figure 5 explains that the SVM algorithm with a training ratio of 90:10 with the highest accuracy of 82.52% uses mask R-CNN which can outperform other algorithms.Figure 6 explains that the random forest algorithm with a training ratio of 80:20 with the highest accuracy of 84.39% uses mask R-CNN which can outperform the Canny algorithm.
Figure 7 explains that the logistic regression algorithm with a training ratio of 90:10 with the highest accuracy of 82.52% uses mask R-CNN which can outperform other algorithms.Figure 8 explains that the Artificial neural network algorithm with a training ratio of 90:10 with the highest accuracy of 82.52% uses Canny which can outperform the mask R-CNN algorithm.The best machine learning method used is 84.39% accuracy with random forest algorithms.While SVM, random forest, and artificial neural network get the same accuracy of 82.52%, artificial neural network uses canny algorithms.

CONCLUSION
Based on the results of research conducted using several machine learning algorithms and segmentation can be concluded.The mask R-CNN is better than the Canny algorithm influencing improved accuracy, precision, recall, and F1-Score.The highest accuracy reached 84.39% with the random forest algorithm with a test ratio of 90:10.Segmentation uses the highest Canny on the artificial neural network algorithm so it can be summed up with Canny appropriate for the artificial neural network algorithm.On the matching test, the ratio uses a 90:10 ratio on the dataset used.Future work can use deep learning methods to improve model accuracy.

Figure 1 .
Figure 1.Proposed methods Figure 2 illustration of canny algorithm and mask R-CNN segmentation results.

Figure 2 .
Figure 2. Illustration of canny and mask R-CNN segmentation results

Figure 5 .Figure 6 .Figure 7 .
Figure 5. SVM algorithm testing results Figure9the result of a machine learning model with SVM algorithms, random forest, logistic regression, and artificial neural network with a test ratio of 90:10.The best algorithm performance is for 82.52% accuracy with SVM, logistic regression, and random forest algorithms with mask R-CNN segmentation while artificial neural networks use Canny segmentation.The best algorithm performance is for Precision of 90.32% with mask R-CNN segmentation SVM algorithm while artificial neural network uses Canny segmentation with 87.00% precision.