GENETICS AND BREEDING Detection of Bovine Somatotropin Treatment in Dairy Cattle Performance Records HERMANN H. SWALVE1 Institute of Animal Breeding and Genetics University of Gattingen D~ Gottingen. Germany ABSTRACT

Effectiveness of cluster analysis in detecting application of bST was examined. Field data were manipulated by adding a specified percentage of the true performance to original test day records to simulate application of bST. The partly manipulated data then were analyzed using cluster analysis. Test day milk production data came from 42,779 cows of the Bretagne (Northwestern France) that had test days between 1986 and 1989. As criteria in the cluster analysis for differentiation between treated and untreated cows, parameters of the incomplete gamma function along with other variables calculated from test day records were used. The best differentiation was achieved when a persistency parameter, defined as the ratio of second divided by first trimester production, was used as a variable in the cluster analysis. For the assumed scenario of bST application, more than 80% of all cows were classified correctly under random use of bST. Systematic treatment led to improved results. (Key words: cluster analysis, bST treatment, lactation curve) INTRODUCTION

Preferential treatment can be defined in general as a hidden environmental effect that can not be accounted for in the model of analysis.

Received May 21, 1990. tccepted October 31, 1990. This work was done while the author was on leave at the ~tation de G60etique Quantitative et Appliqu6e, Institut National de la Recherche Agronomique-Centre de Recherches de louy-en-losas, 78350 louy-en-losas. France. Funding was provided by the German Research Foundation (Deutsche Forschungsgemeinschaft). 1991 1 Dairy Sci 74:1690-1699

Additionally, most often genotypes are not randomly assigned to these hidden environmental groups. The situation is analogous to the case in which an effect known to influence the data to be analyzed is dropped from the model, so that the model is incomplete. Often, computational ease is a reason for doing so. Another reason for working with an incomplete model can be insufficient reporting schemes that cause the data to be incomplete. Examples in a dairy cattle situation include keeping and feeding certain cows differently, using the knowledge of the next test date for a "preparation" of certain cows or the application of substances that promote performance. Recently, the application of bST has been discussed intensely. According to Bauman et al. (2), for example, lactation yield can be increased by up to 40% when bST is injected daily, and current long-term administration procedures (i.e., every 14 d) have been less effective, increasing yield only 10% (3). Thus, daily injection may be favored by farmers willing to adopt this new technology. Because the use of bST most likely will be limited to certain cows, it will be a perfect instrument for preferential treatment. Studies that analyzed the effect of bST on dairy cattle breeding programs most often have applied simulation and have focused on parameters such as accuracy of breeding value evaluation and genetic gain (5, 6, 14, 16). The results of these studies were that the use of bST will have small but significant effects on accuracy of sire evaluation. Cow evaluations will be affected more, especially if cows are treated systematically, i.e., according to expected yield. As pointed out by Burnside (4) and Simianer and Wollny (16), reporting use of bST would facilitate a drastic reduction in the bias on evaluation procedures due to the effect of bST. Other possible solutions to overcome the problem would be to restrict breeding activities to small parts of the population, e.g., to contract herds for sire testing schemes or multiple ovulation and embryo transfer schemes (4).

1690

DETECTION OF SOMATOTROPIN TREATMENT

Research has been scarce on detection of preferential treatment typified by the application of bST if its use is not reported. SchulteCoerne et al. (15) derived discriminant functions for differentiating between treated and untreated cows from a bST trial in an experimental herd. Untreated cows could be detected with relatively high accuracy, whereas detection of treated cows seemed to be more difficult. Also, it was not clear whether the functions obtained from a trial would be valid for field data. A more general recommendation for ways to differentiate between treated and untreated cows proposes comparing lactation curves (8, 11). The objective of the present study was to examine the possibility of comparing lactation curves to distinguish between preferentially treated (receiving bST) and other cows. For this purpose, a method intermediate between analyzing real data and simulation was chosen. The method consisted of manipulation of field data in order to mimic preferential treatment. These partly manipulated data then were subject to subsequent examinations using cluster analysis. MATERIALS AND METHODS

Data came from cows in the Bretagne (Northwestern France) that had test days October 1986 through March 1989. After edits on reasonable dates and production, four data sets were formed. A small test data set (data set 1) consisted of 1562 first lactation cows calving between October 1986 and March 1987. This data set was used to test the method to be developed with minimum computing time. A main data set (data set 2) comprised 42,779 all lactation cows in herds with more than 10 cows and tested September 1987 through March 1989. These time constraints allowed for complete lactations without having a single cow twice in the data set. A subset of the main data set (data set 3) included only herds with more than 20 all lactation cows. This data set comprised 17,444 cows, whereas a further subset (data set 4) only included cows in first lactation (1977 cows) in herds with more than 10 first parity cows. All lactations were required to include at least 7 test days. Usual difference between two consecutive test days was 35 d; average number of test days per lactation was 8.6.

1691

Working with lactation curves or at least with test day records instead of lactation yields seems to be reasonable because the access to the source of the preferential treatment is more direct. Additionally, much work has been done on modeling of lactation curves; for a thorough review, see Masselin et al. (12). Latest research (7, 9) even suggests multiphasic functions or nonparametric lactation curves. Presumably best known is the model proposed by Wood (18): Y = atbe-ct where Y is the yield at test day; t is the time, i.e., days in milk (originally measured in weeks by Wood); e is the exponential function; and a, b, and c are the three parameters to be estimated. Estimation is easily done if Wood's equation is transformed to a linear model on a logarithmic scale as 1o(Y) =lo(a) + b lo(t) - ct Detection of preferential treatment in practice has to be done using field data without knowledge of which cows are likely to be treated differently. The method should work without requiring any input parameters. In contrast to discriminant analysis, in which discriminant functions are sought and prior knowledge of the membership of an observation in a certain group is required, cluster analysis does not require prior information. Cluster analysis is a method of assigning observations (cases) to groups by comparing criteria from each observation with those of other observations. For the problem of differentiating between cows, a single cow is taken as an observation for which different variables are recorded, such as test day milk yield. More specifically than the broad definition of cluster analysis given, different submethods can be identified (10): disjoint clustering (independent groups), hierarchical clustering (some clusters are subclusters of larger ones), overlapping clusters (some observations are members of more than one cluster), and fuzzy clusters (for each cluster and each observation a probability of membership is computed). Within each of these submethods, various algorithms are available and can be found in textbooks (1, 10). For the present study, the method of disjoint clustering using the k means algorithm was chosen because it was intended to form just two groups, i.e., treated and untreated cows. The method of disjoint clusters Journal of Daily Science Vol. 74, No.5, 1991

1692

SWALVE

combined with the k means algorithm seemed to be highly suitable because it can easily be tailored to the requirement of two disjoint groups. The k means algorithm is an iterative procedure (10): Initial clusters are assigned; if no prior knowledge is available, clusters are formed at random. This is the initial partition. Means are computed for the initial clusters for all variables and all observations within a cluster. The error of partition is computed as the sum of squares of deviations of observations from their cluster means. The deviations are Euclidean distances and are defined as n

EDik =

(L

(Aij -

Bkj)2).5

j=1

EDik is the Euclidean distance between cluster i and observation k, n is the number of j variables, and Aij and Bkj are the values of variable j of a cluster mean Ai and an observation Bk. For j > I, the Euclidean distances have either to be weighted or all variables at all observations have to be prestandardized to avoid scale effects. For each observation, the increase in the error of partition is checked if this observation is moved to another cluster. If the increase in error is negative, the observation is moved. Cluster means and error of partition are corrected. This check and correction is continued as long as there are any changes. Convergence is guaranteed. but may be slow in large data sets. Convergence does not depend on the initialization of the clusters in the initial partition. Any prior knowledge, or at least a good guess, can be used to initialize the partition. A good initialization can reduce the computing time within the first rounds of iteration because fewer observations are moved and less updating is required. The total number of rounds may not necessarily be reduced. The algorithm applied includes the constraint of resulting in two clusters. As a test, normally distributed data were simulated. The results of the clustering were two clusters of perfectly equal size. However, 99% of the 0bservations were in one cluster when data following an extremely skewed distribution were used. From this, it can be concluded that the method is accurate in detecting deviations from the normal distribution. Up to now, only data from small-scale bST trials have been available for an analysis of Journal of Dairy Science Vol. 74. No.5. 1991

bST-like preferential treatment. For a largescale investigation. simulation would be the method of choice as it was applied in the studies on the effect of bST on breeding programs previously mentioned. However, simulation was considered inappropriate because test day records are known to have high variation among and within cows and because important effects like diseases can only be included in a model with great difficulty. Therefore, manipulation of real data in order to mimic preferential treatment was chosen. Manipulation consisted of multiplying some of a cow's performance records, Le., test day milk yields, by a factor greater than 1.0. The computer program to carry out this manipulation had three input parameters. For the choice of the input parameters controlling the manipulation, it was assumed that not all cows were treated, that various response rates to the bST treatment had to be looked at, and that cows would not be treated immediately from the start of the lactation. The latter assumption is in agreement with bST trials that started on d 84 (2) or on d 60 (3) and on assumptions by Burnside (4) and Colleau (6), with bST-free periods at the beginning of the lactation of 120 and 60 d. Cows to be manipulated were either chosen at random or selected within herd according to true lactation production. After manipulation of the data, a cluster analysis as described was applied, ignoring the knowledge of which cow's records were manipulated. Results were calculated from comparison of true and assigned status of a cow, treated or untreated. In choosing the variables (criteria) to be used in the cluster analysis, we used criteria derived from Wood's model, the incomplete gamma function, because this curve still has been found to be highly effective in comparison with other models (13). The criteria are summarized in Table 1. The first criterion represents the parameters of Wood's curve. Criteria 2 to 5 are based on the assumption that for a bST-treated cow the Wood curve should not give a good fit to the observed curve if treatment does not start immediately at the beginning of lactation. Criterion 6 is the attempt to detect the jump of the lactation curve at the start of the treatment. The rest of the criteria are measurements of the persistency of lactation. Criteria 9 and 10 were derived from work done by Solkner and Fuchs

DlITEC'110N OF SOMA1'O'IROPIN TREATMENT

1693

TABLE 1. Criteria calculated from test day results of each cow, used as variables in subscquen1 cluster malyses. Shott name

1. a, b, c (Wood) 2. 1: (Y - ~21ot 3. 1: (Y - ~Int 4. MAX ABS (Y - ~ 5. MAX (Y - ~ 6. MAX % (Y. - Yj--l) 7. P31, P21,

Ph

8.

Pw

9. SD (test day) 10. MeaD/maximum

Estimated panmeters of the fuDction Y = alIe-

Detection of bovine somatotropin treatment in dairy cattle performance records.

Effectiveness of cluster analysis in detecting application of bST was examined. Field data were manipulated by adding a specified percentage of the tr...
895KB Sizes 0 Downloads 0 Views