U S I N G S I M I L A R I T Y M E A S U R E S IN B E N T H I C IMPACT ASSESSMENTS THOMAS HRUBY

Massachusetts Audubon Society, 159 Main Street, Gloucester, M A 01930, U.S.A.

(Received November 20, 1985) Abstract. Eleven similarity measures were used to assess the impacts of clam digging on the infauna of

one intertidal mud-flat. When the similarity matrices were clustered using a divisive polythetic algorithm two different conclusions were possible from the dendograms produced. Dendograms generated by 3 coefficients indicated digging had an impact on areas dug more than once, while the 8 others did not. In addition, existing data from an impact analysis of a power plant effluent were re-analyzed using 5 different coefficients. In this case, three of the five coefficients gave results which indicated that the benthic community at the effluent pipe was different from that present at the control. Two coefficients gave the opposite conclusion - the populations were not different. These results indicate that the objectivity of cluster analysis, as it has come to be used in impact studies, is only apparent, not real. Too many subjective choices are made in selecting algorithms and methods of interpretation. Objective criteria need to be developed for the choices made, based on the intrinsic and ecological properties of the quantitative methods. Some criteria that have been developed are discussed, and used to determine which coefficients are best suited for the two data sets analyzed. To reduce the number of subjective choices made an analysis of variance of the similarity matrix is presented as an alternative to clustering.

1. Introduction

Similarity measures were first developed by terrestrial botanists to compare vegetation patterns in different locations (reviewed by Goodall, 1970; Lambert, 1972). Their use, however, has now spread to many kinds of community studies. Today similarity is one of the structural variables, along with comparisons of diversity and trophic levels commonly used for comparing biological communities. In analyses of similarity an algorithm, or coefficient, is used to calculate a value for the similarity between two samples based on the taxa present in each. If more than two samples are collected, the similarities between all, taken two at a time, are calculated and the results presented in a matrix. With the advent of computers this matrix is now further processed by subjecting it to multivariate analysis, clustering, or other quantitative analyses (for an introduction to such methods see Clifford and Stephenson, 1975; and Sokal and Sneath, 1963). In studies of biological communities, similarity measures have been used to correlate patterns in species distributions with environmental or biological gradients. The method has proved to be very useful in generating hypotheses from complex bodies of biotic and environmental data which can then be tested by lab or field experiments (Green, 1980). More recently, however, similarity analyses have been recommended for evaluating the environmental impacts of human activities in the marine environment (Boesch, 1977) and their use in this field is spreading. Environmental Monitoring and Assessment 8 (1987) 163-180. 9 1987 by D. Reidel Publishing Company.

164

T H O M A S HRUBY

A diverse and confusing array of similarity measures, or coefficients, has been developed since the first one was proposed by J accard (1902). Thirty-two coefficients and their variants were found in a brief examination of the literature (lists of coefficients can be found in Clifford and Stephenson, 1975; Boesch, 1977; Hadju, 1981; Wolda, 1981). The intrinsic mathematical properties of these different coeffients are not all the same and the results they generate may be different (Bloom, 1981; Wolda, 1981). The problem of choosing a coefficient whose properties are suited to the data, and that reflects recognizable changes in populations, has been recognized and is beginning to be addressed by investigators using similarity analyses in studies of community patterns (e.g., Field et al., 1982). In environmental impact studies, however, similarity coefficients are more often chosen because specific computer programs happen to be available, rather than because they are appropriate for the data. Few investigators have attempted to justify their choice on the basis of numerical or ecological suitability. An assumption is made, but hardly ever stated in such studies, that the similarity measure used will provide a quantitative assessment of actual changes taking place in the environment. In impact assessments, the similarity matrices generated from the data are usually interpreted by one of the several methods of clustering, a type of classification analysis (see Clifford and Stephenson, 1975), to determine if the samples collected at an impact site are different from those in a control. This approach, however, introduces three subjective choices in testing the hypotheses: (1) which of the many clustering methods to use (e.g. divisive monothetic, divisive polythetic, agglomerative polythetic, etc." for definitions of these different mathematical approaches see Clifford and Stephenson, 1975); (2)which measure of intercluster distance to use (nearest neighbor, group average, centroid, etc.); and (3) at what level of the cluster diagram will the groups formed be considered significantly different from each other. The purpose of the present study is to evaluate the extent to which the subjective choices made in choosing coefficients in a similarity analysis can affect the conclusions in benthic impact assessments. The evaluation resulted from an attempt to assess the impacts of clam digging on the infauna of an intertidal flat in Gloucester, Massachusetts. A similarity analysis using clustering was initially chosen because this method is increasingly being used in impact studies, and because it has been successfully used in benthic studies of pollution (Gray and Pearson, 1982) and of benthic population patterns (Field et al., 1982). Since there seems to be no clear concensus regarding the appropriate coefficient to use, a number of different ones commonly used in benthic studies were tried (Bray-Curtis, Canberra, Euclidian Distance Squared). The initial analysis was attempted directly from the cluster diagrams, but this resulted in conflicting conclusions. The use of additional coefficients made the conclusions more, rather than less, confusing. To determine whether the problem of conflicting conclusions was a result of anomalous data, data from another benthic impact study (Taxon Inc., 1982) were analyzed using the same methods. To improve the objectivity of the impact assessments criteria also were

USING SIMILARITY MEASURES IN B E N T H I C I M P A C T ASSESSMENTS

165

sought to identify the coefficients best suited to the analysis, and an alternative method using an analysis of variance on the similarity matrix was tried. 2. Methods To assess the impacts of clam digging on one intertidal flat, experimental plots were dug by hand on a 35 m E section of bottom dominated by the soft-shell clam, Mya arenaria. The flat is on a tidal inlet in Gloucester, Massachusetts, known as the Mill River. It contained harvestable numbers of M. arenaria, but was previously undisturbed by clammers because leakage from residential waste systems had forced closure of the flats in 1941 (Hruby, 1981). Using the method of local clammers, sediments on the flat were turned up with a clam fork from a depth of 25-30 cm, and then deposited into the hole left from the previous excavation. Separate 1 x 5 m plots were dug at different rates (once, monthly, and weekly beginning on May 29, 1981) to approximate the rates at which flats are dug locally by commercial and recreational clammers. The three " d u g " plots were separated by two 2 x 5 m plots of undisturbed sediment which were used as controls. The experimental area (7 x 5 m) was on a level part of the fiat at a height of 0.8 ft above Mean Low Water, as measured by noting the time of submergence relative to low water. The experiment described here, done in 1981, was one of several similar experiments carried out between 1980 and 1983 assessing the impacts of digging on 6 different flats throughout the Annisquam River in Gloucester (manuscript in preparation). Infaunal organisms were sampled by taking replicate cores from the sediment using a 2-lb. coffee can (area=0.0128 m E, depth=20 cm). The samples were sieved through a 0.5 mm mesh sieve and organisms retained were collected and preserved in 5~ buffered formalin in sea water. Microscopic identification was to the species level where possible using the nomenclature of Gosner (1971, 1978). Five cores, located by throwing a marker, were taken from each of the 5 plots on September 21 and 22, one month after the last experimental disturbance on the plot dug weekly. Two of the 25 samples were damaged during sieving and these were omitted from the analysis. A species-area curve plotted from preliminary cores taken near the plots in May, 1981, indicated that 4 samples were adequate to characterize the species number on this flat. The species-area curve (Cain, 1938) was used because of the simplicity of its application. Although there are problems with the use of species-area curves in heterogenous communities (Kershaw, 1973), the community sampled was considered to be homogeneous. It had a low diversity and the sampling was limited to a small area of 1/4 ha. No initial control samples were taken because the method was destructive and would have created additional disturbances in the plots, especially from trampling. Comparisons between samples were made using the 11 coefficients listed in the Appendix A. Two coefficients use only data on species present (Jaccard and Sorensen), and nine use data on both species present and their abundances. Of the nine using abundance data, four also included logarithmic transformations of these

166

T H O M A S HRUB Y

values. Five organisms that could not be identified to species were listed by genus (all Clymenella sp.) and included in the analysis under the assumption that all were the same species. To test the effects of this assumption on the overall conclusions, the matrices from two coefficients (Bray-Curtis and Percent with transformations) were re-calculated assuming that each unidentified specimen in the genus belonged to a different species. In addition to the data collected by the author, five coefficients (Bray-Curtis with logarithmic transformation, Percent with and without transformations, Euclidian Distance Squared, and Sorensen) were used to generate similarity matrices for data collected by Taxon Inc. and presented in Appendix 3 of their report (Taxon Inc., 1982). In the Taxon study, five quadrats of 0.01089 m 2 were sampled at each of three locations in August 1981, near the Pilgrim nuclear power station in Plymouth, Ma. One location, called 'Effluent', was near the cooling water discharge pipe; one, called 'Rocky Point', was 0.25 nautical miles away; and the control, called 'Manomet Point', was 2 nautical miles from the discharge pipe. The two-dimensional similarity matrices generated by each coefficient were reduced to one dimension by cluster analysis. The clustering algorithm used was divisive (progressively splitting samples into smaller and smaller groups), polythetic (based on the mean similarities applied over all samples in a group), and was based on maximizing the difference in mean similarities between the two sub-clusters formed at each level (a group averaging clustering method). Although not commonly used because of long computation times, a divisive polythetic algorithm was chosen because this method had the least chance for mis-classification in hierarchical clustering (Williams, 1971). Linkages were made by group-averages because this method has been satisfactorily used in a variety of ecological studies (Clifford and Stephenson, 1975). The similarity and cluster analyses were done on a Apple II + and DEC Rainbow microcomputers using programs written in BASIC. Copies of the programs are available from the author on request. In addition to the cluster analysis usually performed on similarity matrices, the matrices themselves were analyzed statistically. In both studies the samples from each plot, or location, were considered as replicates because both treatments and sites were selected on an a priori basis. The matrices were divided into groups of values which contained the similarities between samples within a plot, and those of the similarities between samples from different plots. The values calculated for the between plots and within plot similarities were then used in an analysis of variance (ANOVA, Sokal and Rohlf, 1969), to test the hypothesis that the mean of the similarities between samples from two plots and the two respective means of within plot similarities did not differ statistically. Because similarity values are constrained between 0 and 1, and thus not necessarily normally distributed, the proportions were transformed to Arcsin values before doing the analysis of variance as recommended by Sokal and Rohlf (1969). If the analysis indicated significant differences, the Student's t Test was used to determine if the significance could be attributed to a lower mean similarity between plots. The

167

USING SIMILARITY MEASURES IN BENTHIC IMPACT ASSESSMENTS

Student's t Test was used rather than one of the commonly used a posteriori tests such as Scheffe's, Duncan's, or Tukey's because the comparisons are planned, or a priori (Sokal and Rohlf, 1969; p. 220). The only comparison in the analysis of variance that is meaningful in the impact assessment is that of the mean similarity between two plots with the mean similarity of the replicates within plots. The hypothesis being tested, and which is planned before the analysis, is: the mean between plots = means within plots, and the alternate hypothesis is" the mean between plots 0 . 1 . If part o f the significant difference in the a n a l y s i s c o u l d b e a t t r i b u t e d to a s i g n i f i c a n t l y l o w e r b e t w e e n - t r e a t m e n t s i m i l a r i t y (p < 0.025) the s y m b o l s are b r a c k e t e d . (Log) i n d i c a t e d the a b u n d a n c e v a l u e s were t r a n s f o r m e d in the coefficient. Treatments Coefficient

Control and Dug(xl)

Control and Dug(x4)

Control and Dug (x14)

D u g ( x 1) and Dug(•

Jaccard Sorensen Bray-Curtis B r a y - C u r t i s (log) Percent P e r c e n t (log) Canberra C a n b e r r a (log) Euclidian Distance 2 Euclidian D i s t a n c e 2 (log) Modified Morisita

- - - ---.

[+ + ] [+ + ] [+ + ] [+ + ] [ +] [++] [ +] .

[+ + [+ + [+ + [+ + [++] [++] [++] .

. [+ [+ . . [

.

.

l ] ] ]

- -

+ +

[

- -

[

[+ + ] [ 4- 4- ]

4- ] -

4-]

D u g ( x 1) and Dug(xl4) .

. .

. . +] +] . . . . . +] . . .

. . . [ 4- ]

.

. . . . [+ + ] [+ + ] . . [

.

D u g ( x 4) and Dug(x14) . - - [ +] --

+]

.

.

.

+ .

.

.

.

168

THOMAS

HRUBY

,.,



o~ B0

u

--t

O

"--~ •

O

~•

I

C)

i,

l [

i

f--~

.o~ w

O

O

~

O

o

9 ,

=______ ~

9

O

E

o

~

9

O

,.o'~

O

USING

SIMILARITY

MEASURES

IN BENTHIC

IMPACT

169

ASSESSMENTS

0

t

'1

o

.q

ti ,o

g

o

S~

%

7-,

3

'0 .

o r~ t-' O

170

THOMAS HRUBY

on the control plots. The species found in the plots and their abundances are listed in Appendix B. When the similarity values were clustered, each coefficient generated a different diagram as shown in Figure 1. Only three coefficients (Canberra, Bray-Curtis, Bray-Curtis with logarithmic transformation) generated diagrams where some of the clusters could be identified with specific treatments. The clusters based on the matrices of the eight other coefficients were not as dear. In the case of five coefficients, samples from the most intensively dug plot were clustered with the control samples. The analyses of variance done on the similarity matrices are summarized in Table I. Significant differences for the ANOVA are bracketed when the between plot similarity was also significantly lower than the within plot similarities. Although, as in the cluster analysis, there were differences among the coefficients, the results for all coefficients were consistent in one respect: there was no statistical difference in the s~tmples from the control and the plot dug once. Also, all coefficients except the Canberra with a logarithmic transformation, indicated a significant difference existed between control samples and those on the plot dug weekty~? 3.2. ANALYSISOF TAXON INC. DATA The original analysis done by Taxon Inc. was based on Euclidian Distance as the similarity coefficient, and the clustering was done using an unweighed pair, group agglomerative algorithm (clustering by progressive fusion of samples) based on arithmetic averages. The results presented by Taxon Inc. indicated that the samples from the three locations separated into individual clusters with those from the Effluent site being the least similar to those at the other two sites. When, however, their data were re-analyzed using 5 additional similarity coefficients and a divisive polythetic algorithm the corresponding dendograms were not as clear (Figure 2). Three coefficients (Bray-Curis and Percent with transfor-

TABLE II Analyses of variance of the similarity matrices used to analyse data from the Taxon, Inc. study. The hypothesis tested and probabilities represented are the same as Table I. Coefficient

Effluent site Effluent site and and Rocky Point site Manomet Point site

Rocky Point site and Manomet Point site

Bray-Curtis (log) Percent (log) Percent Euclidian distance squared Jaccard

[+ [+ [+ [+ [+

[+ + l [+ + ] [+ +] [+ +]

+l +] +] +] +]

[+ + ] [+ + ] [+ +] + [+ +]

,

Fig. 2.

I

I

1.O

.8

I

EUCLIDIAN

DISTANCE 2

00000z~A~OIJO0~

i

1.O

O0000,~x~A~[]ODD[]

BRAY - C U R T I S l logP

0~00~ ~Oz~D0000

1

PERCENT

Cluster diagrams generated by 5 similarity coefficients for the data from the Taxon Inc. study. Similarity values are on the vertical axes, and the samples on the horizontal axes are represented as follows: ( 0 ) Effulent Site; (/x) Rocky Point Site; (F1) Manomet Point Site.

1.0

]_

I

S~RENSEN

1.C

'

PERCENT Ilogl

172

THOMAS HRUBY

mations, and Sorensen) generated dendograms very similar to the one presented in the report, but the other two did not. In the latter cases the Effluent site samples were clustered with those from other areas. As in the clam study, the results from the analysis of variance were more consistent than the dendograms. As shown in Table II the statistical tests done on the matrices from all five coefficients indicated significant differences between the Effluent site and the other two. 4. Discussion

In the study of clam digging, the cluster analyses using different similarity coefficients provide two conflicting conclusions. On one hand, it is possible to conclude that digging did cause changes in the infaunal population based on dendograms produced using the Bray-Curtis, Bray-Curtis with transformation, and Canberra coefficients. These dendograms show distinct separations between two sets of samples - those taken from the plots dug 4 and 14 times and those from the control and the plot dug once. In each case, no more than 3 of the 23 samples fall outside their predetermined grouping. The other 8 dendograms, however, had many samples from the intensively dug plots clustered with samples from the controls at high levels of similarity. These is no evidence from these dendograms, therefore, to suggest that the infaunal population changed on the flat even after the most intensive digging. The individual samples collected in the clam digging experiment had very low species numbers and abundances, and this may have influenced the disparate results obtained using the similarity coefficients. For this reason the data from Taxon Inc., which had individual samples with high species numbers and abundances, were analyzed. Different conclusions, however, were again possible. Based on the BrayCurtis, Percent (both with transformation), and Sorensen coefficients the conclusions would be similar to those in the original report - the Effluent Site was different from the other two sites and was linked to them at lower similarities. Such conclusions, however, could not have been made if the analysis were based on the Percent without transformation and the Euclidian Distance Squared coefficients. In these latter analyses the samples from the Effluent Site were interspersed among several clusters containing samples from the other two sites. Although the Euclidian Distance measure used in the Taxon Inc. study, and the Euclidian Distance Squared used here are mathematically related, the results were different. The differences may have been caused by the different clustering methods used. In the agglomerative method used by Taxon Inc. samples were clustered into groups with the samples with the highest similarity being grouped first. The divisive method, on the other hand, starts with all the samples in one large cluster and splits apart the two groups of samples which are the most dissimilar. The results using these two methods can be quite different (Clifford and Stephenson, 1975). In both studies the conclusions are based on the assumption that clusters can be identified with specific treatments and that they reflect real changes in populations. In the clam study, it could be argued that the dendograms showing a separation

USING SIMILARITY MEASURES IN BENTH [ C I M P A C T ASSESSMENTS

173

between the control and the plots dug 4 and 14 times were valid because 'large' reductions in species diversity were observed on the experimental plots (see Appendix B). This conclusion, however, is based on two qualitative criteria: the appearance of the dendograms and the aparent decrease in species diversity. Clustering methods, as they have come to be used in ecological studies, do not test the significance of the groups that are formed, nor relate them to the actual samples. In the case where cluster analyses are used to describe patterns in populations and sites, and to generate hypotheses for further testing, the problem of choices in coefficients and 'stopping rules' for the dendograms are not critical. Moreover, using several different coefficients may provide a better understanding of population patterns by suggesting different hypotheses (Green, 1980). In impact studies, however, one is attempting to test hypotheses rather than generate them. The evaluation presented in this report suggests that cluster analyses, as they have come to used today, are not suitable for impact assessments. The objectivity of cluster analyses is only apparent, not real, since subjective choices are made by investigators at three levels in the analysis: (1) in the choice of a coefficient, (2) in the choice of clustering algorithm (a problem not dealt with in this analysis, but discussed in Clifford and Stephenson, 1975), and (3) in the choice of the level at which clusters are either accepted or rejected. Although clustering does not seem to be suited for impact assessments, the original similarity analyses on which clustering is based should not be eliminated from the methods available to us. Rather, ways should be found to reduce the subjective choices made in selecting the algorithms, and criteria should be developed for choosing a coefficient on the basis of its intrinsic and ecological properties. Similarity analysis have proved to be powerful tools in developing our understanding of community ecology because they can extract relationships from very large data sets. There can be a place for similarity analyses in impact assessments if we can assure ourselves our methods are chosen objectively and will reflect the real ecological changes taking place. Since clustering introduces unacceptable levels of subjective choices in the similarity analysis, one option is to analyze the similarity matrix directly. In setting up a matrix it is possible to separate differences in species numbers and abundance that can be expected at a site (similarities between replicate samples at one site) from those that can be attributed to differences between sites (similarities between samples from different sites). Such a matrix can then be analyzed using statistical methods to test whether the mean similarity between sites is the same as the mean similarity between replicate samples within a site. By using such statistical methods the problem of subjective choices in the quantitative analyses was reduced, but as can be seen from Tables I and II, the one choice left, that of similarity coefficients can still affect the conclusions derived from the data. Before the full potential of similarity analyses in impact assessments can be realized, we need to develop objective criteria for choosing a coefficient based on its intrinsic and ecological properties. Such criteria have in recent years been

174

THOMAS HRUBY

increasingly discussed (Blanc, et al., 1976; Huhta, 1979; Bloom, 1981; Wolda, 1981; Janson and Vegelius, 1981; Kohn and Riggs, 1982; Field, et al., 1982), and it is now possible to begin selecting coefficients on this basis. In the two impact assessments discussed, for example, several factors were important. First, coefficients can be selected based on the type of data collected. Abundance data are important in comparing biological communities, and these should be used if available (Clifford and Stephenson, 1975). Thus, the Jaccard and Sorensen coefficients, which do not include abundances, are not the best ones to use. Furthermore, the abundances of the common species varied greatly within replicate samples, and in such cases some transformation of the values is suggested (Noy-Meir, 1973; Clifford and Stephenson, 1975). Transformations also increase the importance of rare species relative to the common ones in the calculation of similarities. Since this is what many ecologists do intuitively when comparing sites, analyses using transformations tend to produce quantitative results which more closely reflect those obtained from many years of general qualitative observations. For the benthic studies considered, the Bray-Curtis, Percent, and Euclidian Distance Squared coefficients with a transformation such as the logarithmic are to be preferred. Two coefficients, the Canberra and Morisita already incorporate a standardization of abundances and should be used without a transformation. Thus, of the 11 coefficients tested, 5 can be eliminated based on the preceding factors. Secondly, coefficients can be chosen on the basis of their intrinsic properties. Some coefficients that have been developed are based on the total number of species found in all samples, and treat the absence of a species from two samples being compared as a point of similarity. In the initial selection coefficients which incorporate such "negative matches" were not included because this is not considered to be significant in ecological studies (Clifford and Stephenson, 1975; Boesch, 1977). Many of the species are absent from individual samples resulting in many data entries that are zeros. Thus, measures which include joint absences are not robust enough to be applicable. Criteria based on other intrinsic properties can be used to eliminate 3 more of the 11 coefficients initially chosen. The Morisita coefficient, as modified by Horn (1966) is a poor choice for the clam study because total species abundances (N) in some samples were low, and the modification to the original equation was based on the assumption that N is a good appropximation of (N-l). Morisita's (1956) original equation could not be used because some samples had N = 1, and this resulted in a division by '0' (see Appendix A for equations). The Euclidian Distance Squared, with or without transformation, can be rejected because these two coefficients invariably calculate high values for the similarities in the type of ecological studies considered here (Wolda, 1981; see also Figures 1, 2). High values make it difficult to identify patterns between different sites. Of the 11 coefficients tested, therefore, three coefficients remain to be considered: Bray-Curtis and Percent with transformation, and the Canberra. In benthic studies, the Bray-Curtis coefficient is now favored over the Canberra by many ecologists

175

USING SIMILARITY MEASURES IN BENTHIC I M P A C T ASSESSMENTS

(Boesch, 1977; Field, et al., 1982). In addition to being easier to calculate, the choice o f the Bray-Curtis seems to be based on an intuitive feeling that it gives better ecological analyses. Bloom (1981) has presented an analysis using a model system which may provide a reason for this preference. He found that the Bray-Curtis coefficient responds linearly to changes in species numbers and abundances (degree o f species overlap), whereas, the Canberra does not. In the absence of a similar analysis comparing the Percent and Bray-Curtis coefficients, the choice of coefficients for the benthic impact studies has now been narrowed down to these two. The properties by which the choice has been limited are summarized in Table III. TABLE IIl Some properties of the 11 similarity coefficients used that affected their choice for the benthic impact studies. (log) indicates coefficientswhere abundance data were first transformed to their logarithms. A (+) marks the coefficients having the listed property. Coefficient

Includes negative matches

Includes Includes Similarity Linear species transvalues not response abundances formations clumped to species overlap

Jaccard Sorensen Bray-Curtis Bray-Curtis (log) Percent Percent (log) Euclidian Distance Squared Euclidian Distance Squared (log) Canberra Canberra (log) Modified Morisita

0 0 0 0 0 0 0 0 0 0 0

0 0 + + + + + + + + +

n.a. n.a. 0 + 0 + 0 + + + +

+ + + + + + 0 0 + + +

? ? + ? ? ? 0 0 0 0 0

Rather than making a subjective choice between two seemingly equally appropriate coefficients, it is possible at this point to consider the results obtained from each. Had the conclusions differed, other methods would have had to be'used to determine which conclusion better reflected actual impacts. In the two impact assessments considered, however, the conclusions obtained from the Bray-Curtis and Percent coefficients with transformation were the same. In the clam study one turnover of the sediments did not significantly change the infaunal population on the intertidal flat tested, whereas 4 and 14 turnovers did. The analysis of the Taxon Inc. data indicated that all three sites were significantly different from each other. The question o f species identification is another problem rarely discussed in applying classification techniques to environmental data. The assumption made is that all species identifications are exact and unambiguous. This assumption, from practical experience, is often wishful thinking, especially when dealing with softbodied benthic organisms. Sometimes, organisms can be accurately identified only

176

THOMAS HRUBY

to genus, yet the results are then presented without indicating these ambiguities. To determine whether this was a problem in the data from the clam study, the analyses of variance were re-calculated for the two most suitable coefficients (Bray-Curtis and Percent with transformation); considering each ambiguous organism within a genus as a separate rather than the same species. These re-calculations did not change the conclusions in the analysis of variance at the 0.05 probability level, and the 0.025 level for the Student's t Test. However, additional taxonomic ambiguities might easily have changed the conclusions, and the effects of incompletely identified species should be one of the factors considered before conclusions are made. It should be noted that the conclusions developed from this analysis are applicable only to the one flat dug. The purpose of this report is to assess the methods used in impact studies, rather than looking at the general impacts of digging on intertidal infaunal populations. Data from only one of the 6 areas dug were used so differences attributable to the methods of analysis could be easily separated from differences resulting from biological or environmental factors. Data from the Taxon Inc. study were used as published. It is not the purpose of this report to ascertain whether their experimental design was ecologically reasonable. As impact studies are increasingly playing a role in decision making and policies regarding the environment, there is a need to be sure that predictions or measurements of impacts are scientifically sound and as objective as possible (Rosenberg et al., 1981). The recent trend, however, has been to use quantitative techniques without questioning the subjective decisions that are involved in their selection. The comparisons presented in this paper indicates that uncritical applications of cluster analysis can lead to contradictory conclusions. This does not mean, however, that classification techniques in general should be eliminated from impact studies since these quantitative approaches give us powerful tools for ecological analyses. Rather, ways should be found to limit the subjective choices made, and to justify them using defined criteria. The similarity analysis of the data from two benthic impact assessments is presented with the hope that it will stimulate discussion, and result in the development of criteria or guidelines that can be applied in using classification techniques. The analysis of variance of similarity matrices described offers one way that the number of the subjective choices can be reduced by eliminating clustering. Our understanding, however, of the properties of similarity coefficients is still poor, and much work needs to be done in analyzing and comparing them, using both model data sets and those collected in the field. This paper is contribution # 86-1 from the Massachusetts Audubon Society Environmental Science Department.

References Blanc, F., Chardy, P., Laurec, A., and Reys, J. P.: 1976, 'Choix des Metriques Qualitatives en Analyse d'Inertie. Implications en Ecologie Marine Benthique', Mar. BioL 35, 49-67.

USING SIMILARITY MEASURES IN BENTHIC IMPACT ASSESSMENTS

177

Bloom, S. A.: 1981, 'Similarity Indices in Community Studies: Potential Pitfalls', Mar. Ecol. Proq. Ser. 5, 125-138. Boesch, D. F.: 1977, 'Application of Numerical Classification in Ecological Investigations of Water Pollution', Special Scientific Report 77 (EPA-600/3-7703), VA Inst. Mar. Sci. Bray, J. R., and Curtis, J. T.: 1957, 'An Ordination of the Upland Forest Communities of Southern Wisconcin', Ecol. Monog. 27, 325-349. Cain, S. A.: 1938, 'The Species Area Curve', Am. Midl. Nat. 19, 573-581. Clifford, H. T. and Stephenson, W.: 1975, A n Introduction to Numerical Classification, New York: Academic Press, 229 pp. Field, J. G., Clarke, K. R., and Warwick, R. M.: 1982, 'A Practical Strategy for Analysing Multispecies Distribution Patterns', Mar. Ecol. Proq. Ser. 8, 37-52. Goodall, D. W.: 1970, 'Statistical Plant Ecology', Ann. Rev. Ecol. Syst. 1, 99-124. Gosner, K. L.: 1971, Guide to Identification o f Marine and Estuarine Invertebrates: From Cape Hatteras to the Bay o f Fundy, Boston: Houghton Mifflin Company, 693 pp. Gosner, K. L.: 1978, A Field Guide to the Atlantic Seashore, Boston: Houghton Mifflin Company, 329 pp. Gray, J. S. and Pearson, T. H.: 1982, 'Objective Selection of Sensitive Species Indicative of PollutionInduced Change in Benthic Communities. I. Comparative Methodology', Mar. Ecol. Proq. Ser. 9, 111-119. Green, Roger H.: 1980, 'Multivariate Approaches in Ecology: The Assessment of Ecologic Similarity', Ann. Rev. EcoL Syst. 11, 1-14. Hajdu, L. J.: 1981, 'Graphical Comparison of Resemblance Measures in Phytosociology', Vegetatio 48, 47-59. Horn, H. S.: 1966, 'Measurement of Overlap in Comparative Ecological Studies', Amer. Nat. 100, 419-424. Hruby, T.: 1981, 'The Shellfish Resource in a Polluted Tidal Inlet', Env. Cons. 8, 127-130. Huhta, V.: 1979, 'Evaluation of Different Similarity Indices as Measures of Succession in Arthropod Communities of the Forest Floor After Clear-Cutting', Oecologia 41, 11-23. Jaccard, P.: 1902, 'Lois de Distribution Florale Dans la Zone Alpine', Bull. Soc. Vaudoise Sci. Nat. 38, 69-130. Janson, S. and Vegelius, J.: 1981, 'Measures of Ecological Association', Oecologia 49, 371-376. Kershaw, K. A.: 1973, Quantitative and Dynamic Plant Ecology, London: Edward Arnold, 308 pp. Kohn, A. J. and Riggs, A. C.: 1982, 'Sample Size Dependence in Measures of Proportional Similarity', Mar. Ecol. Proq. Serv. 9, 147-151. Lambert, J. M.: 1972, 'Theoretical Models for Large-Scale Vegetation Survey', in Jeffers, J. N. R. (ed.), Mathematical Models in Ecology, Oxford: Blackwell Scientific Publications, pp. 87-119. Lance, G. N. and Williams, W. T.: 1967, 'Mixed-data Classificatory Programs. I. Agglomerative Systems. Aus. Comput. J. 1, 15-20. Morisita, M.: 1959, 'Measuring of lnterspecific Association and Similarity Between Communities', Mem. Fac. Sci. Kyushu U. Ser. E. 3, 65 80. Noy-Metr, I.: 1973, 'Data Transformations in Ecological Ordination', J. Ecol. 61, 329-341. Renkonen, O.: 1938, 'Statistisch-okologische Untersuchungen fiber die terrestrische Kafferwelt der finnischen Bruchmoore', Ann. ZooL Soc. Zoot.-Bot. Fenn, Vanamo 6, 1-231. Rosenberg, David M., Resh, H., Bailing, S. T., Barnby, M. A., Collins, J. N., Durbin, D. V., Flynn, T. S., Hart, D. D., Lambert, G. A., McElravy, E. P., and Wood, J. R.: 1981, 'Recent Trends in Environmental Impact Assessment', Can. J. Fish. Aquat. Sci. 38, 591-624. Sokal, R. R, and Rohlf, F. J.: 1969, Biometry, San Francisco: W. H. Freeman and Company, 776 pp. Sokal, R. R. and Sneath, P. H.: 1963, Principles o f Numerical Taxonomy, San Francisco: W. H. Freeman and Company, 359 pp. Sorensen, T.: 1948, 'A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content.' Biol. Skrift. 5, 1-34. Taxon Inc.: 1982, Benthic Studies in the Vicinity o f Pilgrim Station, Report No. 19, prepared for the Boston Edison Company. Salem, MA: Taxon Inc., 66 pp. Williams, W. T.: 1971, 'Principles of Clustering', Ann. Rev, EcoL Syst. 2, 303-326. Wolda, H.: 1981, 'Similarity Indices, Sample Size and Diversity', Oecologia 50, 296-302.

178

THOMAS HRUBY

Appendix A: Similarity coefficients used in this study Variables are defined as follows for calculating the similarity between two samples, A and B" c a b k N1, N2

= = = = = nil, n2i = Pli =

P2i

=

number of species common to both samples number of species present in Sample A number of species present in Sample B total number o f different species in A, B total number of individuals in samples, A, B respectively the number of the i-th species in samples A, B respectively the proportion of the i-th species in sample A (=nli/N1) the proportion of the i-th species in sample B ( = n2i/N2)

Two binary coefficients, where the number of species absent in both samples need not be known, were selected. This type of coefficient is preferred in ecological analyses over those that include absent species (Clifford and Stephenson, 1975). (1) Jaccard (1902) r

SIM - - -

a+b-c

(2) Sorensen (1948) (also known as Czekanowski coefficient) SIM = -

2c

a+b"

All other coefficients selected take into account the abundances of the species. Some were originally presented as dissimilarity coefficients and these have been changed so all give the resemblance value as a similarity. Summations are all ~ I= 1 (3) Bray-Curtis (Bray and Curtis, 1957) SIM --

2 ~ min (nli, n2i)

~,, (nli+nEi)

(4) Bray-Curtis after logarithmic transformation of the data [log (n + 1)]. Same formula as (3). (5) Canberra (Lance and Williams, 1977) 1 ~ 2 min (nli, n2i) SIM = k (n l i + n2/) When the similarity is written in this way, rather than as (1 - dissimilarity), the problem of zeros in the numerator is avoided and no correction factor (Clifford and Stephenson, 1975, p. 58) is needed.

USING SIMILARITY MEASURES IN BENTHIC IMPACT ASSESSMENTS

179

(6) Canberra after logarithmic transformation of abuncance [log (n + 1)]. Same formula as (5). (7) Percent similarity (Renkonen, 1938) SIM = ~ min (Pxi, P2i) (8) Percent similarity after logarithmic transformation of abundance [log (n + 1)]. Same formula as (7). (9) Euclidian Distance Squared (Clifford and Stephenson, 1975, p. 65). SIM = 1 - Y~ (Pli-P2i) 2. (10) Euclidian Distance Squared after logarithmic transformation of abundance [log (n + 1)]. Same formula as (9). (11) Modified Morisita (Horn, 1966) SIM -

2 ~ nlin2i (~.~+ ~.2) N1 N2

where

E nli2 N12

Y~ n2i2 ,

k 2

-

-

-

N22

The unmodified Morisita, where

y~n2 N ( N - 1)' could not be used because some samples had only one specimen and this introduced zeros into the denominator of k.

180

T H O M A S HRUBY

B

Appendix

Species and their abundance in sediment core samples taken from plots on the Mill River clam flat. Blanks indicate no specimens found in the sample. Control samples 1-5 were taken from the undisturbed area between the plot dug once and 4 times, and samples 6-9 from the other undisturbed area between the plots dug 4 and 14 times Mill River Control

Dug Once

Dug 4 Times Dug 14 Times

Sample Number

1 2 3 4 5 6 7 8 91011121314151617181920212223

Myaarenaria Macomabalthica Gemma gemma

3 2 3 7 8 2 4 2 8 9 2 3 4 210 4 1 2 3 1 3 3

6 4 5 7 6 251 5

Arabella iricolor Haploscoloplos robustus Clymenella torquata

2 2

2

1 2

1 6 5 6

1 3 2 1

3

Clymenella sp. Saccoglossus kowalewskii Cyathura polita Micrura leidyi Total Number of Species Per Plot

1

3

1

1 4

1

1312 1 21

1

1

1 2 3 3 2 11 12

1 1

1

1

1

1 1

2

1

1 1

10

6

6

3

Using similarity measures in benthic impact assessments.

Eleven similarity measures were used to assess the impacts of clam digging on the infauna of one intertidal mud-flat. When the similarity matrices wer...
901KB Sizes 0 Downloads 0 Views