Marine Pollution Bulletin 82 (2014) 127–136

Contents lists available at ScienceDirect

Marine Pollution Bulletin journal homepage: www.elsevier.com/locate/marpolbul

Confidence rating for eutrophication assessments Uwe H. Brockmann ⇑, Dilek H. Topcu Hamburg University, Institute for Geology, Dept. Biogeochemistry and Marine Chemistry, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany

a r t i c l e

i n f o

Keywords: Representativeness Confidence rating Eutrophication assessments German Bight/North Sea Total nitrogen

a b s t r a c t Confidence of monitoring data is dependent on their variability and representativeness of sampling in space and time. Whereas variability can be assessed as statistical confidence limits, representative sampling is related to equidistant sampling, considering gradients or changing rates at sampling gaps. By the proposed method both aspects are combined, resulting in balanced results for examples of total nitrogen concentrations in the German Bight/North Sea. For assessing sampling representativeness surface areas, vertical profiles and time periods are divided into regular sections for which individually the representativeness is calculated. The sums correspond to the overall representativeness of sampling in the defined area/time period. Effects of not sampled sections are estimated along parallel rows by reducing their confidence, considering their distances to next sampled sections and the interrupted gradients/changing rates. Confidence rating of time sections is based on maximum differences of sampling rates at regular time steps and related means of concentrations. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction Changing mixing gradients of nutrients and organic matter, patchy distributions of organisms and phytoplankton blooms, combined with irregular sampling in space and time, cause uncertainties e.g. of eutrophication assessment results, requiring the estimation of confidences of applied data. By the Oslo-Paris-Convention Commission (OSPAR), Helsinki Commission (HELCOM), Water Framework Directive (WFD), and Marine Strategy Framework Directive (MSFD) mean concentrations of parameters in specified areas and time periods are assessed. Confidence of assessments is dependent on the variability of data and their representative sampling in space and time. Both are affected by sampling strategies, steepness of gradients and velocity of changes. Confidence ratings of assessments are especially needed for definitions of reduction targets e.g. for nutrient discharges and succeeding reduction measures. Since trends indicate the direction and speed of changes and by this the distance to ecological targets, their confidence is needed as well. Confidence of monitoring data has been requested by the EC for the WFD (EC, 2003) and has already been involved in HELCOM assessments (Andersen et al., 2010), however, based on expert judgement only. Statistical approaches for the WFD have been investigated as well (Carstensen, 2007). Reproducible methods are still missing for confidence rating for marine data, considering their representativeness. Mostly variability was considered for estimation of confidence by applying standard deviations and ⇑ Corresponding author. Tel.: +49 40 42838 3989; fax: +49 40 412838 4243. E-mail address: [email protected] (U.H. Brockmann). http://dx.doi.org/10.1016/j.marpolbul.2014.03.007 0025-326X/Ó 2014 Elsevier Ltd. All rights reserved.

connected confidence limits, including distribution of data in regular station grids (Chang and Wen, 2003) and evenly-spaced parallel transects for tests of random sampling and estimating habitat representativeness (MacLeod, 2010). Semi vario-graphic methods have been tested (Anttila et al., 2012) or statistical approaches including empirical elements as well (Nardelli, 2012). Mixing samples to improve confidence and reduce analytical costs has been proposed (Patil, 2011) but this method is neither transparent nor reproducible, because information of subsamples is irreversibly lost by mixing and identical mixtures cannot be repeated. In this proposal for confidence rating representativeness will be compared with statistical confidence limits, calculated from variability of data. Since assessments of defined areas are focused on mean concentrations during a specific time period (e.g. a couple of years), parallel confidence rating of data in space and time is required. This approach of confidence rating is not restricted to eutrophication assessments, as has been discussed in OSPAR for a preceding draft, but can be applied generally for marine data. 2. Methods Provided that sampling and analyses follow inter-calibrated and certified methods, the proposed calculation of representativeness is based on sampling regularity and gradients/changes adjacent to not sampled sections. The confidence rating includes stepwise the following elements: – representativeness of recent data in space, – representativeness in time periods,

128

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

– variability in space and time, – confidence of background data and thresholds, – distances between recent data and thresholds and their degree of overlapping, – combination of confidence rating for different parameters. The confidence rating is performed for individual parameters since the assessed components are often distributed at different scales, reflecting their different sources/sinks and modifying turnover processes. For the combination of confidence ratings of single parameters the completeness of parameters, parameter groups and categories is required in a confident overall assessment. 2.1. Representativeness of sampling Since mainly the differences between recent mean concentrations and thresholds are assessed, their confidences are required. Regular distribution of sampling in the assessed area and time period is required for a high confidence. To assess the evenness of sampling patterns, areas (Fig. 1) and time periods are divided into regular squares/sections of identical sizes which can simply be transferred to % representativeness of the complete area/time period. Sections should be larger than the diameter/duration of tidal action, considering local residence times and should last longer than 1 day, avoiding diurnal effects. The sections are individually scored and the sum of individual scores results in the total representativeness for the assessed parameter. All sampled sections get at first the full percentage of their representativeness corresponding to their size and contribution to the complete area/period. Confidence rating is started by summing up the number of sampled sections and their contributions (%) to the total area/time period. If all sections are sampled, the representativeness is 100%. Not sampled sections will get a reduced score, dependent from the extension of the interrupted gradient/changes. The representativeness of interrupted gradients/changes will be calculated from the difference between the next sampled sections, which had been converted to % of the overall mean before. Since the confidence of

not sampled sections (empty cell) decreases (i) with increasing gradients/changing rates and (ii) the number of connected empty cells, the confidence of each non-sampled cell is reduced by these factors, standardised as % as well: The representativeness of one empty section is reduced by the interrupted gradient/change which is multiplied by the original percentage contribution of the empty section, reflecting its extension, divided by 100:

R ¼ OR  G  OR=100

ð1Þ

R is the representativeness of empty sections [%], OR is original representativeness of section [%], G is the gradient of concentrations between two sampled sections as percentage of overall mean concentration [%]. If a couple of connected sections are not sampled, their individual distances (n = number of empty cells) will be considered for their representativeness by multiplying the gradient/changing rate and the cell size contribution with the distance number (n) of empty sections:

R ¼ OR  G  n  OR=100

ð2Þ

For definition of these distances in an area the sections/cells are arranged along parallel rows. These rows can be arranged due to coordinates e.g. in N–S or W/E direction. The distance of every empty section (n) to the next sampled section is simply defined by counting. Two combined empty sections get both n = 1, for three combined empty sections the middle section will get n = 2, etc. By this the growing uncertainties with increasing numbers of combined empty sections are considered. This means that for empty cells enclosed by sampled cells (n = 1), the representativeness (%) is only reduced by the % of interrupted gradient in relation to the cell sizes. For connected empty cells the representativeness will be reduced with increasing distances, related to cell sizes as well. Resulting negative values indicate that the related cells do not contribute to the overall representativeness which is the sum of individual cell representativeness. The calculation of representativeness of surface data includes the following steps:

Fig. 1. Division of the German Bight into regular squares (716.5 km2) and transects, sampled for vertical TN data (red) 2006–2010. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

– division of assessed area into regular squares (cells), each representing the same % of the total area, – definition of parallel cell stripes (preferably crossing main gradients), – identification of not sampled cells for the assessed parameter, – calculation of interrupted gradients and standardisation as % of overall mean, – calculation of individual representativeness of not sampled cells considering the distance between sampled cells and gradients between next sampled cells, – summing up total representativeness. Assuming that surface waters are dominated by horizontal gradients and depth profiles by vertical gradients, horizontal gradients beneath the surface are neglected. Representative sampling of depth profiles can be calculated similar as for the rows of cells at the surface of an area (Eqs. (1) and (2)). The representativeness of every cube (of the same size) is estimated, which are due to the increased number of cells by the third dimension less representative in relation to the total water masses than surface squares for the area. For combination of surface and volume confidences, both should be reported and finally the worst confidence of area or space should contribute to the parameter specific assessment results. Since confidences of sampling in space and time are closely connected, representative sampling of both should be balanced. The confidence scoring of time periods can be related to the total assessed area, subareas, salinity regimes, or hot spots with specific stations. The assessed time period may be extended to several years, seasons or events, but should always be supplemented by consideration of subunits for estimation of representative sampling of periodically processes. Representative assessments of time periods with multi-years means should be supported by calculation of representative seasonal sampling related to minima and maxima. Generally, the confidence of sample timing is dependent on its regular distribution in identical time steps within the time sections (seasons, years etc.) combined in assessed time periods, similar to the cell rows in areas. Confidence scoring can be extended to long time trends as well (see below). However, the representativeness of sampled time periods for a specified area and parameter is dependent at first from equal distribution of sampling during the involved time sections (as % of total samples) and secondly from the mean concentrations for the most diverging sampling rates. Differences of mean concentrations of all time sections are considered by Eq. (6), reflecting the variability. For calculation of representative sampling during time periods – the specific period is divided into regular sections (e.g. years, months etc.), – the number of samplings within these sections are converted to% of mean sampling rate/year, months etc., – the difference (%) of most diverging sampling rates for time sections included within the time period is calculated, – the mean concentrations of time sections are converted to % of the overall mean, – for the most diverging sampled sections (%) the mean concentrations (%) are considered, – reducing the original confidence of 100% by these factors:

Rts ¼ 100  SR  D=100

ð3Þ

Rts is the representativeness of sampled time section [%], SR is maximum difference of annual or monthly sampling rates [%], D is the difference of mean concentrations for most diverging sampled time sections [% of overall mean].

129

For seasonal assessments the regular sampling distribution should correspond to the duration of seasons, for instance for defined winter: X–II = 5 months 5:7. This means 41% of sampling should be performed during winter, 59% during corresponding growing season. 41% of samples during winter correspond to 100% confidence etc. If time periods are related to salinity regimes, avoiding hydrodynamic effects, their mean contribution to the overall confidence can be weighted according to the contribution of salinity regimes to the assessed area. Shifts of salinities within the salinity regimes during the assessed time periods should be considered if significant mixing diagrams are observed within the assessed area. The representativeness of sampling rates for time series can be calculated similar to those of regular cells along stripes in areas. For this reason the assessed time period is divided into about 100 regular sections (e.g. year: 104 half-weeks), allowing the detection of bloom events and the passages of extended patches as well. Every sampled section gets again its full confidence (%) according to its contribution to the assessed time period. Since the representativeness (R) of a not sampled time step decreases (i) with increasing changes of parameters and (ii) duration without sampling, the confidence (R) of each non-sampled time step is reduced by these factors, standardised as %: The original representativeness (OR) of a non-sampled sections is reduced by the concentration changes (C), calculated as % of total mean, multiplied with the original representativeness of the time step with the duration (number of not sampled time sections n) in relation to the next sampled section, divided by 100.

R ¼ OR  C  OR  n=100

ð4Þ

R is the representativeness of empty sections [%], OR is original representativeness of section [%], C is concentrations difference between two sampled sections [%] as percentage of overall mean, n is the number of non-sampled time sections to next sampled section. This means that for not sampled time steps adjacent to sampled steps (n = 1), the representativeness (%) is only reduced by the % of changes in relation to the duration of the regular time steps (contribution as % to the complete time period). For following combined not sampled time steps the representativeness will be reduced with increasing duration, distance (n) to the next sampled section. Resulting negative values indicate that the related time steps can not contribute to the overall representativeness. 2.2. Variability of data The spatiotemporal variability of the data means controls the confidence of these data besides their representativeness as well. The higher the variability, the lower the confidence will be, since the confidence limit (CL), related to the variability of data, is defined by the standard error which is the standard deviation (SD) divided by the radical of the number of measurements (n). The confidence of means, controlled by the variability can be estimated as confidence limits (CL) for a probability e.g. of 95%. (Eq. (5)). The factor 1.96 for a probability of 95% is dependent on the number of measurements/size of samples and valid for two-tailed tests only.

p CL ¼ 1:96  SD= n

ð5Þ

This range can be converted to % of the mean and by this directly compared with the representativeness scores (%). However, these calculations are originally restricted to ‘‘normally’’ distributed data. Assuming ‘‘normal’’ distribution of data, confidence limits can be calculated for means related to areas, vertical profiles, and time sections and can be compared, after inversion (100CL%) to confidence ranges, directly with the representativeness of sampling.

130

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

Variability of means is controlled by the steepness of gradients in an area, or speed of concentration changes during a time period, but also affected by sampling processes as well. Effects of mixing or biogeochemical turnover can be reduced by restriction of assessments to salinity regimes and/or specific seasons. By this physical forcing is reduced which is of less concern e.g. for eutrophication assessments, quality of the environment, and anthropogenic effects. If assessments of salinity regimes are performed, their contribution to areas has to be considered for an area related confidence rating. Additionally, shifts of sampling activities within the salinity regimes may affect the estimated changing rates. Since standard deviations (SD) of annual or monthly mean sampling rates reflect the differences for annual or monthly sampling, they can be applied as well for confidence rating of time periods. If standard deviations are high, the confidence will be low. If the SD of concentrations during the compiled time periods is high, the confidence will be reduced:

Cts ¼ 100  SDsr  SDc=100

ð6Þ

Cts [%] is the confidence of time section, based on variability of data, SDsr [%] is standard deviation of annual or monthly sampling rates, SDc [%] is the standard deviation for sampled time period. The results will be compared with representativeness calculations of sampling rates. The final confidence score (%) for evenly distributed resolution of sampling time/periods of one parameter is the lowest seasonal, annual or inter-annual mean of representativeness (representativeness t1 or confidence t2). Variability of time series can be calculated for the couple of years, e.g. by running means considering the same season (month). For time series R2 of square fits (0–1) can be transferred to % (0–100) for a joint complete confidence rating of recent data in area and time, connected by means of the assessed area/time period. Confidence of data on natural background concentrations of nutrients e.g. for river discharges can be deduced from their variability concentrations related to area specific freshwater discharges (Topcu et al., 2011). Since there are often significant mixing diagrams and several significant correlations between eutrophication parameters, the confidence of natural background concentrations can be transferred from river nutrients to adjacent salinity regimes and other parameters (Topcu et al., 2009). Confidences of thresholds can be deduced from recent parameter specific variability (as %), and confidences of reduction targets from the variability of effects, e.g. such as oxygen depletion. For combined confidence scoring confidences of background values and thresholds should be merged as means. The degree of overlapping of SD (%) of recent means and thresholds, affecting the confidence of assessments, can simply be estimated as% of maximum overlapping. Concentrations approaching the thresholds need for statistical significance a higher accuracy due to increasing overlapping of variable data. On the other hand confidence may be low without affecting the significance of the assessment if distances between thresholds and recent data are large in polluted areas. For total confidence rating of assessments the representativeness and statistical confidence limits (deduced from variability) of single parameters are combined, those for the areas, depth profiles, and time periods. The completeness of assessed parameters is considered as % for the confidence score. The worst score (space, time, parameter number) is recommended as final confidence score. For combination of parameters for overall confidence the worst component may be used or the most significant effect parameter (e.g. oxygen). The results of the step-wise confidence rating are combined: (i) representativeness of data in the area, water mass, time period,

(ii) variability (confidence limits) of recent data in space and time, overlapping with thresholds, (iii) combination of parameters, (iv) compilation of scales and interpretation. As example TN measurements in the German Bight area have been chosen between 2006 and 2010. Data sources are: ICES, ARGE Elbe Hamburg, BSH Hamburg, FTZ Büsum, LLUR Kiel and NLWKN Wilhelmshaven. For the surface about 1500 data were available, in total 2400 data. The surface of the German Bight including the exclusive economic zone was divided for calculations of representativeness into 60 regular cells of 716.5 km2, representing 1.67% each of the whole area (without inshore estuaries) (Fig. 1). Data for vertical profiles have been compiled from 15 cruises (mostly 3/y) during spring, summer and autumn and combined (data: BSH Hamburg). The vertical gradients have been arranged mainly along 9 parallel nearly equidistant transects (Fig. 1), covering the complete area. For confidence scoring, the vertical profiles have been divided in the mostly shallow area into 129 regular 10 m – segments, representing 0.77% each of total profiles. Since the distances between the vertical profiles are not constant representativeness of volumes has not been calculated. 3. Results Examples for representativeness and variability of TN in the German Bight area are shown for the surface area, vertical profiles, and time periods. 3.1. Representativeness of surface area sampling The TN concentrations decreased in the German Bight from the estuaries towards offshore areas by mixing of river plumes and coastal waters (Fig. 2). In this example 47 cells (representing each 1.67% of the area) were sampled between 2006 and 2010 for TN, corresponding to 78.5% representativeness (Table 1). Most stations are sampled repeatedly at the identical positions. Reduced representativeness by not sampled cells has been calculated along N–S and W–E rows (Table 1). Gradients, interrupted by not sampled cells have been indicated for the W–E direction in Fig. 2. The gradients between empty cells have been calculated as % of total mean concentrations of 32 lM (without inner estuaries). Along the W–E-rows 6 empty cells are connected north of 54.8° N (Fig. 2, Table 1). Due to the gradient of about 30 lM between sampled cells, corresponding to 94% of the mean in the assessed area, there is no confidence left for these cells. However, the total representativeness is with 87% similar to the N–S approach, resulting in 97% because there are only two connected empty cells and confidences of not sampled cells were mainly affected by interrupted gradients. Finally the lower confidence should be considered. The percentages of the original representativeness left to empty cells (Table 1, orig.%) were calculated as additional information for estimating the effects of different gradients and numbers of connected empty cells. Gradients of 1.9% reduce the representativeness only to 98%, those of 84.4–17%. For a couple of connected empty cells at different interrupted concentrations the representativeness is completely lost. 3.2. Representativeness of vertical profiles Along the depth profiles the concentrations were often similar (Fig. 3). In the same way as for the confidence scoring of surface areas, the not sampled depth segments have been indicated, interrupted gradients have been calculated, and converted to % of the

131

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

9.6

55.8°N

8.9

0.4

9.3

7.9 10.1

54.6°N

11.2

1.8

0.6

8.5

55.0°N

475

8.5

8.6

55.4°N

9.1 9.95

9.2

10.8

10.3

11.7

13.6

12.2 16.2

13.6

13.3

53.8°N

16.8

23.1

31.8

57.7

50

46.6

28.3

46.9

30

50.9

18.6

25.1

66.8

22.0 35.4 27.6 28.9

21.7

54.3

61.5

20

55.3 86.2

20.7

56.4

47.0 50.6

15

186.5

264.2 250.9 233.7

102.1 96.1

72.1

53.4°N

empty cells

12 11

226.5

10

358.4

386.4

4

53.0°N 3°E

4°E

5°E

6°E

7°E

8°E

9°E

10°E

30.6

55.8°N

29.5

28.2

1.3

27.4

55.4°N

27.0

55.0°N

1300 25.1 32.2

1.9

26.9

5.7

29.1

34.3

32.6

93.6

43.2

54.2°N

400

42.4

37.0

45.6

66.4 76.7

43.2 46.6 38.8 51.4 53.3 59.0

20.3

31.2

35.7

26.3

28.9

31.6

54.6°N

52.8

51.4

42.3 23.5 50.0

53.8°N 53.4°N

14.7

7.4

15.8

100

37.8

24.2

39.4

16.6

16.2

54.2°N

20.9

14.4

29.49

6.4

9.8

200

13.4

8.3

73.3

79.7 69.9112.2 65.7

53.0°N 3°E

4°E

125.1 147.9 148.9 161.4 212.0

172.2

324.0 304.9

228.8

6°E

7°E

90 70

591.6 838.1 796.0 741.6 718.6

8°E

50 40 30

1137.2

1225.8

5°E

150

175.4273.5 178.9

87.7 149.2 160.4 91.6 68.8

101.0 183.1 195.1

empty cells

89.8

250

120.0

9°E

10°E

20

Fig. 2. TN gradients in the German Bight area (2006–2010) with indicated interrupted gradients (W–E) by not sampled cells: above lM, below % of mean concentrations.

Table 1 Calculation of sampling representativeness of 13 empty cells (N–S and W–E) in the German Bight area. N–S direction assessed Location (E) s. Fig. 2

n⁄

Gradient%

3.5° 5.0°

1 1 1 1 1 1 1 1 1 1 1 1 1

3.2 6.0 5.4 8.9 3.5 10.8 1.9 2.5 20.5 7.3 7.3 13.0 84.4

5.5° 6.0° 6.5° 7.0° 7.5°

Not sampled Sampled Total

13 47 60

W–E direction assessed Repr.% 1.61 1.57 1.58 1.52 1.61 1.49 1.64 1.63 1.33 1.43 1.43 1.45 0.26 18.58 78.49 97.07

Orig.%

Location (N)

n⁄

Grad.%

Repr.%

Orig.%

96.4 94.0 94.6 91.0 96.4 89.2 98.2 97.6 79.6 85.6 85.6 86.8 15.6

55.6° 55.2° 55.0° 54.8° 54.8° 54.8° 54.8° 54.8° 54.8° 54.6° 54.4° 54.4° 54.0° 53.8°

1 1 1 1 2 3 3 2 1 1 1 1 1 1 14 46 60

1.3 5.7 1.9 93.6 93.6 93.6 93.6 93.6 93.6 46.6 20.3 20.3 23.5 68.8

1.65 1.57 1.64 0.11 1.4 3.0 3.0 1.4 0.11 0.89 1.33 1.33 1.28 0.52 10.43 76.82 87.25

98.8 94.0 98.2 6.6 0 0 0 0 6.6 53.3 79.6 79.6 76.6 31.1

Repr. = representativeness, calculated with Eq. (2), Orig. % of original representativeness, n⁄ = number of connected empty cells as distance to next sampled cell.

132

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

overall mean of segments at vertical profiles of 22.9 lM during growing season. The vertical differences between sampled segments were along the northern transects (Fig. 1) between 0.5% and 18% and along southern profiles between 0.4% and 25% (Fig. 3). The total representativeness for the main vertical profiles is for 93 sampled and 31 not sampled segments 97%. 3.3. Representativeness of time periods For the observed time period 2006–2010 most samples (416) for TN analyses have been taken in 2006, 36% above the mean annual mean sampling rate of 305, and as minimum in 2009 17% less have been measured. The maximum deviation is 53%. The mean TN concentrations differ by 4.1% between 2006 and 2009 (Fig. 5). Monthly sampling rate reached as a mean 127 (n) and was highest in February and lowest in April, differing by 109%. (Fig. 4). Mean monthly TN concentrations differed between these months by 23% (Fig. 5). From these data the representativeness for annual sampling was calculated (Eq. (3)): 100–53  4.1/100 = 97.8% representativeness and for monthly sampling: 100–109.4  23/ 100 = 74.8% representativeness. 3.4. Variability of data The variability as standard deviation for the single squares increased for TN from offshore areas with 10–20% to more than 50% in near coastal waters (Fig. 6) and reached in some coastal

Fig. 4. Annual and monthly sampling frequencies for TN between 2006 and 2010 (black: n, red: % of means). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

areas >100%, resulting in statistical confidence limits (Eq. (5)) of moderate 19% (Fig. 7) due to higher sampling rates (>30/square)

Fig. 3. Vertical mean TN gradients in the German Bight area, compiled from 15 cruises during growing seasons (2006–2010) as % of the mean (22.9 lM) with gradients across non-sampled sections (data BSH).

133

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

100–21.27  12.4/100 = 100–2.64, =97.4% annual representativeness; Monthly: 100–47.14  29.0/100 = 100–13.7 = 86.3% monthly representativeness. 3.5. Significance of deviation from thresholds The degree of overlapping of SD (%) of recent means and threshold can be estimated as% of maximum overlapping standard deviations or selected quantiles. In Fig. 8 TN concentrations in the German Bight are shown in a mixing diagram, approaching thresholds in offshore areas (S > 34.5). The thresholds are 50% above natural background concentrations (Topcu et al., 2011). Variability of recent data as SD overlaps with threshold variability between salinities of 25–28 and above 32 in coastal waters, indicated by the 10% quantiles as well. The confidence of TN assessment is reduced within these salinity ranges. These results can be considered for final confidence scoring. 3.6. Compiled confidence rating for TN The combination of confidence ratings, shown here for one parameter only, has been compiled in Table 2. The confidence limits (CL) for a probability of 95% calculated from standard deviations (SD) of means (Eq. (5)) ranged between 4% and 25% (Table 2). The confidence limits (%) are transferred to confidence ranges (CR = 100  CL) for direct comparison with the representativeness (RE), which show similar values. The worst score should be transferred to the next step, the combination of parameter scores, e.g. the low representativeness of 873% for the surface area sampling. The confidence average range for the salinity regimes are only by about 3% higher than the confidence score for the complete area. The confidence ranges of the time periods are below those of the surface area and the salinity regimes. Based on compiled and modeled background data (Topcu et al., 2011), the significance of natural and nearly natural background data for TN as mmol/m2 y correlated with freshwater discharges L/m2 y reached >99%. The variability of mixing in the German Bight was considered by the variability of thresholds which is assumed to correspond to the variability as % of recent TN data of 83.7% SD (Fig. 8, Table 2). The confidence range of the latter value would be transferred to a final scoring.

Fig. 5. Mean annual and monthly TN concentration in the German Bight area 2006– 2010 with standard deviations indicated by error bars and as standard deviations as % of total means (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

in near coastal waters. In offshore squares confidence limits remained below 1%, in most squares of coastal waters 32 Vertical profilesa 42 Profiles Sum/mean of profiles Sum/mean of segments Time:years Months Background data Thresholds

Mean (lM)

n

SD%

CL%

CR%

46.4 32.0 85.4 67.9 54.1 20.4 25.6 8.5–199 26.5 22.9 46.8 47.1

1524 60 141 127 643 528 875 4–29 42 124 5 12 30 1524

83.7 95.7 59.8 54.7 50.1 65.3 110.5 8.2–54.1 31 103.5 12.4 29.0 0.81b 83.7

4.2 24.8 10.1 9.7 4.0 5.7 7.4

95.8 75.2 89.9 90.3 96.0 94.3 92.6

9.6 9.3 11.1 16.8

90.4 90.7 88.9 83.2 >99 95.8

4.2 p

n = number of samples, SD = standard deviation, CL = confidence limits for 95% probability = 1.96  SD/ n, CR = confidence range, RE = representativeness. a Single values. b R2.

RE% 87.3

97.0 97.8/97.4 74.8/86.3

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

diverge: representative sampling in space does not guarantee confident data, if the sampling time was not balanced during the assessed time periods. Since the proposed confidence scoring of time series is following the same method as for space, time series can be involved as well, allowing a combined confidence scoring of areas and time periods enclosed by longer lasting trends for the area. Sampling strategies should be related to the morphology and hydrodynamics of the assessed area and to the variability and gradients of assessed parameters. By seasonal screening main gradients can be identified with a minimum resolution of 10–20% of the area extension, both N–S and W directions. The same holds for vertical gradients during growing season with stratified waters or freshwater mixing gradients. It is recommended that at gradients surpassing 15% of the overall mean concentrations spatial resolution of sampling should be increased. Sampling frequency should at least allow to quantify annual cycles, e.g. by monthly sampling. Annual cycles are also required for evaluation the timing of seasonal assessments. A basic feature of data representativeness in areas and time periods is a regular sampling in space and time. For this reason, areas, depth profiles, time periods are divided into identical sections. Each of these sections should contribute to the total assessed area/time period not more than 20%, even if gradients/changes are small in relation to mean values otherwise the quality of an equidistant sampling is difficult to assess. If steep gradients or fast changes have been observed in prior monitoring, the section sizes should be approximate to 1%, otherwise the detection of ‘‘hot spots’’ could be missed. On the other hand sections should not be too small, for areas significantly above tidal cycles, or for time periods longer than diurnal variation, otherwise variability by physical short-time forcing would dominate. Basic equidistant scoring can be supplemented by scorings with higher resolution, adapted to steep gradients and fast changing rates (e.g. within river plumes). If there are several maxima/minima in an area/time section, subareas/time period may be assessed separately, enclosed in an overall confidence assessment. However, this approach could be limited by diverging scales for the different parameters. Connected not sampled sections will reduce the overall confidence more than the same number of isolated not sampled sections (time and space as well) because the uncertainty increases with the distance between sampled sections. For simplification the number of empty connected cells was estimated within parallel lines, e.g. N–S or E–W, an approach applied by Chang and Wen (2003) as well. Since combinations of empty cells could differ within directions, both should be scored. By transfer of the lowest representativeness to the next step of confidence scoring, diagonal gaps are considered to some degree as well. Since empty cells within subareas with flat gradients are not affecting the overall confidence significantly, oppositely to steep interrupted gradients, standardised gradients as % of overall area means are considered by reducing the original confidence. The relation to means is more significant than a possible relation to min–max distances. The reduction of original representativeness of single not sampled sections (Table 1 column ‘‘orig.%’’) reflects the interrupted gradients and the distances to next sampled section as was illustrated by the different representativeness of directions analysed in the German Bight area as well (Fig. 2 and Table 1). Since the northern part of the assessed area was dominated by E–W gradients and the south-eastern part by N–S gradients, both directions have been scored, considering different accumulations of empty sections as well, reflected by diverging results (Table 1). However, since directions of gradients can diverge for different parameters, combined scoring of standard E–W and N–S directions is recommended.

135

The higher representativeness of vertical gradients in comparison to surface gradients (Table 2) indicates the dominance of horizontal gradients because vertical gradients are in the mostly shallow coastal waters less significant due to tidal mixing (Fig. 3) and the vertical sampling distribution was sufficient for TN. In spite of significant differences of annual and monthly sampling rates (Figs. 4 and 5), confidence of sampling during time periods was sufficient as well, due to moderate concentration differences. Confidence of seasonal sampling time was relatively high for this example, because variation of TN concentrations, representing both, inorganic and organic N, is less modified by seasonal cycling than chlorophyll a or inorganic nutrients. For seasonally stronger modified parameters representativeness of monthly sampling would provide an additional input for the confidence rating. Confidence scoring of time periods can be related to the assessed area, subareas, salinity regimes, or hot spots with specified stations. The representativeness of sampling timing will decrease generally with decreasing or diverging sampling rates during identical time steps and with increasing differences of concentrations during these steps. Since mostly a couple of years are combined for assessments, the concentration differences between maximum and minimum sampling annual rates were considered, which provided similar results as the variability related approach (Table 2). However, if seasonal minima or maxima are not met by sampling, representativeness will be reduced. Since different parameters show often diverging seasonal variation, seasonal assessments for a couple of years should be based at least on monthly data or – as recommended by Anttila et al. (2012) – by fortnightly sampling. From mean seasonal cycles, calculated from a couple of years, monthly standard deviation can be applied for testing representativeness of recent data in relation to minima and maxima sampling rates (e.g. as % of annual means for comparing different years, parameters), considering the variability as well. Confidence of sampling timing, restricted to seasons for specific parameters (e.g. inorganic nutrients during winter, chlorophyll a during growing season) should be supplemented by considering data round the year because, besides of climate changes, phytoplankton is growing in shallow areas also during winter season in the temperate zone (Bennekom and Wetsteijn, 1990; Brockmann and Wegner, 1985), affecting the nutrient regime as well. The confidence of time periods and time series could be combined by scoring the representativeness of time sections included within time series, applying deviations from trends, annual means, seasonal cycling, etc., following the nesting principle. Variability has to be assessed for confidence scoring independently of representativeness because variability is by the latter only considered at not sampled sections by the steepness of gradients/changing rates. Variability can be reduced in many cases by assessing smaller subareas or shorter time periods or e.g. by relation to seasons or to salinity regimes, if significant mixing diagrams are given (Carstensen, 2007). Means related to salinity regimes would be more informative than means for areas, however, at a first priority, areas have to be assessed supplemented by assessments of their dominating salinity regimes. Relations to salinity regimes are recommended for time series, reducing hydrodynamic effects of variability. However, salinity regimes should be analysed for consistency of sampling positions and results should be corrected, if salinity/sampling shifts occurred within the regimes. Different scales for assessed areas, time periods, hydrodynamic and ecosystem processes, e.g. effects of tidal cycles on phytoplankton growth (Blauw et al., 2012), which require specific adapted sampling strategies can be combined by the proposed confidence rating, since the confidence is expresses as% allowing combinations of different scales due to weights of assessed areas and time periods.

136

U.H. Brockmann, D.H. Topcu / Marine Pollution Bulletin 82 (2014) 127–136

A general problem for assessing coastal waters is visualised in Fig. 8: the approach of recent concentrations to background and threshold values which occurs in contaminated areas in outer coastal waters (S > 30) or near-shore in less polluted areas. This overlapping area will move towards the coast with progressing reduction measures, reflecting that ‘‘there will always be situations where the actual ecological status cannot be determined with a given confidence, because the confidence interval is overlapping one or several boundaries between quality classes’’ (Carstensen, 2007). This example indicates that definition of reduction targets should be related preferably upstream diluting mixing zones, to freshwater of rivers because higher significance of differences between recent and statistically validated background data allows more confident assessments (Topcu et al., 2011). For assessing specific ecosystem processes, such as eutrophication, parameters may be affected by interfering processes, such as poisoning, dredging, seasonal changes. If those ‘‘extern’’ interferences affect a parameter significantly within an area or during the assessed time period this parameter should be scored as absent for eutrophication assessments. The worst score for time or space confidence should be transferred to the final confidence score calculation because the overall confidence is dependent on both. If the confidence is low for some time steps, subareas, salinity ranges, or single parameters only, these facts should be reported additionally and their relation to the overall confidence should be interpreted for the assessed ecosystem. The proposed method for confidence is very moderate but the similarity of results to statistically defined confidence ranges confirms this approach (Table 2). This method allows a transparent confidence scoring, considering the main factors: evenly distribution of samples in space and time and influences of gradients/ changes on remaining confidence for not sampled sections. This confidence scoring can also be transferred to other chemical descriptors of the MSFD which affect food chain processes. Confidence rating is not only required for assessments but especially needed for definition of reduction targets and simulated effects of reduction measures.

Acknowledgements We thank the data originators, especially the BSH for providing TN and salinity data. For technical assistance we thank Monika Schütt and for significant comments two reviewers. Preparation of this publication was supported by the German Federal Environmental Agency Dessau, in frame of the project ‘‘Ecological assessment of coastal and offshore waters concerning eutrophication (WFD, MSFD, OSPAR, HELCOM)’’, SN: 370925221. References Andersen, J.H., Murray, C., Kaartokallio, H., Axe, P., Molvaer, J., 2010. A simple method for confidence rating of eutrophication status classification. Mar. Poll. Bull. 60, 919–924. Anttila, S., Ketola, M., Vakkilainen, K., Kairesalo, T., 2012. Assessing temporal representativeness of water quality monitoring data. J. Environ. Monit. 14, 589–595. Bennekom, A.J. van, Wetsteijn, F.J., 1990. The winter distribution of nutrients in the southern bight of the North Sea (1961–1978) and in the estuaries of the Scheldt and the Rhine/Meuse. Netherlands J. Sea Res. 25, 75–87. Blauw, A.N., Beninca, E., Laane, R.W.P.M., Greenwood, N., Huisman, J., 2012. Dancing with the tides: fluctuations of coastal phytoplankton orchestrated by different oscillatory modes of tidal cycle. PLOS ONE 7 (11), 14pp. Brockmann, U.H., Wegner, G., 1985. Hydrography, nutrient and chlorophyll distribution in the North Sea in February 1984. Arch. Fischereiw. 36, 27–45. Carstensen, J., 2007. Statistical principles for ecological status classification of Water Framework Directive monitoring data. Mar. Pollut. Bull. 55, 3–15. Chang, Y.-H., Wen, L.-S., 2003. Sampling representativeness of oceanographic surveys using ship-based instruments. J. Coastal Res. 19, 997–1010. EC, 2003. Guidance on monitoring for the Water Framework Directive, final version 23.1.2003. Brussels. 164pp. MacLeod, C.D., 2010. Habit representativeness score (HRS): a novel concept for objectively assessing the suitability of survey coverage for modelling distribution of marine species. J. Mar. Biol. Ass. UK 90, 1269–1277. Nardelli, B.B., 2012. A novel approach for the high-resolution interpolation of in situ sea surface salinity. J. Atmos. Oceanic Technol. 29, 867–879. Patil, G.P., 2011. Composite sampling: a novel method to accomplish observational economy in environmental studies: a monograph introduction. Environ. Ecol. Stat. 18, 385–392. Topcu, D., Brockmann, U., Claussen, U., 2009. Relationship between eutrophication reference conditions and boundary settings considering OSPAR recommendations and the Water Framework Directive – examples from the German Bight. Hydrobiologia 629, 91–106. Topcu, D., Behrendt, H., Brockmann, U., Claussen, U., 2011. Natural background concentrations of nutrients in the German Bight area (North Sea). Environ. Monit. Assess. 174, 361–388.

Confidence rating for eutrophication assessments.

Confidence of monitoring data is dependent on their variability and representativeness of sampling in space and time. Whereas variability can be asses...
2MB Sizes 2 Downloads 3 Views