A DATABASE

FOR

A MULTI-INSTITUTIONAL

ENVIRONMENT MONITORING PROGRAM MARCELO JUANICO Center for Environmental and Water Resources Engineering, Technion-lsrael Institute of Technology, 32000 Haifa, Israel

(Received March 1988) Abstract. A several year program is underway in Israel for monitoring a wastewater reclamation and storage complex for agriculture irrigation. The program covers wastewater treatment and storage, irrigation, aquifers, air pollution, crops, soil, geography and meteorology, and is operated by one Principal Institution and several Participant Institutions. The institutions feed their data to a central database. Multi-institutional databases are fed by research teams whose areas of activity are traditionally unrelated. The main problem of the Database Administrator consists in combining data created by different methods under different basic assumptions. And to make them available to investigators from several disciplines and institutions who are used to different data analysis procedures. Database Administrator's background should cover both computer and environmental sciences fields. The selection of a flexible multipurpose software is recommended, but there are several decisions on database design that depend not on the software but on the characteristics and requirements of the Monitoring Program. Entering the data from different sources in independent files facilitates input, debugging, administrative control of the contribution of each institution, and changes in units and parameters. This multi-file design also overcomes difficulties due to excessive file size, and better matchs the actual usage of the database in a multi-institutionalprogram. To enter the data with their original characteristics and units also matchs better the database usage. Ready-to-use routines for file merging and unit conversion, and debugging and documentation needs are also discussed. The Database-User interactions are the mechanism that maintains an evolving dynamic Database.

Introduction O w i n g to the i n c r e a s i n g l y i n t e r d i s c i p l i n a r y c h a r a c t e r o f the e n v i r o n m e n t a l sciences, e n v i r o n m e n t m o n i t o r i n g p r o g r a m s which involve several i n s t i t u t i o n s a r e g r a d u a l l y r e p l a c i n g s i n g l e - i n s t i t u t i o n a l ones. T o d a y it is a l m o s t i m p o s s i b l e to conceive a n e n v i r o n m e n t m o n i t o r i n g p r o g r a m w i t h o u t c o m p u t e r s u p p o r t , a n d the design o f a d a t a b a s e for a m u l t i - i n s t i t u t i o n a l p r o g r a m is a relatively new topic. Giavelli a n d Rossi (1984) raised s o m e p r o b l e m s related with this k i n d o f d a t a b a s e s . M u l t i - i n s t i t u t i o n a l d a t a b a s e s are fed b y r e s e a r c h t e a m s w h o s e areas o f activity are t r a d i t i o n a l l y u n r e l a t ed. Thus, the m a i n p r o b l e m consists in c o m b i n i n g d a t a c r e a t e d b y d i f f e r e n t m e t h o d s u n d e r d i f f e r e n t basic a s s u m p t i o n s . A n d to m a k e those d a t a a v a i l a b l e t o investigators f r o m several disciplines a n d i n s t i t u t i o n s w h o are used to d i f f e r e n t d a t a analysis p r o c e d u r e s . In p r a c t i c e , this m e a n s t h a t highly h e t e r o g e n e o u s d a t a m u s t be h a n d l e d a n d p r o c e s s e d t o g e t h e r in a n efficient w a y w i t h o u t losing c o n t a c t with the several t h e o r e t i c a l a n d p r a c t i c a l f r a m e w o r k s within which t h e y were g e n e r a t e d . T h e p r e s e n t p a p e r , in d e s c r i b i n g t h e design o f a d a t a b a s e for a m u l t i - i n s t i t u t i o n a l e n v i r o n m e n t m o n i t o r i n g p r o g r a m o p e r a t e d in Israel, develops f u r t h e r the p o i n t s a d d r e s s e d b y Giavelli a n d Rossi a n d raises several new ones. Environmental Monitoring and Assessment 12: 181-190, 1989. 9 1989 Kluwer Academic Publishers. Printed in the Netherlands.

182

MARCELO JUANICO

The Monitored Complex The Kishon Complex is one of the largest wastewater reclamation and storage systems for agriculture irrigation in Israel. The municipal sewage of Haifa (about 15 million cubic m y - 1) is treated by parallel trickling filters and activated sludge. This effluent is chlorinated, and pumped through a 28 km long pipe line as the only inflow to a Stabilization Reservoir. The Stabilization Reservoir has a storage capacity of 12 million cubic meters, and is divided in two ceils by an embankment which prevents short-circuiting and facilitates operation. Here a significant reduction takes place in the concentration of major effluent constituents such as COD, Suspended Solids, P and N. The outflow from this reservoir is then again chlorinated and discharged to an adjacent Operational Reservoir. The Operational Reservoir has 8 million cubic meters storage capacity and is used for storage and operation. It receives waters from several sources, the Haifa effluents not exceeding one third of total intake at any time. From the Operational Reservoir the water is either supplied directly to the irrigation network or to other smaller peripheral reservoirs. Several settlements of different sizes in the area maintain their own oxidation ponds and small stabilization reservoirs for their local effluents. The main irrigated crop in the area is cotton. There are more than 70 irrigation and fresh water wells and springs in the area, and aquifer resources are exploited to their maximum capacity.

The Monitoring Program The Monitoring Program covers a period of several years and is intended for medium and long term monitoring, not real-time. The project was initiated with a view to: (1) Obtaining a general picture of the environmental conditions in the area in the year prior to commissioning of the complex, and monitoring any changes over the following years. (2) Control of the operation of the wastewater treatment system and of its ability to deplete organic matter, pathogens, heavy metals, detergents, etc. (3) Monitoring possible development of anaerobic conditions and malodours as a result of storing wastewater effluents in large reservoirs. (4) Monitoring the effect of surface storage of wastewater effluents, and of effluent irrigation, on the aquifer. (5) Monitoring the effect of effluent irrigation on soils and crops. The program entails coordinated activity of several Participant Institutions - the Water Supply Company, Ministry of Health, the Haifa Municipality, Soil Extension Service, Agriculture Research Station, etc. - with the Technion-Israel Institute of Technology as Principal Institution. The Participant Institutions conduct field and laboratory work in the areas of their specialization and forward the data to the Technion. A considerable part of these data are not specifically intended for the monitoring program, but are a by-product of other projects that the Participant Institutions conduct in the area.

A DATABASEFOR A MULTI-INSTITUTIONALENVIRONMENTMONITORINGPROGRAM

183

The Database Administrator As quated by Giavelli and Rossi (1984), the build up and maintenance of a complex database requires a Database Administrator. The complexity of the data may preclude researchers with a poor background on programming from make direct use of the database. The Database Administrator and a simple structure of the Database are key factors to make the database accessible to many users. In the case of the Kishon Complex Monitoring Program the Database Administrator is also in charge of the programming work and statistical advice. This may be a common situation where the size of the project does not justify a Database Administrator plus a Programmer plus a Consultant on Statistics. Knowledge on environmental sciences and an actual involvement in the monitoring program were also required from the Database Administrator. Several problems on data standardization and error debugging can be resolved only with a full understanding of data meaning. Considerations for Database Design

(1) Data characteristics Figure 1 shows the areas typically covered by an inter-disciplinary environment monitoring program. In a multi-institutional situation each subject is covered by one or two institutions which coordinate their work with the Principal Institution but are administrative, financial and technically independent. The complexity of data production and handling in a multi-institutional program can be better understood by describing a typical sampling for water quality (Figure 2): Field data (temperature, Dissolved Oxygen, pH, electrical conductivity, turbidity) are written in the field on one form. A sample of water is then taken, and subsamples are sent to three laboratories: Chemistry (32 parameters), Microbiology (3 parameters) and Planktology (algae and main zooplankton groups); each laboratory fills and forwards a different form with the results of its analyses. Another form

I

SEWAGE AND WASTEWATER QUALITY

OPERATIONAL DATA (FLOWS, VOLUMES WATER BALANCES)

AIR POLLUTION

ENVIROMENT MONITORING PROGRAM

/

AQUIFER WATER QUALITY

GEOGRAPHY GEOLOGY

I CROPS CHEMISTRY

SOIL CHEMISTRY

METEOROLOGY

Fig. 1. Areas typically covered by an inter-disciplinary environment monitoring program.

184

MARCELO JUANICO

WEEK

I ORMI

F] Fig. 2.

Data production and handling in a multi-institutional environment monitoring program.

containing operational data at the sampling point is filled by an independent team. The same day, air quality (malodours, sulphides) may be checked in the vicinity of the sampling point by still another team which fills still another form. Thus, six forms related to the same sampling point and time may be received by the Data Administrator. These forms do not arrive together to the Principal Institution, but generally over a period of one month. Moreover, a parallel sub-sample may be sent to another chemical laboratory for precision control, with one more form as a consequence. At the design stage the following characteristics of the data defined the initial structure of the Database: (-) Several categories of data come from different institutions. (-) Different institutions may work with similar but not always equivalent parameters (e.g., PO 4 expressed as mg l - 1 of P, and Total P expressed as mg l - 1 of PO4).

A DATABASE FOR A MULTI-INSTITUTIONAL ENVIRONMENT MONITORING PROGRAM

185

(-) Even the same parameter, coming from different sources, may be written in different units. Or there may be actual differences among values due to analytical precision, accuracy or errors. This problem is out of the control of the Monitoring Program when data are a by-product of other programs. (-) Several analyses made on the same water sample by different laboratories, will arrive at different times at the Principal Institution and Database. (-) Changes on data characteristics can be expected in the course of a long term project, due to changes in the independent programs supplying part of the data. (-) New data requirements by the Monitoring Program itself can also be expected, as experience indicates shortcomings in the original project or as unexpected environmental problems arise. (2) Available software Differences between commercial and engineering data management systems were quoted as early as in the fifties. Twenty years ago, these differences opened two new areas of research: definition, standardization, exchange and dissemination of scientific and technological data, addressed by CODATA and other international agencies (Rossmassler and Watson, 1980); and development of FORTRAN based information processing systems for engineering purposes, such as Integrated Civil Engineering System (ICES) (Schumucker, 1967). Ten years ago, ICES and similar data management systems still predominated in the area of Civil Engineering, although a strong tendency for conversion from multi-purpose systems to specialized ones was already noticeable (Emkin and Prichard, 1975; Wells and Logcher, 1975). Today, a Database Administrator evaluationg appropriate software has several options: (-) Highly specialized packages derived from ICES and its likes, such as those quoted by Rango et al. (1983) and James and Unal (1984) for hydraulics/hydrology. (-) General database management systems based on Data Description Languages like the CODASYL DDL and the IBM DL/I (Martin, 1976; Wiederhold, 1980). These systems are oriented to commercial uses and are too complex and large for application in small and medium size databases. (-) A relatively new catefory of softwares like Statistical Analysis System (SAS), Statistical Package for Social Sciences (SPSS), STATGRAPH for small databases in personal computers, and others. These softwares originated as restricted packages for statistical analysis but soon developed into data handling and analysis systems; they are multi-purpose and highly flexible, and thus optimal for an environment monitoring program like that of the Kishon Complex. The Kishon Complex database was implemented in a IBM/3081D mainframe. Statistical Analysis System (SAS) was selected as the main software for database management and data processing. This software is used as: (-) Data descriptor language. (-) Data management system.

186

MARCELO JUANICO

(-) Programming language for data and file handling and processing. (-) Statistics, graphics and reporting packages. (-) Database interrogation language.

Database Design There are several decisions in Database design that depend not on the selected software but on the characteristics and requirements of the Monitoring Program, and on the Database Administrator's logical conception (schema). Two opposite solutions were selected among several possible options (Figure 3): merging all data in a massive standardized file, or entering data from each form in a separate file maintaining their original characteristics. The second solution was adopted and the seven possible forms of the quoted example (Figure 2) were entered in seven corresponding files. This set up implies that processing of data from two or more files must be preceded by pre-processing in the form of merging, and sometimes by a units standardization as well. Such as design runs counter the recommendations of conventional database design manuals, but proved very efficient for our multi-institutional monitoring program as explained below: (1) Input The forms are delivered by the Participant Institutions at different times. Separation of the files facilitates input and debugging, checking of one laboratory through another, and administrative control of the contribution of each institution. As each form maintains its independence within the computer, any technical or administrative problem is readily located and resolved without affecting the other data. Besides, one laboratory may change the methods to analize some parameters. These changes are out of the control of the Monitoring Program when the data are a by-product of other projects. Or new kind of data may be required by the Monitoring Program itself. All these features are much easier to implement in separate files than in a massive common one. (2) Merging routines The problem of multi-file storage is easily overcome with the aid of merging routines. A ready-to-used set of those routines permits quick pre-processing for any cross-file job, so that only the files needed for the specific job are handled. Unit conversion is optional but available. (3) File size After one year of work the volume of available data was already large, indicating that after two or three years, the size of a massive file containing all the data would lead to several computer problems, even if working with a big mainframe. As file size grows, data handling and processing requires more memory and time, overflowing the default values of the computer system and calling for a continuous re-definition of the Virtual Machine. In batch work, jobs would receive lower priority. And if working with a personal computer, handling and storage of a massive file may become impossible.

A D A T A B A S E FOR A M U L T I - I N S T I T U T I O N A L E N V I R O N M E N T M O N I T O R I N G P R O G R A M

187

PROGRAH

MERGING ROUTINES

i~::::: :::::~c:::::'1~::

OPTION I

=============================

I

F

I

I

I

I MERGING & PROGRAH

OPTION I I

I I Fig. 3.

I I

I I

I I

Two options for the design of the Database of a multi-institutional environment monitoring program.

(4) D a t a b a s e use The way the data are analyzed is very important in determining the structure of the database. Our experience shows that one-file jobs (corresponding to one original form) are the most common request received by the Database Administrator and constitute the bulk of computer work. Some jobs require two or three files, and very few jobs more than three. The one-file jobs are usually requested by individual specialists and envisage short and medium term control. The cross-file jobs usually involve more than one specialist and generally refer to global long-term relationships. (5) Units Most jobs are requested in the original units the data were created. Individual investigators from different institutions generally request outputs on the data produced by their own institutions and in the original units to which they are accustomed. On the other hand, cross-file jobs may request units which are not the original ones of any of the involved fields - e.g., nitrates are determined as mg NO 3 l - 1 in some laboratories, and as mg N_ NO 3 l - 1 in others. Some cross-file jobs requested

188

MARCELO JUANICO

nitrates been expressed as meq 1-1. Thus, a priori standardization of units would not make for higher efficiency of the database. Instead, ready-to use routines for units conversion at the pre-processing step would be more useful.

Debugging, Utilities and Documentation Debugging routines are designed to check for transcription errors, laboratory errors, database consistency, actual changes in parameters values, and alterations of the sampling plan. Wrong values are detected by comparing the received values with expected ones. Database consistency is maintained by checking for no absence of pointers, no repeated observations and correlation among all pointers in observations from different files. Implementation of the sampling plan is controlled by running it through the computer and comparing with the actual data entered in the database. Routines for data input, debugging, merging, units conversions and regular output are very important utilities. Having this pre-planned work in ready-to-use form has three advantages: it relieves personnel of routine work leaving more time for new programming problems, permits quicker output of the regular work when desired, and provides regular output for reporting in a standard format. Documentation is vital in any database. In the case of a multi-institutional program it is even more important, because of the heterogeneity of the data, the different criteria used to create them, and the physical separation of the Database Administrator from the Participant Institutions. The contents of each file have to be clearly described and readily accessible to the operator. Documentation should include not only the names and units of the variables, but all available information on sampling, measurement and analytical methods employed to create the data. And any useful comment on the meaning, use, or possible sources of errors of the variables. Routines, utilities, and any variation or change have to be thoroughly documented as well.

Database Design Evolution The importance and success of a Database is measured not by the amount of data that flows-in from the Monitoring Program to the Database, but by the amount (and quality) of information that flows-out from the Database to the Monitoring Program. Accordingly, the design of the Database requires periodical adjustments to fit the evolving Monitoring Program needs. The Database feeds-back data to the user in the form of processed information. The user feeds-back new processing needs to the Database in the form of further jobs, questions, doubts, claims, etc. These interactions are the mechanism that allows the Database Administrator to define the necessary adjustments to maintain an evolving dynamic Database.

Conclusions From the Kishon Complex Monitoring Program, the following conclusions can be drawn on database design for multi-institutional monitoring:

A DATABASE FOR A MULTI-INSTITUTIONAL ENVIRONMENT MONITORING PROGRAM

189

(-) The main problem of the Database Administrator consists in combining data created with different methods, under different basic assumptions, to be analyzed by investigators from different institutions and disciplines. (-) The Database Administrator's professional knowledge should combine both computer and environmental sciences. (-) Selection of a flexible multipurpose software will release the Database Administrator from a lot of formal computer work. Still, there are several decisions on database design which depend not on the software, but on the characteristics and requirements of the Monitoring Program, and on the Database Administrator's logical conception. (-) Entering data from different sources in separate files facilitates input, debugging, administrative control of the contribution of each institution, changes in parameters and conversion of units. This design also overcomes problems related to excessive file size, and is better suited for the actual use of the database in a multi-institutional program. (-) Entering the data with their original characteristics (not standardized) is also preferable from the above viewpoint. Original units should be maintained even when two laboratories measure the same parameter in different units. (-) Ready-to-use routines for file merging and unit conversion are necessary for cross-file jobs. (-) Debugging routines are useful in checking certain kinds of errors and database consistency, and in warning in cases of actual changes in parameter values. (-) Utilities for regular jobs and detailed documentation are essential. (-) The Database-User interactions are the mechanism to maintain an evolving dynamic Database.

Acknowledgement This paper was written within the framework of the Kishon Complex Monitoring Program, sponsored by Mekorot Water Co. of Israel.

References Emkin, L. and Prichard, M.: 1975, 'Data Handling in Civil Engineering Systems', in Data Handling Techniques in Civil Engineering, Preprints ASCE National Structural Engineering Convention, New Orleans, 33 pp. Giavelli, G. and Rossi, O.: 1984, 'Some Problems Related to an Environmental Factual Database: The Case of the Aeolian Project', Intern. J. Environmental Studies 22, 217-224, James, W. and Unal, A.: 1984, 'Evolving Data Processing Environment for Computational Hydraulic Systems', Can. J. Civil Eng. 11, 187-195. Martin, J.: 1976, Principles of Data-base Management, Prentice-Hall Inc., New Jersey, 352 pp. Rango, A., Feldman, A., George, T., and Ragan, R.: 1983, 'Effective Use of LANDSAT Data in Hydraulic Models', Water Res. Bull. 19 (2), 165-174. Rossmassler, S. and Watson, D.: (eds.) 1980, Data Handling for Science and Technology, CODATA and UNESCO, North Holland Publ., 184 pp.

190

MARCELO JUANICO

Schumacker, B.: 1967, 'An Introduction to ICES', Massachusetts Institute of Technology, Department of Civil Engineering Research Report R67-47, 19 pp. Wells, R. and Logcher, R.: 1975, 'Data Management for Integrated Design and Management Systems', in Data Handling Techniques in Civil Engineering, Preprints ASCE National Structural Engineering Convention, New Orleans, 16 pp. Wiederhold, G.: 1981, Database Design, McGraw-Hill, 658 pp.

A database for a multi-institutional environment monitoring program.

A several year program is underway in Israel for monitoring a wastewater reclamation and storage complex for agriculture irrigation. The program cover...
522KB Sizes 0 Downloads 0 Views