Software Support for Huntingtons Disease Research P. Michael Conneally, Ph.D., Medical and Molecular Genetics, Indiana University John M. Gersting, Ph.D., Computer and Information Sciences, Indiana University Jacqueline M. Gray, Medical and Molecular Genetics, Indiana University Keith Beidleman, Medical and Molecular Genetics, Indiana University Nancy S. Wexler, Ph.D., Hereditary Disease Foundation Carol L. Smith, Medical and Molecular Genetics, Indiana University Huntingtons disease (HlD) is a hereditary disorder involving the central nervous system. Its effects are devastating, to the affected person as well as his family. The Department of Medical and Molecular Genetics at Indiana University (IU) plays an integral part in Huntingtons research by providing computerized repositories of HDfamily informationfor researchers and families. The National Huntingtons Disease Research Roster, founded in 1979 at IU, and the Huntingtons Disease in Venezuela Project database contain information that has proven to be invaluable in the worldwidefield of This paper addresses the types of HD research. information stored in each database, the pedigree database program (MEGADATS) used to manage the data, and significant findings that have resulted from access to the data.

Key Words Huntingtons disease, human pedigree, MEGADATS, database

Huntingtons disease is an autosomal dominant disorder involving the central nervous system. It is pathologically characterized by a loss of cells in the caudate nucleus and putamen, a decrease in the level of neurotransmitters and associated enzymes, and abnormalities in some receptor sites. Outward symptoms of the disease include progressive chorea and dementia, generally typified in late onset cases, and debilitating rigidity typified in juvenile cases. HD has been described effectively as "genetically programmed cell death in the human central nervous Inherent in the genetically-linked system. "[2] transmission of HD is the fact that each offspring of an affected individual has a fifty-percent chance of inheriting the HD gene. Until signs or symptoms of HD are experienced by an offspring of an affected person and that offspring is diagnosed, his status is termed as being "at risk" for carrying the HD gene. HD causes severe emotional and social distress to all members of an affected family. "Huntingtons disease is a family disease. Every member of the family is affected - emotionally, physically, socially - whether patient, at risk, or spouse. And the disease occurs not once, but over and over again in successive generations."[3]

Introduction

HD Roster The Department of Medical and Molecular Genetics at Indiana University is involved in two main areas of Huntingtons disease (HD) research: the National Huntingtons Disease Research Roster and the Huntingtons Disease in Venezuela Project. A major factor in the continued success of these projects involves the computerization of family pedigree information using the pedigree database program Medical Genetics Acquisition and Data Transfer System (MEGADATS). MEGADATS was developed in the department in 1975 to support applications like the HD project, and since that time has been updated and enhanced periodically. This program incorporates all processes involved in managing family databases, including entering, storing, and plotting each individual family, as well as providing mechanisms by which information can be analyzed about a database as a whole. [1]

0195-4210/91/$5.00 CO 1992 AMIA, Inc.

419

The Huntingtons Disease Research Roster was established in September, 1979, at the Department of Medical and Molecular Genetics, Indiana University (IU). Its primary purpose is to aid rpsearch in HD by acting as a national repository of HD-family pedigree information. This information is made available to researchers as well as to the HD families themselves. Since its inception, the Roster database has grown from 90 families in 1980, 643 families in 1984, and 1523 families in 1988, to 1687 families in 1991. Currently, the database contains almost 92,000 individuals, of which over 9,400 (10.2%) are affected and almost 15,000 (16.3 %) are living at risk. Information about individuals stored in the database includes name, dates of birth and death, sex, HD status and shading, cause of death, age at onset of symptoms, and general notes. HD shading provides a visual

sets. Not only can specific data be extracted from the pedigree information alone (i.e., "extract all living, at-risk individuals with affected mothers") or just from the affected questionnaire data set (i.e., "analyze first symptoms noted by affecteds with age at diagnosis of less than 40), but more complex inquiries can be made using data from both databases (i.e., "analyze first symptoms noted by affecteds with affected mothers").

indicator on the pedigree plot to illustrate each person's HD status at a glance: A half-shaded symbol represents an affected, diagnosed individual (HD status is "HDX"), a one-quarter-shaded symbol represents a possibly-affected individual ("PHD") or a possible gene carrier ("PGC"), and a non-shaded symbol represents either an at-risk ("AR") or a no-risk individual (no HD status). To more completely define each individual's HD status on the pedigree plot, the corresponding HD status information is printed in the text block associated with each symbol (See Figure 1). To aid the pedigree entry process of assigning the current HD status to at-risk individuals in each family, facilities have been developed using MEGADATS to automatically calculate and assign the proper HD status and shading for all such individuals based on the affected, possibly-affected, and probable-gene-carrier persons in each pedigree. This automation eliminates the possibility of human error in assigning the proper at-risk status for non-affected individuals.

Due to the particularly sensitive information contained in the HD Roster, security is a key issue in maintaining the integrity of the database. Confidentiality of the patient data is preserved at all times and names of individuals are never included in information or statistics released from the roster. If a researcher needing volunteers requests such information, roster personnel first contact potential families with details about the project. The family then completes a provided postage-paid response card, indicating its willingness to participate in the project. Finally, those families that agree to take part in the study are forwarded to the researcher. To insure that they are appropriate and ethical, all requests from investigators are screened by a committee of the roster. Upon joining the roster, the contact persons for each family are required to sign an informed consent form which indicates that participation is voluntary and which outlines the aforementioned security procedures. In addition, a separate informed consent is required for each additional project in which the family participates.

Figure 1 - Sample HD Pedigree Plot

JOHN

4

9250328 9830606

MARY SMITH 9500307

I

HOX

HDX

JANE5 9300624

2 SLE

JACK

SMITH 9521001 9850523

3

SMITH

PHD

9600205

AR

Venezuela ID Project

Cause of death information is coded using standard cause of death classifications as defined by the International Classification of Diseases, Clinical Modification (ICD-9CM).[4] This standardization was designed for the classification of mortality information for statistical purposes and provides very specific information about causes of death. Other relevant individual or family information not assigned to a specific data field is stored in a special notes field (i.e., "alcoholic," "schizophrenic," "drug addict" etc.) In addition to pedigree information, data from a questionnaire for affected individuals is stored in a MEGADATS database. Presently, this database contains over 1,900 entries. These data include socio-economic, medical, clinical, social and psychiatric information about affected individuals. Information contained in the affected questionnaire data files is easily correlated with the associated family history data in the pedigree files by the use of the family and individual numbers assigned by the pedigree input process of MEGADATS. This cross referencing capability between the two databases enhances the types of queries that can be performed on both data

420

Residing in and around Maracaibo, Venezuela, is the largest known family with Huntingtons disease in the world.[5] A team of researchers travels once a year to Venezuela to gather updated pedigree information, collect blood and tissue samples from at-risk individuals, and perform neurological and psychological tests on members of the Venezuelan kindred. IU is responsible for maintaining most of the computerized information in the Venezuelan data set, including the family pedigree, tissue sample and cell line status, and genotyping (patemity) data. MEGADATS is used to store all of this information and is used to produce the Venezuelan pedigree plot (currently over 700 pages in its entirety), specific computer listings of the family information (i.e., all individuals listed in alphabetical order, only living at risk individuals, locations of individuals, etc.), and labels (to be used for blood tubes or exam forms) each year prior to the researchers' trip. The flexibility of the MEGADATS program allows for ease in modification to the database process as required from year to year.

Recent statistics from the Venezuelan database indicate over 10,800 individuals in the family, 382 (3.5%) of which are affected and 1,070 (9.9%) of which are living at risk. To address the problems related to the enormity of the Venezuelan pedigree and the large number of instances of consanguinity in the family, the database is divided into 283 individual family files, all of which are contained in one large database. These various files consist of the following: one large family file, the "base" file, with over 7000 individuals who comprise the bulk of the HD-affected family; another large family file containing a "skeleton" of the entire family pedigree, illustrating only critical HD lineage within the family; and several other smaller family files representing all marriedin individuals who are not actually genetically-linked to the base family. Resulting from this separation of the pedigree, many individuals appear on more than one page in the pedigree plot. To ease the burden of referencing a particular individual who appears more than once in the pedigree (i.e., with his own parents and with his mate's family), each individual is assigned a unique code number. Then, in the pedigree-plotting process, this code number is used to compile a cross-reference list of all the pages on which each individual is located and that information is printed with each occurrence of the individual on the plot. The cross-reference information for each individual is also included in all of the computer listings and printed labels corresponding to the pedigree. Although MEGADATS is capable of accommodating a pedigree of the magnitude of the Venezuelan family in just one file, this technique of splitting the pedigree into several files and assigning a code to each individual allows for more accuracy in updating and modifying the pedigree. This also allows for easy accessibility of the actual hardcopy pedigree and its associated lists.

Applications Software The original design objective of MEGADATS was to produce a portable, production-level platform to support human pedigree processing regardless of the size or number of pedigrees involved. The MEGADATS implementation process began in 1975. MEGADATS-2 was completed in the early 1980's.[6] Since that time, MEGADATS-3 has been completed and is the most current version of the database program. Development work on the program continues, driven by special needs for particular studies and technological improvements in software and hardware. MEGADATS utilizes a line-oriented, command-driven user interface. The pedigree entry processes are semiautomated and involve a logical sequence of user prompts

which provide a simple data-entry mechanism for even the occasional computer user. At the same time, MEGADATS provides several data-manipulation procedures that make possible very complex data searches and extractions. Security of the database files is maintained by the program through the use of binaryformatted data files and required password logon sequences.

Plotting pedigrees is one of the most important features of MEGADATS and has evolved more than any other part of the system. Originally the only plotting device available was the line-printer (circa 1975). This device was soon replaced by the Tektronix 4662 plotter (single 11" x 15" sheets) and the Calcomp plotter (11" wide rolls, 25' long).[7] MEGADATS has since been enhanced with new device drivers that utilize the industrystandard Hewlett-Packard Graphics Language (HPGL) on Hewlett-Packard (HP) LaserJet Series II printers (with the Pacific Data Products HPGL-emulation cartridge[8] and 1MB additional memory) and HP LaserJet III printers (with 1MB additional memory). An important feature of this enhancement is its compatibility with common desk-top publishing. MEGADATS can export the HPGL codes into an image file that can be imported directly into many word processors. This image file can also be converted into a variety of other types of graphics formats (i.e., TIFF or Postscript), by using several commerciallyavailable conversion programs. [9] As a by-product of the HPGL-compatibility, a stand-alone, plot-only version of MEGADATS, MEGADATS Plot (MP), has been developed to support plotting of large applications. Pedigree files are generated using the traditional MEGADATS, then are drawn using MP. MP uses six-point font on the HP LaserJet Series II, the smallest font supported by that printer.

Another important feature of the MEGADATS program is its flexibility in importing and exporting key pedigree information. A standard ASCII file containing information about individuals in a pedigree (i.e., family number, sex, record number, and record numbers of father and mother) can be easily imported into MEGADATS format by using the "LOAD" command. The "OUTPUT" command, on the other hand, allows MEGADATS to export similar information to an ASCII file. This ability of the program allows for easy portability of information between MEGADATS and other types of genetic analysis programs, such as S.A.G.E. (Statistical Analysis for Genetic Epidemiology)[10], LIPED[11], or LINKAGE.[12] To keep track of relationships within families,

421

MEGADATS uses a seven-part indexed pointer scheme: father, mother, sibling, mate, multiple mate, offspring, and primary. Each of the first six pointer values indicates the record number containing the referenced individual. The primary pointer is used to maintain multiple matings, null offspring, and removed records in the pedigree structure. Specific pedigree-traversal commands in MEGADATS provide mechanisms by which complex pedigrees can be easily traced using these pointer indexes. One problem in constructing a pedigree is the occurrence of consanguineous matings, matings between blood relatives. To address the problem of locating such matings, often termed "loops," in the HD Roster and the Venezuelan pedigree, a loop-detection algorithm has been developed. The MEGADATS pointer scheme (father, mother, mate, multiple-mate, offspring, and sibling pointers), in conjunction with its "OUTPUT" command, allows for efficient implementation of the traversal queries required by this algorithm.

The loop-detection algorithm is a recursive procedure which searches maternal ancestors of each individual for overlap with paternal ancestors. For each person in a family, the program first marks all individuals on the mother's side of the family. Then it traverses the father's side while searching for individuals that have already been marked. In the process, the procedure keeps track of all the individuals on both sides of the family that caused the loop. Finally, once a loop is detected, the mother's side is traversed a second time to locate the individual from the father's side that caused the loop. Ultimately, the algorithm will report all loops found and the extent of the consanguinity of each such mating.

Portability and compatibility are important issues in software development and usage. The MEGADATS system is available for VAX/VMS, Sun/UNIX and PC/MS-DOS systems and is currently in use at several sites world-wide. Less than 100 of the 20,000 lines of the FORTRAN source code are host-computer dependent, thus insulating the system from changes in host-computer hardware and software. Efforts for future improvements center around the interface between MEGADATS and its environment. A graphical user interface for pedigree data entry is under development. MEGADATS DBF will have a menudriven user interface and, since portability across systems is a major concern for users, will run in a dBasecompatible environment.

422

Conclusions Computerization of the HD Roster and Venezuelan data sets using MEGADATS has allowed for complex analysis of pedigree information. The highlight of the HD research has been its participation in mapping the HD gene to the short arm of chromosome four. The HD Roster was an essential component in the identification of a marker (G8) for the HD gene, providing pedigrees not only for locating the marker but also for confirming the original data and searching for heterogeneity in the disease.[ 13] Finding this marker for HD has, in turn, led to the possibility of prenatal and presymptomatic (detection of gene carriers prior to the onset of symptoms) testing of individuals who choose to know their genetic status.[ 14] Other studies involving the HD data set have included the following: variable age of disease onset, excess of paternal transmission among affected individuals with juvenile onset, causes of death in HD, suicide in HD, and epidemiology of HD.

Many pertinent publications and scientific accomplishments have resulted from the use of data and research families available in the HD repository. It is anticipated that these data will play a major role in the continued search for the gene and possible defect of Huntingtons disease itself. Acknowledgements The Huntingtons Disease Research Roster is supported by a contract from the National Institutes of Health, contract number NO1-NA-0-2385.

The Venezuela Project, Huntington's Disease in Venezuela: Genetic and other Studies, is funded by a subcontract from the Hereditary Disease Foundation, contract number 2 RO1 NS2203 1.

References 1.

Gersting, J.M., Conneally, P.M. and Beidelman, K.: Huntington's disease research roster support with a microcomputer data base management system. Proceedings of the Seventh Annual Symposium on Computer Applications in Medical Care. Los Angeles, Computer Society Press, 746-749, 1983.

2.

Martin, J.B.: Huntingtons disease: genetically programmed cell death in the human central nervous system. Nature 229:205-206, 1982.

3.

4.

5.

Commission for the Control of Report: Huntington's Disease and Its Consequences. Volume I - Overview (Maryland: U.S. Dept of Health Education and Welfare, Public Health Service and National Institutes of Health, 1977), p. xix.

9.

HLJAAK, Inset Systems, Brookfield, CN 06804.

10.

Elston, R.C., Bailey-Wilson, J.E., Bonney, G.E., Keats, B.J.B. and Wilson, A.F.: S.A.G.E. - a package of computer programs to perform Statistical Analysis of Genetic Proceedings of the 7'th Epidemiology. International Congress of Human Genetics, Berlin, 1986, p 289.

11.

Ott, J.: Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage studies. Am. J. Hum. Genet. 26:773-775, 1974.

12.

Lathrop, G.M., Lalouel J-M, Julier, C., Ott, J.: Strategies for multilocus linkage analysis in humans. Proc. Natl. Acad. Sci. USA 81:34433446, 1984.

13.

Gusella, J.F., Wexler, N.S., Conneally, P.M., Naylor, S.L., Anderson, M.A., Tanzi, R.E., Watkins, P.C., Ottina, K., Wallace, M.R., Sakaguchi, A.Y., Young, A.B., Shoulson, I., Bonilla, E. and Martin, J.B.: A polymorphic DNA marker genetically linked to huntingtons disease. Nature 306:234-238, 1983.

14.

Conneally, P.M., Gusella, J.F. and Wexler, N.S.: Huntington disease: genetics, presymptomatic and prenatal diagnosis, In: Nucleic Acid Probes in Diagnosis of Human Genetic Diseases, A.M. Willey, ed., Alan R. Liss, Inc., New York, pp. 133-152, 1988.

The International Classification of Diseases. 9th Revision. Clinical Modification (Washington, D.C.: U.S. Department of Health and Human Services, 1980). Penney, Jr., J.B., Young, A.B., Shoulson, I., Starosta-Rubenstein, S., Snodgrass, S.R., Ramos, J.S., Ramos-Arroyo, M., Gomez, F., Penchzadeh, G., Alvir, J., Esteves, J., DeQuiroz, I., Marsol, N., Moreno, H., Conneally, P.M., Bonilla, E. and Wexler, N.S.: Huntingtons disease in Venezuela: 7 years of follow-up on symptomatic and asymptomatic individuals. Movement Disorders Vol. 5, No. 2, 1990, pp 93-99.

6.

Kang, K.W., Merritt, A.D., Conneally, P.M. and Gersting, J.M. Jr.: A medical genetic data base management system. Second Annual Conference on Computer Applications in Medical Care 524-529, 1978.

7.

Beidelman, K. and Gersting, J.M.: Plotting human pedigrees. Journal of Medical Systems Vol. 9, No. 3, 1985, pp 97-108.

8.

Plotter in a Cartridge, Pacific Data Products, San Diego, CA 92121.

423

Software support for Huntingtons disease research.

Huntingtons disease (HD) is a hereditary disorder involving the central nervous system. Its effects are devastating, to the affected person as well as...
848KB Sizes 0 Downloads 0 Views