:‘owpur. tJml Mrd

Pergamon Press 1976. Vol. 6. pp. 191-198.

DIGITAL

ARCHIVING FOR OFF-LINE

Prmted I” Great Bntaln.

OF BIOMEDICAL RECORDINGS COMPUTER ANALYSIS*

WALTER K. HARRISQNand KENNETHM. BAKALAR Johns

Hopkins

University

School

(Rrcriued

of Medicine.

27 May 1975;

720 Rutland in

Avenue,

Baltimore.

MD 21205, U.S.A

rtwised f&m 20 O~~rohr 1975)

Abstract-This article describes a new procedure for archiving biomedical recordings on industry standard digital magnetic tape. Familiar computer methodology is employed for data-banking heartbeat patterns in a format which has been optimized for later analysis. Recordings are made through an encoder-controller unit coupled to a four channel FM cassette data recorder. This unit produces FM analog magnetic tape with square wave identification and calibration code preceding each sample of biomedical data. Code bits and data sample are timed out sequentially in repetitive standardized format. Digital images of these specially formatted analog tape cassettes are then processed by an edit program on an IBM 3701145. Here, the identifiers are decoded and data sections located. Identification, calibration. and biomedical recordings are then archived on a directory tape which may be efficiently and repeatedly accessed for future computations. Archiving Archiving Data recognition algorithm analog tape recordings

biomedical signals Decoding algorithms

Biomedical recordings Encoder -controller

Data unit

encoding Formating

1. INTRODUCTION The procedure described in this article was developed for use in a collaborative research project involving four geographically separate laboratories. At three of the locations, ballistocardiograms (Beg) and carotid pulse recordings (CP) of patients with coronary heart disease are recorded on analog tape cassettes. These biomedical recordings are formatted and identified on the cassettes by means of an encoder unit. This device controls the tape recording of a sample of IS30 heartbeats from each patient. and his identification code. Collaborators then send the cassettes to the authors’ laboratory. Here, in separate steps, they are translated to digital computer tape, edited to an archive tape, and analysed by means of another computer program. An off-line procedure is complementary to the goals of the research project which require sequential testing of the patient population and detailed statistical analysis of results without investment in expensive capital equipment. This article is concerned with methodology of recording the heartbeat patterns and their archiving by means of a specialized computer program. Our procedure employs standard computer methodology. However, we believe that it will be of interest to systems analysts as well as to cardiovascular research workers and others who desire to make computer analyses of similar biomedical signals. Unless these investigators have exceptional minicomputer systems at their disposal, analytical options can be seriously limited by the machine. This is especially true if inter-heartbeat variability, arrythmias, and related phenomena are to be studied. Data volume in such cases often dictates use of high density computer tape and a sizeable core which are infrequently available on laboratory computers. We have therefore employed methodology which is compatable with off-line batch processing in central computer facilities. In our collaborative study we can archive loo0 cases of simultaneous Beg and CP on every computer tape. Each of these studies may contain sequences of heartbeat patterns up to 20 set in duration. With a minimum digitization rate of 200 samples/set/ * Aided by grants

HL 14928 and HL 16907 of the National

Institutes

of Health.

192

WALTER K. HARRISQN and KENNETH M. BAKALAR

channel, more than 8 million characters must be stored along with identifiers and calibration data. While this requirement can be met by elaborate and expensive mini computer systems, it is easily fulfilled in most institutional centers such as those equipped with the IBM 370. Primary heartbeat data archived on such a readily machinable medium is ideal for advancement of cardiovascular and other research objectives. Many subtle aspects of higher dynamic function of the heart and arterial system may be explored in great detail for large numbers of cases. Important time-domain measurements of cardiovascular recordings such as slopes, higher derivatives, integrals, systolic time intervals, etc. are all easily accessible by means of appropriate computer programs. These can operate in economical batch mode if desired. This capability expands analytical options for cardiovascular data beyond the innovative examples described by Scher et al. [l], in context of a dedicated laboratory computer system. While this article is concerned with the methodology of archiving, the sound rationale in support of these efforts should be clearly in mind. Our methodology is attractive from a budgetary standpoint because minimal investment is required in equipment. The encoder unit [2] which identifies and formats the analog recordings has cost $1300 in the recent past. The four channel FM cassette instrumentation recorder* sells for less than $2400. These are the only units which must be added to existing standard equipment for cardiovascular recording [3]. Costs of analog-digital conversion and other computer time charges are incurred in proportion to numbers of individual cases archived and subsequently analysed. These pro rata expenses are more easily justified in research project budgeting than sizeable outlays for expensive capital equipment. 2. RECORDING

METHODS

A block diagram of the laboratory apparatus for recording biomedical signals is given in Fig. I. The encoder-controller unit is located in the signal path just before the FM analog tape recorder. It operates in the following way: The operator monitors signals on direct writing recorder and/or display oscilloscope to appraise their quality and stability. An identification number for the sample has previously been set on the thumbwheel switches of the encoder. The recording sequence is initiated when the operator presses “start” on the encoder. Under encoder control, the tape transport is then activated, and an analog image of the BCD identification code recorded on the cassette. At the conclusion of the identification, the encoder switches tape recorder inputs to the biomedical signal sources. A sample of these signals of present duration is then recorded, with the encoder shutting down the tape transport at the end of the desired period. Figure 2 illustrates the standardized tape format produced by the encoder-controller. Sequences begin with a “start-transient” and end with a “stop-transient”. Duration of the section containing identification code is extended in this example to allow adequate time for voice labeling of the tape on a separate channel. The seven digit BCD code is preceded by a pair of bits which are used to initiate the computer algorithm for decoding identification numbers. A constant “zero volt” level is recorded before and after the code. This level and the relative elevation of the leading pair of code bits is usable for calibration purposes in computer analyses of the biomedical data, and in the digital tape edit algorithm. This standardized format for analog data is advantageous for subsequent edit steps controlled by computer program. It also helps to organize routine review of recordings by the operator at key points in the progression of databefore, during, and after tape recording. Recordings may be reviewed individually just after capture by rewinding the cassette and reproducing each sample on a direct writer (Fig. 1). If imperfections are noted, * Instrumentation North Hollywood,

recorder R-70, TEAC CA 91601, U.S.A.

Corporation.

B. J. Wolfe

Enterprises.

10760 Burbank

Boulevard.

Digital

archiving

of biomedical

recordings

for off-line computer

analysis

193

PULSE -

DISPLAY

DIRECT

OSCILLOSCOPE

WRITING

OSCILLOGRAPN

Beg -y

EDIT VOICE

LABEL ‘I

I

w PULSE

PATIENT

‘Beg

ON AIR

1

I

1

r

Pi=-? I - :::.‘::l:::‘.:.“,:,:,,..

ACCELE

BEARING

REPLAY

Beg

Fig. I. Arrangement of laboratory apparatus for recording biomedical signals for digital archiving and computer analysis. Encoder-controller unit generates identification code and provides uniform format of data on analog tape cassette.

users may elect to re-record data over the flawed sections at once. This option is valuable if data are obtained from patients who may not be restudied conveniently at a future time. Re-recording is also efficient in that it eliminates archiving of data known to be of no value, and economizes by omitting any computer charges which might be incurred in later processing steps. If a shorter duration of data than the sample programmed by the encoder is required, the operator may press an “override” button at the desired time to terminate sampling. When this is done, the zero volt level is recorded until tape transport is shut down by the encoder after the normal time interval. This method of terminating data sampling preserves the standardized format of the encoder-controller so that computer edit steps are simplified. Approximately 40 individual data samples (cases or studies) are recorded on each 30-min cassette. A series of several cassettes may be re-recorded (“dubbed”) on IRIGstandard FM magnetic tape for analog to digital conversion at multiples of real time rates. Processing time may be saved by this step. Lint, DEC, or industry standard ANALOG MAGNETIC

TAPE

Fig. 2. Example

FORMAT PRODUCE0

BY ENCODER

0 CONTROLLER

of analog tape format produced by encoder-controller. These data are converted to digital time series on computer tape for editing and archiving.

WALTER K. HARRISON and

194

KENNETH M. BA~CALAR

seven or nine track digital magnetic tape may of course be employed as appropriate for data to be analysed. Since our approach is oriented towards archiving large quantities of data, 2400 foot reels of industry compatible tape have been used for rapid edit operations on a large computer (IBM-370/145). Other systems with appropriate peripherals may be used. An edit program was therefore written for the IBM 370 to pre-process digital images of data recorded on cassettes, and to decode the digitized square wave sequences of their identification codes. Significant advantages result from the directory tape produced by the edit program. 3. DIGITAL

TAPE

EDIT

PROGRAM

This phase of archiving operates on the computer tapes produced in analog to digital translation of the specially formatted recordings illustrated in Fig. 2. The four principal functions of the computer program are diagrammed in Fig. 3. They consist of: (1) Location of the onset of identification code (localizer). (2) Analysis of the identification code to yield a seven digit number and an amplitude calibration taken from the pulse amplitude (decoder). (3) Location of the onset of the data section (data onset detector). (4) Transfer of data to digital magnetic tape up to the point of data section termination (transfer and termination). The editor performs these four steps in sequence until the end of the input tape is reached. The identification numbers and data sections are written on a second digital magnetic tape (“directory tape”) as described below. The main functions of the edit program will be summarized qualitatively in the following sections. This program is written in FORTRAN. 3.1 Locali:(,r This algorithm searches for the steepest slope among a series of points which is sized to co\cr one complete bit cycle of ID code. It then calculates the difference of mean levels between points just before the slope and just after it. This difference is then compared to a measure of variability of points (or “noise”) calculated in the vicinity.

END

OF *

STOP

DATA

TRANSFER

ID

I

DETECT

DATA

c TRANSFER

DATA

UNTIL TERM I NAT I Oii

Fig. 3. Flow

chart

of computer

program

for edit of digital data.

tape

and

archiving

of biomedical

Digital

archiving

of biomedical

Leadlng Chorocfer-BCD Code for numerol”3” t

recordings

for off-line computer analysis

195

BCD Code for numeral ‘W

BCD Cade for numeroI”I” t

t

t

I.

._ _.

842184218421

n

-Denotes

t “high”

-Denotes

“low”

-

Fig. 4. Details of analog recording of identification code pattern. “BCD” denotes binary coded decimal.

If it is greater

than the desired threshold, the epoch for decoding subsequent pulse bits is defined. Absolute magnitude of code bit representations is therefore not a major factor in

recognition of the beginning of an ID sequence. This feature removes a constraint on analog recording procedure which could be troublesome in the field trial environment of the research project. Height of the code bits is archived as a calibration factor for use in subsequent calculations. 3.2 Decoder Beginning at a point before the decoding epoch, a “scanner” algorithm searches for transitions between the two principal levels of the points representing code bits, high and low (Fig. 4). When a transition from high to low or the reverse is detected, a “transition handler” algorithm is given control. This algorithm classifies the bit or bits which have been scanned since the last transition, and reports results to the “half bit handler”. The bits are assembled into digits of the identification number by this routine. Procedures take advantage of the redundant structure of the square wave code pattern instead of relying on precise shape, timing, and absolute size of each individual code bit. Deviations from allowable syntax, as well as very noisy recordings, cause rejection of a candidate sequence and a return to localizer function. The decoding algorithm has been designed to minimize the risk of reporting an incorrect identification number, In fact, there has been no such malfunction in over 10,000 executions of the algorithm. 3.3 Data recognition

algorithrn

In order to locate the beginning and end of a biomedical data sequence, a basic characteristic of the data which have been formatted by the encoder is used. During times when the pulse code and physiological data are not present, all recorded channels have the same source, and are identical except for possible constant bias. Discrimination between presence or absence of data can be accomplished by test of the absolute value of the numerical difference between the two digitized channels at the same instant of time. If this difference is above an appropriate threshold, it is assumed that there is physiological data present on one or both of the channels. Starting at a point in the time sequence after the end of the identification code (channels electrically grounded), a sample of 100 channel differences is taken. Their mean is a measure of any constant bias which might have been introduced in a tape dubbing procedure. This constant is then subtracted from channel differences tested subsequently. Every eighth difference is calculated in a forward scan begun after an appropriate period of latency following decoding of the identification. Differences are enumerated on two counters, one for values below threshold and one for values above. Counters are reset to zero after every 3 points scanned below threshold. If 15 points above threshold

196

WALTERK. HARRISJNand KENNETHM. BAKALAR

B Fig. 5. Action of data recognition algorithm in terminating transfer of biomedical data to archive tape. Panel A: A fixed timing algorithm results in archiving stop transient (Fig. 2). Panel B: Data recognition algorithm archives only biomedical data. Top pattern in each panel is CP, bottom, Beg.

are found, data onset is taken as 40 points before the last point which was tested. The transfer of data for the archive tape now begins. The search for termination of data is begun after a short nominal “data-time” interval. Scan proceeds in the same manner as previously described with each eighth point. Here, termination of data is based on 15 points below threshold. This action is justifiable because of the nearly identical stop transient patterns which are recorded (Fig. 2). Termination of data transfer by means of this algorithm is illustrated in Fig. 5. Accurate definition of data onset and termination simplifies operation of pattern recognition routines employed in analytical processing of archived biomedical recordings.

4. ARCHIVE

OR DIRECTORY

TAPE

The computer edit program produces a digital tape containing two major subdivisions. In the directory section, case identification number and data coordinates are written in the order in which they were edited. This section is dimensioned for up to 1000 entries. In the data section, elevations of the two channels of heartbeat data are written sequentially, two characters to each channel. An integral number of blocks of characters sufficient to cover actual data quantity up to an equivalent 20 second maximum are employed. Unfilled blocks within coordinates are completed with characters which are used by analytical programs to define the last data points of each case. A header record containing ID number and calibration amplitude is located before heartbeat elevations in each data case. Data may be archived at any stage of remaining capacity of each directory tape. Computer time charges per case depend on a number of variables such as initial and final content of the tape, quantity of actual data archived, incidence of archiving failure due to poor recordings, packing of analog recordings on cassettes, and on the usual variety of factors peculiar to operations at the center. We commonly use off peak periods to minimize costs. Unit charges have shown a strong dependence on number of cases archived (batching). From a high value just over $2.OO/case for a single submission, there is a declining trend in unit cost to asymptote between 30 and 40 cents. A level of cu. 45 cents/case is reached with a batch of 40 cases. At this writing, the program has run in just one computer center so that we have no comparative data on cost for centers equipped with different machines. Each time the edit program is employed to archive additional recordings on a tape, a printed report like that shown as Table 1 is produced. This report shows contents of the directory prior to addition of new data, the new ID numbers added, and the complete updated directory at the end of the operation. In addition to summarizing current status of each tape, successive reports document timing of this stage of process-

Digital

archiving

of biomedical

recordings

for off-line computer

197

analysis

Table 1. This is a sample report of the addition of biomedical data to an archive or directory computer tape. The first group of numbers are identifications of studies on the tape prior to updating. Input parameters to the edit program are printed next. The left hand column below these gives identifications of studies to be added to the tape during this updating (13 of these are already on the tape). The right hand column reports height of the leading code bit, needed for calibration. which is calculated by the edit program as described in the text. The last section of the Table lists contents of the updated directory tape and coordinates of the beginning of each section of data archived. The last study, No. 1001621, was not archived because of an error on the digital input tape. Diagnostic output from the edit program has been deleted for simplicity COP

ARCHIVE

TAPE

SAOICH

l0403521

+04c3511 l0415021

l0402211

*0410011 *0402221

PROCESS:NG

HISTORY

+0403511 +0403521 +040631 +0409321 l0415011 l0415021 l0410011 l0410021 *0401411 *0401421 *0402211 *0402221

NO. NC. NO. NO. ND.

+0408311 +04 10021

+0408321 l040 1411

+1001bll

SSSSSSSS

UAR

1974

l0415011 *0401421

11 MAR 1974

BTLNGW38

NO. NO. NO. ND. NO. NO. NO.

11

ZLEVLI

6.00

ZLVMX-

9.00

DATIHE=19.00

SATCHT-

2.00

392.050 391.350

390.600

I

393.050 364.150 389.500 365.650 382.350 395.900 402.400 366.200 394.200 388.450 394.100

*loo1611 NO. NO. +1001621 EN0

CDP

OF

FILE

ARCHIVE

l 040351

40410021 l 1001611 PI0

Pl

TAPE

1

l0402211 0415021 l 0408311

NEXT

ON

1

BLDCK

l 4040222 0410011 1 l0408321

45 90

40401411 SSSSSSSS BE

400

SAOLCH +0403521

132 170 210 WILL

AT

AT

BLIJCK

11 5:

l 0408311

99

l0410021 l 0415011

140 177 217

*0401421

18

*0408321 40401411 40403511

146 185

l0415021

40402211

27

117 72 154 193

+0415011 *0401421 40403521

40410011 40402221

UAR

1974 36

1:: 162 201

217

ing. Reports also list magnitude of the calibration factor derived from ID code bit height. Each sample of biomedical data on directory tape can be repeatedly located and selectively analysed by appropriate computer programs. This may of course be done for a single study, or any desired group of studies on the tape. Any number of future sequential analyses of the same time-series data may be made to calculate new information, the need for which can often not be fully anticipated. These options are of unquestionable value in biomedical data analyses. The concept of data-base archiving has for years enjoyed wide general acceptance. but received limited use for unprocessed biomedical recordings. Too often, investigators have utilized only a tiny fraction of data from elaborate, costly experiments. They have been content to store the balance indefinitely on rolls of paper chart, or on analog magnetic tapes which seldom receive further attention. While this fate is due to many different factors, two important ones can usually be recognized in most cases. First, there are the intrinsic limitations imposed by laborious effort for manual edit and measurement of analog storage media. In addition many potentially useful analytical options, easily mobilized on the computer, are completely out of reach when its services are ruled out a priori. These problems are overcome by the benefits of computerized archiving of biomedical data as exemplified by methods described in this article. We believe their advantages will be increasingly enjoyed in future years.

198

WALTER K. HARRIXIN and KENNETH M. BAKALAR

5. SUMMARY Procedures described were developed for use in a multi-center collaborative research project. Samples of patients’ heartbeats are recorded on analog magnetic tape cassettes. These recordings are given a standardized format by means of a specially developed encoder-controller unit. Cassettes are then mailed to the authors’ laboratory for digital conversion and archiving by an edit program operating in an IBM 370/145 computer system. The edit program locates and decodes analog representations of seven digit identifiers of each sample of biomedical data. A calibration factor based on the analog identifiers (“bit height”) is also derived for use in subsequent analytical steps by other computer programs. The edit program then recognizes onset and termination of the biomedical signals which may range in duration from 3-20 seconds. Identification numbers, calibration factors, and biomedical signals are then written on the archive tape. The archive tape is dimensioned to contain up to 1000 recordings. New data may be added in successive steps or all at once within this size limitation. All data on the tape may be addressed by ID numbers for retrieval at any stage of capacity. The archive tape is thus a valuable storage medium for biomedical data which is likely to require repetitive off-line computer analysis as new information is discovered in the research process. REFERENCES I. A. M. Scher, W. W. Ohm. T. H. Kehl and A. C. Young, Computer data collection and editing for hcmodynamics studies. Ann. hio~~tf. E~IH~I~1, 99 (1973). 2. R. C. Wang and W. K. Harrison. Encoder

Digital archiving of biomedical recordings for off-line computer analysis.

:‘owpur. tJml Mrd Pergamon Press 1976. Vol. 6. pp. 191-198. DIGITAL ARCHIVING FOR OFF-LINE Prmted I” Great Bntaln. OF BIOMEDICAL RECORDINGS COMPU...
735KB Sizes 0 Downloads 0 Views