Computer Programs in Biomedicine 7 (1977) 233-246 © Elsevier/North-Holland Biomedical Press

A SOFTWARE

SYSTEM TO RECORD AND ANALYZE

DIGITIZED CELL IMAGES

E. BENGTSSON Institute of Physics, Uppsala University O. ERIKSSON and J. HOLMQUIST Dept. of Computer Science, Uppsala University B. STENKVIST Dept. of Clinical Cytology, University Hospital, Uppsala, Sweden

This paper describes basic software for digitization and processing of microscopic cell images used at the Department of Clinical Cytology at Uppsala University Hospital. A family of programs running on a PDP-8 minicomputer which is connected to a Leitz Orthoplan microscope with two image scanners, one diode-array scanner and a moving-stage photometer, is used for data collection. The digitized image data is converted by conversion programs to IBM compatible format. The data structures for image processing and statistical evaluation on the IBM system are also described. Finally, some experiences from the use of the software in cytology automation are discussed. Cytology automation

Data collection

Image processing

1. Introduction

Pattern recognition

Scanning photometry

of displaying grey-level images, connected on line to the microscope. Interactive software can then be developed and used, for instance to explore the discriminatory power o f various cell features or for testing new automatic image processing algorithms. Since the actual microscopic object can be available all the time in the microscope it is easy to check on the various steps in the analysis procedure and to relate the resuit to the real cell. The analysis of digital images is, however, a computationaUy very heavy task. In order to obtain reliable statistics in the evaluation of the features a fairly large number (hundreds) of cells have to be analysed. The digital images should also be saved as reference material until the experiment is finished so that new feature extraction algorithms can be applied without all the cells having to be rescanned. This requires a considerable amount of mass storage facilities since each cell occupies several thousand words. Unfortunately, computers with great computational power and large mass storage facilities tend to be expensive. One way of solving this problem is to use a small minicomputer for data acquisition and for the

During the past two decades a growing interest has been shown for research aiming at quantitative and automatic methods in cytology. One reason is the possibility to achieve a more accurate and reproducible diagnosis based on single cells as well as based on cell populations [1 ]. Another reason is the ambition to automate mass screening procedures in order to decrease the human workload [2]. One of the approaches taken in this research is the use of computers to analyze high resolution digitized microscopic images of cells. Various numerical features, or parameters, can be obtained from ceils in this way, such as size and integrated density of nucleus, shape of both nucleus and cytoplasm, chromatin coarseness of nucleus (inhomogeneously distributed DNA) etc. Single cells as well as cell populations have been characterized with high accuracy and reproducibility, using these parameters or multivariate probability distributions of the parameters [ 3 - 5 ] . The ideal situation when working with the analysis of digitized cell images is in general to have a powerful computer, equipped with a graphics terminal capable 233

234

E. Bengtsson et aL, A software system to record and analyze digitized cell images

interactive analysis of a limited number of cells while a large computer at a computer centre is used for the batch processing of a larger number of cells in order to obtain reliable statistics, This is the approach that has been taken at the Department of Clinical Cytology of tire University Hospital in Uppsala. Using a PDP-8/F minicomputer connected to a Leitz microscope equipped with two image scanners we have developed two different program systems. The first version of our interactive cell analysis system has been described in an earlier paper [6]. A simpler image recording system has also been described earlier [7]. This paper describes a family of programs for recording digitized cell images, and the data organization used for subsequent analysis in batch mode.

2. System design When digitized images are computer processed in batch mode it is difficult to provide means for human intervention in the process to control the result and to supply additional information. On the other hand, when the images are being recorded on the minicomputer they have to be shown to the operator on a graphical display so that he can check that they include what he intended and have an acceptable quality. This therefore is a suitable time for interaction with the image data. Information such as threshold levels, coordinates for interesting objects, borderlines between objects etc. can easily be put in, checked and saved with the image data. The amount and type of data that need to be supplied varies with each particular investigation. In some cases for instance the cells are large and well separated so that only one is recorded on each image, thus making the input of subimage coordinates unnecessary, while in other cases the division of a large image, containing several clustered cells, into subimages is essential. Another factor, that may need to be varied is the wavelength of the monochromatic light used for the scanning. As we have shown in another paper [8] it is sometimes suitable to record the cell at two different wavelengths to obtain optimal results. All these different requirements make the flexibility of an image and data recording system an important factor. This can of course be solved by reprogramming

the recording system each time. However, changes in the recording program usually also necessitate changes in the subsequent conversion and analysis programs. If the recording program is made flexible through the inclusion of many options in the program flow it tends to bug the user with a lot of questions to which the answer always is the same. In addition this approach easily leads to very large programs. We have solved this dilemma by having not a single recording program but a whole family of recording programs the individual member of which can be generated when needed. This generation is done by a special interactive program generator which is run once before each image acquisition series. In this way the complexity in selecting hardware and software options is moved away from the actual recording program. The generated program will thus be tailored for one specific task. This procedure is similar to the system generation which is used in most computer operating systems to tailor a system for a specific installation and application. Another design objective was to make the recording program as simple to use and as 'fool-proof as possible. Since a data acquisition system of this kind should be used by, e.g., cytotechnicians, rather than programmers, this is a very desirable feature. The approach taken to achieve this was to make a program that 'knows' what data it needs and what actions need be taken. The operator is all the time told what to do next and asked questions about various parameters. When the images have been recorded on DEC-tapes, or the image files copied onto DEC-tapes from some other file structured device on the PDP-8, they have to be transferred to files that can be used on a large scale computer, in our case an IBM 370/155. This is accomplished in two steps. First the DEC-tapes are dumped, one tape per file, onto 9-track standard tapes, using a DEC-10 computer, then the resulting f'lles are unpacked to a more convenient format by a program running on the IBM computer. This conversion program of course has to be capable of handling data from all different versions of the recording program that can be generated. The program structure for image analysis on the IBM system is highly dependent on how the data are organized, both the digitized images as well as the data extracted from the images. One requirement on this organization is that it should be easy to select a lim-

E. Bengtsson et al., A software system to record and analyze digitized cell images

Red number of images for development of algorithms and later apply these on a larger material in order to get statistically significant results. Furthermore, since computerized image processing is a highly experimental discipline, the data should be organized in such a way that deleting, adding and replacing extracted data is easy. In addition it is desirable that the data organization supports statistical evaluation, which implies that definition of various groups of data, and access to the data of these groups should be easy. To fulfill these requirements a direct access file structure has been used for both the digitized images and the extracted data, where the images and the data are accessed through a unique identity (IMAGE ID) consisting of a character string.

3. Hardware The family of recording programs is based on a PDP8F minicomputer equipped with DEC-tapes and two image scanners. It uses a full 32 k words of core memory even though this could be reduced to 16 k or 24 k at the price of more program and data swapping. A Tektronix 4010 graphical display terminal is used for the operator interaction. The DEC-tapes which are needed for the data transfer to the large scale computer can also be used as the system unit for OS/8. However, a somewhat faster execution is obtained if an RKO5 disk or a flappy disk is used. We have been running the program from a RKO5 disk. All programming for the PDP-8 computer was done in FORTRAN II and the SABR assembly language, under the OS-8 operating system. The first scanner that can be used is a Leitz MPV II microphotometer with a mechanical moving stage. Two stepping motors are used to move the microscope slide under the microscope in two orthogonal directions in steps of 0.5 micron. At each position the light transmission of a central spot of a size of approx, 0.7 micron is measured with a photomultiplier. Another set of stepping motors with a stepsize of 10 micron makes it possible to move the slide over greater distances to position the cells. The scanner is interfaced to the PDP-8 via the DR8-EA digital and analog I/O units. This is a standard system commercially available from the manufacturers, The second scanner in the system is a diode-array,

235

image plane scanner. It has been developed at the Physics IV department of the Royal Institute of Technology in Stockholm and given the name OSIRIS (Our Second Image Reading Instrument System). In this system the microscopic image is projected onto a linear array of 256 photo-diodes through a set of prisms. One of the prisms is mounted in an electromechanical swing so that the image can be moved across the array. Thus the image can be scanned electronically in one direction via the array and mechanically in the other direction via the prisms. Since the scanning is in the image plane it is easy to vary the magnification and to obtain high positional accuracy. A more detailed description of this system can be found elsewhere [9-11]. The DEC-tape dumping is done on the DEC-10 computer at the Stockholm Universities Data Centre (QZ) which is equipped both with DEC-tape drives and standard 9-track tape units. Any computer which can read DEC-tapes and write standard tapes could be used for this. However, if another computer than a DEC-10 was used it would be necessary to make sure that the output format is the same as that produced by the DEC-10, otherwise the conversion program would have to be changed. The conversion program and the analysis programs run on the IBM 370/155 computer at Uppsala University Computing Centre (UDAC) under the operating systems OS-360. The programming language used is FORTRAN IV and OS-360 Assembly language.

4. The family of recording programs 4.1. Program generator

In order to obtain good flexibility without having a bulky and complicated program we use a program generator to generate a tailored version of the recording program for each investigation as described under System design. This generator is a set of programs executed in essentially three steps. The system program OS/8 BATCH [12] is the basis of the generation process. BATCH is a batch monitor that uses an ASCII-file instead of the operator to control the computer. This provides the possibility of creating such files by a FORTRAN program, later inserting them into the BATCH command stream. Thus BATCH can create its own command files and via the

236

E. Bengtsson et al., A software system to record and analyze digitized cell images

SUBMIT command execute them. Each phase in the generation consists of an execution of such a batch file. In phase one, which is the interactive part of the generation process the various options are selected through the question and answer method. Using this information the variables in the COMMON area of the recording program are initiated and a BATCH file for phase two is created. The initiated COMMON area is written onto disk, formatted as a PAL-8 assembly file with only constants. This file is assembled before phase one ends by submitting the newly created BATCH file. This fairly complicated procedure is necessary because FORTRAN II has no means for initiating COMMON or data areas before the execution of the program starts, Phase two loads two or three modules of the recording program according to the specifications from phase one. It also searches the loader map for the entry-point of the first module and a BATCH file for the third phase is created. In the third and last phase the program from phase two and the initiated COMMON area from phase one, are combined using the absolute loader. For this the entry-point which was obtained in phase two has to be supplied. Finally some scratch files needed by the recording program are created and all temporary files are deleted. At this point the requested program is ready for use. The entire generation procedure requires approx. 3 min when run with a disk as the system device (and 45 min with DEC-tapes). The total number of subroutines in the system is 85. 4.2. Recording program structure

Due to the limited amount of core memory on the PDP-8 the recording program is built using overlay technique. Depending on which version of the program has been generated it consists of two or three overlay modules. These modules chain to each other in a circular chain. Figure 1 shows a block diagram of this. The first module is used to build a header block for the picture and to set up the scanner. The header block is a descriptive block associated to the picture, containing general administrative information such as date, specimen identifications, type of cell(s), photo number, IMAGE ID etc. This information is entered by the operator as answers to questions from the computer,

Generally the questions are very short since the operatot soon remembers what they mean anyway, however, if he is unable to understand a question he can press the question-mark (?) and (in most cases) get a more complete restating of the question. In addition to making the program easy to use this technique makes it very unlikely that any necessary steps in the recording procedure will be forgotten and thus makes the recorded data more reliable. However, sometimes mistakes are made and discovered later while working with the same cell. To deal with such situations a number of 'panic-buttons' are incorporated through the use of control characters. These transfer control back to the beginning of the present section or start all over again with the image, disregarding any previous input. The remainder of the first module is scanner-dependent. In the Leitz scanner version of the first module the joystick is enabled to move the scanning table in order to position the cell(s) properly in the field of view. The next step is to determine the extent of the area to be scanned. The computer tells the operator about the default image size, which might be accepted or changed. The scanning table is moved so that the area to be scanned is outlined. If the area is too big or too small this procedure can be repeated until an accepted table setting is obtained. Next the sensitivity of the scanner is adjusted, and a photo is taken before the actual scanning starts. A detailed description of the implementation is available in [13]. In the OSIRIS scanner version of the first module the joystick is also used to position the cell to be recorded. The size of the scanned area is fixed for this scanner as opposed to the Leitz scanner. Instead parts of no interest in the image are deleted using the graphical terminal. The sensitivity setting of this scanner is done automatically. The system itself finds a suitable integration time for the diode array. However two or three times each day the OSIRIS scanner needs to be recalibrated, which includes generation of tables for shading correction and tables for individual correction of each diode in the array. This requires scanning of two clean image fields, which takes about 6 min. Details about the implementation and an evaluation of the OSIRIS scanner can be found in [ 14]. The second module performs the actual scanning, and one version exists for each of the two scanners. In the Leitz scanner version the specimen is moved in a

237

E. Bengtsson et al., A software system to record and analyze digitized cell images

[

1:st module

I I

I

start

l [

I I

I

i

i,n~e~ hoad°~l

I I i

IinfOrmatiOn

I

Iset u~ the

I

Is°~ne~

I

I

t__

-~

1"

2:nd

7

module

I

'

I ii i I II

I image

wavelength

no

module

I I

A software system to record and analyze digitized cell images.

Computer Programs in Biomedicine 7 (1977) 233-246 © Elsevier/North-Holland Biomedical Press A SOFTWARE SYSTEM TO RECORD AND ANALYZE DIGITIZED CELL...
2MB Sizes 0 Downloads 0 Views