Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved

617

Computer

Page

1 Computerized Literature Reference System: Use of an Optical Scanner and Optical Character Recognition Software Steven

V. Lossef1

and

Lawrence

H. Schwartz2

A computerized reference system for radiology journal articles was developed by using an IBM-compatible personal computer with a hand-held optical scanner and optical character recognition software. This allows direct entry of scanned text from printed material into word processing or data-base files. Additionally, line diagrams and photographs of radiographs can be incorporated into

these

files.

A text

search

and

retrieval

software

control

dial is present

recognition

square sheet the document side of

ensures

driven

and Methods

used.

An interface

slot on the motherboard of the

hand

drawn

across

and

has

card

is inserted

of the computer. three

small

rollers

into

an 8-bit

The scanner that

the text or image at a speed

allow

expansion

fits in the palm

it to be smoothly

of 1 -2 cm/sec.

A contrast

with

scanning

scanning

and

can

easily

be trained

1990 0361-803X/90/i

553-061

7 © American

1). With

contrast

a minimal settings,

amount accurate

These

may

to recognize

all of the

characters

of

which is included with the GeniScan GSscanning of line drawings and gray-scale

be directly

incorporated

as

graphics

TX). Storage of a scanned image uses approximately of memory.

Simple

text

files,

into

word

of course,

use

much

6-32 kilobytes less

memory

(1-7

kilobytes).

WordPerfect 5.0 (WordPerfect Corp., Orem, UT) was used as a word processing software program. It is compatible with the CAT optical

character

hance

programs.

of

scanned

recognition

Macro

images

and

system,

programs text

ScanEdit,

generated

by

CAT Image En-

and

can be written the

to simplify optical

retrieval character

recognition program. An extremely useful macro program was dovised that automatically removes unwanted hyphenations and spaces; it has been included in the appendix of this article.

Received March 19, 1 990; accepted after revision May 3, 1990. 1 Department of Radiology, Georgetown University Hospital, 3800 Reservoir Rd., NW., Washington, DC 20007-21 Department of Radiology, The New York Hospital/Cornell University Medical Center, 525 E. 68th St., New York, September

(Fig.

and

processor files. Scanned line drawings are of excellent quality. The quality of gray-scale images may be improved by using the CAT Image Enhancer program (Computer Aided Technology, Inc., Dallas,

2

AJR 155:617-619,

path

rates

a given typeface or font. The ScanEdit II program, 4500 hand scanner, permits

Hardware consisted of an IBM-compatible 803865X personal cornputer with 1 megabyte of random access memory (RAM), a 1.2megabyte floppy disk drive, 40 megabytes of hard disk drive, a serial mouse, a Video Graphics Array (VGA) card, and an 800 x 600 VGA monochrome monitor. A GeniScan GS-4500 hand-held optical scanner (KYE International Corp., Chino, CA) with resolution of 100-400 dots per inch (dpi), a 32-gray-shade scale, and a 4.13-in. (105 mm) maximum scanning was

a straight

practice

images.

width

achieved

scanned images can be obtained (Fig. 2). Scanned text is entered into an optical character recognition software program, in this case, the CAT Reader Software Program (Computer Aided Technology, Inc., Dallas, TX). The program is menu-

program

enables rapid searching for keywords in scanned documents. The hand scanner and software programs are commercially available, relatively inexpensive, and easily used. This permits construction of a personalized radiology literature file of readily accessible text and images requiring minimal typing or keystroke entry.

Materials

on the scanner device. Excellent character with the 300-dpi setting. A 1 2-in. (30.5 cm) of V8-in. (3.2 mm) transparent plastic is useful to flatten to be scanned, and a straight edge fastened along one

was

Roentgen

Ray Society

97. Address NY 10021.

reprint

requests

to S. V. Lossef.

LOSSEF

Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved

618

AND

SCHWARTZ

AJR:155,

Curved

Guid.

Slain. Radiloqy

E.

Wir#{149} for

P.rcutan.ous

Xo&$k. MD .los.f 1955; 167:864-865

Roach,

Pulsonary

September

1990

Angioqraphy

MD

A curv.d, tap.r.d. solid-cor., aovabl. J quid. wire was develop’l for us. in psrcutan.ou. tranef.aorel puleonery angiography. The guide wire wee used in 30 patients and, coapared with other techniques, greatly reduced the tiee required to pace the catheter through the right side of the heart. No coaplications occurred. and only occasional preaatur. ventricular contractions were detected.

2b.

Fig. 2.-Example

of a typical

converted

page of text with a scanned

image.

Fig. 1.-Hand-held lished

scanner

being drawn across

an abstract

of a pub-

article.

GOfer 2.0 (Microlytics,

Inc., Pittsford,

NY) is a text-retrieval

soft-

ware program used to rapidly search for and highlight selected keywords in stored data files on hard disk. Boolean AND, OR, and NOT functions facilitate rapid and precise literature searches. Currently many manufacturers produce similar scanners and software programs that are compatible with various computer models and configurations. The list prices of the GeniScan GS-4500 hand scanner, CAT Reader OCR program, CAT Image Enhancer program, and GOfer are $279, $295, $99, and $79, respectively, but retail discounts

scanners

of

25-35%

are

commonly

offered.

Flatbed

or

desktop

also are available.

Discussion Many radiologists keep files of pertinent medical journal articles for reference. Information in large collections of clipped articles often is difficult to access and may be buried in multiple folders beaning broad categories of articles. Keywords submitted by authors form the basis of most literature searches, but these are not necessarily specific enough to allow rapid retrieval of a particular fact or phrase in an article. A computerized literature storage and retrieval system that uses a relatively inexpensive hand scanner and readily available software programs was therefore developed by using a personal

computer.

Several steps must be completed in order to create this literature system. First, the article must be optically scanned. Entry of articles by optical scanning alone would save a page of text as a graphics image, pixel by pixel. However, this would require inordinate amounts of memory, would not permit editing of text, and would not facilitate searching for keywords. By converting scanned images of printed text to word processor files, data storage requirements are significantly diminished, and text editing and search is made possible. Optical character recognition performs this conversion. Optical character recognition software is used because it enables rapid entry of articles with a minimum amount of typing. Each scanned letter, number, and character may be thought of as a map of tiny black and white pixels. The optical character recognition program can recognize each characteristic grouping of pixels and assign an appropriate ASCII (American Standard Code for Information Interchange) code, which can then be directly entered into a word processing program for editing and storage. Images also can be scanned and directly incorporated into computerized files. Finally a text search and retrieval program facilitates rapid and complete searches for phrases in documents stored on the hard disk. The second feature of this system is the ease with which line drawings and photographic images can be incorporated into files. The quality of scanned line drawings has been excellent. The quality of scanned photographs resembles that of a photocopy of the original, but information content usually is preserved. A convenient system of annotating text files is assignment of a unique file code to each scanned journal article consisting of the initials of the journal, the volume number followed by a hyphen, and the number of the first page of the article. For

AJR:i55,

COMPUTERIZED

September1990

LITERATURE

Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved

example,

the article Radiology 1 988;i 67:864-865 would be assigned the WordPerfect file code R167-864 so that it could be recognized and retrieved. Although the system we describe uses a word processor program for editing and storage of files, free-form data-base programs, such as AskSam 4.2 (AskSam Systems, Perry, FL) could store and retrieve text and graphics. The text search program GOfen has proved to be a thorough and indispensable tool for retrieving articles containing keywords or phrases from word processor files. The personalized and and the This

nature

of this system

allows

scanning

storage of relevant portions of the results, discussion, materials and methods sections of articles, in addition to abstracts, as they are perused, for future ready retrieval. makes it different from on-line bibliographic files such as

MEDLINE,

in which

only abstracts

are available

for on-line

viewing [1]. The MEDLINE system is complementary, however, because our system is limited to only those articles personally stored, whereas MEDLINE is a complete index containing over 4 million references. Computerized reference management systems have been described before [2-4], but these have been limited to manual typographic entry of specific items such as title, author, volume, year, page, and keyword(s) into data-base programs, such as dBase. Such files are limited in both content and accessible keywords, and images cannot be directly entered. In conclusion, we present a relatively inexpensive system for entry and retrieval of printed articles requiring minimal typing. This has potential use for teaching and academic research.

REFERENCE

619

SYSTEM

Appendix A useful macro program written for WordPerfect 5.0 was written to retrieve a text file (t.txt) automatically from the optical character recognition program into WordPerfect 5.0, search for and remove unnecessary hyphens and spaces, and activate the spelling correction

program.

programs.

We include

It can







be

modified

it because







0>

Computerized literature reference system: use of an optical scanner and optical character recognition software.

A computerized reference system for radiology journal articles was developed by using an IBM-compatible personal computer with a hand-held optical sca...
427KB Sizes 0 Downloads 0 Views