Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved
617
Computer
Page
1 Computerized Literature Reference System: Use of an Optical Scanner and Optical Character Recognition Software Steven
V. Lossef1
and
Lawrence
H. Schwartz2
A computerized reference system for radiology journal articles was developed by using an IBM-compatible personal computer with a hand-held optical scanner and optical character recognition software. This allows direct entry of scanned text from printed material into word processing or data-base files. Additionally, line diagrams and photographs of radiographs can be incorporated into
these
files.
A text
search
and
retrieval
software
control
dial is present
recognition
square sheet the document side of
ensures
driven
and Methods
used.
An interface
slot on the motherboard of the
hand
drawn
across
and
has
card
is inserted
of the computer. three
small
rollers
into
an 8-bit
The scanner that
the text or image at a speed
allow
expansion
fits in the palm
it to be smoothly
of 1 -2 cm/sec.
A contrast
with
scanning
scanning
and
can
easily
be trained
1990 0361-803X/90/i
553-061
7 © American
1). With
contrast
a minimal settings,
amount accurate
These
may
to recognize
all of the
characters
of
which is included with the GeniScan GSscanning of line drawings and gray-scale
be directly
incorporated
as
graphics
TX). Storage of a scanned image uses approximately of memory.
Simple
text
files,
into
word
of course,
use
much
6-32 kilobytes less
memory
(1-7
kilobytes).
WordPerfect 5.0 (WordPerfect Corp., Orem, UT) was used as a word processing software program. It is compatible with the CAT optical
character
hance
programs.
of
scanned
recognition
Macro
images
and
system,
programs text
ScanEdit,
generated
by
CAT Image En-
and
can be written the
to simplify optical
retrieval character
recognition program. An extremely useful macro program was dovised that automatically removes unwanted hyphenations and spaces; it has been included in the appendix of this article.
Received March 19, 1 990; accepted after revision May 3, 1990. 1 Department of Radiology, Georgetown University Hospital, 3800 Reservoir Rd., NW., Washington, DC 20007-21 Department of Radiology, The New York Hospital/Cornell University Medical Center, 525 E. 68th St., New York, September
(Fig.
and
processor files. Scanned line drawings are of excellent quality. The quality of gray-scale images may be improved by using the CAT Image Enhancer program (Computer Aided Technology, Inc., Dallas,
2
AJR 155:617-619,
path
rates
a given typeface or font. The ScanEdit II program, 4500 hand scanner, permits
Hardware consisted of an IBM-compatible 803865X personal cornputer with 1 megabyte of random access memory (RAM), a 1.2megabyte floppy disk drive, 40 megabytes of hard disk drive, a serial mouse, a Video Graphics Array (VGA) card, and an 800 x 600 VGA monochrome monitor. A GeniScan GS-4500 hand-held optical scanner (KYE International Corp., Chino, CA) with resolution of 100-400 dots per inch (dpi), a 32-gray-shade scale, and a 4.13-in. (105 mm) maximum scanning was
a straight
practice
images.
width
achieved
scanned images can be obtained (Fig. 2). Scanned text is entered into an optical character recognition software program, in this case, the CAT Reader Software Program (Computer Aided Technology, Inc., Dallas, TX). The program is menu-
program
enables rapid searching for keywords in scanned documents. The hand scanner and software programs are commercially available, relatively inexpensive, and easily used. This permits construction of a personalized radiology literature file of readily accessible text and images requiring minimal typing or keystroke entry.
Materials
on the scanner device. Excellent character with the 300-dpi setting. A 1 2-in. (30.5 cm) of V8-in. (3.2 mm) transparent plastic is useful to flatten to be scanned, and a straight edge fastened along one
was
Roentgen
Ray Society
97. Address NY 10021.
reprint
requests
to S. V. Lossef.
LOSSEF
Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved
618
AND
SCHWARTZ
AJR:155,
Curved
Guid.
Slain. Radiloqy
E.
Wir#{149} for
P.rcutan.ous
Xo&$k. MD .los.f 1955; 167:864-865
Roach,
Pulsonary
September
1990
Angioqraphy
MD
A curv.d, tap.r.d. solid-cor., aovabl. J quid. wire was develop’l for us. in psrcutan.ou. tranef.aorel puleonery angiography. The guide wire wee used in 30 patients and, coapared with other techniques, greatly reduced the tiee required to pace the catheter through the right side of the heart. No coaplications occurred. and only occasional preaatur. ventricular contractions were detected.
2b.
Fig. 2.-Example
of a typical
converted
page of text with a scanned
image.
Fig. 1.-Hand-held lished
scanner
being drawn across
an abstract
of a pub-
article.
GOfer 2.0 (Microlytics,
Inc., Pittsford,
NY) is a text-retrieval
soft-
ware program used to rapidly search for and highlight selected keywords in stored data files on hard disk. Boolean AND, OR, and NOT functions facilitate rapid and precise literature searches. Currently many manufacturers produce similar scanners and software programs that are compatible with various computer models and configurations. The list prices of the GeniScan GS-4500 hand scanner, CAT Reader OCR program, CAT Image Enhancer program, and GOfer are $279, $295, $99, and $79, respectively, but retail discounts
scanners
of
25-35%
are
commonly
offered.
Flatbed
or
desktop
also are available.
Discussion Many radiologists keep files of pertinent medical journal articles for reference. Information in large collections of clipped articles often is difficult to access and may be buried in multiple folders beaning broad categories of articles. Keywords submitted by authors form the basis of most literature searches, but these are not necessarily specific enough to allow rapid retrieval of a particular fact or phrase in an article. A computerized literature storage and retrieval system that uses a relatively inexpensive hand scanner and readily available software programs was therefore developed by using a personal
computer.
Several steps must be completed in order to create this literature system. First, the article must be optically scanned. Entry of articles by optical scanning alone would save a page of text as a graphics image, pixel by pixel. However, this would require inordinate amounts of memory, would not permit editing of text, and would not facilitate searching for keywords. By converting scanned images of printed text to word processor files, data storage requirements are significantly diminished, and text editing and search is made possible. Optical character recognition performs this conversion. Optical character recognition software is used because it enables rapid entry of articles with a minimum amount of typing. Each scanned letter, number, and character may be thought of as a map of tiny black and white pixels. The optical character recognition program can recognize each characteristic grouping of pixels and assign an appropriate ASCII (American Standard Code for Information Interchange) code, which can then be directly entered into a word processing program for editing and storage. Images also can be scanned and directly incorporated into computerized files. Finally a text search and retrieval program facilitates rapid and complete searches for phrases in documents stored on the hard disk. The second feature of this system is the ease with which line drawings and photographic images can be incorporated into files. The quality of scanned line drawings has been excellent. The quality of scanned photographs resembles that of a photocopy of the original, but information content usually is preserved. A convenient system of annotating text files is assignment of a unique file code to each scanned journal article consisting of the initials of the journal, the volume number followed by a hyphen, and the number of the first page of the article. For
AJR:i55,
COMPUTERIZED
September1990
LITERATURE
Downloaded from www.ajronline.org by 211.37.14.44 on 10/04/15 from IP address 211.37.14.44. Copyright ARRS. For personal use only; all rights reserved
example,
the article Radiology 1 988;i 67:864-865 would be assigned the WordPerfect file code R167-864 so that it could be recognized and retrieved. Although the system we describe uses a word processor program for editing and storage of files, free-form data-base programs, such as AskSam 4.2 (AskSam Systems, Perry, FL) could store and retrieve text and graphics. The text search program GOfen has proved to be a thorough and indispensable tool for retrieving articles containing keywords or phrases from word processor files. The personalized and and the This
nature
of this system
allows
scanning
storage of relevant portions of the results, discussion, materials and methods sections of articles, in addition to abstracts, as they are perused, for future ready retrieval. makes it different from on-line bibliographic files such as
MEDLINE,
in which
only abstracts
are available
for on-line
viewing [1]. The MEDLINE system is complementary, however, because our system is limited to only those articles personally stored, whereas MEDLINE is a complete index containing over 4 million references. Computerized reference management systems have been described before [2-4], but these have been limited to manual typographic entry of specific items such as title, author, volume, year, page, and keyword(s) into data-base programs, such as dBase. Such files are limited in both content and accessible keywords, and images cannot be directly entered. In conclusion, we present a relatively inexpensive system for entry and retrieval of printed articles requiring minimal typing. This has potential use for teaching and academic research.
REFERENCE
619
SYSTEM
Appendix A useful macro program written for WordPerfect 5.0 was written to retrieve a text file (t.txt) automatically from the optical character recognition program into WordPerfect 5.0, search for and remove unnecessary hyphens and spaces, and activate the spelling correction
program.
programs.
We include
It can
be
modified
it because
0>