Journal of Biopharmaceutical Statistics

ISSN: 1054-3406 (Print) 1520-5711 (Online) Journal homepage: http://www.tandfonline.com/loi/lbps20

Qcis: a sas-based relational database/query system for the current index to statistics Charles W. Kish Jr. & Kathryn S. Dawson To cite this article: Charles W. Kish Jr. & Kathryn S. Dawson (1992) Qcis: a sas-based relational database/query system for the current index to statistics, Journal of Biopharmaceutical Statistics, 2:2, 257-263, DOI: 10.1080/10543409208835043 To link to this article: http://dx.doi.org/10.1080/10543409208835043

Published online: 29 Mar 2007.

Submit your article to this journal

Article views: 7

View related articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lbps20 Download by: [NUS National University of Singapore]

Date: 06 November 2015, At: 16:00

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

Journal of Biopharmaceutical Statistics, 2(2), 257-263 (1992)

QCIS: A SAS-BASED RELATIONAL DATABASE/ QUERY SYSTEM FOR THE CURRENT INDEX TO STA TISTICS Charles W . Kish, Jr., and Kathryn S. Dawson Department of Biostatistics Medical College of Virginia Virginia Commonwealth University Richmond, Virginia 23298-0032

Keywords. CIS computer database; Search software; Portability; Menu-driven system; SAS base language

Abstract A portable, menu-driven software system for the Current Index to Statistics (CIS) culmulative database is described. The SAS-based system converts the CIS datafiles into a relational SAS database and provides windows for users to easily define their queries. Query execution may occur in batch or interactive mode.

Introduction Statisticians working in biopharmaceutical areas often need to search the literature to identify and apply methodology to a wide range of problems. The Current Index to Statistics (CIS) is commonly used as a resource. The capability of doing author, subject, and journal searches of the CIS is enhanced by the release of the computerized culmulative database. The 1991 edition has 98,634 entries covering 1978-1990. The records are contained in 26 IBM-

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

258

Kish and Dawsol

PC DOS flat files on 13 high-density microdiskettes. Each record is made ul of six fields [location, title, author(s), keywords and phrases, searchable au thor information, and additional searchable author information]. With the ex ception of the location field, all fields are of variable length. Two additiona files for journal names are also included. No software is provided to managt or query the datafiles. Initial efforts have been made by individual institutions to develop soft, ware for the CIS database. Trumbo (1) described efforts that ranged frorr utilizing word processors on PCs to writing specialized programs in the UNI3 environment on mainframe or minicomputer networks. The former approact is widely available to many CIS users, but is user intensive and relativelj slow. The latter programs are probably fast but appear to require access tc UNIX system utilities and knowledge of specialized commands. This article describes QCIS (Query Current Index to Statistics), a relational database/query software system written in the base language of the Statistical Analysis System (SAS) (2). QCIS converts 27 files into a relational database and enables users to conduct author, title, keyword (subject), and journal queries in a menu-driven environment.

QCIS Design Goals and Implementation via SAS QCIS search software is designed to have: Portability across a wide variety of machines and operating systems. Portability is needed to make search software widely available to CIS users, especially those who do not have the time and/or resources to develop their own search software. Thoroughness and accuracy of search algorithms. The algorithms should be totally thorough and accurate in searching for strings that define a query and should be able to handle unavoidable inconsistencies in the database (e.g., run-on words). Ease-of-use via a menu-driven user interface. All users, regardless of their computer expertise, should be able to easily query the CIS database without the need to refer to a manual. Flexible query capacity. QCIS provides for author, keyword (subject), and journal queries, as well as queries on titles. The flexibility of these queries is enhanced by allowing:

1. compound search strings joined by logical operators AND, OR, 2. queries to be subsetted by time intervals. SAS is already used by many statisticians in a wide variety of computing environments. The application development capacities of SAS allow for search

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

QCIS

259

algorithms to be specifically designed to reflect the needs of the users and the current condition of the datafiles. This custom design ensures the thoroughness, accuracy, and flexibility of the search algorithms. These capacities also allow development of a menu-driven system through various windowing facilities. The menu system is specifically designed for searching the CIS data bas^. The user is logically directed through a network of screens. Help windows and examples are shown on request. SAS's data management capabilities provide efficient handling of large, multiple datafiles. Disk space is conserved by compressing each SAS dataset. The primary disadvantage of using SAS for this application is the limitation of character strings to be no greater than 200 characters. This characteristic of SAS does affect the structural design of the system and the ultimate query speed. The variable-length fields in the CIS datafiles (e.g., title field) are partitioned into a variable number of subfields of fixed length. The subfields are grouped into multiple SAS files using a relational database approach. The relational design has a simple logical structure and facilitates efficient handling of the SAS files.

QCIS Description The QCIS system is made up of two sets of SAS programs: a set that builds the database from the CIS files and a set of programs that query the database. Queries may be executed in either batch or interactive environments. Once the CIS files have been read onto the user's computer system, the build program restructures the information in the CIS files into permanent SAS datasets, which are logically viewed as relational tables. A dataset is also made of records that could not be converted; i.e., their format did not match the defined six-field record format (two invalid records were found in the 1991 edition). Once the relational database is built, the user invokes the system by entering "QCIS." The functional flow of the batch version of the query system is shown in Figure 1. Users may rename output files and/or submit queries for execution only after the queries are validated as correct. After a query is submitted, control is passed back to the main menu for further query definition and submission or exiting the system. At the first menu (Fig. 2), the user can opt to define the query or to obtain general information about the system. If the selection is "d," the query definition menu is displayed (Fig. 3). The example shows a keyword query definition for all entries in the years 1978-1990 that contain the strings "residual" and "outlier" in either the keyword or title fields. This example is a compound query defined by

Kish and Dawson

i-' Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

MAIN MENU

QUERY

LISTING

SUBMIT

Figure 1. QCIS batch system functional overview.

two search strings and a logical operator. A simple query can be constructed by entering only the first string and leaving the query operator and the second string blank. All queries require a query command (K, A, T, or J) and a time interval. The default time interval is the entire time range of the database. Once the user completes the query definition, a validation check of the query's command, time interval, and logic is made. For example, a nonblank

WELCOME TO QCIS: A SAS BASED RELATIONAL DATABASWQUERY SYSTEM FOR THE CURRENT INDEX TO STATISTICS COPYRIGHT VCU, 1991

SELECT A TASK BY ENTERING THE rmST U T E k D D DEFINEQUnrV E EXAMPLES OF QUERIES H HELP FOR QUERY I INFORMATION ABOUT yCIS

Q QUm

PRESS ENTER (OR RETURN)TO START THE TASK

-

Figure 2. Main menu.

QCIS

II

FUR A SIMPLE QUERY, ENIERTHE QUERY COMMAND, QUERY STRLNGI, AND QUERY Nll3RVAL (ACCEPTABLEVALUES ARE INDICATED IN PAREIVEfESES).

FOR A CIMPOUND QUERY, *LIO ENTER THE QUERY OPERATOR AND QUERY STlUNG2.

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

REENIER 'IHE YEARS IF YOUR SEARCH IS ONLY ON A SUBSET OF THE DATABASE.

----------------------------------------.-------------

QUERY COMMAND (A,TX OR J):

B

QUERY STRING (NON-BL4NK WlTH LENGTH < 5 I ) SlWNG 1:QUERY OPERATOR (AND/OR):eClq

I

Figure 3.

STRING 2: OUTLIER

Query definition screen.

logical operator requires the second query string to be nonblank. The query definition is echoed to the user along with the results of the validation checks. If an error is found, the user must redefine the query. If a valid query has been defined, the menu shown in Figure 4 is displayed. The user may redefine the query or submit it for execution. Multiple queries may be defined and submitted in a single session. The system generates a default output file name (based on date and time) for each query. The resultant output file is a standard ASCII print file. The default file name may be renamed before the query is submitted for execution. In the current example, the user has renamed the file to RESOUT. Query execution occurs when the user terminates the session through the quit option. Figure 5 shows a portion of the generated output for the query defined in Figure 3. Thirty-five entries in the CIS database satisfied the example search. The interactive execution version also allows users to view a "shortened" version of the query output in a window before they define permanent output files or exit from the system. On multiprocessor machines, the batch version may be preferable because it frees up the user's processor while the searches are conducted. The example compound query shown in figures required a CPU time of 1 min 57 sec. A second, simple, query for all entries between 1987 and 1990 with an author name of "Carter" required 56 sec of CPU time. Thirty-three entries were found. Both queries were defined in a single session. The total user time required to invoke the system, define both queries, designate sep-

Kish and Dawson

?HE CURRENT OUERY -ON

IS V

U

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

QUERY COMMAND IS: K QUERY STRING 1 IS: RESIDUAL QUERY OFERATOR IS: AND QUERY STRING2 IS: O W QUERY INIFRVAL 1S:l978 1990 THE QUERY PRWC FILENAMEIS: RESOUT SELECT A TASK BY ENTERING 'IHEFIRST E l l W k S SUBMIT QUERY FOR BATCH PROCESSING FILE R RENAME QUERY D DEFINEQUERY E EXAMRES OFQUERIES H HELP FOR QUERY 1 INFORMATION ABOUT QCIS Q Qum PRESS ENIER (OR RETURN)TO START THE TASK

Figure 4 . Query submit screen

PUBLICATIONSFROM A 0 KEYWORD SEARCH OF RESIDUAL AND OUTllER FROM 1978 - 1990 35 EiWWE.9 FOUND

I Year: 78 Reference: AMERSTAT32 17- 22 Title: THE HAT MATTUX IN REGRESSION AND ANOVA (CORR. V32 P146) Authors: DAVID C. HOAGLlN, ROY E. WELSCH Keyword: ANALYSIS OF VARIANCE. PROJECTION, OUTUERS. STUDENTIED RESIDUAL. LEAST SQUARES COMPUTATIONS 2 Year: 78 Reference: APPLSTAT27 11 1- 1 19 Title: OUTLIERS IN FACTORIAL M P E R I M E m S Authors: J. A. JOHN Keyword: CONFOUNDING, FRACTIONAL REPLKATION. MISSING DATA. ANALYSISOF VARIANCE, RESIDUALS 3 Year: 78 Reference: TECHNMCSZO 69- 78 Title: ON TESTING FOR TWO OUTLIERS OR ONE OUlZlER IN TWO-WAY TABLES /

Figure 5 . QCIS hardcopy output

QCIS

263

arate output filenames, and submit the queries was 65 sec. These runs were made on a VAX Model 6000-440 running SAS System, Release 6.06.

Downloaded by [NUS National University of Singapore] at 16:00 06 November 2015

Availability Users who have purchased the CIS database and who want to have a copy of QCIS should send a copy of their contract signature page, a 3.541, disk, and a self-addressed, stamped diskette mailer to the authors. Suggestions from users are welcome.

Trademarks SAS is a registered trademark of SAS Institute Inc., Cary, NC. UNIX is a registered trademark of AT&T Bell Laboratories.

Acknowledgments We thank Luther Atkinson and Peggy Alexander of the Health Sciences Computing Services at the Medical College of Virginia Commonwealth University, for their helpful assistance. We also express our appreciation to Dr. Walter H. Carter, Jr., for his enthusiastic support.

References Trumbo B: Microcomputers make current index searches fast and easy. Amstat News, No. 175, 1991. 2. SAS Institute: [email protected] Language: Reference, Version 6 First Edition. SAS Institute Inc., Cary, NC, 1990, 1042 pp. 1.

query system for the Current Index to Statistics.

A portable, menu-driven software system for the Current Index to Statistics (CIS) cumulative database is described. The SAS-based system converts the ...
446KB Sizes 0 Downloads 0 Views