NIH Public Access Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

NIH-PA Author Manuscript

Published in final edited form as: Pharm Rev. 2009 May 8; 7(3): .

Extracting Relevant Information from FDA Drug Files to Create a Structurally Diverse Drug Database Using KnowItAll® Malcolm J. D’Souza* and Fumie Koyoshi Department of Chemistry, Wesley College, 120 N. State Street, Dover, Delaware 19901-3875, USA Malcolm J. D’Souza: [email protected]

Abstract

NIH-PA Author Manuscript

Each Food and Drug Administration (FDA) consumer drug information file contains an inordinate amount of useful chemical, pharmaceutical, and pharmacological data. These files profile approved drugs by chemical structure, solubility, absorption, distribution, metabolism, elimination, toxicity (ADME/Tox), and possible adverse reactions. The ability to utilize this data in the classroom is a new approach to connect theory, technology, and reality. The KnowItAll® Informatics System available through Bio-Rad Laboratories, Philadelphia, PA, offers fully integrated software and/or database desktop solutions. It holds a large collection of in silico ADME/Tox predictors and is a chemical informatics platform used to record experimental data. This project had three goals: (1) extract relevant information for 75 drugs from their freely available FDA drug files (limited to orally administrated drugs, pro-drugs, having a chemical structure), (2) build a database so this extracted FDA information is indexed for search and analysis, and when completed, (3) undergraduates involved in such a project should be capable of harvesting useful chemical, pharmaceutical, and pharmacological information; be adept in computational chemistry software tools; and should gain an enhanced vocabulary and new insights into organic chemistry, molecular biology, and physiology.

NIH-PA Author Manuscript

Keywords FDA consumer drug database©; KnowItAll®; ADME/Tox; (quantitative) structure activity relationship (Q)SAR; predictor tools; chemical informatics

Introduction FDA consumer drug information files [1] provide open access to clinical data, including data from negative trials, and unpublished results. These portable document format (pdf) files profile drugs by chemical structure, solubility, metabolism pathways, absorption, distribution, elimination, carcinogenesis, mutagenesis, impairment of fertility, possible adverse reactions, and other useful pharmacokinetic, and toxicological data. Additionally, the searchable drug information database Drugs@FDA [2] serves as the most comprehensive resource for product specific information on drugs approved in the USA, and

*

Corresponding Author Tel: 302-736-2528; Fax: 302-736-2301.

D’Souza and Koyoshi

Page 2

NIH-PA Author Manuscript

can be navigated using online text query with ease. However these documents are not consumer friendly and are very wordy. Furthermore, the often complex drug structures that are reported in the pdf drug files cannot be exported into another application. The FDA website also include findings from documents submitted voluntarily to the FDA Safety Information and Adverse Reporting program (MedWatch) [3] by the drug and biologic manufacturers, consumers, distributors, packers, and sponsors and from investigators with studies [3, 4] under the Investigational New Drug (IND) applications. Another very recent exhaustive resource serving the life science community is the University of Alberta’s DrugBank database [5]. One goal is to provide science majors with a process that overcomes the initial fear that they may experience when attempting to extract useful information from this mass of files, as all of the above browseable high-quality mass of data can be mined, analyzed, and interpreted to develop statistical models of predictive quantitative and qualitative structure activity relationships (QSARs, SARs) [6–8].

NIH-PA Author Manuscript

Bio-Rad’s KnowItAll® Informatics System Desktop Solutions [9] offers fully integrated software and/or database desktop solutions for multiple aspects of research including in silico ADME/Tox profiling, spectroscopy, cheminformatics, and medicinal chemistry [10]. The Company offers several KnowItAll® “editions” that combine the appropriate set of software tools based on specific user groups/needs [9–12]. In this article we detail steps that combine information freely accessed through the web along with our KnowItAll® Cheminformatics Edition [9] and our KnowItAll® ADME/Tox Edition [9] purchased through Bio-Rad Laboratories, in a project to teach students within an established Directed Research Program methods to access, extract, document, and manage relevant data from FDA drug files in order to create a searchable consumer drug database.

Discussion In order to create a mindset that would entail both qualitative and quantitative analyses, we undertook our project in two stages: (1) extract information for 75 consumer drugs from the FDA Drug Information Websites (limited to orally administrated drugs, pro-drugs, having a chemical structure) [1–4]; and (2) build a pharmaceutical database using our ADME/Tox edition and our Cheminformatics edition of the KnowItAll® System [9].

NIH-PA Author Manuscript

Here, data was extracted from the freely available FDA consumer drug information files [1– 4] for the following 75 randomly chosen unrelated consumer drugs: Iressa®, Levitra®, Strattera®, Abilify®, Inspra®, Hepsera®, Namenda®, Alinia®, Emtriva®, Emend®, Tindamax®, Sensipar®, Pletal®, Viagra®, Zavesca®, Orfadin®, Zyvox®, Uroxatral®, Zelnorm®, Avodart®, Frova®, Provigil®, Aromasin®, Detrol®, Thalomid®, Atacand®, Zonegran®, Micardis®, Maxalt®, Xeloda®, Arava®, Gleevec®, Cialis®, Reyataz®, 5FU from of Xeloda®, Cardesartan (active metabolite of Xeloda®), Hectoral®, Singulair®, Spectracef®, Cefaditoren (active metabolite of Specfracef®), Sanctura®, Colazal®, active metabolite of Arava®, Zetia®, Hepsera® M1, Spiriva®, Starlix®, Tasmar®, Xifaxan®, Crestor®, Nolvadex®, Detrol® M1, Mobic®, Sustiva®, Aciphex®, Ziagen®, Protonix®, Celebrex®, Agenerase®, Trileptal®, OxcarbazepineMHD (active metabolite of trileptal®), Exelon®, Temodar®, Keppra®, Tequin®, Avandia®, Actos®, Avelox®, Tamiflu®, Oseltamivir Carboxylate (active metabolite of Tamiflu®), Lopinavir®, Ritonavir®,

Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 3

NIH-PA Author Manuscript

Benicar®, Rapamune®, and Ketek®. The three dimensional chemical structure for each of the 75 drugs was drawn using the DrawIt™ drawing application available in the KnowItAll® Cheminformatics Edition [9]. For prodrugs, chemical structures of both the prodrugs and their corresponding real drugs were drawn and dealt with separately (for example Detrol® and Detrol M1). KnowItAll® Cheminformatics solutions [9] include tools to draw, modify, store, search, name, and retrieve chemical structures. Notably, their structure drawing and reporting tools are based on the well-respected ChemWindow technology and are designed so that chemists can recognize stereochemistry, E/Z isomers, and contains chemical recognition features such as, hot keys, chemical syntax checker, tools to calculate mass and formula, etc. A recent comparison [13] of current freely available commercial chemical software drawing and reporting tools, gave the applications from the KnowItAll® Academic Edition a very high rating for its quality, flexibility, and ease of use.

NIH-PA Author Manuscript

Figure 1 shows the chemical structure of the drug Tamiflu® drawn using the DrawIt™ application in the KnowItAll® Cheminformatics Edition. This structure is documented in Tamiflu’s® FDA drug profile [1] where it is reported with a non-systematic International Union of Pure and Applied Chemistry (IUPAC) nomenclature of (3R,4R,5S)-4acetylamino-5-amino-3-(1-ethylpropoxy)-1-cyclohexene-1-carboxylic acid, ethyl ester, phosphate (1:1). An advantage of the KnowItAll® Cheminformatics Edition is that it is bundled with the IUPAC NameIt™ application, which has the capability to generate a compound’s correct systematic IUPAC name from its structure. In this case (as shown in Figure 2) the IUPAC NameIt™ application reported, ethyl (5S,3R,4R)-4-(acetylamino)-5amino-3-(ethylpropoxy) cyclohex-1-enecarboxylate, phosphoric acid, for Tamiflu®. This ensures the accuracy in the recording, storage, and the retrieval of chemical information for the drug that often has significant text information content associated with its systematic name in the literature. The 3D ViewIt™ application also bundled in the KnowItAll® Cheminformatics edition has the ability to covert the 2D DrawIt™ image of Tamiflu® into realistic 3D drawings as shown in Figure 3. This ability to work with such high resolution images can serve as an inexpensive and powerful method to study structure based drug design.

NIH-PA Author Manuscript

Since FDA drug structures cannot be imported directly from the drug file into KnowItAll®, one has to redraw the often complex drug structure using correct stereochemistry and geometry. An example is shown below using the structure of the immunosuppressant Rapamune®. A slight error in the drawing (like not connecting the highlighted double bond as shown in Figure 4), can have a significant impact during any future mining and screening process [14]. Such an error can be avoided when using the KnowItAll® Cheminformatics Edition, due to the presence of a ‘Check Chemistry’ tool which highlights possible connectivity errors, and a ‘Calculate Mass & Composition’ tool that provides the structure’s molecular weight (MWt), molecular formula, and chemical composition; these can be then compared for accuracy with the data reported for these parameters in the drugs FDA files [1–4] (Figures 4 & 5). Storage of accurate structures within a database has been shown to be

Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 4

crucial especially when querying the database by structural criteria or substructure search [14, 15].

NIH-PA Author Manuscript

KnowItAll® also logically integrates all applications in a single interface (Figure 6) so the user can easily transfer information from application to application without opening another program [9]. For example, one can draw a structure, send it to a module for NMR prediction, and then add that structure to a user database. Tools are also available to correlate whether or not a structure matches a spectrum, and the SearchIt™ application [9] allows structures and/or spectra to be imported and searched against reference databases, as well as against user created databases.

NIH-PA Author Manuscript

The company’s website also reports that the KnowItAll® in silico ADME/Tox solutions can be used to assess a potential drug’s ADME/Tox profile with over 30 predictive models and tools for model building and validation [9]. These applications in the KnowItAll® ADME/Tox edition allows researchers to build predictive SAR models of biological properties using databases of compounds with known property values and molecular indices calculated from their chemical structure [9, 16–20]. We purchased fourteen pharmaceutical and pharmacological parameters and these were packaged in our KnowItAll® ADME/Tox edition. These parameters are common to traditional (Q)SAR modeling and each parameter value was then extracted (when reported) from each drug profile. The fourteen pharmaceutical and pharmacological properties were Oncogenicity, Teratogenicity, Mutagenicity, Human Intestinal Absorption, Plasma Protein Binding, Water Solubility, Volume of Distribution, Elimination Half Time, Rate of Absorption, Blood Brain Barrier, NeuroToxicity, pKa, Bioavailability, and log P. In cases where experimental values were not available, missing properties from some drug profiles were calculated with the available predictor tools [9] within the KnowItAll® platform.

NIH-PA Author Manuscript

Since the published FDA drug profiles [1–4] for these structurally diverse consumer drugs come from different pharmaceutical companies, there are major differences in the reporting of a set of properties provided for each drug profile. Therefore for some of the drugs, before using their reported information in the pharmaceutical and pharmacological data content section of their FDA files, data normalizations were sometimes needed. An example here is for the pharmacokinetic parameter Volume of Distribution; for Micardis® (which is a combination of telmisartan, an orally active angiotensin II antagonist acting on the AT1 receptor subtype, and hydrochlorothiazide, a diuretic), the FDA file reports: “The Volume of Distribution for telmisartan is approximately 500 liters indicating additional tissue binding.” In our KnowItAll® database, for Micardis®, Volume of Distribution is normalized to 7.14 L/kg. This was done by dividing the reported volume by 70 kg which is the average weight of the human subjects in a majority of the documented FDA file data. Sometimes, wide ranges of data had to be inputted into KnowItAll®. For example, the Rapamune® FDA label file reports: “The mean volume of distribution (Vss/F) of sirolimus (the international nonproprietary name for Rapamune®) is 12 ± 8 L/kg.” Hence, its KnowItAll® database Volume of Distribution is documented as 4 ~ 20 L/kg. Sometimes data interpretations had to be made; as in the case of pharmaceutical formulation parameter, Water Solubility: approximations of sparingly, slightly soluble => 0.001 – 0.01 mg/ml; very, freely, or highly soluble => 1000 mg/ml, were made. For example, the Crestor® FDA label file reads:

Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 5

NIH-PA Author Manuscript

“Rosuvastatin calcium (active ingredient) is a white amorphous powder that is sparingly soluble in water and methanol, and slightly soluble in ethanol.” Hence, the KnowItAll® database Water Solubility value for Crestor® is documented as 0.001 mg/ml. On evaluating each of the chosen 75 FDA consumer drug profiles, we redrew the three dimensional chemical structures for each of the 75 drugs using the DrawIt™ drawing application available in the KnowItAll® Cheminformatics Edition; harvested information about each drugs trade (patent) name, classification, and associated chemical data; documented available information for all fourteen pharmaceutical and pharmacological parameters; and then, with the database building capability of Bio-Rad’s KnowItAll® platform, created a FDA Consumer Drug Database© as shown in Figure 7. This database is now available [21] through Bio-Rad Laboratories, and it should help increase the accuracy of predictions by contributing to the variation of available models [22]. Going forward, there is the potential to use this database [21] to extract SAR patterns (example shown below) across these structurally diverse and unrelated 75 consumer drugs. Results from such a task will be presented in the near future [23].

NIH-PA Author Manuscript

In order to investigate structure activity relationships (SAR) of sulfur-containing functional groups (sulfide, sulfamoyl, sulfonyl, and sulfone); the sulfur-containing functional group is first drawn using the DrawIt™ tool in KnowItAll®, then the SearchIt™ tool is utilized to search the FDA Consumer Drug Database© for the maximum hits for that functional group, sorting among the 75 drugs. However SearchIt™ results reports some drugs several times (within the 4 sulfur-containing functional groups), not because they contain more than one functional group, but simply because of a common chemical structural motif, such as S=O, shared between the 4 sulfur-containing functional groups. Hence, a visual sorting needs to be done. On completion, the ChemSilico predictions [9] can be made for each of the 75 drugs using the KnowItAll® predictor ProfileIt™ as represented in the Figure 8 below, and the required biological property data can be extracted from these predictions.

Conclusions

NIH-PA Author Manuscript

FDA Drug Information data files [1–4] contain valuable chemical, pharmaceutical and pharmacological data. However at times, this data needs normalization as it is expressed in wide ranges. The consistency to optimize compounds for multiple attributes (simultaneously) with experimental data in certain areas, should also encourage students working on such a project to make the connections between chemistry, biology, and physiology. The KnowItAll® platforms ease-of-operation, availability of analysis, interpretation, and reporting tools, coupled with the ability to build searchable databases seamlessly integrated within a single user interface, is ideally suited for cheminformatics applications at research institutions. This system offers an ideal environment for researchers to analyze and compare experimental results across a diverse collection of drugs including the potential to study structure activity relationship (SAR) patterns.

Acknowledgments This research was supported by grant number 2 P2O RR016472-08 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). This IDeA Network of Biomedical

Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 6 Research Excellence (INBRE) grant to the state of Delaware was obtained under the leadership of the Delaware Biotechnology Institute, University of Delaware, and the authors sincerely appreciate their efforts.

NIH-PA Author Manuscript

References and Notes

NIH-PA Author Manuscript NIH-PA Author Manuscript

1. Food and Drug Administration (FDA) profiles. Retrieved from http://www.fda.gov/cder/drug/ default.htm 2. Drugs@FDA. Retrieved from http://www.accessdata.fda.gov/Scripts/cder/DrugsatFDA/ 3. FDA Safety Information and Adverse Reporting program (MedWatch). Retrieved from http:// www.fda.gov/medwatch/ 4. NIH Molecular Libraries-Small Molecule Repository. Retrieved from http://mlsmr.glpg.com/ MLSMR_HomePage/submitcompounds.html 5. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: A Comprehensive Resource for In Silico Drug Discovery and Exploration. Nucleic Acid Research. 2006; 34 (Database Issue):D668–D672. 6. Kruhlak NL, Contrera JF, Benz RD, Matthews EJ. Progress in QSAR Toxicity Screening of Pharmaceutical Impurities and Other FDA Regulated Products. Advanced Drug Delivery Reviews. 2007; 59(1):43–55. [PubMed: 17229485] 7. Cronin MTD, Jaworska JS, Walker JD, Comber MHI, Watts CD, Worth AP. Use of QSARs in International Decision-Making Frameworks to Predict Health Effects of Chemical Substances. Environmental Health Perspectives. 2003; 111(10):1391–1401. [PubMed: 12896862] 8. Walker JD, Jaworska J, Comber MHI, Schultz TW, Dearden JC. Guidelines for Developing and Using Quantitative Structure Activity Relationships. Environmental Toxicology and Chemistry. 2003; 22(8):1653–1665. [PubMed: 12924568] 9. KnowItAll® Informatics System Desktop Solutions. Retrieved from http://www.knowitall.com/ 10. D’Souza MJ. KnowItAll® - Software Reviews. ChemistryWorld. 2005; 2(9):70–71. 11. KnowItAll® U System. Retrieved from http://www.knowitallu.com/ 12. D’Souza MJ. KnowItAll® U System - Software Reviews. Chemistry World. 2007; 4(11):70–72. 13. Anand V, Gera M, Kumar V, Karwasara P, Kataria M, Kukkar V. Comparative Evaluation of Freely Available Chemical Structure Drawing Software. Pharmaceutical Rev. 2008; 6(2) 14. Richard AM, Swirsky Gold L, Nicklaus MC. Chemical Structure Indexing of Toxicity Data on the Internet: Moving Toward a Flat World. Current Opinion in Drug Discovery & Development. 2006; 9(3):314–325. [PubMed: 16729727] 15. Baumgras JL, Rogers AE. Chemical Structures at the Desktop: Integrating Drawing Tools with online Registry Files. Journal of the American Society for Information Science. 1999; 46(8):623– 631. 16. D’Souza ML, Abshear T, Banik GM, Nedwed K, Peng C. A Model Validation and Consensus Building Environment. SAR and QSAR in Environmental Research. 2006; 17(3):311–321. [PubMed: 16815770] 17. Banik, GM. Current Drug Discovery. 2004. In Silico ADME/Tox Prediction: The More, the Merrier; p. 31-34. 18. Dearden J, Worth A. In Silico Prediction of Physicochemical Properties. JRC Scientific and Technical Reports, EUR 23051, EN-2007. :1–68. 19. Bidault Y. A Flexible Approach for Optimizing In Silico ADME/Tox Characterization of Lead Candidates. Expert Opinion on Drug Metabolism and Toxicity. 2006; 2(1):157–168. 20. Dearden JC. In Silico Predictions of ADMET Properties: How Far Have We Come? Expert Opinion on Drug Metabolism and Toxicity. 2007; 3(5):635–639. 21. D’Souza, MJ. FDA Consumer Drug database –2007. HaveItAll - ADME/Tox Experimental Databases Datasheet. Bio-Rad Laboratories; 2008. Bulletin # INF-96199 22. Wess G. How to Escape the Bottleneck of Medicinal Chemistry. Drug Discovery Today. 2002; 7(10):533–535. [PubMed: 12047844] 23. D’Souza, MJ.; Koyoshi, F.; Everett, LM. Structure Activity Relationship (SAR) Patterns Observed Within a Series of Unrelated Common Consumer Drugs. 2009 International Conference on

Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 7

Bioinformatics, Computational Biology, Genomics, and Chemoinformatics (BCBGC-09); Orlando, FL, USA. 2009.

NIH-PA Author Manuscript

Biographies

Malcolm J. D'Souza, Ph.D., Professor of Chemistry, Wesley College, Dover, DE

NIH-PA Author Manuscript

Fumie Koyoshi was a Wesley College Biology major who completed this project during an INBRE supported Undergraduate Research Assistantship in the Directed Research Program in Chemistry at Wesley College. On graduation, she joined the University of Pennsylvania Hospital, School of Medical Technology, Philadelphia, PA.

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 8

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 1.

Chemical structure of Tamiflu® obtained using the DrawIt™ application in the KnowItAll® Cheminformatics Edition

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 9

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 2.

IUPAC name of Tamiflu® obtained using the IUPAC NameIt™ application in the KnowItAll® Cheminformatics Edition

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 10

NIH-PA Author Manuscript Figure 3.

3D structure of Tamiflu® obtained using the 3D ViewIt™ application in the KnowItAll® Cheminformatics Edition

NIH-PA Author Manuscript NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 11

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 4.

Incorrect drawing of Rapamune® MWt: 930.23; Formula: C52H83NO13

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 12

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 5.

Correct drawing of Rapamune® MWt: 914.19; Formula: C51H79NO13

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 13

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 6.

Single interface applications [9] in the KnowItAll® Cheminformatics Edition

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 14

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 7.

FDA Consumer Drug database© created using the KnowItAll® ADME/Tox Edition

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

D’Souza and Koyoshi

Page 15

NIH-PA Author Manuscript NIH-PA Author Manuscript

Figure 8.

ProfileIt™ prediction results for Tindamax® (a synthetic antiprotozoal and antibacterial agent).

NIH-PA Author Manuscript Pharm Rev. Author manuscript; available in PMC 2014 October 27.

Extracting Relevant Information from FDA Drug Files to Create a Structurally Diverse Drug Database Using KnowItAll®

Each Food and Drug Administration (FDA) consumer drug information file contains an inordinate amount of useful chemical, pharmaceutical, and pharmacol...
2MB Sizes 0 Downloads 7 Views