Vol. 2, No. 3 2005

Drug Discovery Today: Technologies Editors-in-Chief Kelvin Lam – Pfizer, Inc., USA Henk Timmerman – Vrije Universiteit, The Netherlands DRUG DISCOVERY

TODAY

TECHNOLOGIES

Knowledge management

The Semantic Web and Knowledge Grids Carole Goble*, Robert Stevens, Sean Bechhofer School of Computer Science, The University of Manchester, Oxford Road, Manchester, UK M13 9PL

The Semantic Web and the Knowledge Grid are recently proposed technological solutions to distribu-

Section Editor: Manuel Peitsch – Novartis, Basel, Switzerland

ted knowledge management. Early experimental applications from the Life Science community indicate that the approaches have promise and suggest that this community be an appropriate nursery for grounding, developing and hardening the current, rather immature, machinery needed to deliver on the technological visions, which thus far have been dominated by technological curiosity rather than application-led practicality and relevance. Further necessary developments in theory, infrastructure, tools, and content management should and could be steered opportunistically by the needs and applications of Life Science.

Introduction: what is the Semantic Web? The Web has served life scientists well. Many data sets and tools are published and accessed using web protocols and web browsers. Sharing data repositories and tool libraries is straightforward. Widespread collaboration is possible by publishing a simple web page. Thus, the Web enables individual scientists to answer simple ‘low volume’ questions over large but relatively simple data sets without needing a profound knowledge of computer science. However, standard web technology is now straining to meet the needs of biologists. A Web-based distributed information infrastructure is a place where a person performs complex tasks and computers pre-

*Corresponding author: C. Goble ([email protected]) URL: http://www.cs.man.ac.uk/carole/ 1740-6749/$ ß 2005 Elsevier Ltd. All rights reserved.

DOI: 10.1016/j.ddtec.2005.08.005

sent and fetch web pages. People have to manually search the Web for content or just know where to go; interpret and process page content by reading it and interacting with web pages; infer crosslinks between information in web pages or other sites; integrate content from multiple resources and consolidate the heterogeneous information while preserving the understanding of its context. Well-known specialist applications like SRS and Entrez are designed to overcome these difficulties but they are not general solutions, and the interpretation of the information is still buried in the application code. Service providers still publish resources assuming that a person will be ‘point-clicking’ at a browser and reading the text; without a programmatic interface, automatic processing is difficult and fragile. The Semantic Web could enable automated processing and effective reuse of information on the Web that would support intelligent searching and improved interlinking [1,2]. The Semantic Web is an extension of the Web in which information is given well-defined meaning by being associated with metadata described in common terms [3]. Ontologies represent the vocabulary terms, and how they inter-relate, for the concepts shared by a community. This requires that all kind of web content be marked up with metadata that encodes its meaning in a way that is machine-interpretable and hence be processed by agents, search engines and applications to automate the content discovery and integration tasks that people currently do manually. The meaning of information is embedded in the Web, not the applications, so it can be unambiguously and reliably shared across many applications. www.drugdiscoverytoday.com

225

Drug Discovery Today: Technologies | Knowledge management

Thus, the Web could evolve from documents published for people to read to knowledge published for computer applications to process. We can think of it as a way of representing data on the Web or as a globally linked knowledge base. In practice there will be many high-quality Semantic Webs linked together through regular Web links and search mechanisms or low-quality metadata. Semantic Webs are costly and difficult to produce and maintain, so only community ‘weblets’ or corporate intranets that gain from the significant added value will bother with the high quality metadata needed. Semantic Webs for Life Science communities and organisations involved in drug discovery are obvious candidates. Why? They are knowledge driven, fragmented, and have valuable knowledge assets whose contents need to be combined and used by many applications. The content is diverse, being structured (databases, electronic lab books), semistructured (papers, ExcelTM sheets) and unstructured (PowerpointTM documents, Web blogs, images). Its scale necessitates that the processing be done automatically. There are many suppliers and consumers of knowledge and a loose coupling between suppliers and consumers – information is used in unanticipated ways by knowledge workers unknown to those who deposited it. People naturally form communities of practice, and there is a culture of sharing and knowledge curation. The explosion of data coupled with the need to innovate means the problem is mainstream and urgent.

Knowledge Grids The Semantic Web aims to facilitate machine support for distributed knowledge management through the provision of a semantic infrastructure. The Grid aims to support secure and flexible coordinated resource sharing through the provision of a middleware platform for advanced distributing computing [4]. Grid machinery aims to allow the collection of all kinds of resources – computing, storage, data sets, digital libraries, scientific instruments, people, among others – to easily form Virtual Organisations that cross organisational boundaries and work together to solve a problem. Computational-file based Grids are the most mature, harnessing available compute power to support compute-intensive analysis applications. Whereas Computational Grids present the illusion of a single virtual computer to an application, Data Grids present a single virtual data store that is really distributed and multilocated. Portals provide a way for application developers to submit their compute job or query. On top of these ‘plumbing’ Grids, Application Grids aim to present the illusion that applications work together when they do not. Grid Computing is supported by the Global Grid Forum (http:// www.ggf.org) and the Enterprise Grid Alliance (http:// www.gridalliance.org/) through a series of standards, core services and applications. Standardisation routes include OASIS and IETF. The Grid has a strong industry influence and major applications pull chiefly from global, large-scale 226

www.drugdiscoverytoday.com

Vol. 2, No. 3 2005

scientific collaborations. The chief emphasis to date has been on practical deployment of data and building compute grids. Over the past four years the Open Grid Service Architecture Grid initiative has revised Grid computing to adopt the Service Oriented Architecture paradigm using Web Services, a distributed computing platform from the Web community and heavily supported by industry. This revision is still in a state of flux, having had one false start in the OGSI specification (now the WSRF specification). OGSA services are still at the prototype stage. However, OGSA does bring together the Web and Grid communities to a common platform. Knowledge Grids are much less established as a concept, and the term is still controversial [5]. Cannataro and Talia [6] define the term as an environment for the design and execution of geographically distributed high-performance knowledge discovery applications, which is pretty much the purpose of all Grids. Knowledge Grids use knowledge-based methodologies, including knowledge engineering tools, discovery and analysis techniques such as data mining and machine learning, intelligent software agents, mathematical modelling, simulation or planning. Goble and De Roure [7] distinguish a Knowledge Grid from a Semantic Grid by suggesting that Knowledge Grids apply to knowledge associated with Grid domain applications and resources and Semantic Grids to knowledge associated with grid middleware and entities. Either way, Knowledge Grids are intended to provide intelligent guidance for decision makers, through an agreed knowledge representation and the provision of homogeneous access over heterogeneous sources of information, metadata and ontologies. We interpret a Knowledge Grid as a knowledge base to support distributed knowledge discovery applications. From this standpoint, a Semantic Web is a Knowledge Grid with the emphasis on distributed knowledge representation and integration, and a Knowledge Grid is a platform for distributed knowledge processing over a Semantic Web. In combination they seek to support automation by exposing the implicit and tacit knowledge used by humans and providing an appropriate programming infrastructure.

Key Semantic Web machinery The Semantic Web has the backing of the World Wide Web Consortium (W3C) standards organisation (World Wide Web Consortium Semantic Web Activity, http://www.w3.org/ 2001/sw/) and has spawned extensive research, development and standards activity supported by industry and academia. A range of technologies and machinery is needed to deliver vision, and these are currently in various states of maturity, from established standards and commercial products to things improbable. There are four key ingredients necessary to deliver a Semantic Web: (a) a universal model of metadata, knowledge and assertions over the current web. This forms a web of knowledge layered over conventional web (or Grid, resources); (b)

Vol. 2, No. 3 2005

semantic content; (c) tools to build and maintain the model and capture the content; and (d) knowledge-driven applications that use the semantic infrastructure. To date the computing community has concentrated on (a) and (c). We can legitimately say the Semantic Web is technology pushed rather than application pulled. Standardisation activities are focused on (a); commercial activities by specialist and major vendors (notably Oracle, IBM and HP) are focused on (c) and (d). The Semantic Web model has five main components (Fig. 1), usually presented as a layered stack. XML is a carrying syntax for all the languages of the model and for all the languages of the Web.

Identification The Universal Resource Identifier (URI) ensures a unique identity for each Semantic Web entity. The Life Science

Drug Discovery Today: Technologies | Knowledge management

Identifier (LSID) [8] protocol is a domain-specific protocol that introduces a standardised way of naming data resources, backed by an OMG standard. LSIDs have been successfully adopted [9,10] (BioDash http://www.w3.org/2005/04/swls/ BioDash/Demo/), but teething problems include poor support for versions and some confusion over what data can change without changing identity. Semantic Web purists claim that the LSID is unnecessary although it seems not to have developed applications for life scientists.

Metadata annotation Resources are annotated by metadata that asserts facts about and between them and their content in a common, flexible data model. The Resource Description Framework (RDF; http://www.w3.org/RDF/) describes objects and relations between them in a self-describing data model. RDF is a key to integrated and federated data storage, making this the

Figure 1. The layered architecture of the Semantic Web.

www.drugdiscoverytoday.com

227

Drug Discovery Today: Technologies | Knowledge management

‘integration’ layer of the Semantic Web infrastructure. We have webs of metadata (Fig. 1) as well as webs of pages. The RDF model is based on ‘subject-predicate-object’ statements (‘triples’), assembled into graphs that assert facts about and between resources (Fig. 2(1 and 2)). Facts are commonly held separate from the resource held in triple stores (http://simile.mit.edu/reports/stores/). Oracle Corp is incorporating RDF support into their regular relational databases (http://www.oracle.com/technology/tech/semantic_technologies/). RDF supports statements about statements, crucial for provenance (‘this fact is asserted by EMBL-EBI’); timestamping (‘this fact is asserted on 31-05-2005’) and meta-statements (‘this fact is untested’). Although the RDF standard is established, the technologies are still relatively immature. There is a confusing range of RDF query languages although the SPARQL RDF query language is being standardised (http://www.w3.org/TR/rdfsparql-query/). Performance over medium-large data sets is disappointing. There is poor support for grouping statements (‘named graphs’) [11]. Representing RDF within an HTML page is an issue (http://www.cs.vu.nl/guus/public/carroll-rdfhtml.pdf). The RDF syntax is designed for machines and should never be revealed to humans; the same is true for XML, of course, but RDF is particularly distracting.

Vol. 2, No. 3 2005

Knowledge A shared interpretation of what the metadata means requires ontologies for describing controlled vocabularies and background knowledge. Ontologies – consensual, shared models in an executable form of concepts, relations and their constraints tied to a scaffold of taxonomies [12] – are common in Life Sciences (http://www.sofg.org and http://obo.sourceforge.net/). We use ontology terms to assert facts about resources, like web pages, databases, services, a protein, a compound, a gene, a person, among others (Fig. 2(3 and 4)). Topic Maps (http://www.topicmaps.org/) have been proposed as ‘indexing’ mechanisms for the Web content. However, the two standardised languages for exchanging and representing ontologies are: RDF Schema (RDFS) [13], for simple taxonomies and OWL [14], a family of Web Ontology Languages extending RDFS (see [15] in this issue). The Semantic Web Rule Language (SWRL) [16] adds rules to OWL knowledge bases, encoding constraints and supporting further deduction of knowledge. This is undergoing standardisation with only prototype implementations and no commercial support. Specialist vendors such as Ontoprise (http:// www.ontoprise.com) and Cerebra (http://cerebra.com) provide OWL tool suites.

Figure 2. The Semantic Bus. Suppliers (resources) expose their contents as RDF, allowing Consumers (applications) to make use of the data. Ontologies provide the glue that ties the data together and the knowledge that helps us to interpret the data. (1) RDF provides a common data model, exposing the contents of resources, instruments and data sources in a uniform fashion – for example glycogen synthase kinase 3 beta (GSK3beta) is a protein kinase, and GSK3beta is associated with diabetes type 2. (2) Graph merging based on identities from URIs provides links at the syntactic level to link resources about GSK3beta and diabetes. (3) Ontologies provide the consensus and shared knowledge that ties data together semantically – that a protein kinase phosphorylates proteins, which is an enzyme catalysis, which is a kind of control interaction or that chemical entities reference compounds and drug targets reference genes and their products. (4) Ontologies and rules provide an inference apparatus for generating implied knowledge, to infer the role that GSK3beta might play in the Insulin Signal Transduction pathway. (5) The Semantic Bus is the collective knowledge base that could be physically or virtually gathered together and that the knowledge tools and inference engines process.

228

www.drugdiscoverytoday.com

Vol. 2, No. 3 2005

Inference and reasoning Metadata, ontologies and rules combine to make a distributed knowledge base. The added value is to use the OWL and SWRL computational reasoning capabilities to infer new unasserted facts on or between resources and classify resources based on their descriptions. The design of OWL has been strongly influenced by its reasoning procedures. However, concerns include the benefits, performance and scalability of reasoning mechanisms and whether they can cope with the incomplete or inconsistent knowledge found in reality. One practice is to use reasoning ‘offline’, fix the results and use these in real-time applications [17]. A few reasoning engines are available commercially (Cerebra ServerTM, RacerProTM http://www.racer-systems.com/) to be embedded in applications and middleware.

Trust, proof, policy and context The separation of assertions from the resource, and the ability to assert facts about facts, is intended to support a tangled accumulative web of third party assertions over resources by those other than the resource creators or owners. The Semantic Web is envisioned as a ‘democracy’ where everyone can annotate resource or an annotation. The Semantic Web is changeable, inconsistent and will contain many dubious, outdated or conflicting statements. Metadata on the metadata ties assertions (inferred or stated) to a context such as its origin or creation date. This is important for intellectual property, provenance tracing, accountability and security as well as untangling contradictions or weighting support for an assertion. Thus, in addition to webs of metadata we gain webs of meta-metadata (Fig. 1). This is the least well-developed layers of the technology stack. Very little infrastructure exists, the issues are poorly understood and research is patchy and immature.

Semantic Web content Without content, the Semantic Web has no semantics. Acquiring the ontologies, rules and metadata is the prime bottleneck to adoption. The Life Sciences have made great efforts to develop standard domain ontologies for annotating data sets (e.g. The Gene Ontology) and indexing documents (e.g. UMLS). Ontologies have also been developed for describing services [17]. However, a Semantic Web for Life Sciences also needs ontologies for publications, resources, experiments, hypothesis, analysis, workflows among others as well as people and organisations. Concerns arise over the complexity of developing and maintaining ontologies. Knowledge engineering tools like Prote´ge´-OWL [18], still tend to be oriented to knowledge engineers not subject specialists, and there is a paucity of shared best practices. Machine learning and language processing to automatically generate ontologies [19] is in its infancy.

Drug Discovery Today: Technologies | Knowledge management

Annotation of resources with metadata splits into (a) high quality manual annotation using tools like OntoMat Annotizer [20] and (b) automatic (or semiautomatic) annotation using text mining [21], language processing techniques [22] or other forms of processing for other types of data. Text mining, UMLS, MeSH and a culture of manual curation make these approaches feasible. An alternative is to generate metadata in RDF directly from the content generating services, like content management systems, data mining tools or by service providers [23]. Semantic content generated by service providers, publishers and instrument suppliers needs to become universal and ubiquitous to have an impact. UniProt (http:// www.isb-sib.ch/ejain/rdf/) exports results in RDF, as do some publishers like Nature [24] but these are exceptions. Fig. 1 shows that there will be webs of knowledge in addition to webs of metadata and web pages. Multiple, possibly overlapping, ontologies need aligning, merging and linking, multiple metadata even on the same resource that can potentially conflict multiple rules and multiple assertions of trust and context. Just how this will work in practice, the mechanisms needed to cope and the tools and services to assist applications are still research subjects. The current reality is either simple versions of the Semantic Web, or these problems being passed on to the applications.

Semantic Web applications Knowledge-driven applications consume the semantic infrastructure. Applications have been mainly confined to corporate intranets for managing ring-fenced knowledge assets – semantic islands in the sea of the Web. Current applications divide into those that emphasise a Web of Semantics and those that just emphasis the processing of knowledge. We take the knowledge-oriented applications first.

Ontology development The inferencing capabilities of OWL have been shown to aid the building of large and sophisticated ontologies such as The Gene Ontology [25] and BioPAX (http://www.biopax.org/). The concept classification is derived and inconsistencies are automatically identified. The standardisation of a language significantly helps the exchange of ontologies. However, there are some problems with the expressivity of OWL for Life Science, Chemical and Clinical ontologies.

Advanced metadata modelling The self-describing nature of RDF and OWL models enables flexible descriptions for data collections, suiting those whose schemas might evolve and change or whose data types are hard to fix, like knowledge bases of scientific hypotheses, provenance records of in silico experiments [26,27] or publication collections. Now consider the applications where metadata is associated with distributed resources. www.drugdiscoverytoday.com

229

Drug Discovery Today: Technologies | Knowledge management

Intelligent searching and document discovery

Vol. 2, No. 3 2005

Semantically enabled search engines, like TAP [28] could support ‘concept-based’ searches over content annotations, exploiting the structure of the ontology by automatically narrowing or broadening search terms. The ontology hierarchy classifies the contents of metadata collections. An ontology definition for an oncogene including references to organism, function, locus, and associated diseases annotating a web page about function can be inferred to link to a paper on locus despite the paper not mentioning it [1]. Semantically powered catalogues and service registries can find data sets and tools based on ontological descriptions. Semantic Portals describe the properties, relationships and classifications of the various information items using an ontology, for example, PlanetOnto (http://kmi.open.ac.uk/ projects/kmi-planet/) [29] and the SEmantic portAL [30].

different schemas and bridging the structured and unstructured free texts. Shared ontologies and the common RDF data model help overcome the data of syntactic and semantic heterogeneity, forming a global integration ‘Semantic bus’ that knowledge discovery applications build upon. For example, YeastHub [31] converts the outputs of a variety of databases into RDF and combines them in a warehouse built over a native RDF data store. BioDASH (http://www.w3.org/2005/ 04/swls/BioDash/Demo/) (Fig. 3) is an experimental Drug Development Dashboard that uses RDF and OWL to associate disease, compounds, drug progression stages, molecular biology and pathway knowledge for a team of users. Correspondences are not necessarily obvious to detect, requiring specific rules. BioDASH uses ‘Semantic Lenses’ to filter and aggregate RDF particularly in bio-orientation – for example, a pathway lens.

Social and knowledge networking

Application interoperability

A key knowledge commodity is who knows what and making connections with similar minded communities. The popular Friend of a Friend (FOAF) project http://www.foaf-project.org/ creates a Web of machine-readable homepages describing people, the links between them and the things they create and do. In addition to providing simple directory services, FOAF can be used to provide assistance to new entrants in a community and locate people with similar interests. It is a simple RDF vocabulary; the power is the links. The big plus comes when the data is aggregated and can then be explored and crosslinked. SciFOAF builds a FOAF community mined from the analysis of authors and publications over PubMed (http://www.urbigene.com/foaf/).

It is not just web pages that can be annotated; middleware like the Web and Grid services can also be annotated to make them accessible to applications [32]. Techniques for planning, composing, editing, reasoning and analysing about these descriptions are being investigated and deployed to resolve semantic interoperability between services within scalable, open environments. Semantic annotations on applications and data sets enable them to be intelligently discovered, compared and combined into workflows [33,34] to run complex applications, integrate data and mine knowledge (DiscoveryNet http://www.discovery-on-the.net/) [9].

Advanced content syndication and publication Syndication is the process by which a web site is able to share information, such as articles, with other web sites. Scientific publishers like the Institute of Physics (http://syndication.iop.org/) and the International Union of Crystallography (http://journals.iucr.org/services/rss.html) publish RSS feeds in RDF using standard RSS, Dublin Core and PRISM (http:// www.prismstandard.org/) RDF vocabularies. Uniprot has an experimental publication of results in RDF. As with FOAF, the fun starts when data is aggregated and crosslinked.

Integration, aggregation and crosslinking Results in RDF graphs yielded from multiple heterogeneous sources, say, UniProt, Interpro and PubMed, are manipulated and combined using shared ontological concepts and corresponding URIs as bridges (Figs 2 and 3). The semantic definitions serve as ‘smart glue’ specifying how objects relate to each other, managing these inferred or explicitly proposed relations outside any one repository and aggregating genomic, proteomic, cellular, physiological, and chemical data, even when these data are kept in different databases with 230

www.drugdiscoverytoday.com

Knowledge Grids and Knowledge mining The purpose of all this technology is to create a knowledge substrate for knowledge mining. Examples of Grid-based Problem Solving Environments that integrate ontology and workflow approaches for bioinformatics application on the Grid include Proteus [35], myGrid [9] and DiscoveryNet (http://www.discovery-on-the.net/).

Concluding remarks The Semantic Web, and its less well-defined cousin the Knowledge Grid, promises a paradigm shift for the delivery of knowledge just as the Web changed how we publish and find information. Certainly there is a tremendous amount of activity in the development of standards, machinery and tools for the semantic infrastructure and research into its theoretical underpinnings. The technologies being developed have already demonstrated their value for ontology development and advanced metadata modelling for general knowledge-based applications. The first W3C Semantic Web for Life Science Workshop in 2004 attracted over 100 participants with representation from all the major pharmaceutical and drug discovery players. Within a closed enterprise pharmaceutical environment, such as KSpace in Novartis, or

Vol. 2, No. 3 2005

Drug Discovery Today: Technologies | Knowledge management

Figure 3. The BioDASH Drug Development Dashboard prototype. BioDASH associates disease, compounds, drug progression stages, molecular biology, and pathway knowledge for a team of users. It is based on the concept of a therapeutic topic model. Information sources are exposed as RDF – common vocabularies then tie these resources together. Applications can then make use of knowledge services and inference to support user tasks. The figure shows an investigation into the therapeutic value of glycogen synthase kinase 3 beta (GSK3beta), a regulatory enzyme associated with multiple diseases, including diabetes type 2. The BioDASH demonstration is built on Haystack [10], which is an extensible Semantic Web Browser developed by MIT. Data is aggregated and browsed from OMIM and Uniprot by using LSIDs resolved into RDF and automatically linked with a BioPAX pathway, also in RDF, using common LSIDs. The pathway view combines the protein and interacting compounds views. The BioPAX model is represented in OWL. Current work uses reasoning to consistency check nutrient-related analyses for essential compounds, missing essential compounds, and reactions, both fired and unfired.

bounded communities, like the Alzheimer’s Forum we are beginning to see Semantic Webs as they were envisioned. For a Semantic Web to flourish, the communities it would serve needs to be knowledge driven, globally distributed and able and willing to create and maintain the semantic content. It is this latter point that is crucial. As Gardner [15] suggests in this issue, the drug discovery related communities are embracing ontologies. The Life Science world has the desire for collaboration, a culture of annotation, and act as service providers that might be persuaded to generate RDF or at least annotated XML. A Semantic Web is expensive to set up and maintain and, thus, is only probable to work for communities where the added value is worthwhile and an ‘open source data’ philosophy prevails. However, beware of the hype. Most of the technologies are yet to be established and many key components – trust, security, context – are missing. Reusing data in unexpected and uncontrolled ways might not be desirable. There are sceptics [36] who argue that XML is enough, and for tightly

defined and static problems they could be right. The best examples of Semantic Web applications have been when the emphasis has been on the content and connectivity rather than elaborate reasoning. Simple ontologies and simple integration approaches, like FOAF and RSS, have had an impact. Elaborate reasoning has chiefly been used in knowledge applications or for building large ontologies, rather than integrating metadata, despite its centre stage position in the W3C standards activity. Although Semantic Web advocates like to distance themselves from A.I. the Semantic Web is still sold with far-fetched A.I. scenarios [3]. In reality it is the Web aspect – the integration layer and a means to simply link heterogeneous semistructured data – that will win out rather than the rich semantics, although the research activity, and hype, has primarily been the other way about [37]. The important lesson is that a Semantic Web needs Semantics, however simple and however scruffy. Up to now there has been more emphasis on technology push and not enough on application and service provider www.drugdiscoverytoday.com

231

Drug Discovery Today: Technologies | Knowledge management

References

Links  The Semantic Web portal http://www.semanticweb.org/  Semantic Web for Life Sciences http://www.biopathways.org/ semweb/  W3C Semantic Web Activity http://www.w3.org/2001/sw/  Semantic Grid portal http://www.semanticgrid.org  Global Grid Forum http://www.ggf.org  W3C Semantic Web for Life Science Workshop http:// www.w3.org/2004/07/swls-ws.html  Standards and Ontologies for Functional Genomics http:// www.sofg.org  Open Biological Ontologies portal http://obo.sourceforge.net/  RDF Resource Description Framework http://www.w3.org/RDF/  RDF Schema http://www.w3.org/TR/rdf-schema  OWL Web Ontology Language http://www.w3.org/2004/OWL/  LSID Life Science Identifier http://lsid.sourceforge.net/  Friend of a Friend, FOAF http://www.foaf-project.org/

pull. While waiting for the ‘killer application’, the machinery for the platform has been built in a partial vacuum. Some of this infrastructure is cumbersome and hard for application developers to engage with – the layering of OWL over RDF lacks elegance and the number of layers and components can make debugging tricky. Service providers need migration tools and application developers need client libraries and simple APIs, and both need appropriate tooling. In an ideal world, users should never see the Semantic Web at all. The Web needed to be incubated by a highly motivated community with an application and a generous spirit – in this case, High Energy Physics. The Semantic Web needs the nursery of another community that would benefit hugely from its capabilities. It is not e-commerce. It should be drug discovery.

Related articles Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521 Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American, May Neumann, E. (2005) A Life Science Semantic Web: are we there yet? Sci. STKE 2005, 10 May

Outstanding issues  Making it semantic: content acquisition and content management.  Making it Web: tackling distribution, heterogeneity and inconsistency of metadata, ontologies and rules.  Making it understandable: lowering the barriers of entry for developers and simplifying the complexity of the components and language layers.  Making it safe: security, trust, proof and policy management.  Making it accessible: providing the tooling and machinery for applications and content providers.  Making it work: focus on performance and scalability, not just language expressivity.

232

Vol. 2, No. 3 2005

www.drugdiscoverytoday.com

1 Hendler, J. (2003) Science and the Semantic Web. Science 299, 520–521 2 Neumann, E. (2005) A Life Science Semantic Web: are we there yet? Sci. STKE 283, pe22 3 Berners-Lee, T. et al. (2001) The Semantic Web. Scientific American, May 4 Foster, I. and Kesselman, C. (1999) The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann 5 Goble, C.A. et al. (2000) Enhancing services and applications with knowledge and semantics. In The Grid: Blueprint for a New Computing Infrastructure (2nd edn) (Foster, I. and Kesselman, C., eds), Morgan Kaufman 6 Cannataro, M. and Talia, D. (2004) Semantic and Knowledge Grids: building the next-generation grid. IEEE Intell. Syst. (ISSI-0095-1203) – Special Issue on E-Science 19, 56–63 7 Goble, C. and De Roure, D. (2004) The Semantic Grid: building bridges and busting myths. 16th European conference on Artificial Intelligence ECAI 2004 including Prestigious Applicants of Intelligent Systems, 22–27 August 2004, Valencia, Spain, PAIS IOS Press (ISBN 1-58603-452-9) 8 Clark, T. et al. (2004) Globally distributed object identification for biological knowledgebases. Brief. Bioinform. 5.1, 59–70 (http:// lsid.sourceforge.net/) 9 Stevens, R. et al. (2004) Exploring Williams–Beuren Syndrome using myGrid. Bioinformatics 20, i303–i310. Proceedings of 12th Intelligent Systems in Molecular Biology (ISMB), 31st July–4th August, Glasgow, UK (myGrid http://www.mygrid.org.uk) 10 Quan, D. et al. (2003) Haystack: a platform for authoring end user semantic web applications. Proceedings of the 2nd Intl. Semantic Web Conference ISWC 2003, Sanibel, FL 11 Carroll, J.J. et al. (2004) Named graphs, provenance and trust. Proceedings of the 14th International Conference on World Wide Web, Chiba, Japan. pp. 613–622 12 Stevens, R.D. (2000) Ontology-based knowledge representation for bioinformatics. Brief. Bioinform. 1, 398–414 13 RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, 10th February 2004 (http://www.w3.org/TR/rdf-schema) 14 Horrocks, I. et al. (2003) From SHIQ and RDF to OWL: the making of a web ontology language. J. Web Semant. Sci. Serv. Agents World Wide Web 1, 7–26 15 Gardner, S.P. Ontologies. Drug Discov. Today Technol. (in press) 16 Horrocks, I. et al. (2005) OWL rules: a proposal and prototype implementation. J. Web Semant. 3, 23–40 17 Lord, P. et al. (2004) Applying semantic web services to bioinformatics: experiences gained, lessons learnt. Proceedings of the 3rd International Semantic Web Conference – ISWC 2004, 9–11 November, Hiroshima, Japan (Springer LNCS 3298) 18 Knublauch, H. et al. (2004) The Prote´ge´-OWL Plugin: an open development environment for semantic web applications. Third International Semantic Web Conference – ISWC 2004, November, Hiroshima, Japan (http://protege.stanford.edu/plugins/owl/) 19 Sabou, M. et al. (2005) Learning domain ontologies for web service descriptions: an experiment in bioinformatics. Proceedings of the 17th International Conference on World Wide Web (WWW2005), May, Japan 20 Handschuh, S. and Staab, S. (2002) Authoring and annotation of web pages in CREAM. Proceedings of the 11th International World Wide Web Conference (WWW2002), 7–11 May, Honolulu, Hawaii, USA 21 Ghanem, M. et al. (2005) A grid infrastructure for mixed bioinformatics data and text mining. Proceedings of the 3rd ACS/IEEE International Conference on Computer Systems and Applications, January, Cairo, Egypt, IEEE Computer Society 22 Ciravegna, F. et al. (2004) Learning to harvest information for the Semantic Web. Proceedings of the 1st European Semantic Web Symposium, 10– 12 May, Heraklion, Greece 23 Volz, R. et al. (2003) Unveiling the hidden bride: deep annotation for mapping and migrating legacy data to the Semantic Web. J. Web Semant. Sci. Serv. Agents World Wide Web 1, 187–206 24 Lund, B. (2004) Science Publishing and the Semantic Web: RSS, RDF and Urchin, Position Paper W3C Semantic Web for Life Science Workshop (http://www.w3.org/2004/07/swls-ws.html)

Vol. 2, No. 3 2005

25

26

27 28 29

30 31

Wroe, C.J. et al. (2003) A methodology to migrate the gene ontology to a description logic environment using DAML+OIL. Proceedings of the 8th Pacific Symposium on Biocomputing (PSB), January, Hawaii Zhao, J. et al. (2004) Using Semantic Web technologies for representing escience provenance. Proceedings of the 3rd International Semantic Web Conference ISWC2004, 9–11 November, Hiroshima, Japan (Springer LNCS 3298) Frey, J. et al. (2004) Less is More: Lightweight Ontologies and User Interfaces for Smart Labs. The UK e-Science All Hands Meeting 2004 Guha, R. and McCool, R. (2003) TAP: a semantic web test-bed. J. Web Semant. Sci. Serv. Agents World Wide Web 1, 81–87 Domingue, J. and Motta, E. (2005) Planet-Onto: from news publishing to integrated knowledge management support. IEEE Expert Syst. Special Issue on Knowledge Management and Distribution over the Internet 15, 26–32 Maedche, A. et al. (2001) Semantic portal – the SEAL approach. In Creating the Semantic Web (Fensel, D. et al. eds), MIT Press Cheung, K.H. et al. (2005) YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 21 (Suppl. 1), i85–i96

Drug Discovery Today: Technologies | Knowledge management

32

33

34

35

36 37

Nixon, L. and Paslaru, E. State of the Art of Current Semantic Web Services Initiatives, Deliverable 2.4.ID1 Knowledge Web Network of Excellence (http://knowledgeweb.semanticweb.org) Lord, P. et al. (2005) Feta: a light-weight architecture for user oriented semantic service discovery. In European Semantic Web Conference, Lecture Notes in Computer Science, (Vol. 3532) (Go´mez-Pe´rez, A. and Euzenat, J., eds) Springer-Verlag Potter, S. and Aitken, S. (2005) A semantic service environment: a case study in bioinformatics. In European Semantic Web Conference, Lecture Notes in Computer Science, (Vol. 3532) (Go´mez-Pe´rez, A. and Euzenat, J., eds) Springer-Verlag Cannataro, M. et al. (2004) Proteus, a grid based problem solving environment for bioinformatics: architecture and experiments. IEEE Comput. Intell. Bull. 3, 7–18 (ISSN 1727-5997) Nee, E. (2005) Web Future is Not Semantic, Or Overly Orderly CIO Insight 5 May (http://www.cioinsight.com/article2/0,1540,1815338,00.asp) McBride, B. (2004) Four Steps Towards the Widespread Adoption of the Semantic Web. Proceedings of the 1st International Semantic Web Conference ISWC2002, 9–12 June, Sardinia, Italy (Springer LNCS 2342)

www.drugdiscoverytoday.com

233

The semantic web and knowledge grids.

The Semantic Web and the Knowledge Grid are recently proposed technological solutions to distributed knowledge management. Early experimental applicat...
521KB Sizes 0 Downloads 4 Views