JPROT-01784; No of Pages 11 JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

Available online at www.sciencedirect.com

ScienceDirect www.elsevier.com/locate/jprot

Review

A Sydney proteome story☆ Keith L. Williamsa,⁎, Andrew A. Gooleyb , Marc R. Wilkinsc , Nicolle H. Packerd a

BKG Group, 2B Mosman St, Mosman, NSW 2088, Australia TRAJAN Scientific & Medical, 7 Argent Place, Ringwood, Victoria 3134, Australia c Systems Biology Initiative, School of Biotechnology & Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia d Biomolecular Frontiers Research Centre, Faculty of Science, Macquarie University, Sydney, NSW 2109, Australia b

AR TIC LE I N FO

ABS TR ACT

Keywords:

This is the story of the experience of a multidisciplinary group at Macquarie University in

Proteome history

Sydney as we participated in, and impacted upon, major currents that washed through

Mass spectrometry

protein science as the field of Proteomics emerged. The large scale analysis of proteins

Automation

became possible. This is not a history of the field. Instead we have tried to encapsulate the

2-D gels

stimulating personal ride we had transiting from conventional academe, to a Major National Research Facility, to the formation of Proteomics company Proteome Systems Ltd. There were lots of blind alleys, wrong directions, but we also got some things right and our efforts, along with those of many other groups around the world, did change the face of protein science. While the transformation is by no means yet complete, protein science is very different from the field in the 1990s. This article is part of a Special Issue entitled: 20 years of Proteomics. © 2014 Elsevier B.V. All rights reserved.

Contents 1.

2.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Protein science in the 1980s and 1990s . . . . . . . . . . . . . . . 1.2. Coining the term “proteome” . . . . . . . . . . . . . . . . . . . . 1.3. Context of slow protein science and expansive genomics . . . . . 1.4. What does proteomics encompass? . . . . . . . . . . . . . . . . . 1.5. Learning how to make progress in proteomics . . . . . . . . . . . 1.6. Bringing 2-D gel (array) technology together with instrumentation 1.7. Amino acid analysis and pattern matching . . . . . . . . . . . . . APAF (Australian Proteome Analysis Facility) . . . . . . . . . . . . . . . . 2.1. Sample preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Protein identification . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Protein identification: the rise of mass spectrometry . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

☆ This article is part of a Special Issue entitled: 20 years of Proteomics. ⁎ Corresponding author. E-mail addresses: [email protected] (K.L. Williams), [email protected] (A.A. Gooley), [email protected] (M.R. Wilkins), [email protected] (N.H. Packer).

http://dx.doi.org/10.1016/j.jprot.2014.04.006 1874-3919/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

0 0 0 0 0 0 0 0 0 0 0 0

2

JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX–XX X

2.4. Post-translational modification analysis, glycosylation 2.5. Engineering . . . . . . . . . . . . . . . . . . . . . . . . 2.6. Bioinformatics . . . . . . . . . . . . . . . . . . . . . . 3. Going corporate . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Partnership with Shimadzu . . . . . . . . . . . . . . . 3.2. The Xcise . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. The ChIP (Chemical Inkjet Printer) . . . . . . . . . . . 3.4. Post-translational modifications . . . . . . . . . . . . 3.5. Bioinformatics: partnership with IBM . . . . . . . . . . 3.6. The people . . . . . . . . . . . . . . . . . . . . . . . . 4. Where are we now? . . . . . . . . . . . . . . . . . . . . . . . Transparency document . . . . . . . . . . . . . . . . . . . . . . . Transparency. document References . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

1. Introduction 1.1. Protein science in the 1980s and 1990s During the early 1980s monoclonal antibodies were becoming the most powerful tool available for not only characterising developmentally regulated proteins but also purifying them at levels that enabled protein primary structure analysis. Marianne Krefft (Max Planck Institut fur Biochemie in Munchen, in Keith Williams' lab), was using monoclonal antibodies to study a model developmental system and she identified a cell surface glycoprotein that would define our course into the field of protein science. In 1984 Williams returned to Macquarie University in Sydney to establish the Biotech programme and introduce a more molecular approach to biology in an academic discipline with strong emphasis on evolution and ecology. It was not a great time to be a protein scientist as the genomics revolution was in full flight, with concomitant mass migration of technologically oriented biologists into DNA-based experimental programmes. However, in the same way that the DNA science was being transformed, we were exhilarated by having new tools to probe hitherto intractable problems, such as studying complex cell surface glycoproteins. With monoclonal antibodies as probes for both the peptide backbone and a glycosylated domain, we were able to make considerable progress in characterising a Prespore Specific Antigen (PsA) of Dictyostelium discoideum, a key marker for one class of cells in a small multicellular structure. This enabled studies about emergence of the prespore cells and how the pattern of two classes of cells was formed (based on flow cytometry and tissue staining studies). At the biochemical level, we had sufficient material from using affinity columns to extract chemical amounts of the protein. This led to getting limited protein sequence information and matching the gene through studies in Jeff Williams' lab at the University College London, UK. As we understood more, we realised that this cell surface protein was inserted into the membrane by a glycolipid anchor. Characterisation of the protein and its glycoforms helped us reorient our programmes to what became proteomics, while ultimately the glycolipid anchor was determined by Paul Haynes working with Mike Ferguson's group in Dundee, Scotland and the

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

. . . . . . . . . . . . ..

0 0 0 0 0 0 0 0 0 0 0 0 0 0

3D structure of the glycoprotein was done by Bridget Mabbutt and Paul Curmi at UNSW in Sydney. In all 9 PhD students worked on some aspect of this one protein, and this was a time when genomics researchers were getting confident about working on multiple genes. We realised that protein science needed to change, but the way forward wasn't clear. We had established MUCAB (Macquarie University Centre for Analytical Biotechnology) with a major focus on protein and glycoprotein chemistry and we used standard approaches of the time to characterise proteins that had been purified through laborious techniques. N-terminal Edman sequencing was a key first step in the protein chemistry armoury. Andrew Gooley had spent time in Helmut Meyer's group in Bochum and this upskilled the group dramatically. Meyer was studying Serine and Threonine phosphorylation at the time and, serendipitously, PsA was glycosylated on a Threonine rich region of the protein. A key finding by Andrew was that there was an O-glycosylated repeat spacer domain that was polymorphic in different isolates (with 3, 4 or 5 copies of a glycosylated PTVT repeat). By the early 1990s we were running technology courses in MUCAB and a wildly enthusiastic New Zealander, Ben Herbert, kept coming across the Tasman. Ben, who worked on wool, was highly skilled at sample preparation and gel technologies, which we recognised as critical roadblocks at that time. Ben was passionate about 2-D gels. What attracted our attention was the ability to array hundreds of proteins in two dimensions so that they were chemically pure. More exciting was Ben's push to increase the sample loadings so that chemical amounts of proteins were arrayed and then by electroblotting to PVDF, hundreds of highly purified, archived proteins were available for characterisation. At this time it was hard to go the next step and actually characterise proteins from PVDF blots of 2-D gels, but it certainly changed the way we were looking at protein science. This was reminiscent of genomics, where large numbers of genes could be studied. Concurrent with the preparative 2-D gel developments in the mid-1990s, Pappin, Cottrell, Henzel and others defined the simple and elegant idea of using mass spectrometry to identify a subset of tryptic peptides from a tryptic digest of a pure protein. MALDI-TOF MS was now sufficiently developed

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

to accurately measure such peptide masses which could be screened against the masses detected from an in silico digest of a DNA sequence. If the DNA sequence identified the gene, the protein could thus be identified. In fact tryptic digestion in situ on a PVDF blot was straightforward, so the potential for protein identification against DNA databases became a possibility. The DNA world became inextricably linked to progress in protein science. And both technologies were racing, DNA through the human genome project and protein science through parallel approaches to protein purification and characterisation.

1.2. Coining the term “proteome” A number of protein groups around the world understood the very slow and laborious progress on characterising single proteins, compared to what was going on in the adjacent genomics labs. So a number of groups began to think more industrially about protein science. We realised that protein sciences lived in the shadow of genomics in our discussions with corporate groups, who often saw protein science as ‘old fashioned’ and probably redundant because in the future biology would be done through gene definition by genomics and then the rest would be handled by “in silico” biology, i.e. the rest was a computational problem. Our work on protein modifications pioneered by Nicki Packer meant we were very clear that there was a lot to learn between a gene sequence and a computer. Because our work with technology companies (see below) had given us a sense of the need for marketing, we sought to come up with a defining term to shorthand what we did and help reposition the field of protein chemistry. Marc Wilkins, a PhD student at the time, came up with the term “Proteome” which had a nice “ring” to it and also acknowledged the link to the “Genome”. So we started using the term in the lab in 1993. The first use of the term in public was at the 1994 Siena meeting; this issue celebrates the 20th anniversary. The term was not adopted immediately and some more genomics focused researchers used the term “Functional Genomics”. Talking about proteins as “functional genomes” didn't really work for us. So we stayed with the “Proteome”. We thought it was important to emphasise the distinction between information (genomics) and function, which involves proteins (proteomics).

1.3. Context of slow protein science and expansive genomics As indicated above, while we were slowly completing the characterisation of a single cell surface glycoprotein (other protein labs having their own specific target protein), the world was experiencing the genomics revolution. Genes and whole genomes of small organisms (Mycoplasma sp.) were being sequenced and it was becoming clear that sequencing the human genome was both feasible and would happen in a relatively short time frame. Suddenly the “parts manual” for biology was in sight and this was extraordinary given that we had grown up knowing less than 10% of the informational content of most organisms. By early 2001 the first draft of the human genome was completed and in 2014 the much awaited $1000 human genome was announced by Illumina with the

3

release of their HiSeq X10 system [1]. So genomics is now available for personalised medicine at ridiculously low cost (the first human genome sequence cost more than $1 billion). All kinds of gene defects are now being detected with consequences for new modes of treatment and illuminating new drug targets. This makes it more urgent to find better ways to study proteins as these are the targets and often the drugs. This was, and still is, a pull for the evolution of proteomics. Our view about the importance of protein science was confirmed when it became apparent that readout from a single gene could lead to a huge multiplicity of actual proteins. The adhesion protein N-CAM is a good example where the gene encodes a transmembrane protein, but the transmembrane peptide domain does not occur in some cases and is replaced by a glycolipid anchor. More complicating is that some forms of N-CAM are in fact extracellular soluble proteins, with no membrane anchor at all. Added complexity is observed in the huge diversity of different glycoforms on the same protein in different tissues. N-CAM is no special case; there is great diversity in the actual proteins produced along with their post-translational modifications. Biology is much more complex than just gene readout, even though understanding regulation of gene readout is itself a huge challenge. To make progress with complexity, where a single glycoprotein of interest might occur in many different isoforms, each of which could be separated on a 2-D gel, we began to learn about innovations in sample prep and instrument development. Almost all of this work happened in partnership with several leading biotech instrument companies. We were pushed to do this to help fund the group, but we also found the interaction with companies exhilarating. We are grateful to have worked with many supportive corporate individuals, both at the local Australian level as well as at international headquarters and, through their support, with major instrument companies who had the capacity to implement and commercialise innovations to make them widely available.

1.4. What does proteomics encompass? Twenty years ago the focus of proteomics was largely about protein discovery and identification, although as we show here some groups (including us) were clear that understanding function was going to require more, especially about protein modifications, location and even the broader context, such as the nature of interacting proteins. As will become clear in this personal historical perspective, the primary initial challenge was to develop technologies and approaches to allow analysis and identification of multiple proteins. Sample preparation was hugely important – and still is – as proteins need to be solubilised to be moved around. Mass spectrometry became the primary analytical tool and automation became critical to array and manipulate samples.

1.5. Learning how to make progress in proteomics The problem with conventional protein science is that it involves many steps: you start with a complex mixture of proteins and purifying the target protein comes with losses

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

4

JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX–XX X

that end in frustration. Although affinity based purification with monoclonal antibodies made the process easier, one was still confronted with a multistep process that had substantial losses at each step. Hence rare proteins were essentially impossible to work with and difficult proteins (e.g. membrane proteins) were often recalcitrant because they didn't stay soluble. You also needed a high quality monoclonal antibody to get the process started. We, along with others around the world who had decided to stay with protein science, knew that a different approach was needed. Rather than separating and purifying individual proteins, a significant group of people interested in the area began to focus on arraying the proteins and hence purifying a large number of proteins by separating them spatially. There was also a major focus on solubilisation technologies to ensure that the interesting proteins were able to be handled. This set us on the path of having pure proteins but in microgram to nanogram amounts. Proteins are worked on because they are functional molecules that are the workhorses of the cell. While narrowly focused experts in sample preparation, chemistry, mass spectrometry, and robotics, were needed to make progress, ultimately proteins are studied to understand function, be it pattern formation in development, cancer and other diseases, microbial disease, taxonomy, evolution and even agricultural and environmental problems. Any aspect of biology has protein-based questions unanswered. Our group was deeply embedded in biology as we understood that the best chemistry and engineering were wasted on cells or tissues that were not well prepared and in a relevant biological state. The core of the group worked on a simple eukaryote, D. discoideum, as this gave us a means of obtaining quality material in subtly changed states (e.g. as part of a developmental cycle). Being confident about material allowed us to rigorously check out our analytical techniques and instrument development. As we got confident about our techniques we took on increasingly challenging biological projects. Jenny Harry ran our first industry project with the Chiron Corporation.

1.6. Bringing 2-D gel (array) technology together with instrumentation A visit to Sydney by Denis Hochstrasser in 1993, crystallised a way forward that we (together with Denis' group in Geneva) proceeded to implement. Denis' team were world leaders in organising and manipulating the large amounts of protein information becoming available from both genomics and protein work. They were also at the forefront of 2-D gel technology and they were beginning to popularise adoption of IPG (immobilised pH gradient) gels to separate significant quantities of proteins (in the first dimension). We had an instrumentation group and partnerships with major US companies on protein sequencing and amino acid analysis. With the arrival of Ben Herbert from New Zealand, our sample preparation and 2-D gels skills were dramatically expanded. Ben also spent a significant sabbatical in Pier Georgio Righetti's laboratory in Verona, one of the leading electrophoresis research laboratories at the time. The sample preparation area also involved a close partnership with BioRad. And we appreciated, through close interaction with John Redmond's carbohydrate chemistry group,

the importance of post-translation modifications. At the time apart from characterisation of N-glycans, little was known of other forms of glycosylation of proteins or indeed the many other modifications that have since become clear.

1.7. Amino acid analysis and pattern matching One of our earliest technology development projects involved development of a new amino acid analyser in partnership with Australian technology company GBC Scientific Instruments. This project involved development of F-MOC chemistry for amino acid analysis with John Redmond's group in chemistry. Paul Haynes did his PhD on this project and GBC commercialised the instrument (AminoMate). In this project we first learned about requirements for scientific instrument development. The proteomics application that arose from this project was an attempt to identify proteins from 2-D gel spots based on their amino acid composition. Marc Wilkins did this work in partnership with the Geneva group of Denis Hochstrasser. The technique produced some interesting results and was clearly able to be scaled for high sample throughput, but it became clear that with the concomitant improvement in the resolution of mass spectrometers, peptide mass fingerprinting and MS/MS would become the technique of choice. Nevertheless, amino acid analysis remains a significant quality control measure and quantitative technique, which various industry groups contract from APAF 18 years later!

2. APAF (Australian Proteome Analysis Facility) In 1995 there was an election looming. The Australian Telescope group was lobbying the Government for support in building a major telescope in Chile as part of an international consortium. Our interpretation of Prime Minister Paul Keating's view was along the lines of “How can I send tens of millions to Chile in an election year? Surely there are big science projects in Australia that deserve Government support? That is the $64 million dollar question!” So the Major National Facilities (MNRF) programme was set up. We knew that Proteomics was a “big” science field, notwithstanding that there was little consciousness of this internationally in 1995. Indeed the term was still barely used. Nevertheless, we developed a bid in the MNRF programme to develop a National facility (APAF, Australian Proteome Analysis Facility). It was an interesting introduction to high level government politics and there were many twists and turns, but we did establish our case and became one of 7 MNRF programmes (the others being AGRF (Australian Genome Research Facility), ASRP (Australian Synchroton Research Program), ATNF (Australian Telescope National Facility), ANSIR (Australian National Seismic Imaging Research), APFRF (Australian Plasma Fusion Research Facility), ARA (Airborne Research Australia)). So soon after coining the term “proteome”, we had the opportunity to formalise the term through a Major National Research Facilities (MNRF) programme in Australia, which paralleled a similar proteomics initiative in Denmark. Interestingly each national funding programme had a different structural emphasis, with the Australian programme being for

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

a building and equipment, whereas the Danish funding was largely for personnel. We joked that perhaps the two programmes might be consolidated somewhere between Scandinavia and Australia! APAF (Australian Proteome Analysis Facility) was established at the end of 1995 with a $7 million MNRF grant (including $1 million funding from Macquarie University). We had a great team, but our staff were precariously funded. APAF not only brought together a talented team of scientists, but a range of industry partners became an essential part of the team. These included: amino acid analysis platform with GBC Scientific Equipment, the development of an Edman sequencing platform for identification of sites of glycosylation with Beckman Instruments (now Beckman-Coulter, a Danaher company), Nand C-terminal sequencing with Hewlett-Packard (Agilent Technologies), Electrophoresis hardware and consumables (Bio-Rad Laboratories), Gel-spot cutting robot with Advanced Rapid Robotic Manufacturing, which was ultimately commercialised by Bio-Rad, and various mass spectrometry applications with PerSeptive Biosystems and MicroMass (now a division of Waters Corporation). By acquiring a small protein service company and relocating the demountable laboratories to Macquarie University, APAF commenced operations in early 1996. At the same time, plans were completed for APAF to occupy the 4th floor of a new Life Sciences/Chemistry building, which was under construction. The layout of APAF represented the proteomics workflow, with entry to the facility being a sample preparation area, followed by array technology instrumentation (primarily 2-D gel facility developed with Bio-Rad). There followed a robotics area for sample handling (spot excision ARRM/Bio-Rad platform, liquid handling (Packard Multiprobe) or PVDF treatment (developed on a modified ink-jet platform with MicroFab Technologies), while this led to the analytical instruments area, which involved traditional instruments such as protein sequencing technology (Beckman, Hewlett Packard, Applied Biosystems), amino acid analysis (GBC), and a burgeoning mass spectrometry facility (PerSeptive and MicroMass). Glycosylation analysis was performed off PVDF blots using Dionex and GCMS. Finally it was envisaged (but not implemented initially) that bioinformatics would provide the core integration needed to synthesise the many different treatments that an individual sample was subjected to. Giving proteomics a physical reality helped many to understand the scale of the changes that protein science was undergoing. It also gave individual specialist scientists a sense of how their activities were an important part of the whole process. It had other consequences such as putting pressure on key individuals: a common joke was that when Andrew Gooley went on holiday most of our instrumentation ended up unhappy! Eighteen years later APAF remains a world class Facility and probably the only remaining government funded large scale proteomics facility, with an integrated state of the art capacity servicing national and international needs. It has enjoyed continuous Australian Government funding, with 25 staff and headed by Mark Molloy who lived through the early years as a PhD student. Mark's experience at Pfizer has helped consolidate APAF's strong commercial (~ 40%) as well as academic (~ 60%) support. Today APAF is a state of the art facility with mass spectrometry at its core. Major projects

5

involve biosimilars, cytokines, large scale protein identification and quantitation, and post-translational modifications (phosphorylation, glycosylation, N-terminal cleavage etc.). APAF is popular with several major pharmaceutical companies because of the quality of its services and its stability.

2.1. Sample preparation The major challenge at the start of a proteomics workflow was (and still is) keeping the proteins, which are enormously chemically diverse, soluble so that they can not only be studied but are also representative of the in vivo state (with co- and post-translational modifications). A major difference between genomics and proteomics is that the chemistry of DNA is simple while the chemistry of proteins is a vast canvas. Proteins vary hugely in size, solubility, and modification. It is not just that DNA has 4 bases and proteins have 20 amino acids. DNA does undergo some modification (e.g. methylation) but the scale of protein modification is very complex and remains poorly understood. While phosphorylation has captured the interest of many researchers, most cell surface and secreted proteins are glycosylated. Many have lipid attachments. Many amino acids can be modified in the protein chain (Asn, Ser, Thr, His, Lys, Pro, Tyr etc.). This presents daunting technical challenges, especially for “interesting” proteins (e.g. those on the cell surface). Many proteins are poorly soluble under traditional aqueous protein extraction conditions. Proteomics meant that we needed to address this and Ben Herbert led a team whose goal was to array as complete a proteome as possible and to do this preparatively, so that chemically significant amounts of material were available for protein and carbohydrate identification. While our focus was to array the proteome using 2-D gels, the solubilisation techniques were more broadly relevant. Our progress here often came as a result of the needs of the biologists in the group. Over the years we have worked with many difficult materials (e.g. sputum, which is viscous and challenging to solubilise; wool, which of course is insoluble) and intractable proteins (cell surface, very large). We also needed enough material to attempt protein characterisation, so our work produced chemically tractable amounts of proteins, rather than images on an analytical gel.

2.2. Protein identification In 1987 Paul Matsudaira at MIT demonstrated N-terminal sequencing of proteins directly from PVDF membranes. At this time it was not possible to load preparative amounts of protein on 2-D gels, although various laboratories were working on sample loading. Ben Herbert's work on protein solubilisation and keeping proteins soluble throughout the whole process of 2-D gel separation meant that we could start to see how much protein we could recover on individual 2-D gel spots. By 1995 Brad Walsh working with human cells, and Angelika Gorg working with wheat proteins, achieved sufficient material for Andrew Gooley to obtain N-terminal sequence information using Edman sequencing off PVDF membranes.

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

6

JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX–XX X

Although Edman sequencing was a laborious process, it is interesting to see how we pushed this technology in search of higher throughput protein identification and N-terminal sequence identity. By limiting the N-terminal sequence to 5 amino acids (a definitive number) a protein ‘sequence tag’ could be generated and the PVDF spot could then be subjected to amino acid analysis. The tag and composition data were combined in the TagIdent tool, which was developed in collaboration with the Geneva ExPASy team, and proved to be an accurate means of protein identification. The Edman sequencer could be “detuned” for just doing the 5 N-terminal amino acids of sequence, which helped decrease analysis time. To further increase throughput, we built a rotating cartridge carousel to allow the analysis of multiple samples overnight. This is very primitive by today's standards, but it shows the small steps that were made and mostly how we were thinking about getting beyond a 1 protein-at-a-time philosophy. N-terminal Edman sequencing is still offered in APAF for characterisation of therapeutic proteins even though there is no longer a commercially available instrument.

2.3. Protein identification: the rise of mass spectrometry We were a conventional protein lab in 1994 and mass spectrometry, while promising, could not solve the problems that we were studying. However, John Redmond's group in Chemistry at Macquarie University had worked with GC–MS on characterising monosaccharides of bacterial lipopolysaccharide (LPS). At this point in our evolution, Nicki Packer, who became a core member of our proteomics team, commented that we not only needed to study the carbohydrates but the glycopeptides as well using MALDI or LC–MS as the molecules were too large for GCMS. From this time on, we were conscious that change was happening and we constantly looked out for ways that our technology developments could be integrated with mass spectrometry. With APAF funding we were able to afford to start the integration of both MALDI-TOF MS and LC–MS/MS into our proteomics work flow.

2.4. Post-translational modification analysis, glycosylation In the early days of proteomics, little attention was paid to the existence of protein modifications. The technology for modification analysis was poor and many in the field had a genomics focus (i.e. they simply described the protein based on peptide mass fingerprinting, the predicted intact mass rather than the observed protein mass from techniques such as 1- or 2-D PAGE). The protein that was the focus of the biology in our group forced us to pay attention to glycosylation as we had evidence that the PsA protein was modified with O-GlcNAc. This finding happened in an environment where there was deep knowledge of bacterial LPS. So John Redmond and Nicki Packer were not scared of a simple glycan modification, even if it was on a protein. Running an Edman sequencing facility in a carbohydrate lab had its benefits — consulting with Nicki Packer and John Redmond, Andrew Gooley realised that if the extraction solvent could be more polar, the glyco-amino acids

should be recovered and identified by solid phase Edman sequencing. Using this technique, we were able to identify the sites of O-glycosylation on the PsA glycoprotein and that helped us to understand a tightly O-glycosylated PTVT repeat domain near to the membrane attachment of the protein. Studying O-linked glycans was also an interest in our group when we became involved with studies on heavily O-glycosylated mucins. There are no enzymes to cleave these glycans from the protein chain. Edman sequencing remains a good way to successfully carry out site-specific glycosylation analysis of densely O-glycosylated regions in mucins. With Edman protein sequencers no longer commercially available, this is a rare example of progress leading to the loss of capacity (in this case to study glycoforms on proteins). MS is powerful, but site characterisation on proteins such as mucins that are heavily glycosylated is still a major challenge.

2.5. Engineering As we learned to array proteins preparatively, and could do this for thousands of proteins, we then wished to analyse them. The problems of scale, however, became immediately apparent. Humans are not good at simple, repetitive tasks and it became necessary to develop spot cutting technology to excise spots from gels and place them in multiwell plates for processing. We were very focused on standardisation and so after much discussion we decided to use the 96 well plate format for all of our developments (including 2-D gel size). This was controversial at the time as others chose to move to much larger gel formats, which in our view were too technically difficult and which were not amenable to automation. Instead we built a bigger array by assembling many gels, each of which only interrogated a section of the proteome. 2-D gel technology invariably used first charge and then size as the fractionation modes. We (and others) realised that by doing a first (in liquid) separation according to isoelectric point and then developing narrow range IPG strips, we could magnify the first dimension separation dramatically. The development of the narrow range IPG strips and building an integrated electrophoresis power supply became major projects with Bio-Rad. These later became widely available as they were commercialised. The spot cutting project at Macquarie University started with a small Adelaide based engineering firm (Advanced Rapid Robotic Manufacturing). This was a creative partnership which quickly led to the development of a 96-well format spot cutter, which was commercialised by BioRad (Fig. 1).

2.6. Bioinformatics A major outcome of even the primitive proteomics endeavours was a massive amount of data that was produced by a whole library of instruments and in many different forms. We were strongly influenced by and collaborated with Denis Hochstrasser and his team, who had been working for a number of years on bringing some system into information collected on proteins. This work was central to organising data on particular proteins once collected. We chose to play at the sample and instrumentation end of this informatics story

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

Fig. 1 – The Bio-Rad spot cutter, developed in APAF in partnership with ARRM.

as there seemed a gap here and it complemented the Hochstrasser group. We immediately realised that all instruments used in laboratories collected data, which tended to become imprisoned on the instrument. It wasn't until we set up Proteome Systems (below) that we were able to seriously address the issue of collecting and integrating information as a sample was processed.

3. Going corporate The lack of staff funding was a major issue for the team in Sydney as all were on very soft money except for Williams. This became critical as we became heavily involved with new technology development, an area in demand internationally, especially in the companies specialising in this area. The capital base in the University system could not support the ambitious programmes that we were engaged in, so 3 years after setting up APAF, the core technologists left to establish Proteome Systems Ltd. nearby in Sydney. The alternative at the time was for the team to disperse internationally and all of us found the excitement of proteomics worth the risk of getting involved in a startup. It was tough on both APAF, which lost its key founders, and the new company, of course for different reasons. APAF had become prominent internationally and so support was found to provide more stable staffing opportunities to build on the initial foundations. For the founders of Proteome Systems in 1999, the challenge was a standard one for a Biotech startup. Who would pay the bills and what was the business going to do? By this time a significant part of the group was involved in technological developments from sample preparation, to gel technology, to robotics to various mass spectrometry applications. However, our exit from APAF and setup of Proteome Systems was primarily funded initially by Dow Agrosciences, who were intrigued by the possibilities that Proteomics offered. We suggested that several joint programmes might be established to test areas of protein science that were previously too hard to make progress in. One such programme

7

involved very large proteins, where genomics had not given helpful answers. This project was coordinated by Jenny Harry, but the whole company was highly focused on progress as this paid the bills. As with all of our programmes, major progress was made by working very closely with our partner's staff — in this case Dow Agrosciences. By innovating with gel technologies and coupling this with mass spectrometry, we characterised proteins far bigger than a conventional protein chemistry approach would have allowed. This provided interesting insights into a complicated area of agricultural technology development. Subsequently we decided to tackle medical problems that had proved intractable, on the basis that if the proteomics technology was truly enabling, then we should be able to break new ground. We focused on diseases of the lung (with intractable mucoid material) and the brain (high lipid content). In particular, projects were developed to seek early biomarker predictors of exacerbations in cystic fibrosis. This project was conducted with the Cystic Fibrosis Foundation of America, as exacerbation is a major crisis event for cystic fibrosis sufferers. We also developed a substantial programme on developing a simple and rapid field based diagnostic test for tuberculosis (TB), based on identification of TB proteins in sputum or blood — a definitive indicator of the active state of the disease. Both of these projects led to substantial technology innovation in sample preparation and high resolution screening of sputum and blood for key biomarkers. They also led to the development of an engineering programme to design a simple point-of-care protein based diagnostic test (DiagnostIQ, Fig. 2). The plan was for such a test to integrate with our discovery programmes when it was time to commercialise proteomic discoveries. Sample preparation and good informatics were at the core of this handheld device which produced a quantitative outcome within 3 min from effectively unprocessed samples and no need of a laboratory.

Fig. 2 – The DiagnostIQ rapid flow-through ELISA, developed in Proteome Systems. It was commercialised to provide a quantitative in-field diagnostic analysis of amylase levels in wheat to detect quality, and was at the centre of developments to provide a point-of-care diagnostic for tuberculosis.

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

8

JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX–XX X

At the sharp end of characterisation we got involved in the analysis of drugs in sport and Nicki Packer had an intriguing project with Nestle on human and cow milk glycoforms. It was an exhilarating time to be able to start to get answers to long standing problems in biology. Exiting the University led to the loss of some critical partnerships, notably an extremely productive one with Bio-Rad. However, we developed new partnerships with Sigma Aldrich and Millipore to get Proteome Systems' sample preparation technologies to market, and Ben Herbert drove our innovations with new electrophoresis equipment, the ElectrophoretIQ series. This led to commercialisation of a MultiCompartment Electrolyser (MCE) for preparative fractionation of complex protein mixtures into discrete pI fractions, which were then run on 96 well plate format 2D gels, with narrow range IPG strips. Gel production became industrialised by integrating the Boston-based Genomic Solutions team into our Woburn, MA Proteome Systems facility. Unfortunately although we had advanced plans for an automated 2-D gel to simplify the workflow, this was never reduced to practice. The philosophy behind all of our instruments was to avoid information being trapped on the device that collected it, but instead for data monitoring and control to be able to be connected with our informatics platform BioinformatIQ ultimately to form part of an integrated ProteomIQ platform (which included technology from various manufacturers), see Fig. 3.

3.1. Partnership with Shimadzu With the experience of developing a spot cutter with ARRM and commercialised by Bio-Rad, we embarked on a major robotics infrastructure development programme in partnership with the AusIndustry R&D START programme, and Japanese instrument manufacturer Shimadzu. We also strengthened our own engineering capacity by acquiring a Melbourne-based design and engineering group (Niche Innovation). This injected high-level professional design and engineering capacity through such outstanding staff as Bill Hunter, Chau Nugyen, Matt Durack and Gerard Rummery. Our reason for doing this was that having real capacity on protein array through the 2-D gel technology and increasing capacity in the mass spectrometry area, robotic sample processing became a major weakness. We prototyped two instruments (Xcise and CHIP).

3.2. The Xcise The Xcise was a fully integrated system with embedded image analysis for converting the protein array into X,Y coordinates, a gel punch connected to a syringe to facilitate aspiration/ dispensing of the gel spot and an 8-probe liquid handling capacity. The device also enabled spotting of processed samples on a 96 well format metal target for analysis by MALDI (Bruker Scout 384 and Shimadzu Axima). This was the first fully integrated gel spot cutter-MALDI target spotter on

Fig. 3 – An early version of Proteome Systems' ProteomIQ integrated proteomics platform. Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

the market and its launch was to make quite an impact for a very different reason — the Proteome Systems deep blue colour palette came out bright purple for the beta-unit destined to be showcased in Shimadzu's hospitality suite at ASMS in 2001. It certainly brightened up the conservative Shimadzu's classical instrument display. The Xcise was modestly successful although we suffered the consequence of embedding an inflexible firmware and limiting the workflow to MALDI target spotting. The Xcise is still used today in APAF for gel spot cutting.

9

manufactured the commercial Xcise instrument in Sydney, while Shimadzu manufactured the commercialised ChIP1000 in Kyoto. Partnering with Shimadzu also led to the first installation of a significant infrastructure of MALDI-TOF MS with the Axima CFR series of mass spectrometers — 7 instruments were installed in our Sydney facility. High throughput gel spot processing and spotting onto MALDI targets (with the Xcise robotic platform) became the standard work flow in Proteome Systems while we developed a more targeted glycopeptide characterisation strategy with Thermo with the installation of the popular LCQ ion trap.

3.3. The ChIP (Chemical Inkjet Printer) 3.4. Post-translational modifications The ChIP was our attempt to enable solid phase processing of arrayed proteins. We worked with Texas based MicroFab Technologies, who are a leading piezoelectric inkjet printing group. The key idea was Andrew Gooley's; as with most great ideas originated by Australians, it came about on a trans-Pacific flight. He reasoned that when we stained gels, we visualised them by printing an image of the stained gel on paper. His idea was to develop a chemical printer where picolitre quantities of reagents could be printed on proteins arrayed on membranes (e.g. 2-D gels blotted to PVDF). The beauty of this technology is that it allowed archiving of samples as only a small section of a 1 mm spot was needed for any analysis (e.g. peptide mass fingerprinting, antibody analysis, and analysis of glycoforms). The device printed on a 96 well plate format and this could be transferred to a MALDI with 96 well format for precise analysis. This instrument won a Prestigious R&D 100 award for one of the best 100 technologies in 2004 (Fig. 4). History has shown that LC–MS/MS approaches have become the method of choice for proteomics, so the ChIP did not fulfil its promise, although it has now become one of the main methods for printing on tissue sections, undertaking the complex chemistry required for MALDI imaging. As with many cutting edge technologies, the best application is sometimes different from what the device was designed to do. We benefited hugely from working with Shimadzu, learning a great deal about building sophisticated analytical equipment with an extraordinary talented team of Japanese engineers and a visionary general manager in Hattori-san, who went on to become President and Chairman of Shimadzu Corporation. We

Studying modifications adds a layer of complexity and difficulty to an already complicated field. This no doubt explains a lack of enthusiasm for studying post-translational modifications. Sometimes a small technological advance can have a significant impact and this happened with the discovery of a simple way to desalt sugars. Nicki Packer and John Redmond found the desalting equivalent of reversed-phase chromatography in proteomics. Porous graphitised carbon is an LC separation matrix for rapid and effective separation of structural glycan isomers. Niclas Karlsson developed this on the LCQ platform and this has become one of few approaches to detailed structural glycan analysis. In fact to date the isobaric nature of carbohydrates can still only be separated convincingly by this chromatography. Of course we live in a world where raw experimental data is of limited value unless it is connected to an informatics system, so we also got involved in this process. First of all, BOLD was developed as a database of protein O-glycans and then updated to Glycosuite by Catherine Cooper with Elisabeth Gasteiger in Geneva. Glycosuite was the gold standard of curated structural glycan databases, and it has now morphed into a glycoknowledge-database called UniCarbKB. Recently Nicki Packer and Frederique Lisacek (SIB, Geneva) have renewed activities in this area. This is still a tough area, but there are signs that mass spectrometer manufacturers are becoming interested in informatics approaches to identifying glycan modifications on proteins.

Fig. 4 – The Chemical Inkjet Printer (ChIP); left, benchtop instrument; right, detail of the print module. Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

10

JOUR NAL OF P ROTEOM ICS XX ( 2014) X XX–XX X

3.5. Bioinformatics: partnership with IBM When Proteome Systems started, there were only a few tools for protein identification from mass spectrometry data, and these were mostly designed to support small scale experiments. There was excellent software for image analysis of 2-D gels and in many respects this was the most sophisticated software for proteomics data collection at the time. Marc Wilkins' group envisaged and commenced building, in connection to a new suite of instruments, a means by which thousands of proteins could be purified from complex samples using 2-D gels, have them processed robotically from the gels or on membranes blotted from those gels and analysed on a large scale using mass spectrometry. The challenge then became how to manage this process and the underlying data to generate simple answers to the questions “which proteins are present in my sample?” and “in what amount?” As we began to develop software solutions to the above questions, IBM began to show an interest in the Life Sciences. They saw the scale of what we were planning to do and saw a good fit with their products, expertise and market reach. So in 2001 we established a global strategic alliance with IBM (being one of the first Life Sciences companies to do so). With IBM's support, Marc Wilkins and Warren McDonald built a web-based software platform that integrated all instrumentation (from gel separation devices, image analysis, robotic spot picking and processing through to mass spectrometry and protein identification) to generate an information management system that supported gel-based proteomics workflow. Named BioinformatIQ to support our ProteomIQ technology platform, it ran on large computers. Indeed a version of BioinformatIQ was installed on one of IBM's flagship computer offerings at the time, the p690, at the centre of operations of the Charles River Proteomic Services facility, which also ran Proteome Systems' ProteomIQ platform in Worcester, MA. As with any laboratory information management system, our software was big and complicated, and best used to manage projects where the same (complicated) workflow was to be repeated many times. The challenge was to have it sufficiently flexible so that any/all proteomic experiments could also be supported. The challenge of information management in proteomics remains, and in many respects has grown with the dramatic increase in the use of LC–MS/ MS and LC/LC–MS/MS. Clear paradigms for protein identification are now established, but the best manner in which to represent the outcomes of very large proteomic experiments remains elusive.

3.6. The people One thing that proteomics forced was collaboration. This was core to our team and it went from basic biology all the way through protein and sugar chemistry, with a large dose of engineering thrown in at a later stage. We were good at intellectualising a problem because we had many different viewpoints. Out team could also put together a breadboard solution which major instrument groups were able to commercialise. With 150 staff at Proteome Systems in 2005, a

large pool of scientists and engineers were trained in proteomics. With challenging conditions shortly thereafter, they now populate many university groups and Hi-Tech companies around the world.

4. Where are we now? This issue reflects on 20 years of proteomics research. Are we there yet? The short answer is that despite huge advances in technology and scale of proteomics, we are still just off the starting line. There is no need for despair about this situation as when one reflects on the genomics revolution it becomes clear how long it takes even after the key technical discoveries have been made to have concrete outcomes. Biology is complex. It is true that vanishingly small amounts of protein can now be studied almost at the single cell level, and that the identification of all proteins from a simple proteome is almost a reality, but the complete characterisation of proteomes is still a distant goal. It must be remembered that each cell has its own proteome and there are many different cell types in the body. So discovering first which proteins are found, where (not only at the tissue level but also the subcellular distribution) is a major task and this must then be taken to which form of the protein exists, where and who is talking to whom… Such complexity is still daunting, but so was sequencing a genome 20 years ago. While the focus of our group was on parallel processing and developing tools to make that possible, at the same time others were transforming structural analysis of proteins through new technologies and microcrystallisation for 3-D structure determination. There are many aspects of proteomics and there has been huge progress over the past 20 years. The sensitivity of MS instrumentation is now exquisite (a few fmol of protein can be confidently studied), but sample preparation remains perhaps the major challenge (“garbage in garbage out” or perhaps more sensitively — how to prepare the sample and preserve the in vivo nature of the target analyte(s)). And of course we are very far away from teasing out the many different forms of the same gene product that are present in the samples. Basic proteomics and genomic research has underpinned a revolution in medicine with protein-based drugs under development comprising approximately 70% of the pipeline [2]. And small molecule drug development is also dramatically enabled by structural knowledge of protein targets. While remarkable discoveries have been made concerning the role of RNA in gene regulation and protein production, most drugs still target proteins. This has always been the rationale for the field of proteomics. Have no fear proteomics researcher, there is still a big mountain to climb. Looking back at where we've come from can induce a sense of exhaustion (what a lot of work), wonder (how far we've come) or scale (we are but grains of sand in the big picture!!). Looking at a recent paper in Nature Communications [3] one realises that we are getting beyond the mundane and into the sublime. Making music with proteins and as a result working

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

JOURNAL OF P ROTEOM IC S XX ( 2014) X XX–X XX

out which parts vibrate and hence influence their function and interactions, looks like a whole new perspective on proteomics is coming into view!

Transparency document The Transparency document associated with this article can be found, in the online version.

11

REFERENCES [1] Hayden EC. Is the $1,000 genome for real? Nat News 2014. http://dx.doi.org/10.1038/nature.2014.14530. [2] Waltz E. It's official: biologics are pharma's darlings. Nat Biotechnol 2014;32:117. [3] Acbas G, Niessen KA, Snell H, Markelz AG. Optical measurements of long-range protein vibrations. Nat Commun 2014;5:3076.

Please cite this article as: Williams KL., et al, A Sydney proteome story, J Prot (2014), http://dx.doi.org/10.1016/j.jprot.2014.04.006

A Sydney proteome story.

This is the story of the experience of a multidisciplinary group at Macquarie University in Sydney as we participated in, and impacted upon, major cur...
944KB Sizes 3 Downloads 4 Views