REVIEW The Use of ‘Omics Technology to Rationally Improve Industrial Mammalian Cell Line Performance Amanda M. Lewis, Nicholas R. Abu-Absi, Michael C. Borys, Zheng Jian Li Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb Company, 35 South Street, Hopkinton 01748, Massachusetts; e-mail: [email protected]

ABSTRACT: Biologics represent an increasingly important class of therapeutics, with 7 of the 10 top selling drugs from 2013 being in this class. Furthermore, health authority approval of biologics in the immuno-oncology space is expected to transform treatment of patients with debilitating and deadly diseases. The growing importance of biologics in the healthcare field has also resulted in the recent approvals of several biosimilars. These recent developments, combined with pressure to provide treatments at lower costs to payers, are resulting in increasing need for the industry to quickly and efficiently develop high yielding, robust processes for the manufacture of biologics with the ability to control quality attributes within narrow distributions. Achieving this level of manufacturing efficiency and the ability to design processes capable of regulating growth, death and other cellular pathways through manipulation of media, feeding strategies, and other process parameters will undoubtedly be facilitated through systems biology tools generated in academic and public research communities. Here we discuss the intersection of systems biology, ‘Omics technologies, and mammalian bioprocess sciences. Specifically, we address how these methods in conjunction with traditional monitoring techniques represent a unique opportunity to better characterize and understand host cell culture state, shift from an empirical to rational approach to process development and optimization of bioreactor cultivation processes. We summarize the following six key areas: (i) research applied to parental, nonrecombinant cell lines; (ii) systems level datasets generated with recombinant cell lines; (iii) datasets linking phenotypic traits to relevant biomarkers; (iv) data depositories and bioinformatics tools; (v) in silico model development, and (vi) examples where these approaches have been used to rationally improve cellular processes. We critically assess relevant and state of the art research being conducted in academic, government and industrial laboratories. Furthermore, we apply our expertise in bioprocess to define a potential model for integration of these systems biology approaches into biologics development. Biotechnol. Bioeng. 2016;113: 26–38. ß 2015 Wiley Periodicals, Inc.

Conflicts of interest: None. Correspondence to: A. M. Lewis Received 9 December 2014; Revision received 25 March 2015; Accepted 1 June 2015 Accepted manuscript online 9 June 2015; Article first published online 27 October 2015 in Wiley Online Library (http://onlinelibrary.wiley.com/doi/10.1002/bit.25673/abstract). DOI 10.1002/bit.25673

26

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

KEYWORDS: bioprocess; CHO; proteomics; transcriptomics; metabolomics; rational optimization

Introduction Mammalian cells, and in particular Chinese hamster ovary (CHO) cells (Kim et al., 2012), are the primary hosts used for the production of biopharmaceutical proteins including antibodies, hormones and cytokines (Datta et al., 2013; Dietmair et al., 2012b; Kildegaard et al., 2013; Wuest et al., 2012). The first CHO recombinant protein, tissue plasminogen activator (tPA), was approved in 1987 (Jayapal, 2007; Kaufman et al., 1985). Since then, over 140 recombinant products have been brought to market, and volumetric productivities have risen from 0.05 g/L up to 10 g/L of protein (Datta et al., 2013; Hacker, 2008; Jayapal, 2007). These improvements can be largely attributed to cloning and genetic engineering techniques, clone selection, and bioprocess development optimization (Hacker, 2008). Despite these successes, datadriven, rational approaches for the development of cell culture processes are not consistently applied. Thus, process development efforts rely heavily on labor-intensive and time-consuming empirical optimization (Jayapal, 2007). While approaches like multi-factorial design have streamlined some aspects of process optimization, any empirical approach must be applied for every cell line, ultimately requiring significant time and resources (Hacker, 2008). Furthermore, while improvements can be obtained empirically, there is little fundamental understanding of how or why specific conditions result in improvements (Courtes et al., 2013; Selvarasu et al., 2012), which can lead to unpredictable behavior when processes are scaled or transferred to other facilities (Jayapal, 2007). Future progress will require a transition away from a purely empirical approach, and efforts toward rational modification through knowledge of cell biology (Schaub et al., 2012; Schaub et al., 2010; Wuest et al., 2012; Young, 2013). ‘Omics technology is one such approach to capture intracellular characteristics, and, therefore, has the ability to enhance and transform industrial cell line process development (Jayapal, 2007). ß 2015 Wiley Periodicals, Inc.

The term “’Omics” refers to quantification and characterization of various biological molecules present at a particular time and condition. Four primary ‘Omics areas have emerged including genomics, transcriptomics, proteomics, and metabolomics, which respectively examine genes, messenger RNA (mRNA), proteins, and metabolites present in a particular cellular environment. Each set of biological molecules and their levels can be analyzed as a molecular signature for the cell and condition of interest. Compiling this information is a key step in understanding cellular physiology and using a systems-level approach for rationally improving cellular performance. Despite being relatively new, ‘Omics technologies have been rapidly implemented in biotechnology, leading to transformations in the fields of cancer biology (Vucic et al., 2012), biofuels production (de Jong et al., 2012), plant–microbe interactions (Salvioli and Bonfante, 2013), and in silico modeling (Arakawa and Tomita, 2013). Progress with industrial mammalian systems, particularly CHO cells, has been slower. This is in part due to challenges presented by the complexity (larger genomes, compartmentalization, protein modifications) of mammalian cells, and limited federal grant funding. Nonetheless, the use of ‘Omics technologies continues to gain prevalence amongst academic and industrial mammalian research, resulting in maturation of the quality and quantity of data available (Jayapal, 2007). In the past five years, over 1,000 scientific articles have been published involving CHO and ‘Omics according to the PubMed database. With only 173 studies published prior to 2009, this is a noticeable increase that demonstrates the increasing popularity of ‘Omics technologies to study mammalian cellular states (Dietmair et al., 2012a; Kantardjieff et al., 2010; Pilobello et al., 2007). The application of ‘Omics techniques in mammalian technology has been explored in several review articles (Datta et al., 2013; Dietmair et al., 2012b; Kildegaard et al., 2013; Wuest et al., 2012), which provide clear definitions and examples of genomics, transcriptomics, proteomics, and metabolomics in CHO and other mammalian cell systems. Furthermore, they highlight advantages and technical limitations of the current technologies. While some of

this research advances our understanding of the field, many studies examine extremely specialized cellular systems, making much of the information inapplicable to industrial bioprocessing (Ma et al., 2009; Mohmad-Saberi et al., 2013). Furthermore, there is little emphasis on how to use our current capabilities of data collection and data analysis to rationally engineer and improve cellular systems (Wuest et al., 2012). The use of ‘Omics represents a unique opportunity to develop improved methods to characterize and understand cell culture state, and rationally optimize bioreactor cultivation processes (Chrysanthopoulos et al., 2010; Wuest et al., 2012). Furthermore, the information gained from ‘Omics studies can help to prioritize genetic targets that may enable beneficial phenotypes, coupled with media formulation and bioreactor design approaches (Clarke et al., 2011a). This review seeks to address how these goals can be accomplished by examining six key areas of study. First, we discuss ‘Omics research applied to parental, non-recombinant cell lines. Next, we discuss ‘Omics datasets generated using recombinant cell lines under industrially relevant conditions, and efforts to link those datasets to phenotypic traits. We then summarize data depositories and bioinformatic tools that facilitate data analysis, followed by efforts to develop systems-level in silico models for CHO cells. Finally, we look at examples where ‘Omics data have been used to rationally improve cellular processes. Collectively, we define a potential framework through which ‘Omics technologies and systems biology approaches can be used to advance bioprocess development, which is illustrated in Figure 1.

Establishing An ‘Omics Reference State Using Parental, Non-Recombinant Cell Lines The majority of industrial bioprocesses use CHO as the host organism, although some processes have been developed using mouse (NS0) and human (PerC6 and HEK293) hosts. The cell lines used in industrial applications are genetically modified to achieve desired phenotypic attributes, including auxotrophy and secretion of

Figure 1.

Improved workflow for ‘Omics technologies in bioprocessing The primary goal for ‘Omics technologies in bioprocessing is to achieve full control of the cell line and process. This is achieved by (i) defining the system of study, (ii) selecting appropriate methods, (iii) collecting and analyzing the data generated by those methods, (iv) interpreting the data in the context of the system, and (v) implementing rational improvements to the cell or process. This workflow can be iterated using the new system if desired. Data interpretation is currently the workflow bottleneck.

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

27

desired non-natural proteins. Undesired modifications, including chromosomal rearrangements, genetic duplications, and epigenetic changes, can arise in response to environmental stresses such as selection pressure, growth conditions and media adaptation (Wuest et al., 2012). As a result, each industrial cell line has a unique genomic and epigenetic profile. Given the variety of modifications that can occur, comparisons between different recombinant cell lines are complex. Comparing to the less modified, parental cell lines is a preferred approach, as it is typically less restricted and can serve as a wild-type baseline or reference state to which many modified cell lines can simultaneously be compared. For these reasons, parental CHO cell lines including Chinese hamster, DG44, CHO-S, and CHOK1 are being increasingly examined in large-scale ‘Omics studies, with the majority of the published work being conducted in academic groups. Relevant studies are summarized in Table I. A major foundational accomplishment in this area came with the publishing of the CHO-K1 genome sequence in 2011 (Xu et al., 2011). Public availability of the CHO-K1 genome has enabled more accurate annotation of CHO ‘Omics. Previously, genetic inferences were made from other published mammalian genomes, including mouse and human. Since 2011, the Chinese hamster (Cricetulus griseus) genomes, as well as six CHO cell lines derived from CHOK1, DG44, and CHO-S lineages, have been sequenced and published (Lewis et al., 2013). Given the common lineage of all recombinant CHO cell lines, the Chinese hamster genome serves as a universal

reference sequence, and eliminates issues related to chromosomal instability in CHO (Lewis et al., 2013). Additional work was done to identify expressed gene profiles for parental cell lines. This information, coupled with reference genomes and reliable gene annotation, can be used to understand cellular response to system perturbations and enables high throughput methodologies such as microarray profiling, which was previously a major limitation (Rupp et al., 2014). Recently, pyrosequencing technology was used to assemble cDNA libraries for CHO-KI and CHO-DUXXB11 cell lines (Becker et al., 2011). More than 29,000 transcripts were assembled and 13,187 transcripts were functionally annotated. This study represented the first publicly available CHO transcriptome data deposited in the National Center for Biotechnology Information database. Particular focus was placed in reconstruction of central sugar metabolism and N-glycosylation pathways. Recent follow-up work combined previous CHO transcriptome data and additional datasets generated using Roche and Illumina next-generation sequencing platforms (Rupp et al., 2014). The resulting augmented transcriptome identified 5,636 additional genes and extended the total CHO transcriptome to 28,596 genes. A separate study looked at dynamic mRNA and micro (mi)RNA profiling of CHO-K1 suspension cultures (Bort et al., 2012). This work examined expressed transcripts over the time course of a batch culture, including lag, exponential and stationary phases. The

Table I. Key references for establishing a CHO ‘Omics reference state. Reference

Technique(s)

Xu et al. (2011)

Genomics

Lewis et al. (2013)

Genomics

Becker et al. (2011), Rupp et al. (2014)

Transcriptomics

Bort et al. (2012)

Transcriptomics

Baycin-Hizal et al. (2012)

Proteomics

Slade et al. (2012)

Proteomics

Lim et al. (2013)

Proteomics

Levy et al. (2014), Valente et al. (2014a, b) North et al. (2010)

Proteomics

Tep et al. (2012)

Proteomics

Proteomics

Institution(s) BGI-Shenzhen, GT Life sciences, Peking University, University of Delaware, Technical University of Denmark, Stanford University, Johns Hopkins University, University of Copenhagen CHOmics, BGI-Shenzhen, BGI Europe, Cytogen Research and Development, Brandeis University, GT Life Sciences, Johns Hopkins University, Technical University of Denmark, University of Copenhagen, King Abdulaziz University Bielefeld University, Universit€at f€ur Bodenkultur Wien, Austrian left of Industrial Biotechnology, Justus-Liebig-University University of Natural Resources and Applied Life Sciences, Austrian Center of Industrial Biotechnology Johns Hopkins University, Vanderbilt University, University of California San Diego, Technical University of Denmark Life Technologies Bioprocessing Technology Institute, National University of Singapore The University of Delaware Imperial College of London, Albert Einstein College of Medicine Biogen Idec, Northeastern University

Summary First publicly available draft sequence of the CHO-K1 genome.

First publicly available draft sequence of the Chinese hamster. Six draft genomes of CHO cell lines derived from CHO-K1, DG44, and CHO-S lineages.

Publicly available CHO cell cDNA libraries. Special emphasis on central sugar metabolism and Nglycosylation. Examined expression of mRNA and miRNA over batch culture time course, including lag, exponential and stationary phases. First publicly available CHO proteome, identified more than 6,000 expressed proteins. Identified 352secreted proteins from CHO-S and DG44 cell lines. Identified secreted proteins in CHO-K1 fed-batch process. Quantification and characterization of CHO HCP. Characterized glycosylation patterns of expressed proteins in nine lectin-resistant CHO cell lines. Developed a MALDI-TOF MS method to quantify glycomic changes in CHO, applied to bioreactor campaign.

Parental, non-recombinant CHO cell lines have been characterized using global ‘Omics technologies. These datasets can serve as a reference state for future work. Recent studies are summarized here.

28

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

study identified over 1,400 mRNAs and 100 miRNAs differentially regulated over time. As the primary biological role of miRNAs is regulation of mRNA targets, analysis tools were used to identify miRNA-mRNA networks, including programmed cell death. This work serves as a foundation for understanding growth phasedependent gene regulation in CHO, and demonstrates the importance of time course as a factor in future experiments. The identification of expressed proteins, or the proteome, is a logical extension to transcriptome research. A recent study using CHO-K1 identified more than 6,000 expressed proteins, including secreted and glycosylated proteins (Baycin-Hizal et al., 2012). This represents the largest CHO proteome study to date, and an eightfold increase in the number of identified proteins. These results were correlated with previously established transcriptome data to better understand mRNA turnover and relative stability levels. This was the first proteomic study to use the CHO genome exclusively for annotation, and represents to date our most complete picture of the CHO proteome. An important subset of the proteome is secreted proteins, including proteins involved in regulating cell-to-cell and cell-toextracellular matrix interactions. A recent study carried out by Life Technologies sought to identify secreted proteins for two common CHO cell lines: DG44 and CHO-S (Slade et al., 2012). Between the two cell lines, 352 secreted proteins were identified. A separate study by Singapore’s Bioprocessing Technology Institute (BTI) used shot-gun proteomics to identify secreted proteins in a CHO-K1 fedbatch process (Lim et al., 2013). Using tandem mass spectrometry, researchers identified 290 secreted proteins and 4 autocrine growth factors. Conditioned media containing secreted proteins was found to be beneficial to growth, and through supplementation with autocrine growth factors, researchers were able to identify a supplemented chemically defined medium that performed as well as conditioned media. Although the primary application of this work was increased cloning efficiency, it demonstrates the applicability of proteomics in media optimization. The majority of secreted host cell proteins (HCPs) in a bioprocess are impurities and must be separated from the desired product. Removal of these HCPs, which is critical to patient safety, can be difficult. Work at the University of Delaware has sought to characterize CHO HCPs using proteomics. They developed optimized techniques for HCP recovery from culture broth, and validated these using 2DE proteomics and shotgun proteomics (Valente et al., 2014b). They have also evaluated mAB-HCP

interactions, and demonstrated that these interactions directly impact impurity clearance (Levy et al., 2014). Finally, the same group studied a cell culture process for more than 500 days, and quantified HCPs at various time points. In doing so, they were able to link HCP profile to cell age (Valente et al., 2014a), which is especially relevant to continuous bioprocessing. As proteomic techniques have become more sophisticated, the need to characterize proteins based on glycosylation patterns, or glycomics, has become an area of focus. A recent study identified glycomics profiles of the N-linked and O-GalNAc glycans for commonly used CHO glycosylation mutants (North et al., 2010). This work clearly establishes the link between observed glycan structural changes and corresponding genomic modifications. Another study from the Barnett Institute sought to develop methods to quantify glycomic changes in CHO cells using Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) (Tep et al., 2012). The approach involves the removal of the expressed biotherapeutic protein, followed by simultaneous analysis of neutral and sialylated glycans. This preliminary work demonstrates that glycomic changes are occurring during a bioreactor campaign, and the newly developed method can quantify those changes. To date, significant work at the genomic, transcriptomic, and proteomic levels has been conducted using parental CHO cell lines. These studies, largely carried out by academic institutions, are publicly available and provide critical reference points to which genetically modified and adapted CHO cell lines can be compared. Limited work, however, has been done to establish a reference CHO metabolome, which represents an important area for future work. These reference datasets will continue to evolve in current years, as more work is done with the Chinese hamster, and technological capabilities and assay sensitivity continue to improve.

‘Omics Datasets for Industrially Relevant Recombinant Cell Lines The CHO cell lines used in industry have undergone significant genetic modifications and adaptations compared to parental cells lines, resulting in altered genetic expression and cellular metabolism. Here we highlight studies that use ‘Omics to generate datasets for recombinant cell lines under industrially relevant conditions. This information is summarized in Table II.

Table II. Key references for industrially relevant CHO ‘Omics datasets. Reference Courtes et al. (2013) Schaub et al. (2012)

Technique(s) Transcriptomics Metabolic Flux Analysis (MFA), Transcriptomics

Aranibar et al. Metabolomics (2011) Chrysanthopoulos Metabolomics et al. (2010)

Institution(s)

Summary

Bioprocessing Technology Institute, National University of First measurement of CHO translation efficiency Singapore, AbbVie, Institute of Medical Biology in mAb-producing cell line. Boehringer Ingelheim Analysis of CHO IgG process with titer increase from 5 g/L to 8 g/L. Demonstrates use of multivariate data analysis and systems biology tools. Bristol-Myers Squibb Used NMR to compare metabolite profiles for a CHO process at both production and laboratory scale. Foundation for Research and Technology-Hellas, Bayer Metabolomics used to profile lot to lot variability in a BHK perfusion process operated at laboratory and manufacturing scale.

Some industrial CHO cell lines have been characterized using global ‘Omics technologies. Recent studies are summarized here.

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

29

A recent study from BTI sought to bridge the gap between expressed genes and expressed proteins in an IgG producer (Courtes et al., 2013). Using techniques including polysome profiling and microarray technology, CHO transcripts with varying levels of ribosome loading were identified and ranked by translation efficiency and biological function. This approach identified genes important to cell growth and productivity, but translationally inefficient, making them potential targets for cell engineering. Follow-up work should be done to validate these targets and determine their direct impact on growth and productivity phenotypes. This methodology was previously used to identify and modify high value targets in microbial engineering. A study by Bristol-Myers Squibb used NMR to examine the metabolome of a protein-expressing CHO cell line grown at manufacturing (5000L) and lab (7L) scale (Aranibar et al., 2011). Although media and process parameters were the same, viability and productivity were higher at the lab scale. Intra and extracellular profiles were established for 30 metabolites, showing a higher reliance on glycolysis in the manufacturing scale system compared to lab scale. Reduced galactose consumption was also linked with lower viability. Additionally, metabolite profiles of lab scale reactors run with and without dextran sulfate, a chemical additive used to increase culture viability, were examined. Changes to metabolites including uracil, niacinamide, amino acids, acetate, fumarate, and lactate were detected in culture broth when dextran sulfate was present. Work by Bayer illustrates how metabolomics can be used to profile lot-to-lot variability in cell culture processes. Four vials from the same industrial, baby hamster kidney (BHK) cell bank were cultivated in both laboratory and manufacturing-scale perfusion reactors between 113 and 155 days (Chrysanthopoulos et al., 2010). While minimal variability was observed using traditional cell culture metrics, metabolite profiling revealed differences between samples based on cell age, reactor size, and vial source. Although Bayer does not discuss outcomes directly, this work demonstrates the quality of information that can be obtained using a metabolomics approach, and suggests that ‘Omics techniques could improve bioprocessing workflows including cell bank validation.

Linking ‘Omics Datasets and Cell Line Phenotypes Through Biomarker Identification At present, bioprocess development typically relies on a handful of global process performance metrics such as cell density, culture viability, metabolite concentrations, protein titer, and product quality attributes to make process decisions. These metrics focus on the extracellular environment and do not capture cellular readouts such as gene or protein expression levels. Incorporating ‘Omics data into bioprocesses would enable evaluation of both intracellular and extracellular metrics, and provide a more detailed picture of cell physiology. While the technology can support this approach, a key challenge is extracting biological meaning from global ‘Omics profiles, and understanding the link with process performance. This is an active area of research in which both academic and industrial groups are looking to utilize ‘Omics technologies to directly examine cell lines and conditions, and link them with desired phenotypes including specific productivity, titer, metabolite profiles and product quality. Recent references are summarized in Table III. Ultimately, this will result in identification of relevant biomarkers that can be screened and monitored through clone selection and process development to more rapidly identify cell lines and bioprocesses with desirable phenotypic attributes. One desirable trait for recombinant cell lines is high specific productivity. Recent work conducted at BTI compared metabolite profiles of eight mAB producers with production rates from 2 to greater than 40 picograms per cell per day (Chong et al., 2012). The goal was to determine whether low productivity cell lines exhibit a different metabolic profile than high productivity counterparts. This work revealed several key metabolites linked to energy generation, regulation of cellular redox potential, and precursors for glycosylation elevated in the high producing cell lines. Although this is a promising result that encompasses the metabolome of eight cell lines, additional follow-up work would be needed to assess if findings can be generalized, and demonstrate how this information can be incorporated into bioprocess workflow. Another desirable trait for recombinant cell lines is lactate consumption, which helps control pH late in production, improving

Table III. Key references for CHO ‘Omics datasets for biomarker identification. Reference

Technique(s)

Institution(s)

Chong et al. (2012)

Metabolomics

Bioprocessing Technology Institute, National University of Singapore Genentech

Luo et al. (2012), Pascoe et al. (2007)

Proteomics, Metabolomics

Carlage et al. (2009) Dorai et al. (2013) Clarke et al. (2011a)

Proteomics Proteomics Transcriptomics

Biogen Idec Janssen Research, Northeastern University Dublin City University, Pfizer

Doolan et al. (2013)

Transcriptomics

Dublin City University, Pfizer

Kang et al. (2014)

Transcriptomics, Proteomics

Amgen, Karolinska Institute, SciLifeLab

Summary Comparison of metabolite profiles for eight mAb-producing CHO cell lines with production ranging from 2 pg/cell/day to 40 pg/cell/day. ‘Omics analysis of two mAb-producing cell lines known to produce different lactate profiles; Varying levels of Cu introduced and linked to lactate phenotype. Proteomics comparison between low and high producer cell lines. Proteomics comparison between low and high producer cell lines. Microarray profiling study with 295 CHO production cell lines representing a wide range of conditions. Microarray profiling of 10 mAb-producing cell lines with varying growth rates, all derived from a single population. Analysis of 17 mAb-producing cell lines grown with same media and process conditions. Effort to identify “universal” biomarkers.

Global ‘Omics technologies have been used to link industrial CHO cell lines with cellular phenotypes. A primary goal of this work is relevant biomarker identification. Recent studies are summarized here.

30

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

cell viability and culture performance. Research sponsored by Genentech examined two mAb-producing CHO cell lines (Pascoe et al., 2007). Under fed-batch conditions, one cell line switched to lactate consumption and continued growing whereas the other did not consume lactate and showed more rapid cell death. Proteomic analysis using 2D gel electrophoresis and mass spectrometry identified differentially expressed proteins between the two cell lines, including enolase, BiP, and antioxidant enzymes. A separate report by Genentech compared two CHO cell lines known to produce different lactate profiles with varying copper (Cu) levels (Luo et al., 2012). Specifically, they sought to identify a biological link between Cu levels and lactate consumption. Although this link was not clearly established, unique metabolite profiles were observed for each lactate profile, which showed clear differences resulting from Cu levels and cell line choice. Biogen Idec compared low and high producer CHO cell lines (Carlage et al., 2009), revealing 32 differentially expressed proteins and potential markers for high productivity in CHO. Again, BiP was upregulated in the high producer cell line, as well as markers for protein metabolism and cell cycle regulation (Carlage et al., 2009). Janssen Research, in partnership with the Barnett Institute, also used a proteomics approach to analyze high and low productivity CHO-GS cell lines (Dorai et al., 2013). Their comparison revealed 12 proteins modulated between the high and low productivity conditions across multiple days in culture. These proteins have biological functions related to cytoskeleton rearrangement, protein synthesis, cell metabolism and cell growth. Follow-up work is needed to validate these proteins as potential productivity biomarkers. A large-scale microarray study of 295 CHO production cell lines was recently conducted by Dublin City University in partnership with Pfizer (Clarke et al., 2011a). The CHO samples were phenotypically diverse, representing bioreactor and shake flask growth, DUX and KI cell lines, seeding densities between 0.15 and 4.33e6 cells/mL, log, stationary and death phase, culture temperatures between 29.5 C and 37 C, and a range of recombinant protein products including monoclonal antibodies, fusion proteins, growth factors, and coagulation factors. Using a co-expression network analysis approach, correlated mRNA expression patterns across a broad range of bioprocess conditions were determined for CHO. This led to the identification of five distinct gene clusters, which were then mapped back to phenotypic attributes including titer and productivity. These gene clusters can be further probed to identify biomarkers. Pfizer supported recent transcriptomics work examining 10 mAb-producing clones with varying growth rates (Doolan et al., 2013). Ten cell lines, derived from a single population, exhibited low variation in specific productivity, but growth rates between 0.011 h 1 and 0.044 h 1. A high growth rate is a desired phenotype in commercial cell lines, likely arising from a multi-gene effect. Because of parental homogeneity, this was an ideal system for identifying potential biomarkers relating to growth rate, which is the first step in implementing systems control over this attribute. Upregulation of biological processes including DNA replication, mitosis, cell cycle, translation and RNA processing, and down regulation of cell proliferation and cell homeostasis processes were associated with the fast growth phenotype. The correlation between

genes including ALDH7A1 and CBX5 and specific growth rate agreed with previous findings (Clarke et al., 2011a). Amgen recently used proteomics and transcriptomics to analyze 17 different cell lines grown using the same media and process conditions (Kang et al., 2014). By using different cell lines and taking samples prior to production phase, the focus was to identify “universal” markers that can be detected regardless of external factors. Researchers were particularly focused on genetic targets correlated with productivity, but also considered cell size and growth rate. They identified a small number of transcripts and proteins positively and negatively correlated with these attributes. Key genes included HTT, HDAC1, VEGFA, and PDGF. These ‘Omics studies demonstrate the interest and value in establishing biomarkers for desired phenotypes in industrial CHO bioprocesses. Transcriptome and proteome profiles linking phenotypes including productivity, growth rate, cell size, and lactate consumption have been generated. These studies have identified potential biomarkers for each of the phenotypes examined. Again, little work to date has been published relating metabolome profiles to different phenotypes, and this represents a valuable area for expansion. Additionally, the majority of the potential biomarkers identified through this work have not been verified. Follow-up work is crucial to linking ‘Omics dataset generation to a practical outcome. It is possible that biomarker verification is being carried out, but the findings are not being disclosed for intellectual property reasons. Nonetheless, these ‘Omics studies continue to be a valuable area of research with a high potential for direct applications in bioprocess development. ‘Omics Data Analysis Tools Many of the studies discussed thus far examine cell lines and conditions utilizing multiple ‘Omics techniques. A major advantage of this large-scale, systems level approach is the unbiased evaluation of study conditions and high resolution datasets. As sequencing, array, and multiplexing technology improves, the cost and time to generate large-scale datasets is decreasing, making incorporation of this technology into the bioprocessing platform more attractive, affordable and practical. These large datasets can be a disadvantage if they become cumbersome and inefficient to analyze, especially as the number of cell lines, conditions and replicates increase. As a result, developing efficient and robust data analysis methods and tools is becoming increasingly important (Casci, 2012; Jayapal, 2007; Kotera et al., 2012; Schaub et al., 2012). In order to make efficient use of ‘Omics studies and rapidly implement changes to a bioprocess, ‘Omics datasets must be accurately annotated, publicly deposited, and easily searchable. This presents a unique challenge that is largely being tackled through bioinformatics techniques, which are summarized in Table IV. This work is an essential part of ‘Omics-driven bioprocessing, and enables rapid and resourceful use of both new and old datasets. Databases and depositories play a critical role in the organization of ‘Omics data and serve as a central holding place for primary information (Clarke et al., 2012). For bioprocess industries, the Chinese Hamster Genome database (chogenome.org) is quickly emerging as the primary resource for this type of information. In

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

31

Table IV. Key references for ‘Omics data analysis tools. Reference

Resource name

Clarke et al. (2012)

CHO gene coexpression database Chinese Hamster Genome Database

Hammond et al. (2011, 2012), Xu et al. (2011)

Kim et al. (2010)

Array2KEGG

Garcia-Alcalde et al. (2011)

Paintomics

LaMarche et al. (2013) Xia et al. (2013)

MultiAlign INMEX

Taverner et al. (2012)

DanteR

Lohse et al. (2012)

RobiNA

Zhu et al. (2012)

N/A

Hackl et al. (2012)

N/A

Clarke et al. (2011b) N/A Ghorbaniaghdam et al. (2014)

N/A

Institution(s) Dublin City University, Pfizer

Summary

Website

Gene–gene coexpression analysis tool for CHO http://www.cgcdb.org/ cell lines.

BGI-Shenzhen, GT Life sciences, Peking Online database to collect, curate, and University, University of Delaware, Technical distribute genome-scale data for the University of Denmark, Stanford University, Chinese hamster. Johns Hopkins University, University of Copenhagen Hanyang University Bioinformatic tool for visualizing co-expressed genes using pathway maps. Centro de Investigaciones Prıncipe Felipe Bioinformatic tool for integrated visual analysis of transcriptomic and metabolomics data. Pacific Northwest National Laboratory Software tool for analysis of LC-MS proteomic and metabolomic data. University of British Columbia, University of Bioinformatic tool for integrated visual Alberta, Wellcome Trust Sanger Institute analysis of transcriptomic and metabolomic data. Pacific Northwest National Laboratory, Graphical R package for proteomic data University of Tasmania, Center for processing and analysis. Proteomics, Texas A&M University Max-Planck-Institute of Molecular Plant Graphical interface for RNA-Seq differential Physiology, RWTH Aachen University, gene expression analysis. Institute of Bio- and Geosciences Sage Bionetworks, University of Washington, Used Bayesian network reconstruction Merck Research Laboratories, University of algorithm to integrate six datasets in yeast, California at Berkeley, Mount Sinai School of generating a predicted network. Medicine Computational methods used to successfully University of Natural Resources and Life identify the genomic loci of 415 CHO Sciences, Bielefeld University, Austrian miRNA. Center of Industrial Biotechnology Dublin City University, Pfizer A model was developed linking a cell line’s transcriptome and productivity. Ecole Polytechnique de Montreal In silico kinetic–metabolic CHO cell model.

http://chogenome.org/; http://www. chogenome.org/proteome.php

http://www.koreagene.co. kr/cgi-bin/service/service1.pl http://www.paintomics.org/

http://omics.pnl.gov/software/multialign http://www.inmex.ca

http://omics.pnl.gov/software/danter

http://mapman.gabipd.org/web/guest/ robin N/A

N/A

Global ‘Omics technologies generate large datasets. Bioinformatic tools are needed to efficiently handle multiple, large datasets and extract biological meaning. Relevant data analysis tools are summarized here.

response to the recent sequencing of the CHO-K1 genome (Hammond et al., 2011; Xu et al., 2011), this database was developed to curate and distribute genome-scale data for public use (Hammond et al., 2012). Query functions allow a user to search the genome by accession number, gene name, GO term,or symbol. There is also a section of the website devoted to the CHO proteome (http://www.chogenome.org/proteome.php). This serves as a common, public resource for all CHO genome-scale tools, and future plans include the incorporation of transcriptomic and metabolomic data for the species, as well as the addition of the Chinese hamster reference genome. Separately, work is being done to expand and improve the reference databases available for CHO proteomic identification (Meleady et al., 2012). Many approaches to facilitate the annotation, quantification, and extraction of meaningful information from raw data are being developed. These approaches can be generally categorized as metaanalysis tools that help reduce study bias, increase statistical power, and improve biological understanding (Xia et al., 2013). Tools exist for both single and multiple data types. Metabolomic software including MathDAMP, MetAlign, MZMine, and XCMS enables rapid

32

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

analysis of large datasets generally produced by untargeted metabolomics (Patti et al., 2012). DanteR is a freeware graphical R package developed to facilitate mass spectrometry proteomic data analyses including normalization, imputation, and visualization (Taverner et al., 2012). RobiNA is an integrated software package for quality checking, filtering, and differential gene expression analysis of transcriptome data generated using RNA-Seq methods (Lohse et al., 2012). Although many of these tools have not yet been applied to CHO related projects, the methodologies are generic and not limited to a single organism. Successfully integrating multiple data types from a single experiment is also being addressed by bioinformatics approaches. One such tool is a network model that simultaneously integrates up to six different data types (Zhu et al., 2012). Although the proof-ofconcept study was demonstrated using yeast data, it could easily be applied mammalian cell types. MultiAlign is a software tool that analyzes liquid chromatography-mass spectrometry (LC-MS) proteomic and metabolomic datasets (LaMarche et al., 2013). Web-based tools including Paintomics (Garcia-Alcalde et al., 2011) and INMEX (Xia et al., 2013) integrate and visualize transcriptomic

and metabolomic datasets. These free, bioinformatic tools are compatible with the Kyoto encyclopedia of genes and genomes (KEGG) pathway maps. Another web-based tool is Array2KEGG, which allows users to analyze pathway information from multiple co-expressed genes (Kim et al., 2010). KEGG also has built-in tools that facilitate ‘Omics analyses including clustering and visualization of transcriptome data (Kotera et al., 2012). Computational tools can also be utilized for predictive purposes. For example, one research group used a partial least squares regression model to capture the relationship between a 287 gene transcription dataset and cell line productivity (Clarke et al., 2011b). The model incorporated 80 cell lines ranging in productivities 0.81–50.4 pg/cell/day. The model was able to predict productivities within an error of 3.11 pg/cell/day. Separately, a dynamic kinetic–metabolic in silico model for CHO cells including 35 reactions and 46 variables was developed (Ghorbaniaghdam et al., 2014). The model was calibrated using experimental data obtained from a parent cell line, as well as high and low producer mAb secretors derived from the parent. The model was accurately able to simulate variables including viable cell density, amino acid levels, co-factor ratios, lactate levels, and protein titer. Collectively, these computational systems and tools provide the foundation for efficient data analysis, annotation and biomarker identification. Useful databases have been developed for CHO genome and transcriptome data. Although databases do exist for proteome and metabolome data, this is an area that would benefit from expansion and improvement, resulting in a higher quality and quantity of molecular identification. These databases enable the application of generic meta-analysis tools to a specific organism of interest, such as the Chinese hamster. As these bioinformatic technologies are improved, so will our abilities to process ever increasing amounts of raw data and draw the meaningful conclusions that enable rational bioprocess development.

Systems Level CHO Models Computational, or in silico, models of cellular organisms can be useful systems-level tools to explore how genetic and process modifications affect cell physiology. Accurate models can give insight into how changes to process inputs impact outputs such as growth rate and productivity. This in turn can guide experiment design and ultimately reduce the number of experiments needed for process optimization. In order to provide utility, models must incorporate large reaction datasets generated through careful

experiments. Implementation of prokaryotic models has been especially successful, but similar models have been slower to emerge for eukaryotic hosts. Nonetheless, augmenting our understanding of intracellular pathways and fluxes enables better experimental design for bioprocess development. Recent studies in this area are summarized in Table V. A critical step in building accurate in silico CHO models is identifying the cell’s metabolic reactions. Using metabolic flux analysis (MFA) and isotope labeling, intracellular pathway fluxes can be accurately measured (Young, 2013), and the metabolic fate of common CHO carbon sources determined. Recently, 13C-labeled glucose and glutamine were used with CHO-K1 cells in both exponential and early stationary phase to generate flux maps (Ahn and Antoniewicz, 2013) of glycolysis, TCA cycle, pentose phosphate pathways, glutamine metabolism, and fatty acid biosynthesis. A non-stationary method was also employed with CHO-K1 using 13 C-labeled glucose to determine intracellular fluxes based on extracellular metabolite measurements only (Nicolae et al., 2014). Similar work was carried out with a mAB producer grown in fedbatch to mimic industrial conditions (Templeton et al., 2013). 13 C-labeled glucose and glutamine were spiked into cultures at four distinct phases. Researchers found that peak antibody production corresponds with a highly oxidative metabolic state, whereas cells predominantly utilized glycolysis during peak cell growth periods. Metabolic mapping of CHO-GS cells was carried out using 13 C-labeled pyruvate to elucidate the role of asparagine and serine as key nitrogen sources (Duarte et al., 2014). A lack of asparagine was associated with growth arrest, high pyruvate uptake levels, and reduced TCA cycle flux. Cells were still able to grow without serine supplementation, although this resulted in half maximal viable cell density. This information can be used to refine computational models and evaluate the impact of bioprocess parameters on cell growth and metabolism. Recent work by the Bioprocessing Technology Institute in Singapore demonstrates progress toward an accurate in silico CHO cell model (Selvarasu et al., 2010, 2012). Metabolomic data for fedbatch IgG-producing CHO cultures were used to expand previous models to a network containing 1,540 reactions and representing 1,302 metabolites in both cytosol and mitochondria (Selvarasu et al., 2012). This model can be further improved by the inclusion of metabolites sequestered to other cellular compartments, including the golgi and endoplasmic reticulum (Selvarasu et al., 2012). Although it represents a modest starting point, this model can be expanded to include additional reactions through quantitative flux

Table V. Key references for systems level CHO models. Reference

Technique(s)

Nicolae et al. (2014) Templeton et al. (2013)

13

Duarte et al. (2014)

13

Selvarasu et al. (2010, 2012)

C MFA C MFA

13

C-pyruvate mapping N/A

Institution(s)

Summary

Saarland University, University of Hamburg Vanderbilt University, Amgen

Central carbon metabolism flux mapped for CHO-K1 batch culture. Characterized metabolism of mAb-producing CHO cell during four phases of fed-batch culture. Metabolic mapping of a CHO-GS cell line with and without serine and asparagine limintations. In silico CHO cell model assembled from fed-batch IgG-producing CHO cell metabolomics data.

IBET, NOVA Bioprocessing Technology Institute, National University of Singapore

Successfully linking multi-‘Omics datasets requires a systems level approach. Recent studies measuring CHO cell reaction networks, and developing in silico models are summarized here.

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

33

techniques, such as stable isotope-assisted metabolomics (Mueller and Heinzle, 2013). Additional work is needed to both expand this model to include more cellular compartments, and validate the accuracy of the model through experimental work. The development of computational CHO models could provide significant advantages to bioprocess optimization by guiding experimental selection and reducing experiment number. These models require accurate reaction data, which has been significantly augmented in recent years using MFA and radio labeled isotopes. Although much of this work has been carried out using non-recombinant cell lines, some metabolomics data have been obtained for mAB-producers. Nonetheless, this area would greatly benefit from additional studies, and in particular MFA studies covering eukaryotic specific metabolic pathways, such as glycosylation.

processes using the same IgG-producing cell line (Schaub et al., 2010). The processes were designated as “high titer” and “low titer.” Using microarray analysis, they determined gene expression was significantly different under the two conditions and varied over the fed-batch time course. In particular, gene expression of lipid metabolism pathways was differentially upregulated in the high titer process. The fed-batch process was repeated using a modified basal media with an increased lipid concentration. This change to media formulation resulted in a 20% increase in titer from 3.18 g/L to 3.83 g/L of protein. Metabolomics is often the preferred tool for rational media design, as depletion of key nutrients can be measured directly. Metabolite profiling of a CHO-GS IgG4-producer was conducted during a batch campaign in chemically defined media (Sellick et al., 2011). Both intra and extracellular measurements were made daily using a novel sample preparation method and gas chromatographymass spectrometry (GC-MS). Researchers identified several media components depleted by early stationary phase including aspartate, asparagine, glutamate, and pyruvate. This information was used to design a simple feed strategy that increased cell biomass 35% and antibody titer by 75%. At Eli Lilly, NMR was used to rapidly measure and monitor intra and extracellular CHO cell metabolites (Bradley et al., 2010). The approach was applied to optimization of CHO cell lines grown in proprietary medium with similar growth profiles, but different productivities. Researchers were able to identify differences in histidine concentrations between conditions, determining that histidine depletion led to low titers. This approach can be adapted for scale-up experiments where culture consistency should be monitored. Biogen Idec utilized LC-MS to characterize four media batches used in a cell culture process (Zang et al., 2011). One batch resulted in low cell viability and growth. Using metabolomics, elevated levels of oxidation and degradation products and

Rational Bioprocess Improvement Using ‘Omics Data Previous sections discuss the collection and analysis of ‘Omics datasets for industrially relevant cell lines and conditions. Once this foundation is firmly established, the information can be used to rationally modify industrial bioprocesses resulting in improved characteristics such as growth, titer, productivity and quality attributes. This last step is a critical component to validate bioprocess improvement, although few examples exist to date. The examples that we do have, which are summarized in Table VI, fall into two categories of improvement: through media formulation or through genetic modifications. Process improvement through media formulation can be rapidly and directly implemented in a bioprocess environment. There are a few published examples where ‘Omics datasets have guided media modifications with a demonstrated positive impact on the process. For example, Boehringer Ingelheim compared two fed-batch

Table VI. Examples of rational bioprocess improvement using ‘Omics data. Reference

Technique(s)

Institution(s)

Schaub et al. (2010)

Transcriptomics

Boehringer Ingelheim

Sellick et al. (2011)

Metabolomics

Bradley et al. (2010) Zang et al. (2011)

Metabolomics

The University of Manchester, Netherlands Institute for Systems Biology Eli Lilly

Metabolomics

Biogen Idec

Doolan et al. (2010) Druz et al. (2011)

Transcriptomics, Proteomics Transcriptomics

Dublin City University, Pfizer, Biogen Idec National Institutes of Health, Johns Hopkins University

Strotbek et al. (2013)

N/A

University of Stuttgart, Boehringer Ingelheim

Summary Microarray analysis to compare high and low-producing fed-batch processes. Lipid metabolism found to be up-regulated in high producer. Lipid supplementation of media resulted in 20% increased in titer. Daily measurements of intra and extracellular metabolites measured during batch bioreactor campaign. Supplementation of depleted nutrients in media led to increased cell biomass and titer. Used NMR methods to efficiently monitor metabolite profiles in a CHO process. Linked histidine depletion with low titer. LC/MS analysis used to characterize a media batch resulting in poor process performance. Degradation products and reduced riboflavin in media batch identified as root causes. Multi-omics approach identified 21 genetic targets associated with growth rate in mAb-secreting cell lines. Validated 5 targets using siRNA knockdown. Gene expression of CHO cells in fresh and glucose depleted media identified 70 differentially expressed miRNAs, including mmu-miR-466h. Demonstrated increased cell viability following transfection with anti-miRNA. Expressed the human microRNA library in a CHO-IgG cell line and used a functional screen to identify miRNAs that impacted productivity. Identified 15 miRNAs that decreased and 9 that increased productivity, which was validated via stable transfection.

Global ‘Omics datasets can be used to identify bioprocess targets and improve process performance through modification. Recent studies demonstrating this approach are summarized here.

34

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

decreased riboflavin were identified in the bad media batch. Researchers were able to reproduce this effect using fresh media, demonstrating that metabolomics can be utilized to both improve and better understand bioprocesses. Data from ‘Omics experiments can often suggest promising genetic targets for improved process characteristics. Although such changes are more time consuming to implement, recent examples indicate that huge improvements to bioprocesses, including growth, titer, and productivity, can be achieved through this approach. Research conducted with Pfizer compared four mAb-secreting cell lines (Doolan et al., 2010), with two considered fast growing, and two slow growing in a batch model. Transcriptome and proteome profiling identified differentially expressed genes correlated with growth rate. Collectively, 21 genetic targets with differential expression at both the mRNA and protein levels were identified. In order to validate findings, researchers selected five targets for siRNA knockdown, including valosin-containing protein (VCP). siRNA knockdown of VCP resulted in more than 40% reduced cell viability by day 3, and transient over-expression resulted in 1.2–2.1-fold improved cell growth by day 5. Transcriptome profiling was used to compare suspensionadapted CHO cells grown in fresh and glucose depleted chemically defined media (Druz et al., 2011). Glucose-depleted media resulted in the onset of apoptosis, reduced viability and elevated caspase-3/7 activity. Microarray technology identified 70 miRNAs differentially expressed between the two conditions with mmu-miR-466h being highly upregulated in the nutrient-depleted media condition. Transfecting cells with anti-miR-466h led to approximately 15% higher cell viability, and decreased caspase-3/7 activity. Work by the University of Stuttgart used stable miRNA expression to improve productivity of an IgG expressing CHO cell line (Strotbek et al., 2013). Researchers first expressed the entire human mimic miRNA library using transient tranfection in the CHO-IgG line. Next, using a genome-wide functional screen, they identified 15 miRNAs which decreased antibody production, and 9 miRNAs which increased antibody production. These hits, including has-miR-557, 662, and 1287, were validated using stable transfection and demonstrated to improve both mAB titer by as much as 1.3-fold after 7 days in culture. These examples demonstrate how ‘Omics data can identify critical genetic targets, and modifications to those targets can result in improved cellular phenotypes. Currently, there are only a handful of studies demonstrating rational bioprocess improvement using ‘Omics technology. Nonetheless, cases continue to emerge and demonstrate that a rational approach has both time and resource saving benefits. As previously discussed, there are examples where ‘Omics datasets have been generated linking CHO cell lines with phenotypic characteristics. These datasets are an excellent source for potential biomarkers and genetic targets, and follow-up studies should be conducted. Additionally, the field would benefit from expanding to include bioprocess parameters, in addition to media formulation and genetic modifications. For example, cellular response to parameters including temperature, pH, gas composition, gas flow rate, and feed strategy could be studied. In doing so, it could be possible to identify biological fingerprints indicating ideal process parameters. With exponentially more information at our fingertips and

sophisticated tools for analysis available, we expect to see rapid growth in this area.

Future Directions of ‘Omics Technologies in Bioprocess Sciences ‘Omics technologies have been widely incorporated into bioprocess development to date. Although the majority of the bioprocess industry still relies on empirical development techniques, it is clear that this is a rapidly evolving landscape. Many of the studies referenced here were sponsored or conducted by major biopharmaceutical and biotechnology companies, including BristolMyers Squibb, Genentech, Pfizer, Life Technologies, Bayer, Boehringer Ingelheim, and Biogen Idec. This is indicative of company resources being allocated toward ‘Omics technologies. Furthermore, we see the continual emergence of improved and new platforms for sequencing, microarray, proteome and metabolite work. This is likely driven in part by a need within the bioprocess industry for more efficient and cost-effective data capture. Pressure on the industry to reduce costs and speed product to market is likely to continue, and will drive a transition from lengthy, empirical experimental designs toward high-throughput, rational designs. A Framework for Integrating ‘Omics Technologies, Systems Biology, and Bioprocessing ‘Omics technologies and systems biology discussed here are powerful tools that can be used to advance bioprocessing science. In order to do so, these approaches must be incorporated into the bioprocessing workflow. Based on the current state of the field, we propose an improved workflow defined in Figure 1. The first step is defining a system of study, including the cell line, process conditions and controlled variables of interest. Selecting a system that is too complex can make data analysis difficult. This can be avoided by using a wellstudied system and appropriate controls. Next, appropriate methods to study the system are selected. Many of these methods, including transcriptomics, proteomics and metabolomics have been discussed in great detail. Data are then collected and analyzed. Analysis is largely aided by databases and bioinformatic tools that have been specifically designed to handle ‘Omics datasets. Separately, the data are interpreted in the context of what is known about the system. This interpretation is greatly facilitated by what is already known about the system itself, or related systems including parental cell lines, other industrial cell lines, and processes that may be similar. However, this step is the primary bottleneck in the workflow. Based on this interpretation, rational improvements can be applied to the system or process. After modification, the process can be repeated on the new system if desired. Improvements to the system are driven by the larger goal of achieving full control of the cell line and process. Each of the sections discussed in this review support advancement of our ability to interpret ‘Omics data in a meaningful way. First, research with parental cell lines provides a common reference state for all data analysis and multiple platforms. Second, industrially relevant ‘Omics datasets can help to distinguish between causation and correlation when interpreting hundreds of changes in cell expression. Next, studies that use ‘Omics to probe specific cellular phenotypes can be mined to identify biomarkers

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

35

and targets linked to those phenotypes. In addition, ‘Omics data tools enable rapid and statistically meaningful analysis across multiple, large datasets. Systems level CHO models improve our accuracy and understanding of cellular pathways, and once sufficiently mature can facilitate in silico experiments including metabolic flux balance analysis. Finally, rational bioprocess improvements illustrate cases where ‘Omics technologies and data interpretation have resulted in successful identification of relevant targets. As an industry, we need to get faster and more successful at identifying those correct targets that enable control of a cell or process phenotype. When ‘Omics technologies were first developed, utility in a bioprocess environment was largely limited by the availability of robust methods. However, that is no longer the case today, as the technology has and continues to advance rapidly, resulting in precise and reliable methods for measuring cellular profiles (Datta et al., 2013; Dietmair et al., 2012b; Kildegaard et al., 2013; Wuest et al., 2012). Instead of being limited by experimental methods, we instead feel the primary limitation now facing bioprocessing is the ability to meaningfully interpret the data we generate. This limitation is evident when reviewing the publicly available research. Many studies have successfully used ‘Omics technologies to identify genes, proteins, and metabolites significantly altered under conditions of interest. However, very few studies have demonstrated successful identification and modification of such targets leading to an improved cell process. Ultimately, these improvements are the justification for ‘Omics resources in bioprocess development. Limitations in ‘Omics Technologies and Bioprocessing While we expect the incorporation of ‘Omics technologies in bioprocess development to continue, there are some critical limitations that must be addressed before technology is widely adopted. While CHO is the most popular industrial platform for therapeutic protein production, genomic resources are limited for this species (Jayapal, 2007), causing many to use related mammalian tools, (rat, mouse, and human) as a substitute (Druz et al., 2011). Although this can be a useful approximation, the most robust and reliable approach will require CHO-specific tools and datasets. This work is currently being addressed in both the public and private domains; however, additional support from federal funding sources would be transformative. Additionally, there is a need for better computational tools to facilitate the large volumes of data now being generated. This is especially the case for proteome and metabolite data, as many features are not present in databases, requiring additional de novo characterization (Patti et al., 2012). Furthermore, there is a need for automated metabolite-to-feature mapping. Currently, one must manually inspect retention times of MS/MS data and compare this to database entries to identify the correct metabolite. While each of the ‘Omics technologies has developed individually, much less effort has been placed on simultaneous advancement of these techniques, and connecting information contained in a cell’s genome, transcriptome, proteome, and metabolome. Understanding these molecular connections requires a systems biology approach and would be greatly facilitated by stoichiometric models. Such models have been successfully

36

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

developed for many other cell types. While a rudimentary model exists for CHO (Selvarasu et al., 2010, 2012), the number of enzymatic reactions is minimal and should be expanded. Furthermore, in silico cellular models would enable the use of sophisticated software tools that have already been developed and tested for many microbial organisms (Zamboni et al., 2008). While many studies to date have focused on one or two ‘Omics technologies, future studies would ideally utilize three to four technologies under the same conditions. This would result in a better understanding of how process conditions impact cellular states at all levels. Another area in need of greater understanding is separating cause and effect in response to cellular perturbations. Development of a bioprocess typically follows a serious of controlled changes, including media formulation, temperature, pH ranges, and gassing, which alter cellular phenotype. These controlled changes to the process result in complex shifts in the ‘Omics profiles. Our ability to measure and quantify these shifts is established; however, we lack the ability to separate the changes causing the phenotypic shift from noise. This challenge is illustrated in a recent study by Genentech looking at a shift from lactate production to consumption. The transcriptome profile showed significant changes in gene expression, but was unable to explain the change in lactate metabolism (Yuk et al., 2014). This is particularly challenging as many phenotypes, including metabolite consumption, growth rates, and productivity, are controlled by many cellular regulators and not the result of a single expression change. One approach to this problem may be to improve our understanding of signal to noise in ‘Omics datasets. This could be achieved by developing robust datasets using parental cell lines and baseline process conditions. Furthermore, our understanding of cause and effect will be improved by a systems level understanding of how eukaryotic cell parts (genes, proteins, and metabolites) interact and regulate each other. Finally, more work needs to be done to identify meaningful links between ‘Omics datasets and desired phenotypic traits. Large datasets can be used to identify universal biomarkers for traits including growth rate, specific productivity, and lactate consumption (Kang et al., 2014). Validation of these biomarkers is a critical and often neglected step. Once validated, these biomarkers can be routinely monitored throughout clone selection and process optimization. Alternatively, biomarkers may be cell line dependant, in which case high throughput ‘Omics strategies need to be developed in order to incorporate this technology into the rapid timelines associated with industrial bioprocesses. The landscape of protein therapeutics is constantly evolving, with staggering growth and improvement since the release of tPA in 1987. The pharmaceutical industry is shifting toward more protein based therapies to tackle increasingly complex diseases. With this shift comes more biological molecules in the pipeline, resulting in pressure for fast and robust process development. While empirical approaches to process development have been highly successful in the past, they are both time and resource intensive. Incorporating intracellular, biophysical characterization through ‘Omics technologies into process development could enable a more rational and efficient methodology. By examining gains in the last 5 years, clear progress toward this goal is evident.

Author Contributions A. Lewis and N. Abu-Absi conceived idea. A. Lewis wrote the manuscript. All authors contributed to editing and finalizing manuscript. References Ahn WS, Antoniewicz MR. 2013. Parallel labeling experiments with [1,2-13C]glucose and [U-13C]glutamine provide new insights into CHO cell metabolism. Metab Eng 15:34–47. Arakawa K, Tomita M. 2013. Merging multiple omics datasets in silico: Statistical analyses and data interpretation. Methods Mol Biol 985:459–470. Aranibar N, Borys M, Mackin N, Ly V, Abu-Absi N, Abu-Absi S, Niemitz M, Schilling B, Li Z, Brock B, Russell R, Tymiak II, Reily A. 2011. NMR-based metabolomics of mammalian cell and tissue cultures. J Biomol NMR 49:195–206. Baycin-Hizal D, Tabb DL, Chaerkady R, Chen L, Lewis NE, Nagarajan H, Sarkaria V, Kumar A, Wolozny D, Colao J, Jacobson E, Tian Y, O’Meally RN, Krag SS, Cole RN, Palsson BO, Zhang H, Betenbaugh M. 2012. Proteomic analysis of Chinese hamster ovary cells. J Proteome Res 11:5265–5276. Becker J, Hackl M, Rupp O, Jakobi T, Schneider J, Szczepanowski R, Bekel T, Borth N, Goesmann A, Grillari J, Kaltschmidt C, Noll T, Puhler A, Tauch A, Brinkrolf K. 2011. Unraveling the Chinese hamster ovary cell line transcriptome by nextgeneration sequencing. J Biotechnol 156:227–235. Bort JAH, Hackl M, H€oflmayer H, Jadhav V, Harreither E, Kumar N, Ernst W, Grillari J, Borth N. 2012. Dynamic mRNA and miRNA profiling of CHO-K1 suspension cell cultures. Biotechnol J 7:500–515. Bradley SA, Ouyang A, Purdie J, Smitka TA, Wang T, Kaerner A. 2010. Fermentanomics: Monitoring mammalian cell cultures with NMR spectroscopy. J Am Chem Soc 132:9531–9533. Carlage T, Hincapie M, Zang L, Lyubarskaya Y, Madden H, Mhatre R, Hancock WS. 2009. Proteomic profiling of a high-producing Chinese hamster ovary cell culture. Anal Chem 81:7357–7362. Casci T 2012. Bioinformatics: Next-generation omics. Nat Rev Genet 13:378–379. Chong WP, Thng SH, Hiu AP, Lee DY, Chan EC, Ho YS. 2012. LC-MS-based metabolic characterization of high monoclonal antibody-producing Chinese hamster ovary cells. Biotechnol Bioeng 109:3103–3111. Chrysanthopoulos PK, Goudar CT, Klapa MI. 2010. Metabolomics for highresolution monitoring of the cellular physiological state in cell culture engineering. Metab Eng 12:212–222. Clarke C, Doolan P, Barron N, Meleady P, O’Sullivan F, Gammell P, Melville M, Leonard M, Clynes M. 2011a. Large scale microarray profiling and coexpression network analysis of CHO cells identifies transcriptional modules associated with growth and productivity. J Biotechnol 155:350–359. Clarke C, Doolan P, Barron N, Meleady P, O’Sullivan F, Gammell P, Melville M, Leonard M, Clynes M. 2011b. Predicting cell-specific productivity from CHO gene expression. J Biotechnol 151:159–165. Clarke C, Doolan P, Barron N, Meleady P, Madden SF, DiNino D, Leonard M, Clynes M. 2012. CGCDB: A web-based resource for the investigation of gene coexpression in CHO cell culture. Biotechnol Bioeng 109:1368– 1370. Courtes FC, Lin J, Lim HL, Ng SW, Wong NS, Koh G, Vardy L, Yap MG, Loo B, Lee DY. 2013. Translatome analysis of CHO cells to identify key growth genes. J Biotechnol 167:215–224. Datta P, Linhardt RJ, Sharfstein ST. 2013. An ’omics approach towards CHO cell engineering. Biotechnol Bioeng 110:1255–1271. de Jong B, Siewers V, Nielsen J. 2012. Systems biology of yeast: Enabling technology for development of cell factories for production of advanced biofuels. Curr Opin Biotechnol 23:624–630. Dietmair S, Hodson MP, Quek LE, Timmins NE, Chrysanthopoulos P, Jacob SS, Gray P, Nielsen LK. 2012a. Metabolite profiling of CHO cells with different growth characteristics. Biotechnol Bioeng 109:1404–1414. Dietmair S, Nielsen LK, Timmins NE. 2012b. Mammalian cells as biopharmaceutical production hosts in the age of omics. Biotechnol J 7:75–89. Doolan P, Meleady P, Barron N, Henry M, Gallagher R, Gammell P, Melville M, Sinacore M, McCarthy K, Leonard M, Charlebois T, Clynes M. 2010. Microarray and proteomics expression profiling identifies several candidates, including the

valosin-containing protein (VCP), involved in regulating high cellular growth rate in production CHO cell lines. Biotechnol Bioeng 106:42–56. Doolan P, Clarke C, Kinsella P, Breen L, Meleady P, Leonard M, Zhang L, Clynes M, Aherne ST, Barron N. 2013. Transcriptomic analysis of clonal growth rate variation during CHO cell line development. J Biotechnol 166:105–113. Dorai H, Liu S, Yao X, Wang Y, Tekindemir U, Lewis MJ, Wu SL, Hancock W. 2013. Proteomic analysis of bioreactor cultures of an antibody expressing CHOGS cell line that promotes high productivity. J Proteomics Bioinform 6:99–108. Druz A, Chu C, Majors B, Santuary R, Betenbaugh M, Shiloach J. 2011. A novel microRNA mmu-miR-466h affects apoptosis regulation in mammalian cells. Biotechnol Bioeng 108:1651–1661. Duarte TM, Carinhas N, Barreiro LC, Carrondo MJT, Alves PM, Teixeira AP. 2014. Metabolic responses of CHO cells to limitation of key amino acids. Biotechnol Bioeng 13:378–379. Garcia-Alcalde F, Garcia-Lopez F, Dopazo J, Conesa A. 2011. Paintomics: A web based tool for the joint visualization of transcriptomics and metabolomics data. Bioinformatics 27:137–139. Ghorbaniaghdam A, Chen J, Henry O, Jolicoeur M. 2014. Analyzing clonal variation of monoclonal antibody-producing CHO cell lines using an in silico metabolomic platform. PLoS ONE 9(8):e104725. Hacker DL 2008. Recombinant protein production yields from mammalian cells: Past, present, and future. BioPharm Int. Hammond S, Swanberg JC, Kaplarevic M, Lee KH. 2011. Genomic sequencing and analysis of a Chinese hamster ovary cell line using Illumina sequencing technology. BMC Genomics 12:67. Hammond S, Kaplarevic M, Borth N, Betenbaugh MJ, Lee KH. 2012. Chinese hamster genome database: An online resource for the CHO community at www. CHOgenome.org. Biotechnol Bioeng 109:1353–1356. Jayapal KP 2007. Recombinant protein therapeutics from CHO cells—20 years and counting. Chem Eng Progress 103:40–47. Kang S, Ren D, Xiao G, Daris K, Buck L, Enyenihi AA, Zubarev R, Bondarenko PV, Deshpande R. 2014. Cell line profiling to improve monoclonal antibody production. Biotechnol Bioeng 111:748–760. Kantardjieff A, Jacob NM, Yee JC, Epstein E, Kok YJ, Philp R, Betenbaugh M, Hu WS. 2010. Transcriptome and proteome analysis of Chinese hamster ovary cells under low temperature and butyrate treatment. J Biotechnol 145: 143–159. Kaufman RJ, Wasley LC, Spiliotes AJ, Gossels SD, Latt SA, Larsen GR, Kay RM. 1985. Coamplification and coexpression of human tissue-type plasminogen activator and murine dihydrofolate reductase sequences in Chinese hamster ovary cells. Mol Cell Biol 5:1750–1759. Kildegaard HF, Baycin-Hizal D, Lewis NE, Betenbaugh MJ. 2013. The emerging CHO systems biology era: Harnessing the ’omics revolution for biotechnology. Curr Opin Biotechnol 24(6):1102–1107. Kim J-S, Kim S-J, Park H-W, Youn J-P, An Y, Cho H, Hwang S. 2010. Array2KEGG: Web-based tool of KEGG pathway analysis for gene expression profile. BioChip J 4:134–140. Kim JY, Kim YG, Lee GM. 2012. CHO cells in biotechnology for production of recombinant proteins: Current state and further potential. Appl Microbiol Biotechnol 93:917–930. Kotera M, Hirakawa M, Tokimatsu T, Goto S, Kanehisa M. 2012. The KEGG databases and tools facilitating omics analysis: Latest developments involving human diseases and pharmaceuticals. Methods Mol Biol 802:19–39. LaMarche BL, Crowell KL, Jaitly N, Petyuk VA, Shah AR, Polpitiya AD, Sandoval JD, Kiebel GR, Monroe ME, Callister SJ, Metz TO, Anderson GA, Smith RD. 2013. MultiAlign: A multiple LC-MS analysis tool for targeted omics analysis. BMC Bioinformatics 14:49. Levy NE, Valente KN, Choe LH, Lee KH, Lenhoff AM. 2014. Identification and characterization of host cell protein product-associated impurities in monoclonal antibody bioprocessing. Biotechnol Bioeng 111:904–912. Lewis NE, Liu X, Li Y, Nagarajan H, Yerganian G, O’Brien E, Bordbar A, Roth AM, Rosenbloom J, Bian C, Xie M, Chen W, Li N, Baycin-Hizal D, Latif H, Forster J, Betenbaugh MJ, Famili I, Xu X, Wang J, Palsson BO. 2013. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome. Nat Biotech 31:759–765. Lim UM, Yap MGS, Lim YP, Goh L-T, Ng SK. 2013. Identification of autocrine growth factors secreted by CHO cells for applications in single-cell cloning media. J Proteome Res 12:3496–3510.

Lewis et al.: The Use of ‘Omics Technology Biotechnology and Bioengineering

37

Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B. 2012. RobiNA: A user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 40:W622–W627. Luo J, Vijayasankaran N, Autsen J, Santuray R, Hudson T, Amanullah A, Li F. 2012. Comparative metabolite analysis to understand lactate metabolism shift in Chinese hamster ovary cell culture process. Biotechnol Bioeng 109:146–156. Ma N, Ellet J, Okediadi C, Hermes P, McCormick E, Casnocha S. 2009. A single nutrient feed supports both chemically defined NS0 and CHO fed-batch processes: Improved productivity and lactate metabolism. Biotechnol Prog 25:1353–1363. Meleady P, Hoffrogge R, Henry M, Rupp O, Bort JH, Clarke C, Brinkrolf K, Kelly S, M€uller B, Doolan P, Hackl M, Beckmann TF, Noll T, Grillari J, Barron N, P€uhler A, Clynes M, Borth N. 2012. Utilization and evaluation of CHO-specific sequence databases for mass spectrometry based proteomics. Biotechnol Bioeng 109:1386–1394. Mohmad-Saberi SE, Hashim YZ, Mel M, Amid A, Ahmad-Raus R, Packeer-Mohamed V. 2013. Metabolomics profiling of extracellular metabolites in CHO-K1 cells cultured in different types of growth media. Cytotechnology 65:577–586. Mueller D, Heinzle E. 2013. Stable isotope-assisted metabolomics to detect metabolic flux changes in mammalian cell cultures. Curr Opin Biotechnol 24:54–59. Nicolae A, Wahrheit J, Bahnemann J, Zeng AP, Heinzle E. 2014. Non-stationary 13C metabolic flux analysis of Chinese hamster ovary cells in batch culture using extracellular labeling highlights metabolic reversibility and compartmentation. BMC Systems Biol 8:50. doi:10.1186/1752-0509-8-50 North SJ, Huang HH, Sundaram S, Jang-Lee J, Etienne AT, Trollope A, Chalabi S, Dell A, Stanley P, Haslam SM. 2010. Glycomics profiling of Chinese hamster ovary cell glycosylation mutants reveals N-glycans of a novel size and complexity. J Biol Chem 285:5759–5775. Pascoe DE, Arnott D, Papoutsakis ET, Miller WM, Andersen DC. 2007. Proteome analysis of antibody-producing CHO cell lines with different metabolic profiles. Biotechnol Bioeng 98:391–410. Patti GJ, Yanes O, Siuzdak G. 2012. Innovation: Metabolomics: The apogee of the omics trilogy. Nat Rev Mol Cell Biol 13:263–269. Pilobello KT, Slawek DE, Mahal LK. 2007. A ratiometric lectin microarray approach to analysis of the dynamic mammalian glycome. Proc Natl Acad Sci USA 104:11534–11539. Rupp O, Becker J, Brinkrolf K, Timmermann C, Borth N, P€uhler A, Noll T, Goesmann A. 2014. Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines. PLoS ONE 9:e85568. Salvioli A, Bonfante P. 2013. Systems biology and “omics” tools: A cooperation for next-generation mycorrhizal studies. Plant Sci 203-204:107–114. Schaub J, Clemens C, Schorn P, Hildebrandt T, Rust W, Mennerich D, Kaufmann H, Schulz TW. 2010. CHO gene expression profiling in biopharmaceutical process analysis and design. Biotechnol Bioeng 105:431–438. Schaub J, Clemens C, Kaufmann H, Schulz TW. 2012. Advancing biopharmaceutical process development by system-level data analysis and integration of omics data. Adv Biochem Eng Biotechnol 127:133–163. Sellick CA, Croxford AS, Maqsood AR, Stephens G, Westerhoff HV, Goodacre R, Dickson AJ. 2011. Metabolite profiling of recombinant CHO cells: Designing tailored feeding regimes that enhance recombinant antibody production. Biotechnol Bioeng 108:3025–3031. Selvarasu S, Karimi IA, Ghim GH, Lee DY. 2010. Genome-scale modeling and in silico analysis of mouse cell metabolic network. Mol Biosyst 6:152–161.

38

Biotechnology and Bioengineering, Vol. 113, No. 1, January, 2016

Selvarasu S, Ho YS, Chong WP, Wong NS, Yusufi FN, Lee YY, Yap MG, Lee DY. 2012. Combined in silico modeling and metabolomics analysis to characterize fedbatch CHO cell culture. Biotechnol Bioeng 109:1415–1429. Slade PG, Hajivandi M, Bartel CM, Gorfien SF. 2012. Identifying the CHO secretome using mucin-type O-linked glycosylation and click-chemistry. J Proteome Res 11:6175–6186. Strotbek M, Florin L, Koenitzer J, Tolstrup A, Kaufmann H, Hausser A, Olayioye MA. 2013. Stable microRNA expression enhances therapeutic antibody productivity of Chinese hamster ovary cells. Metab Eng 20:157–166. Taverner T, Karpievitch YV, Polpitiya AD, Brown JN, Dabney AR, Anderson GA, Smith RD. 2012. DanteR: An extensible R-based tool for quantitative analysis of -omics data. Bioinformatics 28:2404–2406. Templeton N, Dean J, Reddy P, Young JD. 2013. Peak antibody production is associated with increased oxidative metabolism in an industrially relevant fedbatch CHO cell culture. Biotechnol Bioeng 110:2013–2024. Tep S, Hincapie M, Hancock WS. 2012. The characterization and quantitation of glycomic changes in CHO cells during a bioreactor campaign. Biotechnol Bioeng 109:3007–3017. Valente KN, Lenhoff AM, Lee KH. 2014a. Expression of difficult-to-remove host cell protein impurities during extended Chinese hamster ovary cell culture and their impact on continuous bioprocessing. Biotechnol Bioeng 112(6): 1232–1242. Valente KN, Schaefer AK, Kempton HR, Lenhoff AM, Lee KH. 2014b. Recovery of Chinese hamster ovary host cell proteins for proteomic analysis. Biotechnol J 9:87–99. Vucic EA, Thu KL, Robison K, Rybaczyk LA, Chari R, Alvarez CE, Lam WL. 2012. Translating cancer ’omics’ to improved outcomes. Genome Res 22:188–195. Wuest DM, Harcum SW, Lee KH. 2012. Genomics in mammalian cell culture bioprocessing. Biotechnol Adv 30:629–638. Xia J, Fjell CD, Mayer ML, Pena OM, Wishart DS, Hancock RE. 2013. INMEX—a web-based tool for integrative meta-analysis of expression data. Nucleic Acids Res 41:W63–W70. Xu X, Nagarajan H, Lewis NE, Pan S, Cai Z, Liu X, Chen W, Xie M, Wang W, Hammond S, Andersen MR, Neff N, Passarelli B, Koh W, Fan HC, Wang J, Gui Y, Lee KH, Betenbaugh MJ, Quake SR, Famili I, Palsson BO. 2011. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat Biotechnol 29:735–741. Young JD 2013. Metabolic flux rewiring in mammalian cell cultures. Curr Opin Biotechnol 24:1108–1115. Yuk IH, Zhang JD, Ebeling M, Berrera M, Gomez N, Werz S, Meiringer C, Shao Z, Swanberg JC, Lee KH, Luo J, Szperalski B. 2014. Effects of copper on CHO cells: Insights from gene expression analyses. Biotechnol Prog 30:429–442. Zamboni N, Kummel A, Heinemann M. 2008. AnNET: A tool for network-embedded thermodynamic analysis of quantitative metabolome data. BMC Bioinformatics 9:199. Zang L, Frenkel R, Simeone J, Lanan M, Byers M, Lyubarskaya Y. 2011. Metabolomics profiling of cell culture media leading to the identification of riboflavin photosensitized degradation of tryptophan causing slow growth in cell culture. Anal Chem 83:5422–5430. Zhu J, Sova P, Xu Q, Dombek KM, Xu EY, Vu H, Tu Z, Brem RB, Bumgarner RE, Schadt EE. 2012. Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol 10:e1001301.

The use of 'Omics technology to rationally improve industrial mammalian cell line performance.

Biologics represent an increasingly important class of therapeutics, with 7 of the 10 top selling drugs from 2013 being in this class. Furthermore, he...
297KB Sizes 2 Downloads 8 Views