TRANSACTIONS OF THE AMERICAN CLINICAL AND CLIMATOLOGICAL ASSOCIATION, VOL. 125, 2014

“REVERSE GENOMICS” AND HUMAN ENDOGENOUS RETROVIRUSES DAVID M. MARKOVITZ, MD ANN ARBOR, MICHIGAN

ABSTRACT Over millions of years, actively replicating retroviruses entered the human genome and through time became a stable and substantial part of the inherited genetic material. A remarkable 8% of the human genome is accounted for by endogenous retroviruses, whose biological importance has not yet been elucidated. In studying the RNA of these endogenous retroviruses in the blood of living human subjects with HIV infection, we have discovered a whole new family of these viruses that had been hidden in the centromeres of specific human chromosomes. These retroviruses have specific sequences that can elucidate their chromosome of origin. As centromeres represent the most substantial remaining frontier of human genomics, these viral sequences can provide a “bar-code” that can be used to study the role of centromeres in biology and in disease. This work also highlights the efficacy of using “reverse genomics” to understand and annotate the human genome.

HUMAN ENDOGENOUS RETROVIRUSES The 98% of the human genome that does not encode cellular proteins has sometimes been called “junk DNA.” However, this point of view has been increasingly challenged, particularly with the recent understanding that much of this DNA encodes regulatory elements (1). Intriguingly, fully 8% of the human genome is actually made up of old retroviruses, which are referred to as human endogenous retroviruses (HERVs). HERVs came about when actively replicating retroviruses entered into the human genome over the course of millions of years and became a stable part of our inherited genetic material (2, 3). These viruses subsequently acquired multiple mutations, leading to the widely held assumption that they are no longer competent to replicate. Correspondence and reprint requests: David M. Markovitz, MD, Department of Internal Medicine, Division of Infectious Diseases, University of Michigan Medical Center, 5220 MSRB III, 1150 West Medical Center Drive, Ann Arbor, MI 48109-0640, Tel: 734-647-1786, Fax: 734-764-0101, E-mail: [email protected]. Potential Conflicts of Interest: None disclosed.

57

58

DAVID M. MARKOVITZ

However, in studying living patients rather than the standard cell lines, our research group has recently discovered surprising evidence suggesting that, in certain patients with HIV infection or cancer, HERVs might still be capable of replication (or at least passage) in modern humans (4, 5). In this work, we have examined HERV-K (HML-2), an endogenous retrovirus that is a relatively recent entrant into the human genome and is highly transcribed. Although investigations of the replication competency of HERV-K HML-2 are not the subject of the present paper, it is these studies in human patients that led to the findings described below. DISCOVERY OF A NEW FAMILY OF HERVs Recent studies in our laboratory have revealed that the RNA from HERV-K HML-2 viruses found in the blood of patients with HIV or lymphoma show evidence of recombination (4, 5). This, along with other factors detailed elsewhere (4, 5), have prompted us to look for fully replication competent HERV-K in the blood of human patients with HIV. In doing so, we made the striking discovery of a new family of HERV-K HML-2 viruses, termed K111 (6). Based on the sequence of this virus, we can clearly state that it is not replication competent. When we first identified K111 in the blood of patients with HIV, it was unclear where the virus came from, as it was not present in the annotated version of the human genome. However, one copy of a K111-like virus was found in the arm of chromosome 7 in the newly available chimpanzee genome sequence. Using this information, we were able to show that K111 is totally lacking in more primitive primate ancestors, such as tamarins, marmosets, African green monkeys, rhesus monkeys, crab-eating macaques, olive baboons, yellow baboons, orangutans, and gorillas. K111 is present in one copy in chimpanzees, in approximately five copies in our ancient cousins the Neanderthals and Denisovans, and in likely hundreds of copies in modern human beings (Figure 1). Most interestingly, K111 RNA sequences are found essentially only in the blood of HIV patients, although K111 viral DNA sequences are found in the genomes of almost all human beings studied in our laboratory. We discovered that at least one of the reasons why K111 is present at such high titers in the blood of patients with HIV is that the Tat protein of HIV, which is known to activate the transcription of cellular genes as well as of HIV itself, is activating the expression of K111 (6). This stimulation is likely due to several reasons. First, we have recently shown that Tat can activate HERV-K via a mechanism involving upstream transcription factors

REVERSE GENOMICS

59

FIG. 1. Phylogeny of New World monkeys, Old World monkeys, and hominoids (humans and apes). Estimated times of divergence are shown (MYA, million years ago).

(7). Second, K111 is located in centromeres, which are areas of very compact chromatin, and we showed that Tat is able to open up the chromatin over K111, thus allowing for transcription of the virus to proceed (6). K111 AS A CENTROMERIC VIRUS In contrast to the K111-like virus in chimps, the sequences surrounding human K111 indicated that it is found in centromeres. This was confirmed using chromatin immunoprecipitation assays. Further confirmation of the existence of K111 in the human genome was obtained through deep-sequencing efforts. Intriguingly, K111 is found in 15 different human centromeres. Further, K111 appears to have spread both within centromeres and from centromere to centromere in a process resembling homologous recombination. Each centromere contains a virus with a specific sequence, and some centromeres contain many K111s where as others contain only one (6). Thus, an entire family of HERVs that was not found in the annotated version of the human genome had spread from centromere to centromere. BIOLOGICAL/CLINICAL IMPLICATIONS OF K111 At this time, the direct biological effects of K111 are unknown. However, several interesting possibilities exist that are currently un-

60

DAVID M. MARKOVITZ

der exploration in our laboratory. First, although K111 is not a replicatively competent virus, and most of its proteins have been inactivated through mutagenesis, K111 does encode one potentially important protein, Np9. Np9 is an accessory protein of HERV-Ks that is a putative oncogene (8, 9). Therefore, expression of Np9 encoded by the numerous copies of K111 may have importance in the development of cancer. Indeed, HERV expression has been linked to a number of malignancies, as well as to autoimmunity (10 –14). In the case of K111, and so Np9, this link might be predicted to be strongest in malignancies like lymphoma that are found much more frequently in patients who have HIV infection (5). In addition, as K111 has expanded as evolution has proceeded, the possibility exists that K111, and Np9 in particular, may play a role in the evolutionary development of primates. Specifically, as Np9 stimulates both the Notch and Wnt signaling pathways (15, 16), we suggest that Np9 encoded by K111 might play a role in brain development. Finally, K111 might encode interfering RNAs that could have an effect on neighboring genes. In particular, as K111 is activated by HIV expression (6), it will be of importance to understand whether K111 can affect HIV replication or pathogenesis. “REVERSE GENOMICS”: A NEW MECHANISM FOR GENE DISCOVERY? Although other groups have investigated the possibility of gene discovery through RNA analysis in cell lines, it is particularly intriguing that we were able to discover a whole new family of endogenous retroviruses that were not present in the annotated version of the human genome through the study of blood from human patients (6). This study therefore suggests that the blood of both patients and normal subjects may be a very fruitful place to look for expression of genes or other genomic elements that are not present in the annotated human genome. This approach is particularly intriguing for the largely unmapped areas of the genome such as centromeres, but could also yield information from other segments of the genome. However, further studies will be necessary to understand whether “reverse genomics” only works in isolated cases, such as the one presented here, or whether it can serve as a broad mechanism for discovery. K111: A “BAR-CODE” FOR HUMAN CENTROMERES? The centromere is the largest remaining frontier of human genomics (17). It has traditionally been very hard to annotate the DNA sequences of the centromere, as the highly repetitive nature of the DNA

REVERSE GENOMICS

61

found there has hindered the ability to align the sequences in the appropriate order. As we have now discovered that K111s are distinctive within given centromeres, these sequences and the surrounding sequences may provide a “bar-coding” mechanism that might assist in ultimately obtaining sufficient information to sequence and annotate the human centromeres. In addition, this type of centromeric barcoding can be used to study the role of the dynamic centromere, which is crucial to the partitioning of chromosomes (18, 19) in health and disease. Indeed, it has previously been shown that in certain hematologic malignancies there is loss of centromeric material (20). Ongoing experiments in our group are further exploring the behavior and evolution of centromeres in cancer. ACKNOWLEDGMENTS The author would like to acknowledge the many collaborators who have made this project possible. In particular, he would like to thank Drs Mark Kaplan, Rafael ContrerasGalindo, Marta Gonzalez-Hernandez, Derek Dube, Scott Gitlin, Fan Meng, and Gil Omenn. Financial support came primarily from a transformative R01 from the Office of the Director of the National Institutes of Health (CA144043). The author would also like to thank Shannon Kenney, Mike Cohen, and Larry Boxer for their support. The author would like to further acknowledge Lee Sabath, Joseph Pagano, and Gary Nabel, a group of terrific and inspiring mentors. Finally, the author would like to thank JMM for his pursuit of veracity, LH for being an academic role model, BLM for ego strength, and RHM for so much.

REFERENCES 1. Lindblad-Toh K, Garber M, Zuk O, et al. Nature 2011;478:476 – 82. doi:10.1038/ nature10530. 2. Jern P, Coffin JM. Effects of retroviruses on host genome function. Annu Rev Genet 2008;42:709 –32. doi: 10.1146/annurev.genet.42.110807.091501. 3. Nelson PN, Carnegie PR, Martin J, et al. Demystified. Human endogenous retroviruses. Mol Pathol 2003;56:11– 8. 4. Contreras-Galindo R, Kaplan MH, Leissner P, et al. Human endogenous retrovirus-K (HML-2) elements in the plasma of people with lymphoma and breast cancer. J Virol 2008;82(19):9329 –36. PMCID: PMC2546968. 5. Contreras-Galindo R, Kaplan MH, Contreras-Galindo AC, et al. Characterization of human endogenous retroviral elements in the blood of HIV-1-infected individuals. J Virol 2012;86(1):262–76. PMICID: PMC3255917. 6. Contreras-Galindo R, Kaplan MH, He S, et al. HIV infection reveals wide-spread expansion of novel centromeric human endogenous retroviruses. Genome Res 2013;23(9):1505–13. PMCID: PMC3759726. 7. Gonzalez-Hernandez MJ, Swanson MD, Contreras-Galindo R, et al. Expression of human endogenous retrovirus type-K (HML-2) is activated by the Tat protein of HIV-1. J Virol 2012;86(15):7790 – 805. PMICD: PMC3421662. 8. Armbruester V, Sauter M, Krautkraemer E, et al. A novel gene from the human endogenous retrovirus K expressed in transformed cells. Clin Cancer Res 2002;8(6):1800 –7.

62

DAVID M. MARKOVITZ

9. Magin C, Lower R, Lower J. cORF and RcRE, the Rev/Rex and RRE/RxRE homologues of the human endogenous retrovirus family HTDV/HERV-K. J Virol 1999;73(11):9496 –507. PMCID: PMC112984. 10. Bannert N, Kurth R. Retroelements and the human genome: new perspectives on an old relation. Proc Natl Acad Sci U S A 2004;101(suppl 2):14572–9. 11. Lower R, Lower J, Kurth R. The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. Proc Natl Acad Sci U S A 1996;93(11):5177– 84. 12. Dolei A. Endogenous retroviruses and human disease. Expert Rev Clin Immunol 2006;2(1):149 – 67. 13. Wang-Johanning F, Frost AR, Jian B, Epp L, Lu DW, Johanning GL. Quantitation of HERV-K env gene expression and splicing in human breast cancer. Oncogene 2003 22(10):1528 –35. 14. Wang-Johanning F, Li M, Esteva F, et al. Human endogenous retrovirus type K antibodies and mRNA as serum biomarkers of early-stage breast cancer. Int J Cancer 2013;134(3):587–95. doi: 10.1002/ijc.28389. Epub 2013 Sep 13. 15. Chen T, Meng Z, Gan Y, et al. The viral oncogene Np9 acts as a critical molecular switch for co-activating beta-catenin, ERK, Akt and Notch1 and promoting the growth of human leukemia stem/progenitor cells. Leukemia 2013;27(7):1469 –78. 16. Armbruester V, Sauter M, Roemer K, et al. Np9 protein of human endogenous retrovirus K interacts with ligand of numb protein X. J Virol 2004;78(19):10310 –9. PMCID: PMC516385. 17. Hayden KE. Human centromere genomics: now it’s personal. Chromosome Res 2012;20(5):621–33. doi: 10.1007/s10577-012-9295-y. 18. Kalitsis P, Choo KH. The evolutionary life cycle of the resilient centromere. Chromosoma 2012;121(4):327– 40. 19. Malik HS. Mimulus finds centromeres in the driver’s seat. Trends Ecol Evol 2005;20(4):151– 4. 20. Mackinnon RN, Campbell LI. The role of dicentric chromosome formation and secondary centromere deletion in the evolution of myeloid malignance. Genet Res Int 2011;2011:643628. PMCID: PMC3335544.

DISCUSSION Kenney, Madison: That was a great talk. A few years ago, a paper came out in Science saying that the most highly transcribed transcripts in cancer was a gamma satellite repeat that’s normally suppressed by heterochromatin. I am wondering how this relates to that finding, and if you think this increased transcription that you are seeing in these retroviruses in cancer is due to their generalized hypomethylation. What’s your explanation for that? Markovitz, Ann Arbor: So Shannon, several comments on that. I don’t know, specifically, if this is the same as the satellite information. I think you asked me that before, and I should’ve looked into this after you asked me the first time. But it certainly would make a lot of sense though, because obviously, a lot of the satellite DNA comes from the centromeres. And I didn’t mention, but K111 is in probably a few telomeres too, so that’s a very good thought. And there is a very pronounced increase in HERV-K transcripts — especially the HERV-Ks, the youngest — in cancers. And we’ve looked at that and shown that, along with some other groups. The question has been, is it causative, and I’m not sure. I think actually the best disease causation comes in sort of a funny angle that what people have shown, and convincingly, is that HERV-K antigens

REVERSE GENOMICS

63

are really good targets for cancer and probably HIV immunotherapy. But the actual causative effect on cancer is a little unclear. There are also a couple of oncogenes in HERV-K, so there is a possibility. Hochberg, Baltimore: Do anti-centromere antibodies recognize HERV-K? Markovitz, Ann Arbor: I’m sorry. I wasn’t even able to show it, but in our published papers we’ve shown when we’ve wanted to see whether K111 was really in the centromere. The original data all came from sequences, because the surrounding area was clearly from centromeres, so-called CERs, but we wanted to clarify that it really was in centromeres. Interestingly enough, among the 10,000 questions asked by reviewers, that wasn’t one of them. But you have, and that’s good. We actually did chromatin immunoprecipitation assays using CENPA and CENPB to clarify that it’s really in the centromere, and it is. I didn’t have time to show that but very good question. Howley, Boston: Your involvement of Tat in the activation suggests the potential role for the bromodomain protein Brd4. I’m wondering if you’ve queried that, and looked at the effect of the BET domain inhibitors on its activation? Markovitz, Ann Arbor: We have not. That’s a great question, and perhaps I can talk to you more about doing that. Cohen, Chapel Hill: Thanks, David, for your talk. I am curious about kind of a different aspect of this that’s kind of speculative and that is, could you say a bit more about the archeology of endogenous retroviruses and also the risk that HIV will become an endogenous retrovirus, which has been something we’ve talked about endlessly? Markovitz, Ann Arbor: So certainly HIV would have the opportunity to become an endogenous retrovirus. As you know obviously, there are things that guard against that, or try to, like APOBEC, which actually is active against HERVs but obviously didn’t work all the time. So certainly HIV could do that. I’m sorry, your second question? Oh, the history of . . . . Oh yes, that’s right. I didn’t mention. These things have come into our germ line over millions and millions or years, like HERV-K as the youngster in our genome, meaning 5 million to 200,000 years. So they are very ancient, and they probably are important. And there is data to suggest, for example, that some of the HERVs are involved. One of the HERVs is involved in syncyciotrophoblast formation too. Did that answer your question?

"Reverse genomics" and human endogenous retroviruses.

Over millions of years, actively replicating retroviruses entered the human genome and through time became a stable and substantial part of the inheri...
99KB Sizes 1 Downloads 11 Views