ARTICLE Received 9 Jun 2016 | Accepted 28 Dec 2016 | Published 13 Feb 2017

DOI: 10.1038/ncomms14421

OPEN

Recurrently deregulated lncRNAs in hepatocellular carcinoma Yang Yang1,*, Lei Chen2,3,*,**, Jin Gu4,*, Hanshuo Zhang5,*, Jiapei Yuan1, Qiuyu Lian4, Guishuai Lv2, Siqi Wang1, Yang Wu1, Yu-Cheng T. Yang1, Dongfang Wang4, Yang Liu1, Jing Tang6, Guijuan Luo2, Yang Li1, Long Hu1, Xinbao Sun1, Dong Wang1, Mingzhou Guo7, Qiaoran Xi1, Jianzhong Xi5, Hongyang Wang2,3,**, Michael Q. Zhang1,4,8,** & Zhi John Lu1,**

Hepatocellular carcinoma (HCC) cells often invade the portal venous system and subsequently develop into portal vein tumour thrombosis (PVTT). Long noncoding RNAs (lncRNAs) have been associated with HCC, but a comprehensive analysis of their specific association with HCC metastasis has not been conducted. Here, by analysing 60 clinical samples’ RNA-seq data from 20 HCC patients, we have identified and characterized 8,603 candidate lncRNAs. The expression patterns of 917 recurrently deregulated lncRNAs are correlated with clinical data in a TCGA cohort and published liver cancer data. Matched array data from the 60 samples show that copy number variations (CNVs) and alterations in DNA methylation contribute to the observed recurrent deregulation of 235 lncRNAs. Many recurrently deregulated lncRNAs are enriched in co-expressed clusters of genes related to cell adhesion, immune response and metabolic processes. Candidate lncRNAs related to metastasis, such as HAND2-AS1, were further validated using RNAi-based loss-of-function assays. Thus, we provide a valuable resource of functional lncRNAs and biomarkers associated with HCC tumorigenesis and metastasis.

1 MOE

Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China. 2 International Co-operation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, Second Military Medical University, Shanghai 200438, China. 3 National Center for Liver Cancer, Shanghai 201805, China. 4 Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China. 5 Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China. 6 Department of Neurosurgery, Wuhan General Hospital of Guangzhou Command, Wuhan Hubei 430070, China. 7 Department of Gastroenterology & Hepatology, Chinese PLA General Hospital, #28 Fuxing Road, Beijing 100853, China. 8 Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, 800 West Campbell Road, RL11 Richardson, Texas 75080-3021, USA. * These authors contributed equally to this work. ** These authors jointly supervised this work. Correspondence and requests for materials should be addressed to H.W. (email: [email protected]) or to M.Q.Z. (email: [email protected]) or to Z.J.L. (email: [email protected]). NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

1

ARTICLE

H

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

epatocellular carcinoma (HCC) is one of the most common and aggressive human malignancies1. The dismal clinical outcome of HCC is largely due to the high incidence of intrahepatic and extrahepatic metastasis in HCC patients2. HCC cells are highly likely to develop into portal vein tumour thrombosis (PVTT), which is the main route for intra-hepatic metastasis of HCC (ref. 3). Therefore, PVTT is closely associated with poor prognosis for HCC patients4. Several long noncoding RNAs (lncRNAs), including H19 (ref. 5), HOTAIR (ref. 6) and HULC (ref. 7), are directly involved in tumorigenesis and metastasis of various types of cancer. Recent studies have also revealed the pro-metastasis mechanisms through which some lncRNAs contribute to the activation of epithelial-to-mesenchymal transition networks, including activation of the WNT (ref. 8) and TGF-b signalling pathways9. Although several studies have assessed the contributions of individual lncRNAs to the development of HCC, the functions and mechanisms of only a few lncRNAs in HCC tumorigenesis and metastasis are understood in detail10,11. Moreover, efforts at systematic identification and characterization of candidate lncRNAs involved in HCC, especially those involved in HCC metastasis, remain at an early stage. A recent study based on The Cancer Genome Atlas (TCGA, http://cancergenome.nih.gov/) revealed the existence of more than 50,000 lncRNAs (designated MiTranscriptome lncRNAs) in the human transcriptome (generated from various tumours, normal tissues and cell lines)12, of which more than 80% were not reported in previous studies or found in databases (that is, GENCODE (ref. 13) and Refseq (ref. 14)). This study demonstrates the genomic diversity and expression specificity of lncRNAs, while suggesting that more lncRNAs will be discovered as additional tumour and cell types (for example, metastatic samples) are sequenced. Remarkably, studies suggest that approximately 88% of singlenucleotide polymorphisms (SNPs) in the human genome are within noncoding regions15, suggesting that many noncoding RNAs and DNA regulatory elements (for example, promoters and enhancers) have functional roles. Indeed, some lncRNAs play important roles in diverse cellular process, such as cell differentiation16, cell death and tumorigenesis17. In addition, lncRNAs can be used as biomarkers for cancer diagnosis, prognosis and classification because they have cell-type specificity better than that of most protein-coding genes and relatively stable local secondary structures, facilitating their detection in body fluids18–20. For instance, lncRNA PVT1 has been used as a diagnostic and prognostic biomarker for HCC (ref. 11). Here, 60 matched samples (primary tumour, PVTT and adjacent normal tissue) from 20 Chinese HCC patients were subjected to total RNA-seq (rRNA depleted), followed by integrative analysis at the genomic, transcriptomic and epigenomic levels, with the goal of identifying and characterizing deregulated lncRNAs in HCC patients. Approximately 76% of the lncRNAs identified in the samples were not annotated by the MiTranscriptome12 or GENCODE transcriptome13 databases. Next, approximately 1,000 lncRNAs that were recurrently deregulated in primary tumours and/or PVTTs were identified. Their expression levels were correlated with TCGA clinical data and additional published liver cancer data21. We also showed that one of the recurrently deregulated lncRNAs was suitable as a prognosis and metastasis biomarker in HCC patients. Furthermore, copy number variations (CNVs) and DNA methylation alterations were shown to be responsible for the aberrant expression patterns of 147 and 93 recurrently deregulated lncRNAs, respectively. Finally, a coding-noncoding co-expression network was used to predict candidate lncRNAs 2

related to metastasis, after which the predictions were validated experimentally. Results Identification of candidate lncRNAs in HCC clinical samples. To systematically identify lncRNAs related to HCC tumorigenesis and metastasis, approximately 9.6 billion reads for 60 samples from 20 HCC patients were sequenced using total RNA-seq (rRNA depleted) (Supplementary Data 1). Three matched samples were collected from each patient: primary tumour, adjacent normal tissue and PVTT. We first found that 95.1% of 13,870 lncRNAs (including 23,898 transcripts) annotated by GENCODE (V19) (ref. 13) were detected (FPKM40.5 for single-exon lncRNAs and FPKM40 for multi-exon lncRNAs) in our samples, indicating that our sequencing depth was good. Next, using these GENCODE lncRNAs as a reference annotation, we assembled 8,603 candidate lncRNAs (including 10,196 transcripts) (Supplementary Data 2) with a bioinformatics pipeline (Fig. 1a and Supplementary Table 1): (1) assembling RNA transcripts from RNA-seq reads; (2) filtering potential noise based on genomic location, length and expression level; (3) removing transcripts with coding potential, which was calculated by two computational tools, CPC (ref. 22) and COME (ref. 23) (see details in Methods). These candidate lncRNAs were designated as newly assembled lncRNAs. Only a small number of the newly assembled lncRNAs were reported in other studies. For example, 76% of them were not found in the MiTranscriptome database12, which was mainly derived from TCGA data (Fig. 1b). The newly assembled lncRNAs were also compared with two other representative annotation databases: a high quality set, RefSeq (ref. 14), and a comprehensive set, NONCODE (over 50,000 lncRNA transcripts included)24. Only 2% and 10% of the newly assembled lncRNAs were found in the RefSeq (Release 72) and NONCODE (V4) databases, respectively (Supplementary Fig. 1). These results indicate the depth of our sequencing data and expression specificity of the lncRNAs identified in our samples. We showed that the number of lncRNAs increased when the number of sequenced samples was increased (Supplementary Fig. 2), while the detection ability for protein-coding genes and GENCODE lncRNAs was saturated at approximately 10 and 20 out of 60 samples, respectively. Characterization of the candidate lncRNAs. We characterized genomic location, expression abundance, transcript length, conservation and SNP enrichment for the newly assembled lncRNAs (Fig. 1c–f, Supplementary Data 3 and 4). We first focused on the genomic locations of newly assembled lncRNAs (Supplementary Fig. 3A). The majority (74%) of the lncRNAs were located in intergenic regions; 16% were antisense to proteincoding genes, whereas 3% were located in the introns of proteincoding genes. Moreover, 1.39% and 24.94% of the lncRNAs overlapped with pseudogenes and transposable elements, respectively, whereas 2.24% and 5.06% of the lncRNAs contained local domains conserved with canonical ncRNAs (for example, rRNA, tRNA and snRNA) at the sequence and structure levels, respectively. Similarly, the GENCODE lncRNAs also overlapped with or included these elements (Supplementary Fig. 3b,c). Next, we characterized the basic features of the newly assembled lncRNAs and compared them with protein-coding genes and GENCODE lncRNAs. Because they had fewer putative exons, we found that the newly assembled lncRNA transcripts were shorter than protein-coding genes, but longer than GENCODE lncRNAs (Fig. 1c, Supplementary Fig. 4). This result

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

a

b Merged transcriptome

Tumour Normal

Mitranscriptome lncRNAs

Annotate by reference annotation PVTT

Newly assembled lncRNAs

Unannotated • Transcript length >200 • With strand information RNA

7,100

Multi-exonic

PE strand-specific total RNA-seq (rRNA removed)

62,200

Mono-exonic • Distance to neighbour annotated genes > 2 kb • FPKM > 0.5 in at least 1 samples 1,503

Alignment to hg19 with Tophat

1,415

• Coding potential prediction

• Assign gene type

Noncoding Ab initio transcript assembly with Cufflinks

Protein-coding genes (20345)

GencodeV19 annotated IncRNAs (13870)

Newly assembled IncRNAs (8603) GWAS SNP

d

0.75 0.50 0.25 0.00

5,000

4,000

3,000

2,000

1,000

0

0.0000

1.4

0.75 0.50

Exon Intron

0.00 –2

0

2

4

log10(max(FPKM))

0.0

0.4

Protein-coding gene

1.0

Ge

Newly assembled lncRNA

0.8

phastCons score

Length (nt) Gencode lncRNA

1.2

0.25

od eI nc RN wly A as Inc sem RN bl Pro e A d tei n-c od ing Wh ge ole ne tra ns cri pto me

0.0005

GWAS enrichment

Ne

0.0010

Conservation 1.00

Cumulative distribution

Cumulative distribution

0.0015 Density

Expression 1.00

0.0020

Random SNP

f

Odds ratio

Transcript length

e

nc

c

Figure 1 | Identification of candidate lncRNAs in 60 HCC samples. (a) Overview of the comprehensive experimental and computational scheme for the systematic identification of lncRNAs in samples from HCC patients. (b) Venn diagram showing the overlap between newly assembled lncRNAs and MiTranscriptome lncRNAs. Characterization of GENCODE lncRNAs, newly assembled lncRNAs and protein-coding genes: (c) transcript length distribution; (d) cumulative distribution curve of maximum gene expression level (RPKM); (e) conservation of exons and introns; (f) enrichment of GWAS SNPs (circle) over randomly selected SNPs (triangle).

indicates that the high sequencing depth of our analysis (Supplementary Data 1) enabled us to assemble long transcripts close to their full length, even though they were expressed at low levels (Fig. 1d, Supplementary Data 4 and 5). Notably, the newly assembled lncRNAs were less evolutionarily conserved in comparison with protein-coding genes and GENCODE lncRNAs, while exonic regions were more conserved in comparison with intronic regions (Fig. 1e). A previous study reported that almost 90% of SNPs are located in non-coding regions15. To investigate the relationship between lncRNAs and diseases, we capitalized on the GWAS SNP Catalog from NHGRI (ref. 25). We intersected the lncRNAs with GWAS SNPs from the NHGRI catalogue and randomly selected SNPs from the dbSNP (ref. 26). Interestingly, we found that GWAS SNPs were significantly enriched in the newly assembled lncRNAs in comparison with a random set (Fig. 1f). These data suggested that our newly assembled lncRNAs were likely to be functionally associated with human diseases. To further validate the activity of the newly assembled lncRNAs, we used published ChIP-seq data for the HepG2 cell line27 to depict activity markers around transcription start sites

(TSSs). Different epigenetic signatures (H3K4me3, H3K27ac, Pol II binding, DNase I hypersensitivity) indicated active transcription of the newly assembled lncRNAs in liver cancer cell lines (Supplementary Fig. 5). Peaks of these markers were found at the TSSs of lncRNAs, suggesting that the promoters of these transcripts are actively regulated in HepG2 cells. Recurrently deregulated lncRNAs in tumours and PVTTs. We used three statistical methods, GFOLD (ref. 28) followed by counting recurrences in multiple patients (Supplementary Figs 6–7), DESeq2 (ref. 29) and Wilcoxon signed-rank test to define lncRNAs (including both GENCODE lncRNAs and newly assembled lncRNAs) that were differentially expressed recurrently (Fig. 2a,b, Supplementary Data 6) (see details in Methods). For the comparison of primary tumours and adjacent normal tissues, we found that the results of DESeq2 (ref. 29) and Wilcoxon signed-rank test were more similar to each other than to the results of GFOLD (Fig. 2a), because the former two methods both treated the patients as replicates, whereas GFOLD

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

3

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

a

Downregulated GFOLD+ Recurrence DESeq2 48

156

309

525 15

525

Metastasis

Upregulated GFOLD+ Recurrence DESeq2

3

42

Downregulated GFOLD+ Recurrence DESeq2 1

58

2724

323 0 1619

0

Upregulated GFOLD+ Recurrence DESeq2 0

54

0

0

0 0

0

0

0

133

235

0

0

Wilcoxon

Wilcoxon

Wilcoxon

Wilcoxon

–1 0 1

Patient

–1 0 1

20 15 10 5 0

Downregulation Upregulation

D L 2R INC AS P1 0 1 R 1- 10 P1 7 1 1- 31F 8 16 5 6D .2 19 N .1 C ES H AS T IF C 1A 15 -A S TE 2 X4 XL O M 1 XL C_ EG O 05 3 XL C_ 53 O 01 55 C 4 _0 51 30 5 22 0

No. of patients H

N D C RR R N SN DE H LU G6 C AT PV 1 ZF T1 BC AS TR 1 N T 1 XL C ER O C C XL C_ AT O 05 1 XL C_ 65 O 05 73 C 0 _0 37 66 0 98 1

FE

Recurrent #

AN

f

20 15 10 5 0

20 10 0 L D C H B R J P K M E G N A O S F T Q

Normalized fold change (PVTT/tumour)

d

Recurrent #

Normalized fold change (tumour/normal)

20 10 0 A B C D E F G H I J K L M N O P Q R S T

No. of patients

Patient

c

e

b

Tumorigenesis

Figure 2 | Recurrently deregulated lncRNAs in primary tumours and PVTTs. Identification of recurrently deregulated lncRNAs: recurrently downregulated and upregulated lncRNAs that were predicted by three statistical methods to be associated with (a) tumorigenesis and (b) metastasis. Fold-change of expression in each individual patient for (c) recurrently deregulated tumorigenesis-associated lncRNAs; (d) recurrently deregulated metastasis-associated lncRNAs (tumour versus PVTT). Patient I was not included in the analyses related to metastasis because the PVTT sample of patient I was contaminated. Stacked bar charts showing examples of recurrently deregulated lncRNAs, including tumorigenesis-associated (e) and metastasisassociated (f) lncRNAs. The number on the y axis is the number of patients with differential expression of each lncRNA.

assessed differential expression based on individual patients. Finally, we identified 525 and 323 lncRNAs as recurrently downregulated or upregulated, respectively, in primary tumours by overlapping the predictions of all three methods (Fig. 2a,c). The identified lncRNAs were considered to represent a group of recurrently deregulated lncRNAs potentially associated with tumorigenesis. These lncRNAs and their P values, q-values and expression fold-changes are listed in Supplementary Data 6. For the comparison of PVTTs and primary tumours, we found that DESeq2 and Wilcoxon signed-rank test identified very few differentially expressed lncRNAs (Fig. 2b), because of the relatively high heterogeneity of PVTTs (Fig. 2d). Only one lncRNA (HAND2-AS1) was identified as a downregulated candidate in PVTTs. Paired primary tumours and PVTTs 4

from individual patients (average correlation coefficient: 0.99) were more similar than PVTTs from different patients (average correlation coefficient: 0.76). Therefore, we used GFOLD to evaluate differential expression by treating patients individually, followed by counting recurrences in multiple patients (see Methods), revealing 107 lncRNAs that were defined as recurrently deregulated lncRNAs potentially associated with metastasis (Fig. 2d, Supplementary Data 6). Notably, of the 107 metastasis-associated candidates, 38 were also recurrently deregulated in primary tumours in comparison with adjacent normal tissues (see details in Discussion). Examples of tumorigenesis-associated lncRNAs are shown in Fig. 2e. Some of the tumorigenesis-associated lncRNAs identified in this study have been reported by previous studies. For instance,

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

b

Tumorigenesis

TCGA LIHC

HCC

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

ES = 0.67 P < 0.001

Normal

HCC

Downregulated

ES = –0.80 P < 0.001

Normal

ES = –0.75 P < 0.001

Normal

c

Upregulated ES = 0.65 P = 0.006

HCC

HCC

0.0 –0.1 –0.2 –0.3 –0.4 –0.5 –0.6 –0.7

0.6 0.5 0.4 0.3 0.2 0.1 0.0

HCC

Enrichment score (ES)

Enrichment score (ES)

P < 0.001

0.0 –0.1 –0.2 –0.3 –0.4 –0.5 –0.6 –0.7 –0.8

Enrichment score (ES)

ES = 0.88

Normal Enrichment score (ES)

Enrichment score (ES)

Upregulated

Enrichment score (ES)

Zhang et al.

Enrichment score (ES)

Downregulated 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Metastasis

0.0

0.5 0.4

ES = 0.48

0.3

P = 0.005

0.2 0.1 0.0

Non-invasion

Invasion

ES = –0.43

–0.1

P = 0.068

–0.2 –0.3 –0.4

PVTT

HCC Enrichment score (ES)

a

PVTT

0.1 ES = –0.36

0.0

P = 0.237 –0.1 –0.2 –0.3

Non-invasion

Invasion

e

d RP11-166D19.1

***

***

1.0

Age (>=55 years) 1.00

Gender (male)

High expression

AFP (>40 ng ml–1)

0.6

Expression level

Survival probability

0.8

Serum albumin (6 s) Cirrhosis

0.2

Vascular invasion

Log rank P-value = 0.0037

0.0

0.75

0.50

0.25

Well differentiated 0.00

60

Moderately differentiated

Time (months)

S1

17

11

Low

51

12

3

2

Poorly differentiated

S2

S3

S1 vs. S2 P-value = 3.5×10–4 6 8 10

45

4

100

2

High

0.6 0.8 1

40

0.4

Number at risk

20

0.2

0

S2 vs. S3 P-value = 3.5×10–6

Odds ratio (low/high)

Figure 3 | Association of recurrently deregulated lncRNAs with TCGA clinical data and other published data. (a) Gene set enrichment analysis (GSEA) of recurrently deregulated tumorigenesis-associated lncRNAs based on TCGA LIHC data and published liver cancer data. lncRNAs were rank-ordered by differential expression between adjacent normal tissue and primary tumour samples. (b) GSEA of recurrently deregulated metastasis-associated lncRNAs. lncRNAs were rank-ordered by differential expression between primary tumours with and without vascular invasion in the TCGA LIHC data, as well as by differential expression between primary tumours and PVTTs in published liver cancer data. (c) Kaplan-Meier analysis of overall survival in the TCGA LIHC cohort. Subjects were stratified according to the expression of lncRNA RP11-166D19.1. The P value for Kaplan-Meier analysis was determined using log-rank test. (d) Multivariate analysis using additional clinical information. Forest plot depicting correlations between the indicated clinical criteria and the expression level of RP11-166D19.1. (e) Expression levels (TCGA data) of RP11-166D19.1 in three HCC subclasses (S1, S2 and S3). ***P valueo0.001, Wilcoxon rank-sum test.

PVT1 promotes cell proliferation, cell cycle progression and development of stem cell-like properties in HCC (ref. 11). In addition, several lncRNAs (for example, TEX41, XLOC_014515 and XLOC_030220) were recurrently upregulated in PVTT samples, whereas others (for example, HAND2-AS1, RP11-731F5.2 and XLOC_055355) were recurrently downregulated in PVTTs (Fig. 2f). For example, HIF1A-AS2, a lncRNA antisense to hypoxia-inducible factor 1-alpha, is overexpressed in gastric cancer cells and involved in gastric cancer development30. Notably, we identified some newly assembled lncRNA candidates potentially related to metastasis, including XLOC_014515 and XLOC_030220 (Supplementary Data 6). Association of deregulated lncRNAs with public clinical data. Based on Gene Set Enrichment Analysis (GSEA) (see Methods) of recurrently deregulated lncRNAs, we found that the expression

levels of the tumorigenesis- and metastasis-associated lncRNA sets were consistent with their expression levels in another published data set from 11 matched normal tissue samples, primary tumours and PVTTs (ref. 21) (Fig. 3a,b). In addition, we explored the expression levels of recurrently deregulated lncRNAs in a TCGA liver hepatocellular carcinoma (LIHC) cohort (see Methods). Tumorigenesis-associated lncRNAs were also aberrantly expressed between normal tissues and primary tumours. Because the TCGA LIHC cohort had no PVTT or metastatic tumour samples, we compared expression levels of metastasis-associated lncRNAs between primary tumours with and without invasion, revealing that deregulation of metastasis-associated lncRNAs was in accord with the clinical status of the TCGA patients (Fig. 3a,b). The consistent expression levels of recurrently deregulated lncRNAs in our samples and the TCGA LIHC cohort suggest that the identified lncRNAs could potentially serve as biomarkers. As an example, we explored a metastasis-associated lncRNA,

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

5

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

RP11-166D19.1 (ENSEMBL ID: ENSG00000255248, an isoform of MIR100HG), which was also significantly downregulated in 4 of 11 PVTTs in comparison with matched primary liver tumours in a previous study21. We found that RP11-166D19.1 could be used to clearly classify the patients in the TCGA cohort into two subclasses with different survival rates. RP11-166D19.1 expression was significantly associated with the overall survival time of HCC patients’ (log rank P value ¼ 0.0037) (Fig. 3c). Moreover, a multivariate analysis based on additional clinical information showed that the HCCs of the low RP11-166D19.1 expression subclass were globally more severe than those of the high-expression subclass: the low expression subclass was significantly more likely to have AFPZ40 ng ml  1, had a high risk of vascular invasion, and had a clear tendency to have serum albumino3.5 g dl  1 and prothrombin time46 s. The patients in the high-expression subclass were mostly well-differentiated (Fig. 3d). Three HCC subclasses (S1, S2 and S3) were identified in a previous HCC study31,32. Here, we clustered 20 primary tumours from our study, as well as 157 TCGA LIHC tumours and 11 HCC tumours from a previous study21, into three subclasses (Supplementary Fig. 9) based on the expression patterns of 619 signature genes31. We also identified lncRNAs that were significantly deregulated in each subclass (Supplementary Fig. 10, Supplementary Data 7). Interestingly, the putative metastasis biomarker RP11-166D19.1 was found to be significantly downregulated in subclass S2 in comparison with its expression level in subclasses S1 and S3 (Wilcoxon rank-sum test, P value ¼ 3.5  10  4 and P value ¼ 3.5  10  6, respectively) (Fig. 3e). DNA methylation alternations and CNVs of lncRNAs. We categorized the recurrently deregulated lncRNAs based on their correlations with CNVs and/or DNA methylation alterations assayed in our matched samples (Fig. 4a). We listed all recurrently deregulated lncRNAs (Supplementary Data 8) with expression patterns correlated with CNV data (that is, upregulated lncRNAs were found to be located in a DNA amplification region) and/or DNA methylation data (that is, downregulated lncRNAs had strong DNA methylation signals at their promoter regions). Several lncRNAs were recurrently upregulated in some patients and recurrently downregulated in other patients; such lncRNAs were designated as bimorphic lncRNAs (Fig. 4a). The CNVs of recurrently deregulated lncRNAs in HCC cells were analysed using CytoscanHD arrays. Based on a GISTIC analysis33, 4 significantly amplified genome regions and 70 significantly deleted genome regions were revealed in our samples (Fig. 4b). To characterize candidate CNV-driven lncRNAs, we mapped recurrently deregulated lncRNAs to amplified and deleted genome regions. In total, 147 recurrently deregulated lncRNAs were identified in deleted regions, whereas none were found in amplified regions (Fig. 4a, Supplementary Data 8). For example, FENDRR, which was reported to be a prognostic biomarker in gastric cancer34, had a pattern of decreased expression driven by copy number deletion (Fig. 4b). Based on DNA methylation microarrays of 60 matched samples, we analysed DNA methylation patterns to identify recurrently deregulated lncRNAs that were affected by alterations in DNA methylation. We applied several separate filtering criteria (see Methods) to define recurrently deregulated lncRNAs driven by alterations in DNA methylation. In total, 93 (10.1%) recurrently deregulated lncRNAs had significant correlations between DNA methylation and expression levels (Fig. 4a,c, Supplementary Data 8), suggesting that their expression levels in tumour and/or PVTT samples were probably regulated by 6

DNA methylation. As an example, we showed that the expression level of a recurrently deregulated lncRNA, HAND2-AS1, was inversely correlated with the DNA methylation level at its promoter region (R2 ¼ 0.16694) (Fig. 4d). The promoter region of HAND2-AS1 was hypermethylated in primary tumours and PVTT samples. Inference of lncRNA function using a co-expression network. To predict the potential functional and regulatory mechanisms of lncRNAs with respect to the molecular aetiology of HCC, we constructed a co-expression network35 of protein-coding genes and lncRNAs (see Methods). The resulting co-expression network consisted of 7,367 protein-coding genes, 6,377 GENCODE lncRNAs and 5,612 newly assembled lncRNAs. There were 1,286 recurrently deregulated lncRNAs in the network. All protein-coding genes and lncRNAs were grouped into 43 clusters, each of which had at least 100 highly interconnected genes (Supplementary Data 9). In addition to interaction edges within a cluster, there were also 337,609 edges between nodes in different clusters, which could indicate their functional relatedness or regulatory relationships (Fig. 5a, Supplementary Fig. 11). Among the 43 clusters, we found four clusters containing protein-coding genes with interesting functions: clusters 4, 9, 18 and 25 (Fig. 5b). The recurrently deregulated lncRNAs are highly enriched in these gene clusters (Fisher’s exact test, all P valueso0.01). For example, Gene Ontology and KEGG pathway enrichment analyses suggest that the protein-coding genes in cluster 4 are mostly associated with metabolic processes in the liver, such as organic acid metabolism and degradation of fatty acids. The protein-coding genes in cluster 9 mainly function in cell cycle processes such as DNA repair, DNA replication and DNA metabolism, which influence cell migration36. Cluster 18 is enriched with immune response genes involved in the T-cell and B-cell receptor signalling pathways, consistent with reports that immune and inflammatory responses play decisive roles in tumour development by influencing the processes of invasion and migration37. Another intriguing example is cluster 25 (Fig. 5c), which includes protein-coding genes enriched in functional terms related to metastasis, such as cell adhesion and the TGF-b signalling pathway, which have been shown to play essential roles in diverse processes, including cell proliferation, differentiation, motility and adhesion38. Furthermore, many HCC-related driver genes39 are also found in cluster 25. For example, the FLT3 receptor plays a role during the late stages of liver regeneration and is involved in GPCR signalling pathways40. FLT3 was co-expressed with other driver genes in the sub-network, including WDFY4 and FAT4. Moreover, three recurrently deregulated lncRNAs (HAND2-AS1, AC096579.7 and FENDRR) were strongly correlated with FLT3 at the expression level, suggesting that these lncRNAs have functions related to that of FLT3. Another interesting gene in cluster 25, FAT4, encodes a cadherin (a calcium-dependent cell adhesion protein) that serves as a tumour-suppressor gene41. FAT4 was closely associated with some recurrently deregulated lncRNAs, including HAND2-AS1, FENDRR and WDFY3-AS2, all of which were differentially expressed during cell migration. Cell adhesion was one of the most significantly enriched processes during tumour metastasis. These co-expression relationships provide functional evidence demonstrating that adhesion-related lncRNAs likely have roles in tumour metastasis. Furthermore, driver gene ZFPM2 was also highly involved in the sub-network; it was significantly correlated with seven driver genes and five recurrently deregulated lncRNAs. Some migration-related

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

Tumorigenesis

Metastasis

Upregulated

55

0

55

Downregulated

174

147

32

Upregulated

1



1

Downregulated

7



7

Bimorphic All associated

2



2

235

147

93

G score 0.1 0.15 0.4

CNV

G score 0.4

Total

DNA methylation

0.68

Recurrently deregulated lncRNAs

0.8

b

0.8

a

1

Amplification

Deletion 2

3

HAND2-AS1 4

5 6 7

c

8

Tumorigenesis Upregulated

9

Downregulated

10

HAND2-AS1

12

17

Metastasis Downregulated 20

15

15

10 5

MEG3 H19

10 5

DNA methylation: HAND2-AS1 10 8 HAND2-AS1 MEG3

10–30

d 0.8

2

R = 0.16694 Beta

20 Recurrence

Recurrence

Upregulated

q - value

q - value

–1.00 –0.75 –0.50 –0.25 0.00 pcc

FPKM

–1.00 –0.75 –0.50 –0.25 0.00 pcc

10

0

0

0.25

5

10–7 10–10

18 19 20 21 22

0.25

5

FENDRR

16

10

–1

10

14 15

10–2

LUCAT1

13

15

10–3

Recurrence

Recurrence

CRNDE

15

RP11-166D19

11

10–4

20

10–2

20

6 4

0.6 0.4

2 0

0

0.2

0

–1.00 –0.75 –0.50 –0.25 0.00

–1.00 –0.75 –0.50 –0.25 0.00

pcc

pcc

0

0.25

0.5

0.75

1

Normal Tumour PVTT

Beta value

Figure 4 | Regulatory mechanisms for recurrently deregulated lncRNAs. (a) Summary of the regulatory mechanisms of recurrently deregulated lncRNAs. The numbers of lncRNAs associated with CNV and/or DNA methylation data are listed. Bimorphic lncRNAs were those that were recurrently upregulated in some patients and recurrently downregulated in other patients. The total number of lncRNAs is not equal to the sum of each type because of overlapping sub-types. (b) Chromosomal view of amplification and deletion peaks between primary tumours and normal tissue. The G-scores (top) and FDR q-values (bottom) of peaks were calculated using GISTIC2.0. The G-score considered the amplitude of the aberration and its frequency of occurrence across all samples. The q-value was calculated for the observed gain/loss at each locus using randomly permuted events as a control. Examples of recurrently deregulated lncRNAs located in the peaks (only found in deletions) are labelled. (c) Scatterplots showing recurrently deregulated lncRNAs (colour labelled) that were putatively affected by alterations in DNA methylation. Recurrently deregulated lncRNAs driven by DNA methylation had expression levels that were inversely correlated with DNA methylation levels at their promoter regions (PCC, Pearson correlation coefficientso  0.3, x axis). (d) Example of a recurrently deregulated lncRNA driven by DNA methylation; the HAND2-AS1 expression level (FPKM) was inversely correlated with its promoter methylation level (beta value). Boxplot showing that the beta values of primary tumour samples were significantly higher than those of normal tissue samples, but slightly lower than those of PVTT samples.

recurrently deregulated lncRNAs (HAND2-AS1, AC096579.7 and FENDRR) had expression patterns similar to that of PTPRB, which plays an important role in blood vessel remodelling and angiogenesis42, indicating that these lncRNAs could have related functions. Functional assay for metastasis-related lncRNA candidates. Transwell migration assays were used to assess whether putative candidate lncRNAs might function in the progression of HCC. We first selected ten lncRNA candidates that were associated with cell adhesion based on the co-expression network described above (Supplementary Table 2), in which four lncRNAs (WDFY3-AS2, HAND2-AS1, RP11-166D19.1 and XLOC_055355)

were also significantly co-expressed with genes in the TGF-b signalling pathway. Note that seven candidate lncRNAs (shown in Fig. 6a) were recurrently deregulated lncRNAs in PVTT samples and designated as metastasis-associated lncRNAs in the sections above. Three siRNAs were synthesized for each candidate lncRNA and mixed as a pool (Supplementary Table 3). Three liver cancer cell lines, HepG2, SMMC-7721 and HCCLM9, were used to conduct loss-of-function RNAi assays (Fig. 6a). Remarkably, knockdown of seven of the ten candidate lncRNAs significantly affected cell migration in at least one cell line; knockdown of three lncRNAs produced accordant alterations in cell migration in at least two cell lines by suppressing or promoting cell migration (Fig. 6a, Supplementary Table 3). RNAi knockdown efficiency was confirmed using

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

7

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

a

b

4

7 9

1

40 18 39 36 35 25

27 26

Protein-coding gene

GENCODE lncRNA

Newly assembled lncRNA

Recurrently deregulated lncRNA

Driver gene

c WBSCR17

PTPRN2

LINC01018 WDFY4 ITGAD JAK3

HAND2-AS1 FLT3 AC096579.7

CNRIP1 PTPRB

Cell cycle DNA metabolic process DNA conformation change Alcoholism DNA replication p53 signalling pathway Carboxylic acid metabolic process Organic acid metabolic process Organic acid catabolic process Fatty acid degradation Chemical carcinogenesis Retinol metabolism

RP11-166D19.1

DMGDH ZFPM2

4

Cell adhesion Extracellular matrix organization Cell adhesion molecules (CAMs) TGF-β signalling pathway

21 22

9

Immune system process Lymphocyte activation Regulation of immune system process T-cell receptor signalling pathway NF-kappa B signalling pathway B-cell receptor signalling pathway

17

19

18

Cluster ID

11

25

2

JAK2

FENDRR

–log10 (P-value)

WDFY3-AS2

GO enrichment

1

5

10

15

20

25

30

KAZN FAT4

DSCAM

Pathway enrichment

Recurrently deregulated lncRNA (tumorigenesis) Recurrently deregulated lncRNA (tumorigenesis & metastasis) Driver gene

Experimentally validated

Figure 5 | Inference of potential functions for recurrently deregulated lncRNAs using a co-expression network. (a) Network representation of 18 selected inter-connected clusters in the coding-non-coding co-expression network. (b) GO and KEGG pathway enrichment for four selected clusters (4, 9, 18 and 25). Heatmap showing enrichment scores (  log10(P value)) for GO terms and KEGG pathways in four selected clusters. The most significantly enriched GO terms and KEGG pathways are displayed. (c) Sub-network showing important genes/lncRNAs in cluster 25. The subnetwork depicts the relationships among four lncRNAs and liver cancer-related driver genes.

qRT-PCR (Fig. 6b, Supplementary Fig. 12, Supplementary Table 4). The results from three representative transwell assays are shown on the right side of the figure (Fig. 6c,d). In order to document the reproducibility of the results, we repeated the RNAi knockdown and transwell migration assays using three siRNAs separately in all three cell lines (Supplementary Fig. 13A–B and Supplementary Table 5); the results of these experiments were consistent with those of the mixed siRNA experiments. Moreover, to avoid mistaking differences in cell proliferation for differences in cell migration, we performed CCK8 cell proliferation assays, revealing that knockdown of these lncRNAs hardly affected cell proliferation (Supplementary Fig. 13C, Supplementary Table 6). For some lncRNA knockdown experiments, changes in migration ability were consistent with the deregulation patterns of HCC patients. For example, RP11-166D19.1 was recurrently downregulated in PVTT samples from four patients. The loss-offunction assay showed that knockdown of RP11-166D19.1 enhanced the migration ability of HCC cells. However, some other lncRNAs, such as HAND2-AS1, demonstrated an inconsistent trend between deregulation patterns in HCC patients and experimentally validated functions in cancer cell lines; silencing of HAND2-AS1 suppressed cell migration, although it 8

was downregulated in 8 of 20 patients’ PVTT samples. It has been reported that nearly half of HCC cell lines do not resemble primary tumours43, so the intrinsic differences between cancer cells lines and clinical samples might explain the discrepancies between the samples’ gene expression patterns and experimentally validated functions in cell lines. Overall, the high validation rate of the candidate lncRNAs showed that the co-expression network, based on previous knowledge of signalling pathways and supplemented by recurrent aberrant expression patterns in matched clinical samples, identified candidate lncRNAs that potentially play functional roles in the sophisticated regulation of cancer development and progression. Discussion Based on an analysis of genomic, epigenomic and transcriptomic data of HCC primary tumours and PVTTs, this study reports several findings. First, based on high-throughput sequencing technology and bioinformatics analysis of 60 matched samples, including primary tumours, PVTTs and adjacent normal tissue, we discovered and characterized an expanded landscape of lncRNAs. Using PVTTs from Chinese HCC patients and deep sequencing data enabled us to detect many candidate lncRNA

NATURE COMMUNICATIONS | 8:14421 | DOI: 10.1038/ncomms14421 | www.nature.com/naturecommunications

ARTICLE

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14421

a

b

HepG2

SMMC-7 HCCLM9 721

*

c

lncRNAs

HepG2

SMMC7721

HCCLM9

***

WDFY3-AS2

0.52

0.48

0.20

**

RP1-153P14.5

0.67

0.66

0.20

AC096579.7

0.70

0.93

0.99

HAND2-AS1

0.72

0.98

0.68

LINC01018

0.20

0.81

0.53

RP11-731F5.2

0.72

0.97

0.99

*

RP11-166D19.1

0.61

0.60

0.88

**

XLOC_014515

0.41

0.83

0.52

**

XLOC_015969

0.05

0.80

0.63

**

XLOC_055355

0.85

0.99

0.53

**

**

*

RP11-166D19.1 (HepG2)

HAND2-AS1 (SMMC-7721)

XLOC_015969 (HCCLM9)

1.4

0

1.6

RP11-166D19.1

100

HAND2-AS1

0

lncRNA siRNA

1.2

***P-value

Recurrently deregulated lncRNAs in hepatocellular carcinoma.

Hepatocellular carcinoma (HCC) cells often invade the portal venous system and subsequently develop into portal vein tumour thrombosis (PVTT). Long no...
2MB Sizes 1 Downloads 15 Views