Nanopore Full-length transcriptome overview

What’s new of full-length transcriptome ?

Traditional 2nd generation transcriptome sequencing can only analyze the regulatory mechanism at the gene level to find the key genes related to traits. Genes can produce multiple transcripts at certain time or situation, the richness and complexity of the transcripts are the direct cause of protein diversity, which will eventually lead to a variety of phenotypes. The full-length transciptome sequencing on 3rd generation platform does not need to break mRNA randomly, the transcript can be sequenced from 5' end to 3' UTR region at once. 3rd generation full-length transcriptome sequencing can tell the complex transcription in organisms, it can reveal the real structure of sequences during transcription, such as alternative splicing, APA, fusion genes, etc.

Products advantages

High cost performance, high throughput, accurate quantification at transcriptome level, low efficiency of multiple alignment

No need to break sequences and gene structure, alternative splicing, fusion gene and other structural characteristics can be identified accurately

Long sequencing reads, no GC specificity amd bases bias


Applications

Quantitative analysis

Identification of differential expressed genes (DEGs) and differential expressed transcripts (DETs), functional annotation analysis, in-depth exploration of the regulatory mechanism of functional genes and key pathways

Gene structure identification

Alternative splicing, non-coding RNA, gene family, evolution relationship

Genome annotation quality promotion

Novel genes, gene structure of new alternative splicesome


Practical data display

Data quality

Sequencing amount is around 2 Gb~20 Gb, the N50 length is around 1500 bp, the mean sequencing length is 1~2 kb and the mean Q score is above Q10.

Species  Sample Number Reads Number Base Number N50 Average Length  Max Length Data Quality control
Fruit A 6 42,979,789  49,617,831,241  1,300  1,154  15,076  Q11
Fruit B 6 19,571,990  25,390,549,545  1,502  1,297  15,082  Q10
Plant A 30 93,944,045  110,848,053,132  1,337  1,179  19,507  Q10
Plant B 15 44,244,844  52,032,652,883  1,331  1,176  25,911  Q11
Plant C 18 45,994,374  52,309,928,186  1,270  1,137  10,662  Q10
Plant D 15 45,879,629  53,506,402,721  1,334  1,166  26,920  Q10
Plnat E 12 90,318,442  90,706,268,174  1,064  1,004  26,092  Q10
Crop A 10 80,525,164  73,467,910,938  957  912  13,807  Q11
Crop B 14 186,584,485  179,690,872,324  1,022  963  35,167  Q10
Crop C 3 10,152,254  9,155,820,053  963  901 10,877  Q11
Flower A 18 116,192,367  133,000,682,020  1,281  1,144  13,901  Q11
Fungi A 12 61,210,673  69,529,988,582  1,348  1,135  33,704  Q10
Insect A 6 53,463,044  58,206,229,677  1,321  1,088  31,707  Q10
Insect B 16 58,736,505  50,047,053,412  914 852  19,166  Q10
Insect C 6 21,628,473  22,693,316,737  1,142  1,049  20,020  Q11
Insect D 21 69,249,084  60,237,451,959  917  869  21,620  Q10
Fish A 12 34,762,990  38,069,664,114  1,345  1,095  88,860  Q10
Fish B 6 34,756,817  43,805,602,957  1,502  1,260  14,249  Q10
Aquatic Animal  18 193,466,111  227,023,309,557  1,342  1,173  46,742  Q10
Aquatic Plant  1 29,081,500  31,673,918,600  1,202  1,089  13,691  Q10

True cases of Nanopore full-length transcriptome data

Quantitative analysis

Data saturation: Compared to NGS, Nanopore technology needs less data amount to cover the same amount transcripts.



Accurate quantification, low GC bias, low multiple alignment, differential expressed genes (DEG) and differential expressed transcripts (DET) can be handled at once


2 Gb Nanopore data and 6 Gb Illumina data have almost the same count of detected genes, the shared identified differential expressed genes have the same up & down-regulation relationships

Background Data amount Common DEGs  Consensus up-regulated genes Consensus down-regulated genes
Plant with 27,628 genes and 48,332 transcipts 2 Gb 628 174 454
3 Gb 690 189 501
4 Gb 710 208 502
6 Gb 787 214 573

DEGs identification on ONT and Illumina platforms with the same amount of data

Qualitative analysis

Transcripts number and species identification

Species Full-length Rate Mapping Rate Known Transcripts New Transcripts Known Genes New Transcripts for
Known Genes
New Genes Transcripts for
new genes
Fruit A 79.14% 99.42% 10,381 51,555 21,209 44,831 3,602 6,724
Fruit B 70.56% 96.63% 8,966 16,036 9,680 14,655 867  1,381
Plant A 75.58% 99.32% 1,643 47,563 14,637 42,282 2,727 5,281
Plant B 80.65% 99.78% 2,421 36,454 11,965 34,315 1,337 2,139
Plant C 78.74% 99.33% 7,263 24,261 10,997 21,024 1,711 3,237
Plant D 81.16% 96.19% 2,663 53,852 20,048 44,644 5,146 9,208
Plnat E 79.59% 99.58% 8,794 61,213 23,678 49,674 6,433 11,539
Crop A 76.75% 95.83% 12,290 31,910 12,886 23,422 3,554 8,488
Crop B 81.26% 99.39% 23,495 41,476 16,915 31,604 4,858 9,872
Crop C 80.49% 99.08% 10,322 17,865 13,642 15,158 1,667 2,707
Flower A 83.35% 99.14% 15,930 65,789 21,278 60,908 3,085 4,881
Fungi A 80.70% 99.78% 2,134 42,282 8,475 31,556 5,122 10,726
Insect A 77.38% 99.63% 4,934 29,405 6,444 25,993 1,780 3,412
Insect B 84.30% 98.35% 1,257 17,607 6,042 14,120 2,527 3,487
Insect C 82.95% 98.95% 1,396 13,321 4,573 10,619 1,882 2,702
Insect D 83.61% 92.25% 6,350 18,506 8,292 15,173 2,326 3,333
Fish A 79.25% 99.07% 1,564 39,707 16,915 35,318 2,902 4,389
Fish B 79.73% 98.30% 6,756 56,244 14,266 48,977 4,472 7,267
Aquatic Animal 78.71% 98.83% 15,361 59,260 12,581 50,271 5,454 8,989
Aquatic Plant 81.28% 99.40% 1,488 61,196 18,014 50,564 5,417 10,632

Full-length transcripts identification of different species

Gene structure identification


Comparison of transcriptome data between Nanopore and Pacbio sequencing platforms

Background: A Plant with 27,628 genes and 48,332 transcripts
Sequencing
Platform
Data amount Identified full-length sequences Redundant-removed transcripts Known full-length transcripts Novel full-length transcripts Identified genes Identified
known genes
Identified
novel genes
Pacbio 27.71 Gb 411,155 15,263 13,184 2,439 11,285 11,119 166 
Nanopore 2.34 Gb 1,811,431 15,515 13,417 2,098 11,513 11,278 235 
Comparison of full-length transcripts on ONT and PB platforms

Cases

Analysis of classic cases

Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D. (Nucleic Acids Research, 2018, IF=11.561)

1 Use Nanopore to do full-length transcriptome sequencing for Saccharomyces cerevisiae

2 ~509 MB (59X) data were obtained from yeast under glucose condition. Total ~623 MB (72X) data were obtained from yeast under alcohol condition. MinION sequencing depth 64X, Illumina sequencing depth 118X. Mean coverage is very similar between the two platforms.

3 Differential expression and functional enrichment analysis show that the up-regulated genes in glucose culture were enriched to terms related to transcription and translation processes, which was consistent with the phenotype of faster growth in glucose culture. Under the condition of ethanol culture, the up-regulated genes were mainly enriched in TCA cycle, glyoxylic acid pathway and mitochondrial electron transport.

Figure 3. Summary of the direct RNA sequencing data. (A) The histogram plot shows the distribution of read length of high quality reads obtained from yeast cell growth ethanol (magenta) and glucose (cyan), respectively, with the distribution of expected transcript lengths derived fromthe ORFs annotation. (B) Bar plots of the detected highly expressed transcripts are presented as an average normalized count with standard error over four biological replicates for each growth condition. The constitutively expressed, highly expressed in ethanol growth and highly expressed in glucose growth are illustrated in the left middle and right box, respectively. (C) The bubble scatter plots show the relationship between the fraction of detected full-length transcripts by the direct RNA sequencing with the transcript length and the level transcript expression. The violin-boxplots on the right show the overall distribution of the fraction of detected full-length transcripts.

BMKcloud Supported

BMKCloud developed by Biomarker Co. Ltd. and is an open cloud platform for big biological data analysis. It has the largest professional user and developer user groups in China. It provides users with comprehensive bioinformatics analysis including bioinformatics analysis platform, computing resources, public data, information analysis training, social platform, and how to integrate and utilize public data.

Flow chart for Genome-guided Full-length Transcriptome Analysis Platform (GFTAP)


Easy to use

* Online  graphical operation and  no  need to know Linux/programming language

* Tasks can be delivered in 1 minute and be delivered in anywhere with Internet

* A single flow for all intergrated analysis

* Multi-groups-analysis can be delivered in a single submission

* Reference genomes be updated weekly and personalized references are surported


Fast and efficient

* 6 samples be analyzed in 24 hours (gold account)

* 18 main analysis items were integrated in a single flow

* Both NGS and Nanopore fastq data are supported

* DEG, PCA, WGCNA and more analysis items were integrated in a single flow

* Both private and public data are supported


Friendly to personalized demands

* Personalized parameters are supported in tasks submission

* Majority (90%+) of the parameters are open after reports been generated

* Continuous updating and results can be updated after APP been updated


Reference

1) Bayega A, Oikonomopoulos S, Zorbas E, et al. Transcriptome landscape of the developing olive fruit fly embryo delineated by Oxford Nanopore long-read RNA-Seq[J]. bioRxiv, 2018.

2) Benetta E D, Antoshechkin I, Yang T, et al. Genome Elimination Mediated by Gene Expression from a Selfish Chromosome[J]. bioRxiv, 2019.

3) Byrne A, Supple M A, Volden R, et al. Depletion of Hemoglobin Transcripts and Long-Read Sequencing Improves the Transcriptome Annotation of the Polar Bear (Ursus maritimus)[J]. Frontiers in Genetics, 2019.

4) Chuang T, Chen Y, Chen C, et al. Integrative transcriptome sequencing reveals extensive alternative trans-splicing and cis-backsplicing in human cells.[J]. Nucleic Acids Research, 2018, 46(7): 3671-3691.

5) Cruzgarcia L, Obrien G, Sipos B, et al. Generation of a Transcriptional Radiation Exposure Signature in Human Blood Using Long-Read Nanopore Sequencing[J]. Radiation Research, 2019, 193(2).

6) Fleming M B, Patterson E L, Reeves P A, et al. Exploring the fate of mRNA in aging seeds: protection, destruction, or slow decay?[J]. Journal of Experimental Botany, 2018, 69(18): 4309-4321.

7) Garalde D R, Snell E A, Jachimowicz D, et al. Highly parallel direct RNA sequencing on an array of nanopores[J]. Nature Methods, 2018, 15(3): 201-206.

8) Grunberger F, Knuppel R, Juttner M, et al. Nanopore-based native RNA sequencing provides insights into prokaryotic transcription, operon structures, rRNA maturation and modifications[J]. bioRxiv, 2019.

9) Gupta I, Collier P G, Haase B, et al. Single-cell isoform RNA sequencing (ScISOr-Seq) across thousands of cells reveals isoforms of cerebellar cell types.[J]. bioRxiv, 2018.

10) Hardwick S A, Bassett S D, Kaczorowski D C, et al. Targeted, High-Resolution RNA Sequencing of Non-coding Genomic Regions Associated With Neuropsychiatric Functions[J]. Frontiers in Genetics, 2019.

11) Lea W A, Parnell S C, Wallace D P, et al. Human-Specific Abnormal Alternative Splicing of Wild-Type PKD1 Induces Premature Termination of Polycystin-1[J]. Journal of The American Society of Nephrology, 2018, 29(10): 2482-2492.

12) Li R, Ren X, Ding Q, et al. Direct full-length RNA sequencing reveals unexpected transcriptome complexity during Caenorhabditis elegans development[J]. Genome Research, 2020, 30(2): 287-298.

13) Ono H, Yoshida M. Direct RNA sequencing approach to compare non-model mitochondrial transcriptomes: an application to a cephalopod host and its mesozoan parasite.[J]. Methods, 2020.

14) Panda K, Slotkin R K. Long-Read cDNA Sequencing Enables a 'Gene-Like' Transcript Annotation of Arabidopsis Transposable Elements[J]. bioRxiv, 2020.

15) Parker M T, Knop K, Sherwood A, et al. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification[J]. eLife, 2020: 1-35

16) Piroon J , Thidathip W , Rui P , et al. Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN. PK113-7D[J]. Nucleic Acids Research(7):7.

17) Roach N P, Sadowski N, Alessi A F, et al. The full-length transcriptome of C. elegans using direct RNA sequencing[J]. bioRxiv, 2019.

18) Sessegolo C, Cruaud C, Silva C D, et al. Transcriptome profiling of mouse samples using nanopore sequencing of cDNA and RNA molecules[J]. Scientific Reports, 2019, 9(1).

19) Tang A D, Soulette C M, Van Baren M J, et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns[J]. bioRxiv, 2018.

20) Workman R E, Tang A D, Tang P S, et al. Nanopore native RNA sequencing of a human poly(A) transcriptome[J]. bioRxiv, 2018.