Animal/Plant De novo Sequencing

Service Overview

De novo Sequencing refers to sequencing of a novel genome in the absence of a reference sequence available for alignment. It is typically accomplished sequencing, reads assembly, genome annotation, functional annotation and advanced bioinformatic analysis such as comparative genomics analysis, etc. Third-generation sequencing (TGS) technologies, including PacBio and Nanopore, have largely extended their applications in animal/plant genome de novo sequencing since 2015, attribute to their remarkable read length. They have shown superior performance in de novo genome sequencing in various species covering research in agriculture, forestry, fishery, medicine, marine biology , etc.
1-35
Figure 1. Summary of read length, accuracy and genome continuity of different technical platforms

Third-Generation Sequencing Technology

PacBio sequencing
On PacBio sequencing platform, sequences are captured during the synthesis of the other strand of DNA template. Targeted double-strand DNA is ligated to hairpin adapters on both side forming a circular single-strand DNA, so called SMRT bell. Each SMRT bell is immobilized on the active polymerase binding site within a single ZMW, where fluorescent signals are collected. These fluorescent signals are further transferred into base signals, which form the sequences of the template. More attractively, CCS sequencing mode is introduced by PacBio to generate highly accurate HiFi reads with average length of 15 kb, which greatly enhanced the accuracy of genome assembly.
2-20
Figure 2. Principle of PacBio Sequencing
Nanopore sequencing
Nanopore sequencing distinguishes itself from other sequencing platforms, in that the nucleotides are read directly without DNA synthesis process. As a single strand DNA passes through a nano-sized protein pore (nanopore), different nucleotides generate different ionic current, which can be captured and transferred into sequence of bases. ONT sequencing platform itself doesn't show apparent technical limit on the length of DNA reading. Therefore, Ultra-long reads (ULRs) are available for genome assembly of high quality.
nanopore
Figure 3. Principle of Nanopore Sequencing

Bioinformatic Analysis

Analysis

Contents

Genome Assembly

Genome assembly and Assembly evaluation

Gene prediction and annotation

Protein coding gene prediction, Repetitive sequence annotation and transposon classification, Non-coding RNA annotation, Pseudogene annotation

Hi-C based genome assembly

Efficient data evaluation, Contigs clustering, Contigs ordering and orientation, post-assembly evaluation 








Biological analysis









Comparative genome analysis

Gene family clustering

Phylogenetic tree construction

Gene family expansion and contraction analysis

Species differentiation timeline 

LTR formation dating

Whole genome duplication event analysis

Selective pressure analysis

Specific biological queries

Customized analysis basing on specific research goal

Results Demo

6-18
Genome collinearity analysis

Project Workflow

Sample delivery
Library construction and sequencing
Data analysis
Final report
After-sale technical support

Service Advantages

Biomarker Technologies is one of the leading providers of genome assembly services with more than 10 years experience in genome sequencing. We are equipped with the latest Third-generation sequencing platforms, including all versions of PacBio and Nanopore platforms, and massive genome assembly experience on these platforms. We own diverse experience in complex genome and polyploid genome. Joint analysis of TGS genome sequencing and our outstanding Hi-C based genome assembly service achieves genome assembly of much higher quality. Biomarker Technologies aims at providing the best and most cutting-edge genomic services. We own more than 30 experimental patents and 150 bioinformatic software copyrights.

Service Performance

PacBio platforms

Nanopore platforms

Species

Genome size

Contig N50

Species

Genome size

Contig N50

Animal

1.2 Gb

8.1 Mb

Animal

1.15 Gb

13.17 Mb

Animal

1.06 Gb

6.15 Mb

Animal

475 Mb

22.46 Mb

Animal

384 Mb

5.79 Mb

Animal

998 Mb

11.04 Mb

Animal

1.05 Gb

5.22 Mb

Animal

710 Mb

9.67 Mb

Animal

1.06 Gb

4.17 Mb

Animal

510 Mb

9.56 Mb

Animal

695.77 Mb

3.49 Mb

Animal

988 Mb

9.22 Mb

Animal

1.2 Gb 

3.2 Mb

Animal

249 Mb

7.64 Mb

Animal

559.30 Mb

3.09 Mb

Animal

338 Mb

6.79 Mb

Plant

429.31 Mb

19.8 Mb

Plant

2.5 Gb

11.16 Mb

Plant

393.35 Mb

14.80 Mb

Plant

2.3 Gb

17.97 Mb

Plant

277.06 Mb

8.75 Mb

Plant

715 Mb

7.08 Mb

Plant

396.00 Mb

7.04 Mb

Plant

806 Mb

12.23 Mb

Plant

530.78 Mb

6.9 Mb

Plant

1.5 Gb

11.58 Mb

Plant

393.9 Mb

2.93 Mb

Plant

368 Mb

10.75 Mb

Plant

382.9 Mb

2.78 Mb

Plant

640 Mb

14.51 Mb

Plant

823 Mb

2.76 Mb

Plant

800 Mb

27.64 Mb

Plant

2.22 Gb

2.3 Mb

Plant

780 Mb

25.29 Mb

Successful cases

Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits

Journal: Nature Genetics
IF: 27.125
Published: May. 2018

Background

Cotton is one of the world’s most important commercial crops and is also a valuable resource for studying plant polyploidization. The ancestors of Gossypium arboreum and Gossypium herbaceum provided the A subgenome for the modern cultivated allotetraploid cotton. In this research, G. arboreum genome assembly was upgraded by a joint technology of PacBio sequencing and Hi-C genome assembly. 243 G. arboreum and G. herbaceum were sequenced to study population structure and gene differentiation. Several genes with potential functions in yield were identified.

Results

1. G. arboreum genome assembly
145.54 Gb sequencing data was generated by PacBio platform and assembled to 1.71 Gb G.arboreum genome with a contig N50 of 1.1. Mb. The longest contig in the assembly was 12.37 Mb. 1573 Mb Hi-C data was aligned to chromosomes . Compared to previously published genome, the new assembled genome was found to have a significantly lower number of incongruities outside of the excepted diagonal.
1594277608773
Hi-C interaction heatmap of two G.arboreum genome versions
2. Population genetic analysis on diploid cotton

230 G. arboreum and 13 G. herbaceum were subjected to resequencing. The data was processed for genome mapping (against updated genome), phylogenetic tree analysis, population structure analysis, PCA, LD, etc. G. herbaceum and G. arboreum were clustered in two independent clades after branching from G. raimondii. The G. arboreum clade could be divided into SC(South China), YZR(Yangtze River), and YER(Yellow River) groups that exhibited strong geographical distribution patterns. These two species were independently domesticated from different wild progenitors.
10-1
Diploid cotton population evolution and population structure analysis
3. G. arboreum GWAS analysis

11 important traits from varied environments were subjected to GWAS analysis. 98 significant association signals were identified. Among them, Non-synonoymous substitution on GaKASIII locus were found to affect content of palmitic acid (C16:0) and palmitoleic acid (C16:1) in seed. The activation of GaGSTE9 expression was found to be related to FOC resistance.
11-2-611x414
GWAS and QTL analysis on G. arboreum genomes
Conclusion

With help of PacBio and Hi-C technology, an updated version of G. arboreum genome was assembled, which improved the contig N50 from 72 Kb to 1.1 Mb. Based on the new genome and population genetic studies, G. arboreum and G. herbaceum were found to be equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. GWAS and QTL analysis identified significant associated gene related to important traits, such as fatty acid content, FOV resistance, etc.