Pan-genome Sequencing

Differential analysis within species based on unique gene sequences

Service Overview


Pan-genome refers to a collection of genes for all strains within a species. Distinct from using individual genome as representative genome of a species, pan-genome enables a more comprehensive presentation of all genetic information of a species, which largely supports the research in sub-species and variants with very distinct features. Accumulated evidence has shown that differences in unique fragments between sub-species are more likely to be related to key features than that in common fragments. Pan-genome sequencing empowers the analysis on both core genome and accessory genome and unique sequences based differential analysis within species.

Pan-genome Assembly


There are mainly three methods in assembling pan-genome: Map to pan, Iterative assembly and De novo assembly. De novo assembly is the most typical method applied in pan-genome and reference genome construction. De novo assembly and annotation is processed in each individual. The assembled genomes are subsequently mapped against reference genome to identify un-matched regions or genes, which are known as accessory genome of a specific individual.
Figure. Pan-genome construction methods (Golicz, et, al., 2016)

Bioinformatic Analysis

Pan-genome research

Analysis

Genome assembly

Pan-genome construction, multi-software assembly, evaluation on assembled genome

Gene prediction and annotation

Gene coding region prediction, repetitive region annotation and transposon classification, lncRNA annotation, pseudogene annotation

Hi-C based genome assembly

Valid interaction evaluation, Contig clustering, ordering and orienting, evaluation on contig anchoring






Biological analysis





Basic analysis

Core genome and accessory genome analysis

SV, PAV, CNV in accessory genome

Gene family clustering

Phylogenetic tree construction



Advanced analysis

Population genetics: GWAS, QTLs mapping

Database construction

Results demo


Project Workflow

Sample delivery
Library construction and sequencing
Data analysis
Final report
After-sale technical support

Pan-genome sequencing with Biomarker Technologies

Leading provider of genomic sequencing services
Providing over 40 comprehensive high-throughput sequencing and bioinformatic service; contributed to thousands of high impact publications.
Highly-skilled technical team.
Massive experience in thousand species genome construction has been accumulated over past 11 years.
Cutting-edge platform supported pan-genome sequencing.
Biomarker Technologies is equipped with the latest third-generation sequencing platforms, including all types of PacBio platforms and Nanopore platforms.
Joint analysis with Hi-C sequencing
Joint analysis with Hi-C data empowers more accurate identification of structural variations. Biomarker Technologies is able to provide high-quality Hi-C services with high valid data ratio and high anchoring rate.
Advanced analysis in QTLs mapping
Biomarker Technologies is an expert in QTL mapping.
Innovative R&D group
We own over 30 self-developed patents and 150 software copy rights.

Case Study

Title: Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense

Journal: Nature Genetics
Impact Factor: 27.125
Published: Dec. 2018

Background:

Cotton represents the largest source of natural textile fibers in the world. Over 90% of annual fiber production comes from allotetraploid cotton (G. hirsutum and G. barbadense). G. hirsutum is cultivated all over the world because of its high yield and G. barbadense is prized for its superior fiber quality. To cultivate G. hirsutum that produces longer, finer and stronger fibers, one approach is to introduce the superior fiber traits from G. barbadense into G. hirsutum. A genomics-enabled breeding strategy requires a detailed and robust understanding of genomic organization.

Results:

1. G.hirsutum and G.barbadense genome assembly Allotetraploid genomes of G.hirsutum and G.barbadense are assembly based on joint analysis of third-generation sequencing data (PacBio RSII), optical maps (BioNano Genomics Irys) and Hi-C. Genome assembly and annotation are summarized in the following table. Contig L50 of G.hirsutum is 1.89 Mb and that of G.barbadense is 2.15 Mb. Hi-C based contig anchoring ratio is 98.94% and 97.68% respectively.

2. Chromosome features of G. hirsutum and G.barbadense genomes Analysis on sequence and structural variations were processed on G.hirsutum and G.barbadense genomes, including SNPs, Indels, SV, PAVs, etc.(Fig. 1) The predicted SNPs and Indels are expected to have large functional effects on G.hirsutum and G.barbadense. Moreover, 4,093 genes were identified to be positive selection (Ka/Ks>1). Hi-C based chromosome-level structural variation analysis revealed an inversion of 170.2 Mb in length between G.hirsutum and G.barbadense.
Figure 1. Chromosomal features of G. hirsutum and G. barbadense genomes with integration of genetics and epigenetics data

3. Introgression line population construction and QTL mapping In order to introduce favorable variants that control the formation of important agronomic trait, such as fiber quality, from G.barbadense to G.hirsutum, introgression line population was constructed. 168 introgression lines determined by molecular markers were sequenced an 466 introgression segments covered 26 chromosomes were identified (Fig. 2). The genetic variant underlying the fuzz-less mutant in G. hirsutum were found to be co-localized with the quantitative trait locus (QTL) in G. barbadense. Characterization of this introgression segment, together with natural fiber mutants, will facilitate a comparative analysis of the mechanism of fuzz fiber initiation between G. barbadense and G. hirsutum.
Figure 2. Introgression line population construction

Conclusion:

By assembly of reference genomes for two cultivated cotton accessions, we have been able to identify extensive variations. These variations should be integrated with those from genome analyses of other accessions to fully exploit genome divergence between the two species in the future. We explored beneficial genome sequences underlying superior fiber quality between two representative accessions of species by constructing introgression lines that can be used for the cultivation of desirable traits in intensive cotton breeding.