Genome Survey SequenceNext-generation Sequencing
Service OverviewGenome survey, as the name indicates, means a rapid characterization of a genome, where very limited size of small-fragment library is sequenced in low depth. With help of K-mer analysis, genome survey can provide information including genome size, heterozygosity, repetition rate, which are crucial in determining sequencing strategy for whole genome de novo sequencing.
Principle of Genome Survey
Genome survey is achieved by analyzing K-mer frequency. K-mer refers to a subsequence of length K, (e.g. 17-mer contains 17 bases). Frequency of K-mers are counted in all reads, which generates a curve of depth of k-mer over frequency. Normally, the main peak in the curve indicates that most of K-mers are sequenced at this specific depth, which can be considered as the sequencing depth of the genome. Therefore, estimated genome size= total number of K-mer/depth at main peak.
Due to heterozygous loci and repetitive sequences, K-mer frequency curve is not always perfectly following poisson distribution. Other peaks may appear around main peak. The peak shown up at half of the main depth indicates heterozygosity and that at mutiples of main depth indicates repetitive ratio.
Initial characterization of genome by genome survey
- Repetitive ratio
- GC content
Application of Genome survey
Providing basic characterization of a genome and estimate difficulties in genome assembly.
Guiding design of library construction and sequencing strategy of large-scale de novo genome sequencing.
Revealing differences in genome between related species.
Genome Survey with Biomarker Technologies
Clear K-mer frequency statistics. Accurate estimation on genome size, heterozygosity, repetitive ratio, etc.
Over 1,000 genome survey completed. Accumulated experience of over 300 species, covering forest, marine organisms, animals, plants, etc.
Contributed in many high impact genome publications.
1What is de novo genome sequencing?
Answer: De novo genome sequencing refers to sequencing of a novel genome without reference. It enables construction of genome for novel species and updating existing reference genome. The whole process include DNA library construction, sequencing and reads assembly, annotation with bioinformatic tools.
2What are the advantages of TGS-based genome over NGS-based genome?
Answer: Third generation sequencing is characterized by its long reads at average length of 10-15 kb. The read length of NGS is PE125-250 bp. Therefore, assembling NGS reads can be problematic, especially for repetitive sequences and heterozygous region. With long reads which could possibly cross these complicated regions, TGS sequencing data largely improved the quality of genome assembly.
3TGS is also known for its higher error rate. Is it still suitable for genome sequencing?
Answer: The known error rate refers to errors in base calling, which can be corrected by increasing sequencing depth. Data with 30x coverage can achieve above 99.99% accuracy in single base. Therefore, TGS data is completely suitable for genome assembly.
4How to choose samples for genome sequencing?
Answer: Samples for genome sequencing should be sampled from the same organisms as genome survey. For plant sample, bud cultures, fresh leaves without contamination is recommended. For animals, whole blood and viscera are recommended.