Genome Survey Sequence

Next-generation Sequencing

Service Overview

Genome survey, as the name indicates, means a rapid characterization of a genome, where very limited size of small-fragment library is sequenced in low depth. With help of K-mer analysis, genome survey can provide information including genome size, heterozygosity, repetition rate, which are crucial in determining sequencing strategy for whole genome de novo sequencing.

Principle of Genome Survey

Genome survey is achieved by analyzing K-mer frequency. K-mer refers to a subsequence of length K, (e.g. 17-mer contains 17 bases). Frequency of K-mers are counted in all reads, which generates a curve of depth of k-mer over frequency. Normally, the main peak in the curve indicates that most of K-mers are sequenced at this specific depth, which can be considered as the sequencing depth of the genome. Therefore, estimated genome size= total number of K-mer/depth at main peak. Due to heterozygous loci and repetitive sequences, K-mer frequency curve is not always perfectly following poisson distribution. Other peaks may appear around main peak. The peak shown up at half of the main depth indicates heterozygosity and that at mutiples of main depth indicates repetitive ratio.

Bioinformatic Analysis

Results Content

Initial characterization of genome by genome survey

  • Size
  • Heterozygosity
  • Repetitive ratio
  • GC content

Application of Genome survey

Providing basic characterization of a genome and estimate difficulties in genome assembly. 

Guiding design of library construction and sequencing strategy of large-scale de novo genome sequencing.

Revealing differences in genome between related species.

Genome Survey with Biomarker Technologies


1What is de novo genome sequencing?
Answer: De novo genome sequencing refers to sequencing of a novel genome without reference. It enables construction of genome for novel species and updating existing reference genome. The whole process include DNA library construction, sequencing and reads assembly, annotation with bioinformatic tools.
2What are the advantages of TGS-based genome over NGS-based genome?
Answer: Third generation sequencing is characterized by its long reads at average length of 10-15 kb. The read length of NGS is PE125-250 bp. Therefore, assembling NGS reads can be problematic, especially for repetitive sequences and heterozygous region. With long reads which could possibly cross these complicated regions, TGS sequencing data largely improved the quality of genome assembly.
3TGS is also known for its higher error rate. Is it still suitable for genome sequencing?
Answer: The known error rate refers to errors in base calling, which can be corrected by increasing sequencing depth. Data with 30x coverage can achieve above 99.99% accuracy in single base. Therefore, TGS data is completely suitable for genome assembly.
4How to choose samples for genome sequencing?
Answer: Samples for genome sequencing should be sampled from the same organisms as genome survey. For plant sample, bud cultures, fresh leaves without contamination is recommended. For animals, whole blood and viscera are recommended.