FANSe3 Features

Extreme speed

FANSe3 is optimized for the dual Ring bus structure of Intel Xeon E5-V2/V3/V4 CPUs and the mesh structures of Intel Xeon Scalable / AMD Epyc CPUs. It is optimized for the many-core environment and achieved very high parallelization efficiency. Usually the hard-disk I/O is the bottleneck, especially for the applications except the WGS.

The mapping speed: 1 hour for a 30x human whole genome sequencing, 1 minute for a 50x human whole exome sequencing, 1-2 seconds for a human transciptome sequencing. Click here to see the literature of "transcriptome sequencing in seconds" (Liu et al., Nucleic Acids Research 2017).

Inherited the extremely high accuracy and robustness from FANSe2, FANSe3 is 30-1200x faster than FANSe2 in most cases. A lot of other features are also included in the FANSe3, e.g. efficient indel support, unique mapping, efficient unmasked reference genome support, direct read trimming.

 

Accuracy and experimental verifiability

FANSe3 inherited the extreme accuracy. It is the only mapping algorithm with mathematical estimation on accuracy (See the FANSe1 and FANSe2 papers for mathematical deduction). In most cases, the miss rate of FANSe3 can be robustly as low as 1e-6. In most clinical cases (e.g. SNV) the miss rate would be mathematically ZERO (boundary conditions see the FANSe2 paper).

Here is a brief list of the literatures evidencing extremely high accuracy and experimental verifiability of FANSe series algorithms in various applications, which completely outperformed or even overwhelmed other algorithms:

  • DNA sequencing: SNV calling and genome correction. 1994 positions verified, no false positives and no false negatives at all. (Wu et al, J Proteome Res 2014)
  • mRNA expression: FANSe2 100% verified vs. Bowtie2 0% verified. (Xiao et al., PLoS ONE 2014)
  • Differential gene expression: FANSe3 all verified. (Li et al., Scientific Reports 2017)
  • Splice junction detection: FANSe2splice 80-100% verified vs. TopHat2/Mapsplice2/HISAT2/STAR 0-40% verified. (Mai et al., Scientific Reports 2017)
  • Due to the experimental verifiability, FANSe2 has been intensively used in the transcriptome and translatome analyses of Chromosome-centric Human Proteome Project to provide translational evidence:

     

    Versatility

    FANSe3 accepts multiple input formats, such as FASTQ, FQC (compresed FASTQ, a private format of Chi-Biotech), FASTA, read per line, etc. FANSe3 works best for the reads of 50-500nt long, but accepts shorter (at least 14-nt) and longer reads. It can map reads of all sequencers, including Illumina, Ion Torrent, BGISEQ/MGISEQ, Helicos, PacBio, etc.

    FANSe3 doesn't use any hardware-acceleration features, including SSE2/3, AVX, etc. It is also independent on specific hardware processors like GPU or FPGA. However, FANSe3 is already faster than any GPU/FPGA realization of BWA-based system - and much more accurate (see below). This is a great advantage of deployment in theoretically any systems, especially in national supercomputing centers. For example, we collaborate with the TianHe-2 supercomputer (Guangzhou, China), which can theoretically analyze WES data for all 1.3 billion Chinese population within half a month.

     

    Error tolerance

    Inherited from FANSe2, FANSe3 perfectly supports errors up to 12%. To support higher errors (e.g. for the genome correction), iterative correction strategy could be used to support up to ~25% deviation from the reference sequences (Wu et al, J Proteome Res 2014).

    FANSe3 maintains high accuracy and high error tolerance simultaneously, which is a unique feature compared to other mapping algorithms. This is overwhelming in case of very complex systems such as de novo meta-transcriptome analysis of the symbiotic roots of Eichhornia crassipes (Luo et al. Scientific Reports 2015).

    Another unique feature of FANSe3 is the built-in error compensation of sequencer errors. Due to the flowcell design and signal readout mechanisms of some sequencers (e.g. Illumina, BGISEQ/MGISEQ), certain parts of reads are of considerablely low quality. FANSe3 may automatically compensate these artifacts and use other parts of the reads to map these reads. This remarkably "decontaminates" the mapping results and facilitates the accurate SNV detection. This feature is available only in the commercial version.

     

    RNA-seq optimization

    The full (commercial) version of FANSe3 integrated a unique feature of RNA-seq optimization. The program automatically recognizes that the mapping is an RNA-seq application so that the transcription mode is activated. Under this mode, the algorithm is specifically tuned to achieve up to 20x speed-up. Also, when providing annotation, FANSe3 directly performs RNA quantification without exporting bulky mapping results, which is already enough for >95% RNA-seq applications.