FANSe2

What is FANSe2?

FANSe2 is a mapping algorithm which can map a billion reads in hours with ultimate and robust accuracy. Compared with FANSe, FANSe2 inherited the near-perfect and robust accuracy while making substantial strategic and technical improvements, increasing the running speed for more than 10x and thus is suitable for large reference genome sequences.

Cite FANSe2:

FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications.
Chuan-Le Xiao, Zhi-Biao Mai, Xin-Lei Lian, Jia-Yong Zhong, Jing-jie Jin, Qing-Yu He *, Gong Zhang *
PLoS One. 2014 Apr 17;9(4):e94250
DOI: 10.1371/journal.pone.0094250

What is the advantage of FANSe2?

Accuracy
FANSe2 achieves the same stable and extremely high sensitivity as FANSe: it loses only 10^-6 of the mappable reads which is an advantage especially when mapping fragments generated by RNA-seq. And it is the only algorithm up-to-date with theoretical proof of the extremely high accuracy, showing its robustness. In all our tested cases, FANSe2 can map more reads than any other short reads mapping algorithms. Comparing with Bowtie2, FANSe2 at 14-seed mode can map more reads.

Experimental verifiability
It has been experimentally shown that the FANSe2 results can be validated by experiments, while the other mapping algorithms generates lots of false positives and false negatives. This has been evidenced in mRNA identifications and mutation identifications. Please refer to the following literatures:

FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications. PLoS One. 2014 Apr 17;9(4):e94250
Iterative Genome Correction Largely Improves Proteomic Analysis of Nonmodel Organisms. J Proteome Res. 2014 May 19.

Due to the experimental verifiability, FANSe2 has been intensively used in the transcriptome and translatome analyses of Chromosome-centric Human Proteome Project to provide translational evidence:

Resolving chromosome-centric human proteome with translating mRNA analysis: a strategic demonstration. J Proteome Res. 2014 Jan 3;13(1):50-9.
Systematic analyses of the transcriptome, translatome, and proteome provide a global view and potential strategy for the C-HPP. J Proteome Res. 2014 Jan 3;13(1):38-49.
Systematic analysis of missing proteins provides clues to help define all of the protein-coding genes on human chromosome 1. J Proteome Res. 2014 Jan 3;13(1):114-25.
Chromosome-8-coded proteome of Chinese Chromosome Proteome Data set (CCPD) 2.0 with partial immunohistochemical verifications. J Proteome Res. 2014 Jan 3;13(1):126-36.
Omics evidence: single nucleotide variants transmissions on chromosome 20 in liver cancer cell lines. J Proteome Res. 2014 Jan 3;13(1):200-11.

Sensitivity to indels
FANSe2 offers a full sensitivity to indels (insertions and deletions) due to its accelerated Smith-Waterman refinement. Unlike SHRiMP and Bowtie2 which requires SSE2/3 instruction support of the CPU, the accelerated SW refinement of FANSe is hardware independent.
Compared with the “very-sensitive mode” of Bowtie2, FANSe2 at 14-seed can map typically at least 3-5% more reads within roughly the same time when mapping 75-nt reads to large genomes, enabling the indel detection.

Speed
FANSe can map a billion reads per hour with ultimate and robust accuracy in a modern hexa-core computer. FANSe maps typically 15-30% more reads than Bowtie2 “very-fast mode” with similar speed, especially when mapping short reads (50-nt or shorter). Compared with Bowtie2 “very-sensitive mode”, FANSe2 is typically 4 times faster with 14-seed and indel detection off, exporting very similar number of reads.
FANSe2 is designed to parallelize the mapping processing, unleashing the computational power of modern multi-core CPUs. Moreover, FANSe2 can perform distributed computing by spreading the work to multiple computers. Importantly, no cluster is needed! Forget about the expensive 4-way workstations or servers. You can combine the spare computational power of all office/lab computers during the night, lunch breaks or seminars, completing the same job. Also, there’s no need to switch between Windows and Linux systems – FANSe2 runs the best under Windows.

Error allowance in the reads
The settings of FANSe2 for error allowance are completely flexible: any number of errors in the reads can be detected. This is a great advantage against algorithms like SOAP2 (max. 2 mismatches), Bowtie (max. 3 mismatches) and Bowtie2 (the error allowance cannot be explicitly set).

Versatility
FANSe2 does not have any restrictions on genome size. The read length must be more than 14-nt. With the decreasing minimal seed length, FANSe2 reaches higher accuracy in the cost of more computational time. Users may make a balance between the accuracy and speed by setting appropriate parameters. FANSe2 supports masked genomes and non-specified nucleotides (“N”s) in reference sequence. FANSe2 offers an optional memory reduction mode to reduce memory consumption by half or less, suitable for computers with limited memory. FANSe2 can also perform strand specific mapping for applications like mRNA or miRNA sequencing.