Please note ...

FANSe3 is a commercial development project of Chi-Biotech Co. Ltd.

The public (free) version of FANSe3 here is only for trials, with only basic and limited features. For full version with unleashed power (both academic and commercial use), please contact Chi-Biotech to obtain a licence.

Feature Comparison of free/commercial versions
Feature Free version Commercial version
Parallel CPU cores Limited to 2 Unlimited (>256)
Unique mapping Supported Supported
Fast indel detection Yes Yes
Masked genome Supported Supported
Export all multi-mapped locations Up to 200 Up to 200
Max read length 1000 Unlimited
Unidirectional mapping No Yes, forward or reverse, for strand-specific applications
Sequencer artifact compensation No Yes, optimized for Illumina and BGISEQ/MGISEQ flowcells, also compatible with Ion Torrent
Format supported FASTQ only FASTQ, FASTA, FQC,
one-line nucleotide, etc.
(Use FQC format for the best performance)
Disk I/O saving No Yes
Batch mode No Yes, single indexing, mapping multiple datasets sequentially, saving time of indexing
Performance optimization for RNA-seq No Yes, up to 20x faster than free version
Trim reads while mapping No Yes
Direct quantification for RNA-seq No Yes, no need to use other programs to obtain read count and rpkM values

 

Command-line usage

FANSe3 -R<ref.fa> -D<reads.fq> [-O<out.fanse3>] [-E3] [-S14] [-C2] [-H1] [--indel] [--unique] [--mask]

Options
Option Optional? Explanation
-R compulsory Reference sequence file (FASTA format).
Supports UNC name (like \\server1\myfolder\abc.fa).
Supports Chinese characters.
In the FASTA file, the sequence name may contain space and special characters.
In the FASTA file, no limitation of the sequence name.
-D compulsory FASTQ dataset file.
Supports UNC name (like \\server1\myfolder\ionS5-1.fq).
Supports Chinese characters.
-O optional Output file name. Automatic generate when missing.
Supports UNC name (like \\server1\myfolder\ionS5-1.fanse3).
Supports Chinese characters.
-E optional Error allowance. Default=3.
Mismatch and indel are all counted as errors.
It can be set as integer or percentage.
Integer: like -E5, designate fixed number of errors allowed in the alignment. This is preferred when read length is fixed, e.g. Illumina and BGISEQ/MGISEQ sequencers.
Percentage: like -E5%, designate error allowance as a percentage of the read length. This is useful when the read length is variable, e.g. Ion Torrent and Helicos sequencers, or the short fragments after trimming the adapters.
-S optional Seed length. Default=14. Can be set as any integer from 6 to 14.
Larger seed length will be faster but may lose more reads when the error rate exceeds 6%. Please refer to the FANSe2 paper to set a proper seed length according to your high error rates scenarios for an estimated accuracy.
-C optional Parallel CPU cores. Default=2. For free version, max=2.
-H optional Batch size: how many reads (in million) will be loaded for each batch. -H2 means 2 million reads per batch. -H0.5 means 0.5 million reads per batch.
--indel optional Fast indel detection on. Equivalent to the "-I1" in FANSe2.
--unique optional Unique mapping. When this toggle is present, the uniquely mapped reads will be stored in the .fanse3 file, and the multi-mapped reads will be stored in a separate -multimap.fanse3 file.
--mask optional Masked genome. When this toggle is present, the lower-case letters in the reference sequences will not be considered. Equivalent to the "-M1" in FANSe2.

 

Quick examples:

FANSe3 -Rref.fa -Dillumina.fq --unique

FANSe3 -Rref.fa -Diontorrent.fq -E5% --indel --unique

 

Result file format

There will be four result files:

  • *.fanse3: the uniquely mapped reads
  • *-multimap.fanse3: the multi-mapped reads
  • *.unmapped: the unmapped reads and the reads of highly repetitive sequences
  • *.log: log file for the parameters and the speed
  • Format description of the .fanse3 files

    The .fanse3 files are very similar to the .fanse2 files, with some additinoal information.

    Example of a uniquely mapped read:
    42628 AGCAAGGACTAACCCCTATACC .................x....
    F NM_001190470 1 142 1

    Line 1:
    42628 = read name (exactly as in FASTQ, could be a string)
    AGCAAGGACTAACCCCTATACC = read nucleotide sequence
    .................x.... = alignment. "x"=mismatch; "-"=deletion; nucleotide=insertion.
    Line 2:
    F = direction (F = forward, R = reverse)
    NM_001190470 = mapped reference name
    1 = error count
    142 = position (0-based)
    1 = number of mapping locations (1 = uniquely mapped)

    Example of a multi-mapped read:
    369061 AGCTGGTACAGAAAGCCAAATTCGCTG ....................x......,....................x......
    F,F NM_003404,NM_139323 1 405,310 2

    For multi-mapped reads, the directions, reference names, positions would be multiple values separated by a comma ",".