Manual
REAL preprocesses the reference sequence first to create an index, based on the short reads length, by using word-level parallelism and radix-sort. Then, it does not hash the short reads, but instead it converts the seed part of each read to a unique arithmetic value, using 2-bits-per-base encoding of the DNA alphabet, and then implements the filtering strategy, binary search, and word-level operations for mapping the reads to the reference sequence.
Usage
SYNOPSIS real [options] Options: -t <textfilename> -p <patternfilename> -o <outputfilename> -s <maximum number of errors in seed, default=2> -e <total maximum number of errors, default=5> -l <length of seed, default=32> -u <search for unique match, default=1> -f <fraction of physical memory to use, default=0.75> -q <use quality scores, default=1> -Q <offset for quality scores, default=autodetect> -R <rewrite pattern file, default=1> -T <number of matching threads, default=2> -similarity <sequence similarity, default=0.995> -trans <transitions fraction of mutations, default=0.71> -gc <composition bias, default=0.41> -gcmut_bias <mutability bias of GC, default=2> -filter_level <filtering level for equal hits 0-4, default=2> -g <search for gapped matches, default=0> Note: gapped matching is not complete in the current version (0.0.31)
Output
Each line of the output represents a hit, which is tab-separated as follows:
1. id of the read
2. full sequence of the read (converted to the complementary, if mapped on the reverse chain of the reference)
3. quality of the read for a FASTQ file (converted to the complementary as well)
4. number of equally best hits
5. a/b to distiniquish which file the read belongs to for pair-end alignment
6. length of the read
7. +/- to distiniquish whether the hit is on the direct (+) or reverse (-) chain of the reference
8. id of the reference sequence
9. location of the hit on the reference, counted from 1
10. number of mismatches in the hit
11. "+n Offset": n-bp insertion on read. Example: "+1 15" means 1-bp insertion on read, start after location 15 on reference
"-n Offset": n-bp deletion on read. Example: "-2 16" means 2-bp deletion on read, start after location 16 on reference
One may directly use the format convertor script provided by the Short Oligonucleotide Analysis Package to convert the REAL output into SAM Format.