REAL

REad ALigner for next-generation sequencing reads

Manual


REAL preprocesses the reference sequence first to create an index, based on the short reads length, by using word-level parallelism and radix-sort. Then, it does not hash the short reads, but instead it converts the seed part of each read to a unique arithmetic value, using 2-bits-per-base encoding of the DNA alphabet, and then implements the filtering strategy, binary search, and word-level operations for mapping the reads to the reference sequence.

Usage


Output

Each line of the output represents a hit, which is tab-separated as follows:
  1. id of the read
  2. full sequence of the read (converted to the complementary, if mapped on the reverse chain of the reference)
  3. quality of the read for a FASTQ file (converted to the complementary as well)
  4. number of equally best hits
  5. a/b to distiniquish which file the read belongs to for pair-end alignment
  6. length of the read
  7. +/- to distiniquish whether the hit is on the direct (+) or reverse (-) chain of the reference
  8. id of the reference sequence
  9. location of the hit on the reference, counted from 1
  10. number of mismatches in the hit
  11. "+n Offset": n-bp insertion on read. Example: "+1 15" means 1-bp insertion on read, start after location 15 on reference
         "-n Offset": n-bp deletion on read. Example: "-2 16" means 2-bp deletion on read, start after location 16 on reference



One may directly use the format convertor script provided by the Short Oligonucleotide Analysis Package to convert the REAL output into SAM Format.