Benchmark
We benchmarked REAL (version 0.0.31 linux-x86_64) against SOAP2 (version 2.21 linux-x86_64) by aligning:
- 24,543,488 70 bp-long single-end simulated reads, obtained from the Drosophila melanogaster chromosome 3L by inserting in each read a random number of mismatches ranging from 0 to 4, against the same reference sequence. We measured the performance of each programme using 1 matching thread (see Table 1).
- 3,619,970 35 bp-long single-end real mouse reads against the whole mouse genome. We measured the performance of each programme using 4 matching threads (see Table 2).
- 29,671,358 100 bp-long single-end real homo sapiens reads against the whole homo sapiens genome. We measured the performance of each programme using 8 matching threads (see Table 3).
The experiments were conducted on a desktop PC using 1 to 8 2.4 GHz AMD 6136 processors and 16 GB of main memory, and running the Ubuntu GNU/Linux operating system.
Results
Table 1. Performance and accuracy of aligning 24,543,488 70 bp-long single-end simulated reads to the Drosophila melanogaster chromosome 3L using 1 matching thread.
Time for indexing | Time for mapping | # Aligned reads | # Correctly aligned reads | Accuracy | |
SOAP2 | 00m26s | 10m16s | 21,158,794 | 21,155,270 | 99,983% |
REAL | 00m00s | 04m43s | 21,166,647 | 21,163,254 | 99,984% |
Both programmes were run with 48 bp-long seed, with at most 4 mismatches in the whole read, with 1 matching thread, and reported the best hits only. For REAL these options are `-e 4 -T 1 -q 0 -l 48'. For SOAP2 these options are `-l 48 -M 4 -v 4 -p 1 -r 0'. Due to the fact that the reads were simulated, and thus, we were able to know the exact locations they were derived from the reference sequence, we also measured the accuracy of each programme by counting the number of correctly aligned reads.
Table 2. Performance of aligning 3,619,970 35 bp-long single-end real reads against the whole mouse genome using 4 matching threads.
Time for indexing | Time for mapping | # Aligned reads | |
SOAP2 | 49m42s | 2m23s | 1,889,410 |
REAL | 00m00s | 37m01s | 1,812,750 |
Both programmes were run with 32 bp-long seed, with at most 3 mismatches in the whole read, with 4 matching threads, and reported the best hits only. For REAL these options are `-e 3 -T 4 -q 0 -l 32'. For SOAP2 these options are `-l 32 -M 4 -v 3 -p 4 -r 0'.
Table 3. Performance of aligning 29,671,358 100 bp-long single-end real reads against the whole homo sapiens genome using 8 matching threads.
Time for indexing | Time for mapping | # Aligned reads | |
SOAP2 | 53m03s | 18m35s | 24,907,582 |
REAL | 00m00s | 70m18s | 24,825,141 |
Both programmes were run with 52 bp-long seed, with at most 4 mismatches in the whole read, with 8 matching threads, and reported the best hits only. For REAL these options are `-e 4 -T 8 -q 0 -l 52'. For SOAP2 these options are `-l 52 -M 4 -v 4 -p 8 -r 0'.
-
Notice that REAL works without a precomputed stored index, and, hence, the time for indexing is always included in the time for mapping.