AUGUSTUS Forum

Posted: **Thu Nov 19, 2015 7:44 pm**

Originally posted in the old forum by John on 23.07.2012 - 19:46
I have a question about the Augustus protocol for gene calling when you use Intron hints generated from a bam alignment of RNA-seq data like is overviewed on your wiki (http://bioinf.uni-greifswald.de/bioinf/ ... seq.Tophat). How is data handled where a significant number of reads land in introns rather than exons? I have super deep coverage of RNA-seq data, and with the eye it is easy to pick out the exons from the introns, but every intron position has a few bases aligning. On one example, the introns near the poly-A tail of the reads (I did poly-A purification) were 10% the exon coverage, and at the other end of the read were about 1% of the exon coverage. Will this be interpreted by Augustus as being a single giant exon at the 10% coverage end if I use bam2hints to generate intron hints? The raw coverage at the 10% end was like 200X in the intron and 2000X in the exon. If I simply down-sample the data, then there will be an issue with the more rare genes not having any reads aligning. There are regions where there is very light coverage even in this dataset.
What would you recommend? Or has this issue already been thought of and addressed?

Posted: **Thu Nov 19, 2015 7:45 pm**

by Mario on 23.07.2012 - 19:49
The short answer is that it has been taken care of.
Here is the long answer.
Yes retained introns are quite common. My observation is that some introns can have a significant fraction of the coverage of the neighboring exons even if these introns are long, while other have low coverage. It seems to be very inhomogeneous.
Here is how augustus deals with it. Every hint (intron, exonpart in this case) is evaluated with respect to how much support it has from other hints and wrt to how much
conflicts there are with other hints. Hints with a high ratio of conflicting evidence versus supporting evidence are downvalued or even removed when the ration exceeds a threshold with default value

Code: Select all

--/Constant/max_contra_supp_ratio=9

With this default, exonpart hints are ignored when they have a multiplicity of 1/10 or less of the multiplicity of the intron hint that overlaps or contains them.
You can see the effect that this modification (and other modifications) have on the bonus when you look at the section of the augustus output in the beginning that contains the hints. Hints that were deleted are not reported anymore. The last column may contain e.g.

Code: Select all

"1e4;1;3:6"

to indicate a bonus of 1e4, a malus for that type of hint of 1, 3 other supporting hints, and 6 contradicting hints.
An exonpart hint with multiplicity 4 in an intron of multiplicity 6 would count as having 3 other supporting hints and 6 contradicting hints.
If you see more coverage in retained introns you can decrease the threshold and say e.g.

Code: Select all

--/Constant/max_contra_supp_ratio=4

to ignore coverage of up to 20% of the intron multiplicity.

AUGUSTUS Forum

RNA-Seq coverage in intron

RNA-Seq coverage in intron

Re: RNA-Seq coverage in intron