RNA-Seq coverage in intron
Posted: Thu Nov 19, 2015 7:44 pm
Originally posted in the old forum by John on 23.07.2012 - 19:46
I have a question about the Augustus protocol for gene calling when you use Intron hints generated from a bam alignment of RNA-seq data like is overviewed on your wiki (http://bioinf.uni-greifswald.de/bioinf/ ... seq.Tophat). How is data handled where a significant number of reads land in introns rather than exons? I have super deep coverage of RNA-seq data, and with the eye it is easy to pick out the exons from the introns, but every intron position has a few bases aligning. On one example, the introns near the poly-A tail of the reads (I did poly-A purification) were 10% the exon coverage, and at the other end of the read were about 1% of the exon coverage. Will this be interpreted by Augustus as being a single giant exon at the 10% coverage end if I use bam2hints to generate intron hints? The raw coverage at the 10% end was like 200X in the intron and 2000X in the exon. If I simply down-sample the data, then there will be an issue with the more rare genes not having any reads aligning. There are regions where there is very light coverage even in this dataset.
What would you recommend? Or has this issue already been thought of and addressed?
I have a question about the Augustus protocol for gene calling when you use Intron hints generated from a bam alignment of RNA-seq data like is overviewed on your wiki (http://bioinf.uni-greifswald.de/bioinf/ ... seq.Tophat). How is data handled where a significant number of reads land in introns rather than exons? I have super deep coverage of RNA-seq data, and with the eye it is easy to pick out the exons from the introns, but every intron position has a few bases aligning. On one example, the introns near the poly-A tail of the reads (I did poly-A purification) were 10% the exon coverage, and at the other end of the read were about 1% of the exon coverage. Will this be interpreted by Augustus as being a single giant exon at the 10% coverage end if I use bam2hints to generate intron hints? The raw coverage at the 10% end was like 200X in the intron and 2000X in the exon. If I simply down-sample the data, then there will be an issue with the more rare genes not having any reads aligning. There are regions where there is very light coverage even in this dataset.
What would you recommend? Or has this issue already been thought of and addressed?