Originally posted in the old forum by Chris on 18.07.2013 - 17:29
Dear forum members,
I am currently using Augutus version 2.5.5. I trained the algorithm with a transcriptome data. Unfortunately, Augutus creates protein predictions which
are obviously fusions of serveral genes. Does anybody of you know such a
thing can be circumvented? Or is there a parameter available which limits the
maximum length of a protein prediction in terms of amino acids or base pairs?
I am very grateful for any help,
Chris
wrongly predicted protein fusions
Moderator: bioinf
Re: wrongly predicted protein fusions
by Mario on 29.07.2013 - 16:05
There are several possible "solutions" (mitigations):
1) When you use hints from RNA-seq, you can edit this line in your extrinsic.cfg (e.g. starting from extrinsic.M.RM.E.W.cfg as template)
Decrease the numer in the third column, the malus, here 0.34, to get fewer predicted unsupported introns.
2) Incentivise the prediction of a larger number of genes.
Assuming that you have an UTR model for your species and that you run it with --UTR=on
the default for --genemodel: You can try playing with parameters in
config/model/trans_shadow_partial_utr.pbl
(make a copy first)
Transition probabilities
only non-zero probabilities
Increasing transitions parameters out of the intergenic region (state 0) will create the tendency that more genes are predicted, hopefully mainly in the described cases of erroneously joined genes.
I suggest you try to increase the four numbers in the third column of the rows starting
by a factor of 10 (subject to trial and error), e.g. change .00003 into 0.0003, and so on.
3) If you don't already have a UTR model for your species and use hints from RNA-seq, then do both.
The exonpart hints from the RNA-seq coverage in the UTRs can make the difference from a false positive gene-joining intron to two true positive UTRs intead.
We are currently working on using the information from paired RNA-seq not only in filtering alignments but also for delineating genes. Hang on...
There are several possible "solutions" (mitigations):
1) When you use hints from RNA-seq, you can edit this line in your extrinsic.cfg (e.g. starting from extrinsic.M.RM.E.W.cfg as template)
Code: Select all
intron 1 .34 M 1 1e+100 ...
2) Incentivise the prediction of a larger number of genes.
Assuming that you have an UTR model for your species and that you run it with --UTR=on
the default for --genemodel: You can try playing with parameters in
config/model/trans_shadow_partial_utr.pbl
(make a copy first)
Transition probabilities
only non-zero probabilities
Code: Select all
[Transition]
----- Igenic region -----
intergenic region
0 0 .9999
5' UTR single exon
0 24 .00003
5' UTR initial exon
0 25 .00002
reverse 3' UTR single exon
0 65 .00004
reverse 3' UTR terminal exon
0 70 .00001
I suggest you try to increase the four numbers in the third column of the rows starting
Code: Select all
0 24
0 25
0 65
0 70
3) If you don't already have a UTR model for your species and use hints from RNA-seq, then do both.
The exonpart hints from the RNA-seq coverage in the UTRs can make the difference from a false positive gene-joining intron to two true positive UTRs intead.
We are currently working on using the information from paired RNA-seq not only in filtering alignments but also for delineating genes. Hang on...
Re: wrongly predicted protein fusions
by Karina H on 20.03.2014 - 07:45
dataset for standardization
would there be a gold standard dataset on which any of these strategies can be tested and parameters optimized, until expected / pre-determined results are obtained?
dataset for standardization
would there be a gold standard dataset on which any of these strategies can be tested and parameters optimized, until expected / pre-determined results are obtained?
Re: wrongly predicted protein fusions
by AksR on 31.03.2014 - 19:59
What to change if NOT using RNa-Seq data as hints
Solution #1, you've suggested altering the malus values in the extrinsic.cfg file
Is there a similar strategy if and when I am NOT using RNA-Seq data as hints?
Thank you!
What to change if NOT using RNa-Seq data as hints
Solution #1, you've suggested altering the malus values in the extrinsic.cfg file
Is there a similar strategy if and when I am NOT using RNA-Seq data as hints?
Thank you!