wrongly predicted protein fusions

Post by **katharina** » Thu Nov 19, 2015 7:56 pm

Originally posted in the old forum by Chris on 18.07.2013 - 17:29
Dear forum members,
I am currently using Augutus version 2.5.5. I trained the algorithm with a transcriptome data. Unfortunately, Augutus creates protein predictions which
are obviously fusions of serveral genes. Does anybody of you know such a
thing can be circumvented? Or is there a parameter available which limits the
maximum length of a protein prediction in terms of amino acids or base pairs?
I am very grateful for any help,
Chris

Post by **katharina** » Thu Nov 19, 2015 7:57 pm

by Mario on 29.07.2013 - 16:05

There are several possible "solutions" (mitigations):

1) When you use hints from RNA-seq, you can edit this line in your extrinsic.cfg (e.g. starting from extrinsic.M.RM.E.W.cfg as template)

Code: Select all

   intron        1       .34  M    1  1e+100 ...

Decrease the numer in the third column, the malus, here 0.34, to get fewer predicted unsupported introns.

2) Incentivise the prediction of a larger number of genes.
Assuming that you have an UTR model for your species and that you run it with --UTR=on
the default for --genemodel: You can try playing with parameters in
config/model/trans_shadow_partial_utr.pbl
(make a copy first)
Transition probabilities
only non-zero probabilities

Code: Select all

[Transition]
----- Igenic region -----
intergenic region
0 0 .9999
5' UTR single exon
0 24 .00003
5' UTR initial exon
0 25 .00002
reverse 3' UTR single exon
0 65 .00004
reverse 3' UTR terminal exon
0 70 .00001

Increasing transitions parameters out of the intergenic region (state 0) will create the tendency that more genes are predicted, hopefully mainly in the described cases of erroneously joined genes.
I suggest you try to increase the four numbers in the third column of the rows starting

Code: Select all

by a factor of 10 (subject to trial and error), e.g. change .00003 into 0.0003, and so on.

3) If you don't already have a UTR model for your species and use hints from RNA-seq, then do both.
The exonpart hints from the RNA-seq coverage in the UTRs can make the difference from a false positive gene-joining intron to two true positive UTRs intead.

We are currently working on using the information from paired RNA-seq not only in filtering alignments but also for delineating genes. Hang on...

Post by **katharina** » Thu Nov 19, 2015 7:57 pm

by Karina H on 20.03.2014 - 07:45
dataset for standardization
would there be a gold standard dataset on which any of these strategies can be tested and parameters optimized, until expected / pre-determined results are obtained?

Post by **katharina** » Thu Nov 19, 2015 7:57 pm

by AksR on 31.03.2014 - 19:59
What to change if NOT using RNa-Seq data as hints
Solution #1, you've suggested altering the malus values in the extrinsic.cfg file
Is there a similar strategy if and when I am NOT using RNA-Seq data as hints?
Thank you!

AUGUSTUS Forum

wrongly predicted protein fusions

wrongly predicted protein fusions

Re: wrongly predicted protein fusions

Re: wrongly predicted protein fusions

Re: wrongly predicted protein fusions