rna-seq

Post by **katharina** » Thu Nov 19, 2015 2:24 pm

Originally posted by Assaf in the old forum on 01.05.2013 - 16:40

Hi all,

I see that RNA-Seq supported predictions in augustus become quite accurate by using intron-hints from splice junctions data (similar to what described in http://augustus.gobics.de/binaries/readme.rnaseq.html). The wig hints did not contribute at all in my case.

Still, I see that the program sometimes tends to extend the CDs and exons beyond regions supported by evidence, leading to concatenation adjacent genes, which are not linked by any intron-evidence. I tried to use lower malus values, but this just leads to the increasing this problem ( I expected it will lead to the oposite). Maybe you have an idea how to solve this problem ???

Best,
Assaf

Post by **katharina** » Thu Nov 19, 2015 2:24 pm

Originally posted by Mario in the old forum on 21.05.2013 - 16:55

Yes, lower intron malus values should lead to fewer predicted unsupported introns.
I am giving you an excerpt of an extrinsic file below that I recently used.

Code: Select all

        ass        1   1 0.05  M    1  1e+100  RM  1     1    E 1    1    W 1    1
        dss        1   1 0.01  M    1  1e+100  RM  1     1    E 1    1    W 1    1
   exonpart        1      .99  M    1  1e+100  RM  1     1    E 1    1    W 1    1.003
       exon        1        1  M    1  1e+100  RM  1     1    E 1    1    W 1    1
 intronpart        1        1  M    1  1e+100  RM  1     1    E 1    1    W 1    1
     intron        1       .2  M    1  1e+100  RM  1     1    E 1   50    W 1    1
    CDSpart        1    1 .99  M    1  1e+100  RM  1     1    E 1    1    W 1    1
        CDS        1        1  M    1  1e+100  RM  1     1    E 1    1    W 1    1
    UTRpart        1   1 .985  M    1  1e+100  RM  1     1    E 1    1    W 1    1
        UTR        1        1  M    1  1e+100  RM  1     1    E 1    1    W 1    1
     irpart        1        1  M    1  1e+100  RM  1     1    E 1    1    W 1    1
nonexonpart        1        1  M    1  1e+100  RM  1     1.15 E 1    1    W 1    1

The intron malus (here 0.2) should remove some of the unsupported introns. Reducing this further, e.g. to 1e-10, this should remove more introns without any hint support. This obviously has limits as then true introns also vanish in the predictions.
More importantly, the local splice site mali (numbers in the third column of rows ass, dss), help more specifically in this respect.
They only apply to candidates splice sites that do no have any hints supporting them (e.g. from intron hints), but the exon that they
flank does have hint support. Therefore, this malus does not apply to unexpressed genes, when using RNA-seq-based hints.
Assuming that you have a UTR model for your species, I would suspect that the exonpart hints from the wiggle file could help as well, in those cases where there are hints in a false positive intron that actually contains two UTRs. You may want to try to increase the exonpart bonus (here 1.003). When varying parameters, do not be shy to try to increase or decrease them a lot.

Post by **katharina** » Thu Nov 19, 2015 2:25 pm

Originally posted by Jammie in the old forum on 11.10.2013 - 04:44

Hi,all :
When i ran etraining this program
==
./etraining --species=Flower sequence.gb
==
I get the following error "Segmentation fault"
sequence.gb is coming from genebank which file is 72M
When i use another small sequence_small.gb which file is 3.2M
I do not have this problem.
If i want to use the first .gb file (sequence.gb)
How can i fix this error ?
Thanks a lot
Jammie

Post by **katharina** » Thu Nov 19, 2015 2:25 pm

Originally posted by katharina in the old forum on 13.10.2013 - 18:14

You could try to split the larger genbank file into smaller files in order to find the location of the problem (it's likely caused by an incorrectly formatted entry). After identification, remove the error causing entry from the original file and run etraining.
Katharina

AUGUSTUS Forum

rna-seq

rna-seq

Re: rna-seq

Re: rna-seq

Re: rna-seq