Page 1 of 1

very short introns

Posted: Wed Nov 18, 2015 6:34 pm
by katharina
Originally posted by Olivier in the old forum on 11.04.2012 - 13:13

Hello,
I am trying to use augustus to annotate our genome [...]
Apparently you already have trained augustus with Tetrahymena, which is the closest model organism.
I have ESTs and I have generated a hint file :

Code: Select all

scaffold51_5    EST51    exonpart    127840    127889    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    intron      127890    127915    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    exon        127916    128173    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    intron      128174    128199    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    exon        128200    128535    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    intron      128536    128564    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_5    EST51    exonpart    128565    128736    116.6    +    .    grp=ESTPt5100000000379;pri=4;src=E
scaffold51_122  EST51    exonpart    109994    110207    117.3    -    .    grp=ESTPt5100000027354;pri=4;src=E
And when I run this command :

Code: Select all

augustus --species=[...] $REF --UTR=off --strand=both --genemodel=complete --AUGUSTUS_CONFIG_PATH=$AUGUSTUS_CONF --alternatives-from-evidence=false  --hintsfile=hints.gff 
I obtain errors like that :
Error: intron hint is too short.

Code: Select all

scaffold51_2 EST51 intron 11430 11457 230.7 + . grp=ESTPt5100000046600;pri=4;src=E "100436;0.34;0:0"
Delete group HintGroup ESTPt5100000046600, 10920-11643, mult= 1, priority= 4 3 features
Error: intron hint is too short.
The intron size is really short in [...]. Between 15 and 100 nucleotides. So the length is normal...
Could you help me ?

Re: very short introns

Posted: Wed Nov 18, 2015 6:34 pm
by katharina
Originally posted by Mario in the old forum on 11.04.2012 - 16:18

You can set the minimum intron length like this

Code: Select all

augustus --species=tetrahymena --min_intron_len=15 example.fa
The default is 39bp, which for most species is a safe minimum. It makes sense to not decrease this below the actual minimum size of real introns in your species as this also serves as a filter for intron hints that may be coming from gaps in alignments that do not correspond to introns.

Re: very short introns

Posted: Wed Nov 18, 2015 6:35 pm
by katharina
Originally posted by Olivier in the old forum on 16.04.2012 - 13:49

Is there a way to force augustus to put only introns with length < 100 nt? (like --max_intron_len=100 ?)
Thanks again
Best
Olivier

Re: very short introns

Posted: Wed Nov 18, 2015 6:35 pm
by katharina
Originally posted by Mario in the old forum on 16.04.2012 - 13:50

Yes. In the text file
tetrahymena_intron_probs.pbl
or whatever it is called for your species edit the section after
[LENGTH]
which gives the intron length distribution. You can fill in a desired length distribution, e.g. with all zeros from the 101th line on. Note that the actual probabilities in this section are multiplied by 1000 so that there are not so many leading zeros in their decimal representation. Theoretically, the numbers in the first 101 lines (from length 0 to length 100) should therefore then add up to 1000, but augustus will not complain and still work if they don't.