Augustus excludes exons

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Augustus excludes exons

Post by katharina »

Originally posted in the old forum by ebioman on 03.07.2014 - 16:25

Hello I have a rather strange observation
I trained Augustus for a new species and was rather successful in my opinion:
75% on the exon level and 45% at the gene level for both sensitivity and specificity. The strange thing though is, that upon prediction I receive all genes without any exons annotated 
E.g. (removed some columns for a better view)

Code: Select all

gene 27294 28002 0.46 - . 
transcript 27294 28002 0.46 - . 
stop_codon 27294 27296 . - 0 
intron 27631 27814 1 - . 
CDS 27294 27630 0.96 - 1 
CDS 27815 28002 0.48 - 0 
start_codon 28000 28002 . - 0
If I import the annotation in IGV he automatically annotates exons -> that why I
did not realize it immediately.
Any clue what might cause this effect?
My command line was:

Code: Select all

augustus --species=X Contig10.fasta
But I tried it as well with other options and the output is similar 
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus excludes exons

Post by katharina »

by ebioman on 03.07.2014 - 16:32
Missed information
I forgot to add, that if I take an established plant model as Arabidopsis I receive with the same command exons (wrong annotated though).
The version I am using is 3.0.1
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus excludes exons

Post by katharina »

by ebioman on 04.07.2014 - 09:34
Solved
Just to follow up.
I think I understood the core of the problem. Essentially, if a model contains information about UTR regions (such as Arabidopsis), then Augustus is capable to predict exons and therefore this are subsequently annotated.
If a UTR model is lacking (in my case), CDS is logically equal to Exon. The annotation only calls CDS though and not exon.
Please correct me if I am wrong!
Therefore replacing CDS by exon should be correct
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus excludes exons

Post by katharina »

by katharina on 04.07.2014 - 11:33
Yes, you are right, if you predict genes without a UTR model, CDS equal exons.
In the output file, exons are in that case labeled as "initial", "internal", and "terminal". Example:

Code: Select all

scaffold36      AUGUSTUS        gene    59      1324    0.13    +       .       g5
scaffold36      AUGUSTUS        transcript      59      1324    0.13    +       .       g5.t1
scaffold36      AUGUSTUS        start_codon     59      61      .       +       0       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        initial 59      449     0.21    +       0       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        internal        736     935     0.75    +       2       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        terminal        1091    1324    0.77    +       0       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        intron  450     735     0.62    +       .       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        intron  936     1090    0.77    +       .       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        CDS     59      449     0.21    +       0       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        CDS     736     935     0.75    +       2       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        CDS     1091    1324    0.77    +       0       transcript_id "g5.t1"; gene_id "g5";
scaffold36      AUGUSTUS        stop_codon      1322    1324    .       +       0       transcript_id "g5.t1"; gene_id "g5";
Katharina
Post Reply