interpretation of AUGUSTUS results

Discussions about WebAUGUSTUS, the Web Service of AUGUSTUS.

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

interpretation of AUGUSTUS results

Post by katharina »

Originally posted in the old forum by katharina on 04.09.2012 - 10:24
An augustus training web service user asked the following questions:
I just got the AUGUSTUS results, but have a question about the results.
In the .gff file, each CDS was shown in one row. What do the numbers in each row mean? Which number shows the accuracy, and is there a threshold in the accuracy index for confident inferences?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: interpretation of AUGUSTUS results

Post by katharina »

by Mario on 04.09.2012 - 15:09
In the 6th column there is a score. In augustus, when sampling is turned on (e.g. --sample=100, which is the default for many species) this is the fraction of times this
coding exon was sampled. E.g. in

Code: Select all

chr1 AUGUSTUS CDS 6661 7008 0.54 - 1 transcript_id "g5.t1"; gene_id "g5";
The sampled gene structure contained a coding exon from 6661 to 7008 in 54 out of 100 times. According to the "belief" of the model, this exon is correct with a probability of 54%.
Correct here means that both boundaries need to be exact. Exons overlapping this range would not be counted if they are not exact at both boundaries.
If you want to prioritize, e.g. you want primers for an experiment, you could stick with regions of the gene, in which the exons have high probability, say 0.8 or more. That of course depends on the cost of false positives.
This paper has a table that contains numbers on how well these probabilities in the 6th column correlate with the actual probability that the exon is correct:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538822/
Otherwise, you can google GFF (General Feature Format) or GTF for a definition of the columns.
Post Reply