measuring accuracy

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

measuring accuracy

Post by katharina »

Originally posted by Julie in the old forum on 15.06.2012 - 04:43

Hi all,
I am using augustus with different pipelines to annotate a recently assembled plant genome, and now I would like to compare them.
Is there any simple way to calculate some statistics about the annotation process? I would like to know how many features (CDS, exon, complete or partial gene) were annotated in the gff file after running augustus?
Best
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: measuring accuracy

Post by katharina »

Originally posted by Mario in the old forum on 15.06.2012 - 11:08

Code: Select all

grep -Pc "\tCDS\t" aug.gff
grep -Pc "\tgene\t" aug.gff
grep -Pc "\ttranscript\t" aug.gff
grep -Pc "\tstart_codon\t" aug.gff
grep -Pc "\tstop_codon\t" aug.gff
count the number of coding exons, genes, transcripts, genes/gene fragments that are complete at the 5'-end or 3'-end, respectively.
A gene has a complete coding region, if both start_codon and stop_codon are predicted.

Code: Select all

cat aug.gff | grep codon | perl -pe 's/.*transcript_id "[^"]+)".*/$1/' > codons.txt
gets you a file with the transcript ids, in which the complete genes
are listed twice, as they have two codon lines.
Then you can do e.g.

Code: Select all

cat codons.txt | perl -ne '$s{$_}++; print if ($s{$_}==2)' | wc -l
To get the number of complete transcripts.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: measuring accuracy

Post by katharina »

Originally posted by Julie in the old forum on 18.06.2012 - 03:03

many thankx
Post Reply