gff file for training, accepted features?
Posted: Fri Nov 20, 2015 1:18 pm
Originally posted in the old forum by Viola Manning on 15.01.2013 - 21:39
Hello Augustus folks,
I would like to train Augustus to annotate multiple genomes of an organism
that already is well annotated. To train Augustus, I see that I can use a
gene structure file in gff format, which I have, but the features in column
three are different than what the training tutorial says is allowed.
For example, the training tutorial says that the features may be CDS, 5'-UTR
or 3'-UTR, but my gff3 files contains additional lines and 5' and 3' are
written out as five_prime_UTR and three_prime_UTR. Here is an example:
Should I modify my gff file to have only these three lines and change the
UTRs to the abbreviations, or will the training server accommodate the gff3
file formatted as above as well?
Hello Augustus folks,
I would like to train Augustus to annotate multiple genomes of an organism
that already is well annotated. To train Augustus, I see that I can use a
gene structure file in gff format, which I have, but the features in column
three are different than what the training tutorial says is allowed.
For example, the training tutorial says that the features may be CDS, 5'-UTR
or 3'-UTR, but my gff3 files contains additional lines and 5' and 3' are
written out as five_prime_UTR and three_prime_UTR. Here is an example:
Code: Select all
supercont1.8 FINAL_CALLGENES_1 gene 39095 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 mRNA 39095 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 start_codon 39095 39097 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 exon 39095 39160 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 39260 39379 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 39937 40014 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 40077 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 CDS 39095 39160 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 39260 39379 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 39937 40014 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 40077 40196 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 stop_codon 40194 40196 . + 0 ID
UTRs to the abbreviations, or will the training server accommodate the gff3
file formatted as above as well?