Hello Augustus folks,
I would like to train Augustus to annotate multiple genomes of an organism
that already is well annotated. To train Augustus, I see that I can use a
gene structure file in gff format, which I have, but the features in column
three are different than what the training tutorial says is allowed.
For example, the training tutorial says that the features may be CDS, 5'-UTR
or 3'-UTR, but my gff3 files contains additional lines and 5' and 3' are
written out as five_prime_UTR and three_prime_UTR. Here is an example:
Code: Select all
supercont1.8 FINAL_CALLGENES_1 gene 39095 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 mRNA 39095 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 start_codon 39095 39097 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 exon 39095 39160 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 39260 39379 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 39937 40014 . + . ID
supercont1.8 FINAL_CALLGENES_1 exon 40077 40196 . + . ID
supercont1.8 FINAL_CALLGENES_1 CDS 39095 39160 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 39260 39379 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 39937 40014 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 CDS 40077 40196 . + 0 ID
supercont1.8 FINAL_CALLGENES_1 stop_codon 40194 40196 . + 0 ID
UTRs to the abbreviations, or will the training server accommodate the gff3
file formatted as above as well?