Hello
I used the CEGMA output in order to train Augustus.
1. took the gff and converted it to Augustus style (did it with awk since I did not have your script
2. I used your script to extract genbank-format:
./gff2gbSmallDNA.pl cegma_annotation.corrected2.gff genome 1000 cegma_mapped_V2.gb
3. the genbank file I wanted then to use for training (contained 270 entries)
4. after random-split I used 200 for training
Code: Select all
etraining --species=tmp cegma_mapped_V2.gb.train
Code: Select all
One CDS exon begins before the previous CDS exon ends.474 >= 302
GBProcessor::getGeneList(): Intron has negative length.
Encountered error after reading 0 annotations.
etraining: ERROR
No genbank sequences found.
Thanks