Hello Dear,
I'm trying to train augustus for my organism (new organism that has not been trained before) using augustus auto training, and I have been running it as so:
Code: Select all
$ autoAugTrain.pl --useexisting --species=Euglena_gracilis --trainingset=exonerate_gene_models.gff --genome=genome.fasta >& augustus_training.log &
Code: Select all
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 44 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 63 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 67 annotations.
Followed by these:
ExonModel::processInternalExon: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
ExonModel::processInternalExon: in-frame stop codon
1. Convert the .gff to .gb file as so:
Code: Select all
$ gff2gbSmallDNA.pl exonerate_gene_models.gff genome.fasta 1000 genes.raw.gb
Code: Select all
$ etraining --species=generic --stopCodonExcludedFromCDS=true genes.raw.gb 2> train.err
Code: Select all
$ cat train.err | perl -pe 's/.*in sequence (\S+): .*/$1/' > badgenes.lst
$ filterGenes.pl badgenes.lst genes.raw.gb > genes.gb
Code: Select all
$ grep -c "LOCUS" genes.raw.gb genes.gb
Code: Select all
> genes.raw.gb: 37502
> genes.gb: 1
So, when I open the the train.err file, this is what I see:
Code: Select all
> gene ID=gene_24240 transcr. 1 in sequence scaffold_3_1-4377: coding length not a multiple of 3. Skipping...
> gene ID=gene_11160 transcr. 1 in sequence scaffold_10_1-1066: Single exon gene does not begin with start codon but with cgg
> gene ID=gene_20354 transcr. 1 in sequence scaffold_13_6205-17145: Initial exon does not begin with start codon but with tct
> gene ID=gene_20354 transcr. 1 in sequence scaffold_13_6205-17145: in-frame stop codon
> gene ID=gene_3619 transcr. 1 in sequence scaffold_17_364-1717: Single exon gene does not begin with start codon but with atc
> gene ID=gene_53340 transcr. 1 in sequence scaffold_27_2113-10791: Initial exon does not begin with start codon but with gtt
> gene ID=gene_53340 transcr. 1 in sequence scaffold_27_2113-10791: Terminal exon doesn't end in stop codon. Variable stopCodonExcludedFromCDS set right?
> gene ID=gene_21924 transcr. 1 in sequence scaffold_31_1345-3899: coding length not a multiple of 3. Skipping...
> gene ID=gene_7817 transcr. 1 in sequence scaffold_39_1-458: coding length not a multiple of 3. Skipping...
............................................................... ............................................................... ...............................................................
Please let me know how I can correct this, or perhaps I would appreciate any advise on how I can address this?
Regards,
ThankGod