Augustus 3.0 etraining input format error

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Augustus 3.0 etraining input format error

Post by katharina »

Originally posted in the old forum by Malte Petersen on 13.01.2014 - 16:57
I am using the new Augustus version 3.0. It compiled fine and runs well, but if I want to retrain it, etraining complains about the input file not being in genbank format:

Code: Select all

./bin/etraining --species=leptopilina /tmp/sequence.gb

./bin/etraining: ERROR
        Input file not in genbank format.
However, this is a sequence file downloaded from Genbank [1], I doubt that it is misformatted. Could this be a bug in Augustus? Or am I doing anything wrong?
Thanks for your help!
[1] http://www.ncbi.nlm.nih.gov/nucleotide? ... ds=1293613
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus 3.0 etraining input format error

Post by katharina »

by Mario on 21.01.2014 - 10:51
I am onto this. This problem is likely to be an effect of the new possibility to input gzipped input files with 3.0 (both genbank and fasta). Another user has reported this with augustus itself. However, we could not reproduce the error on our machines. It runs fine here if it compiles.
Malte, can you please check whether you get the problem also with
augustus --species=human examples/HS08198.fa
? Have you installed the zlib and boost iostreams libaries on your system? They are now required with 3.0.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus 3.0 etraining input format error

Post by katharina »

by Malte Petersen on 21.01.2014 - 11:08
Thanks for your reply. I am not getting any error when using

Code: Select all

--species=human examples/HS08198.fa
zlib and boost-iostreams are both installed (otherwise I could not have compiled the package).
I should note that the leptopilina species is a custom config that was generated using the new_species.pl script. That went without any problems, and I can use it on examples/HS08198.fa as well.
I also tried to use the example genbank file examples/hsackI10.gb, with the same error.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Augustus 3.0 etraining input format error

Post by katharina »

by Mario on 21.01.2014 - 12:15
I found the problem. It was with the function that determines the file type. Now all these combinations work:

Code: Select all

cd augustus/trunks/examples
gzip -c hsackI10.gb > hsackI10.gb.gz
gzip -c hsackI10.fa > hsackI10.fa.gz
etraining --species=generic hsackI10.gb
etraining --species=generic hsackI10.gb.gz
augustus --species=human hsackI10.gb
augustus --species=human hsackI10.gb.gz
augustus --species=human hsackI10.fa
augustus --species=human hsackI10.fa.gz
The updated version with that fix in online:
http://bioinf.uni-greifswald.de/augustu ... 0.1.tar.gz
Post Reply