formating the gff file

Discussions about WebAUGUSTUS, the Web Service of AUGUSTUS.

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

formating the gff file

Post by katharina »

Originally posted in the old forum by Clement on 16.10.2014 - 19:19
Hi all,
I'm using Augustus for gene prediction in a new species. Both the genome and the hint file are home-made.
When running autoAugTrain.pl, i get the following error message :
ERROR: training.gb is empty. Possible reasons:
a) features in a provided training gene structure gff file were not compliant with the autoAug.pl pipeline (for instructions read at e.g.
http://bioinf.uni-greifswald.de/webaugu ... #structure)
The link seems to be invalid now, so I'm hoping to get help in this forum. This is how my gff file looks like :
pilon_round_18_contig_2558 exonerate CDS 345798 345888 . - . ID=gene_1
pilon_round_18_contig_2558 exonerate CDS 344999 345193 . - . ID=gene_1
pilon_round_18_contig_3684 exonerate CDS 684414 685064 . - . ID=gene_2
pilon_round_18_contig_3684 exonerate CDS 683996 684190 . - . ID=gene_2
Do you see any obvious problem with the formating ?
Best,
Clement
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: formating the gff file

Post by katharina »

by katharina on 17.10.2014 - 07:58
You find a description of the accepted format at http://bioinf.uni-greifswald.de/webaugu ... ile_format
Have a look at the last column of your file.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: formating the gff file

Post by katharina »

by Clement on 17.10.2014 - 12:26
Thanks for the link.
I think I've adopted the exact same formating now (see below) and still have the same error message...
Any idea what else could be the problem ?
Vielen dank!
Clement
pilon_round_18_contig_2558 exonerate CDS 345798 345888 1 - . transcript_id "1"
pilon_round_18_contig_2558 exonerate CDS 344999 345193 1 - . transcript_id "1"
684414 685064 1 - . transcript_id "2"
pilon_round_18_contig_3684 exonerate CDS 683996 684190 1 - . transcript_id "2"
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: formating the gff file

Post by katharina »

by Clement on 17.10.2014 - 12:28
the third line of the file is formated just like the three others.
sorry about the bad copy/paste.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: formating the gff file

Post by katharina »

by katharina on 17.10.2014 - 13:01
That's weird. I have submitted a job with your file format (corrected line 3, adapted to fasta names in one of my fasta files, made sure the contigs are long enough):

Code: Select all

NT_039169.8 exonerate CDS 345798 345888 1 - . transcript_id "1" 
NT_039169.8 exonerate CDS 344999 345193 1 - . transcript_id "1" 
NT_039169.8 exonerate CDS 684414 685064 1 - . transcript_id "2" 
NT_039169.8 exonerate CDS 683996 684190 1 - . transcript_id "2"
And it works fine. I mean, I cannot train AUGUSTUS with two genes, and obviously I used a wrong sequence template, but the file format works.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: formating the gff file

Post by katharina »

by Clement on 17.10.2014 - 13:08
Hey
I think I just found the issue. It was really silly of me : after running the script for simplifying the headers of the genome file, the first column of the hint file was not corresponding to the genome file fasta headers anymore ...!
Anyway it seems to be running now. Sorry about that. And Thanks for your help .
Best,
Clement
Post Reply