Problem converting trainingSetComplete.gff to trainingSetComplete.gb in autoAug.pl

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Problem converting trainingSetComplete.gff to trainingSetComplete.gb in autoAug.pl

Post by katharina »

Originally posted in the old forum by willj on 19.08.2013 - 09:04
Hi guys,
I am trying to run autoAug.pl and have run into a problem:
2 Running "grep -f pasa.complete.lst ../pasa/trainingSetCandidates.gff3 >trainingSetComplete.temp.gff" ... Finished!
2 Running "cat trainingSetComplete.temp.gff | perl -pe 's/\t\S*(asmbl_\d+).*/\t$1/' | sort -n -k 4 | sort -s -k 9 | sort -s -k 1,1 > trainingSetComplete.gff" ... Finished!
1 Average gene length in the training set is 1.00
2 The length of flanking DNA is set as 1000 accordingly.
3 Found script /home/will/devel/bioinfo/augustus.2.7/config/../scripts/gff2gbSmallDNA.pl.
3 perl /home/will/devel/bioinfo/augustus.2.7/config/../scripts/gff2gbSmallDNA.pl trainingSetComplete.gff /work/will/bin/augustus/autoAug/seq/genome_clean.fa 1000 trainingSetComplete.gb 1>gff2gbSmallDNA.stdout 2>gff2gbSmallDNA.stderr
1 The training set trainingSetComplete.gb contains 0 entries
2 Now trying to find out whether the CDS in the training set contain or exclude the stop codon.
3 cd /home/will/devel/bioinfo/augustus.2.7/config/species/generic
3 cat generic_parameters.cfg | perl -pe 's/(stopCodonExcludedFromCDS )(s+) /true /' > temp_1
3 mv temp_1 generic_parameters.cfg
3 Set value of "stopCodonExcludedFromCDS" in generic_parameters.cfg to "true"
3 cd /work/will/Puccinia_striiformis/bin/augustus/autoAug/trainingSet/training
3 Running "etraining --species=generic trainingSetComplete.gb 1>train.out 2>train.err" ...
failed to execute:
The problem seems to occur when it tries to run gff2gbSmallDNA.pl, because it says "1 The training set trainingSetComplete.gb contains 0 entries".
But the stderr and stdout for this step are empty:
$ cat gff2gbSmallDNA.stderr
$
$ cat gff2gbSmallDNA.stdout
$
So I don't know what is going wrong. Any thoughts?
Thanks very much
Will
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Problem converting trainingSetComplete.gff to trainingSetComplete.gb in autoAug.pl

Post by katharina »

by katharina on 21.08.2013 - 15:37
What PASA version are you using? The current autoAug.pl pipeline is not compatible with the latest version. An average gene length of 1 is weird. That's where I'd start looking for the problem.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Problem converting trainingSetComplete.gff to trainingSetComplete.gb in autoAug.pl

Post by katharina »

by willj on 22.08.2013 - 01:22
I am using the Stable Release PASA2-r20130605.
Which version do you guys use?
Is there some way to modify the code of gff2gbSmallDNA.pl to get it to run with the latest version of PASA?
Is autoAug.pl going to be updated any time soon? I can put this analysis on hold for a month or two if it is going to be soon.
Thanks
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Problem converting trainingSetComplete.gff to trainingSetComplete.gb in autoAug.pl

Post by katharina »

by Katharina on 22.08.2013 - 11:02
We are still using Version: Jan-09-2011
Since we are leaving for a one month vacation, soon, and when we return, teaching period will start immediately, I am not sure how soon the autoAug.pl pipeline will be updated from our side.
This is how the input file for gff2gbSmallDNA.pl should look like (tabulator separated gff-format, example from Scipio, the content of the second column is irrelevant):

Code: Select all

chr2R Scipio CDS 900562 900621 1.000 + 0 transcript_id "392" 
chr2R Scipio CDS 904518 904880 1.000 + 0 transcript_id "392" 
chr2R Scipio CDS 904940 905131 1.000 + 0 transcript_id "392" 
chr2R Scipio CDS 905195 905263 1.000 + 0 transcript_id "392" 
chr2R Scipio CDS 3595076 3596041 1.000 + 0 transcript_id "2517"
Katharina
Post Reply