Page 1 of 1

AutoAugustus starting PASA at pipline 21

Posted: Fri Nov 20, 2015 1:02 pm
by katharina
Originally posted in the old forum by Vickie on 31.01.2013 - 15:45

Hello!
I have problem usin autoaug.pl with PASA. Exactly, because of some bug in PASA, I should skip the step 20 and start pasa at step 21. But I can't do it, 'cause starting autoaug.pl with changed line 426 ($perlCmdString="perl $PASAHOME/scripts/Launch_PASA_pipeline.pl -s 21 -c alignAssembly.config -C -R -g genome.fasta ") -s 21 means starting pipeline at index 21, the autoaug.pl drops existing database in mysql, and this causes error. How should I change the script, to run PASA at needed pipeline without dropping existing database at mysql?
Best,
Vickie

Re: AutoAugustus starting PASA at pipline 21

Posted: Fri Nov 20, 2015 1:02 pm
by katharina
by mario on 05.02.2013 - 14:11
I can't tell you. I suggest, in this case you run PASA by itself, locally. And then use the set of full length ORF gene structures as GFF training set in a run of autoAugTrain.

Re: AutoAugustus starting PASA at pipline 21

Posted: Fri Nov 20, 2015 1:02 pm
by katharina
by Vickie on 05.02.2013 - 15:06
I already found a solution. Simply commented lines 422-424 where autoAug.pl drops existing database. And as I can see, it works, DB exists, no errors from pasa. So I started pasa from pipeline index 14, because of different servers usage, it ran about 3 h and now is on index 15. I expect it break at pipeline 20, because of pasa bug. And then will start it from index 21. I'll write here, if some error appears, or if I successfully get my results. 

Re: AutoAugustus starting PASA at pipline 21

Posted: Fri Nov 20, 2015 1:02 pm
by katharina
by Vickie on 07.02.2013 - 11:00
Next error
I ran into another error:
grep complete ../pasa/trainingSetCandidates.fasta | perl -pe 's/>(\S+).*/$1\$/' 1> pasa.complete.lst
grep: ../pasa/trainingSetCandidates.fasta: No such file or directory
PASA has not constructed any complete training gene. Training aborted because of insufficient data.
And really there is no trainingSetCandidates.fasta file, and when I looked into the pasa_asmbls_to_training_set.dbi script, I understood that it can't and doesn't generate any fasta file, but it generates .cds, .pep,.gff3, .gff3.inx, .top_500_longest, cds.scoreas and cds.scores.selected files. The .cds, .top_500_longest, .pep files are in fasta format. The .pep file contains peptide sequence, the other ones DNA sequences. So the question is, which one: .cds or .top_500_longest should I rename to .fasta?