Error with running autoAug.pl

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Error with running autoAug.pl

Post by katharina »

Originally posted in the old forum by Yanpeng, Li on 15.11.2012 - 09:42
Hi all,
I first use augustus. Some problems occur:
muzi@muzi-laptop[examples] perl autoAug.pl --genome=genome.fa --species=HS04 --cdna=cdna.fa -v -v --pasa
2 checking for installed programs ... ok.
3 mkdir /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/trainingSet
3 mkdir seq
3 mkdir hints
3 mkdir cdna
3 mkdir gbrowse
3 mkdir pasa
3 mkdir training
2 All necessary directories have been created unter /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/trainingSet.
3 cd /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/seq
3 ln -s /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/genome.fa genome.fa
3 Cleaning genome file from DOS whitespaces/linebreaks...
3 perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/config//../scripts/cleanDOSfasta.pl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/genome.fa > /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/seq/genome_clean.fa
1 ####### Step 0: Creating training set with genes using PASA #######
3 cd /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/trainingSet/pasa
3 ln -fs /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/cdna.fa transcripts.fasta
2 Running perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/seqclean/seqclean/seqclean transcripts.fasta 1>seqclean.stdout 2>seqclean.stderr ... Finished!
3 cp /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/pasa_conf/pasa.alignAssembly.Template.txt alignAssembly.config
3 Setting appropriate values in alignAssembly.config
3 rm alignAssembly.config; mv temp alignAssembly.config; chmod a+x alignAssembly.config
3 Adjusted alignAssembly.config
3 ln -fs /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/seq/genome_clean.fa genome.fasta
3 Reading MySQL variables from /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/pasa_conf/
2 Executing the Alignment Assembly: perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/scripts/Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/seq/genome_clean.fa -t transcripts.fasta.clean -T -u transcripts.fasta 1>Launch_PASA_pipeline.stdout 2>Launch_PASA_pipeline.stderr ...2 A test output...
Failed to execute, possible reasons could be:
1. There is already a database named "PASAHS04" in your mysql host.
2. The software "slclust" is not installed correctly, try to install it again (see the details in the PASA documentation).
Inspect /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/examples/autoAug/trainingSet/pasa/Launch_PASA_pipeline.stderr for PASA error messages.
I deleted the database named "PASAHS04" in your mysql host and reinstall slclust. The problems still occurred.
Can anyone help me with this?
Thanks,
Yanpeng,Li
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by katharina on 16.11.2012 - 15:34
Hi,
there is a third reason that causes this error message to be thrown: non-unique fasta headers in either the genome or the EST file. This can also be caused by actually unique fasta headers that are too long and contain special characters and spaces, and that are therefore cleaved off at some point in the header, which makes the resulting cleaved fasta header non-unique.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by Yanpeng Li on 19.11.2012 - 11:58
Hi Katharina,
Fasta headers of the genome file are >HS04636 and >HS08198.I think the headers is no problems because the genome file is in the examples file of augustus.You can find in augustus.2.6.1 file.
When I run augustus with the genome file named Pleos_EstClusters_20081001000000_cluster_consensus.fasta(the fasta headers are scaffold_01, 02 ...12) and cdna named Pleos_EstClusters_20081001000000_cluster_consensus.fasta(the fasta headers are 2_1_CCFC_CCFF_CCFG_CCFH_EXTA, >28_1_CCFC_CCFF_CCFG_CCFH_EXTA ...). The same problems occur:
perl autoAug.pl --genome=PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta --species=Pleurotus ostreatus --cdna=Pleos_EstClusters_20081001000000_cluster_consensus.fasta -v -v --pasa
2 checking for installed programs ... ok.
3 mkdir /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/trainingSet
3 mkdir seq
3 mkdir hints
3 mkdir cdna
3 mkdir gbrowse
3 mkdir pasa
3 mkdir training
2 All necessary directories have been created unter /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/trainingSet.
3 cd /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/seq
3 ln -s /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta genome.fa
3 Cleaning genome file from DOS whitespaces/linebreaks...
3 perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/config//../scripts/cleanDOSfasta.pl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta > /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/seq/genome_clean.fa
1 ####### Step 0: Creating training set with genes using PASA #######
3 cd /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/trainingSet/pasa
3 ln -fs /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/Pleos_EstClusters_20081001000000_cluster_consensus.fasta transcripts.fasta
2 Running perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/seqclean/seqclean/seqclean transcripts.fasta 1>seqclean.stdout 2>seqclean.stderr ... Finished!
3 cp /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/pasa_conf/pasa.alignAssembly.Template.txt alignAssembly.config
3 Setting appropriate values in alignAssembly.config
3 rm alignAssembly.config; mv temp alignAssembly.config; chmod a+x alignAssembly.config
3 Adjusted alignAssembly.config
3 ln -fs /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/seq/genome_clean.fa genome.fasta
3 Reading MySQL variables from /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/pasa_conf/
2 Executing the Alignment Assembly: perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/scripts/Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/seq/genome_clean.fa -t transcripts.fasta.clean -T -u transcripts.fasta 1>Launch_PASA_pipeline.stdout 2>Launch_PASA_pipeline.stderr ...2 A test output...
Failed to execute, possible reasons could be:
1. There is already a database named "PASAPleurotus" in your mysql host.
2. The software "slclust" is not installed correctly, try to install it again (see the details in the PASA documentation).
Inspect /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/trainingSet/pasa/Launch_PASA_pipeline.stderr for PASA error messages.
are there the other reasons?
Thanks,
best
Yanpeng,li
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by katharina on 19.11.2012 - 15:32
It is hard to say which other reasons there might be. Please submit your files to our training web service at http://bioinf.uni-greifswald.de/webaugustus . If it runs smoothly there, something is not ok with your local installation. If an error occurs, I'll try to find out what's the issue with the particular dataset.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by Yanpeng Li on 20.11.2012 - 04:54
Hi Katharina,
I submitted the genome files named PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta and cDNA named Pleos_EstClusters_20081001000000_cluster_consensus.fasta to the web service.The message and results are:
1.Hello!
An error occured while running the AUGUSTUS training job trainaxKmRR93.
Please check the log-files carefully before proceeding to work with the produced results.
You find the results of your job at http://bioinf.uni-greifswald.de/webaugu ... a9c5e400e1.
The administrator of the AUGUSTUS web server has been informed and will get back to you as soon as the problem is solved.
Best regards,
the AUGUSTUS web server team
2. the AutoAug.err is:
Number of UTR training examples is smaller than 50. Abort UTR training.
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
The file with UTR parameters for trainaxKmRR93does not seem to exist. This likely means that the UTR model has not beeen trained yet for trainaxKmRR93.
/usr/local/augustus/trunks/config/../src/augustus: ERROR
UtrModel::readProbabilities: Couldn't open file /usr/local/augustus/trunks/config/species/trainaxKmRR93/trainaxKmRR93_utr_probs.pbl
Job aug1 not properly finished.
Job aug2 not properly finished.
Job aug3 not properly finished.
Job aug4 not properly finished.
Job aug5 not properly finished.
Job aug6 not properly finished.
Job aug7 not properly finished.
Job aug8 not properly finished.
Job aug9 not properly finished.
Job aug10 not properly finished.
Job aug11 not properly finished.
11 augustus job(s) not properly finished. at /usr/local/augustus/trunks/scripts/autoAugPred.pl line 365.
failed to execute
3. The results include some files:
Log-file: AutoAug.log
Error-file: AutoAug.err
Species parameter archive: parameters.tar.gz
Training genes: training.gb.gz
Ab initio predictions: ab_initio.tar.gz
predictions with hints: hints_pred.tar.gz
Do you think augustus run smoothly and completely? Are these results correct?
Thanks,
best
Yanpeng,Li
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by katharina on 20.11.2012 - 15:51
This result tells you that AUGUSTUS has been trained, but that it was not possible to train UTR parameters because there was not a sufficient amount of data for generating a high number of UTR training examples. This happens quite often. Technically, it should work to use the parameter set for predicting genes with AUGUSTUS without UTRs.
On a set of 69 genes that were assembled from your submitted data but that were not used for optimizing the parameters, the accuracy of the obtained parameter set is as follows:
nucleotide level sensitivity: 0.796
nucleotide level specificity: 0.432
exon level sensitivity: 0.643
exon level specificity: 0.338
gene level sensitivity: 0.362
gene level specificity: 0.219
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by Yanpeng Li on 26.11.2012 - 18:10
Hi Katharina,
when I run perl autoAug.pl, Some problems that I first give them to you occurred. I think these problems occurred because of this:
Executing the Alignment Assembly: perl /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/scripts/Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts/autoAug/seq/genome_clean.fa -t transcripts.fasta.clean -T -u transcripts.fasta 1>Launch_PASA_pipeline.stdout 2>Launch_PASA_pipeline.stderr ...2 A test output...
Launch_PASA_pipeline.pl does not run. When run Launch_PASA_pipeline.pl, the file in /autoAug/seq named genome_clean.fa can be created. I think /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus-training/PASA/PASA_r2012-06-25/scripts/Launch_PASA_pipeline.pl does not run correctly because of genome_clean.fa. In the corresponding step of the augustus web training Launch_PASA_pipeline.pl run well.
First, I run pasa like this:
perl Launch_PASA_pipeline.pl -c alignAssembly.config -C -R -g /home/muzi/biology/bioinformatics/soft/gene-prediction/augustus/augustus.2.6.1/scripts1/autoAug/seq/genome_clean.fa -t transcripts.fasta.clean -T -u transcripts.fasta

Launch_PASA_pipeline.pl run smoothly.(because PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta _leos_EstClusters_20081001000000_cluster_consensus.fasta -v -v --pasa are clean, I change their names as genome_clean.fa and transcripts.fasta.clean ).
Second, I Extract the ORFs from PASA assemblies using this command:
perl pasa_asmbls_to_training_set.dbi -M "sample_mydb_pasa23:localhost" -p "root:#muzi1979" -g PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta -C
In this step, the file named trainingSetCandidates.gff3 can be created.
Third, I run augustus to train genes:
perl autoAug.pl --genome=PleosPC15_2_Assembly_scaffolds_repeatmasked.fasta -t trainingSetCandidates.gff3 --species=Pleurotus ostreatus --cdna=Pleos_EstClusters_20081001000000_cluster_consensus.fasta -v --singleCPU --useexisting
Augustus run smoothly. 10635 genes have been predicted in file named augustus.aa in the directory autoAug/autoAug-Pleurotus/predictions.While in the augustus web training 10846 genes have been predicted in file named augustus.aa in the directory ab_initio.
My questions are:
1 Do you think the gene prediction steps correctly?
2 How do I evaluate this result? Is this gene prediction correct?
3 You answer to me like this:
nucleotide level sensitivity: 0.796
nucleotide level specificity: 0.432
exon level sensitivity: 0.643
exon level specificity: 0.338
gene level sensitivity: 0.362
gene level specificity: 0.219
In the directory autoAug/autoAug-Pleurotus/autoAugTrain/training/tmp_opt_Pleurotus there is a file named predictions.txt. The evaluation of gene prediction is:
nucleotide level sensitivity: 0.778
nucleotide level specificity: 0.142
exon level sensitivity: 0.615
exon level specificity: 0.116
gene level sensitivity: 0.243
gene level specificity: 0.075
How is the evaluation of gene prediction compared to yours?
4 How can I edit the nucleotide level sensitivity(sensitivity), exon level sensitivity(specificity) and gene level sensitivity( specificity) that you give to me?
Thanks,
best
Yanpeng, Li
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

Hi Li,
as I said before: it's hard to tell why autoAug.pl does not run smoothly on your system. Our web service actually does execute the latest version of the autoAug.pl release, which should be exactly the same as your are using, so either it's a system dependent problem, or a configuration problem. I still suspect that it's an installation/configuration problem, particularly since you are apparently able to run the single steps, manually.
What you're doing looks correct to me, yes.
You can evaluate gene prediction accuracy of augustus very easily on a genbank file with test genes. (Simply take part of the "training genes", leave them out from training and evaluate on them, later.) The call is
augustus --species=yourSpecies test.gb
At the bottom of the output, you'll find accuracy results.
When looking at the new accuracy values that you posted, it seems that for whatever reasons, the previous parameter set was better because the specificity values were higher on all levels (there is no big change in sensitivity). However, you probably evaluated on a different set of genes, so it's hard to tell.
In your results, the specificity is very low. This is usually the case when neighboring genes were included in the flanking region. The easiest thing to try is setting the flanking region in the training gene structure genbank file shorter. This also answers question 4.
Best,
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by Yanpeng Li on 02.12.2012 - 16:24
Hi Katharina,
I don't understand this sentence "The easiest thing to try is setting the flanking region in the training gene structure genbank file shorter".
Do you mean that I edit some parameters in named Pleurotus_parameters.cfg or in some else file?
Please tell me exhaustive steps!
Thanks,
best
Yanpeng, Li
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by katharina on 03.12.2012 - 10:10
After generating a gff file with training gene structures, the following command is used to generate a training genbank file for augustus:
gff2gbSmallDNA.pl training-genes.gff genome.fa 1000 training-genes.gb
The "1000" is here the length of the flanking region in the genbank file. That's the parameter you need to shorten. (I don't know with what value you previously ran it... it might not have been 1000 exactly).
Best,
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Error with running autoAug.pl

Post by katharina »

by Yanpeng, Li on 04.12.2012 - 04:13
Thank you very much!
Best,
Yanpeng, Li
Post Reply