UTR training error?

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
xvazquezc
Posts: 2
Joined: Tue Oct 03, 2017 11:31 am

UTR training error?

Post by xvazquezc »

Hi,
I have used webAugustus for a while for annotating fungal genomes paired with Maker and usually I get no UTR prediction even when using RNA-seq data.
I have started to recreate the workflow (based on the logs) on our cluster for convenience, and I found that after running the 6th and last step (based on webAugustus log file)
1. the bothutr.lst file is just a bunch of empty lines, the exact number of tr.lst
2. and based on the tr.lst I have >1000 genes with both 5'- and 3'-UTR but they are not passed through to the next steps.

So because the bothutr.lst is empty, this is not passed to gff2gbSmallDNA.pl, GB file is not created...

Not completely sure if this is a bug or if I'm actually missing something in my reasoning, but if it is it would be between lines 544-567 of the autoAugTrain.pl script.

Thank you in advance,
Xabi
xvazquezc
Posts: 2
Joined: Tue Oct 03, 2017 11:31 am

Re: UTR training error?

Post by xvazquezc »

Update: I changed the lines of code in autoAugTrain.pl responsible of searching the genes with both UTR and writing them into bothutr.lst, and it seems to do the work.

Code: Select all

        open(TR, "tr.lst") or die ("Can not open tr.lst!\n");
        open(BOTH, "> bothutr.lst");
        my $g;
        while(<TR>){
            split;
            print BOTH "$_[0]\n" if ($g eq $_[0]);
            $g=$_[0];
        }
        close(TR);
        close(BOTH);
By this,

Code: Select all

        open(TR, "tr.lst") or die ("Can not open tr.lst!\n");
        open(BOTH, "> bothutr.lst");
        my $Fld1;
        my $prev;
        while(<TR>){
            ($Fld1) = split('\t', $_, -1);
            if ($Fld1 eq $prev) {	#???
                print BOTH "$prev\n";
            }
            $prev = $Fld1;
        }
        close(TR);
        close(BOTH);
However, once autoAugTrain.pl gets to run optimizeAugustus.pl, I get this error, which doesn't seem to affect the final output (included few lines before the error):

Code: Select all

3 Found script /srv/scratch/z3382651/anaconda2/envs/myconda3/bin/optimize_augustus.pl.
1 Now optimizing meta parameters of AUGUSTUS for the UTR model. This will likely run for a long time..
1 Running "perl /srv/scratch/z3382651/anaconda2/envs/myconda3/bin/optimize_augustus.pl --cpus=4 --rounds=1 --species=pugra_adv_mpi_v1 --trainOnlyUtr=1 --onlytrain=onlytrain.gb  --metapars=/srv/scratch/z3382651/anaconda2/envs/myconda3/config/species/pugra_adv_mpi_v1/pugra_adv_mpi_v1_metapars.utr.cfg train.gb --UTR=on > optimize.utr.out"...Sampling error in intron model. state=14 base=11070

augustus: ERROR
	Tried to sample from empty list.
PS: the line numbers indicated previously might be a shifted a little bit as I had done some prior modifications to autoAugTrain.pl so it could pass a custom number of cpus to optimizeAugustus.pl during execution
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: UTR training error?

Post by katharina »

Thanks a lot for spotting and fixing this issue! Very nice of you to share! I have modified our codebase accordingly, WebAUGUSTUS is updated. The next source code release will also contain the fix.

The sampling error is unrelated. We are aware of it but it has not been fixed, yet.

Katharina
ThomasYang
Posts: 1
Joined: Thu Oct 26, 2017 4:08 am

Re: UTR training error?

Post by ThomasYang »

Hi all,
I got a same error here.
The bothutr.lst file is just a bunch of empty lines too. (96 lines)
And my tr.lst file contained 96 lines of UTR informations.

The partial logs of my autoAug.pl run are listed below:

================================================
1 ####### Finished step 5. All files are stored in /home/tyang/braker1_babbler/autoAug_2/autoAug/autoAugPred_hints #######

1 ####### Step 6: Training AUGUSTUS with UTR #######

2 perl /opt/augustus-3.3/scripts/autoAugTrain.pl -g=/home/tyang/braker1_babbler/autoAug_2/autoAug/seq/genome_clean.fa -s=babbler_autoAug --utr -e=/home/tyang/braker1_babbler/autoAug_2/autoAug/cdna/cdna.f.psl --aug=/home/tyang/braker1_babbler/autoAug_2/autoAug/autoAugPred_hints/predictions/augustus.gff -w=/home/tyang/braker1_babbler/autoAug_2/autoAug -v -v -v --opt=1 --useexisting


1 ####### Now constructing a training set for Untranslated Regions (UTRs).... ########
3 File check OK.
3 Now extracting all stop and start codons from genemodels.gtf to stops.and.starts.gff
3 File stops.and.starts.gff has been created.
3 Found script /opt/augustus-3.3/config/../scripts/makeUtrTrainingSet.pl.
2 Running command: perl /opt/augustus-3.3/config/../scripts/makeUtrTrainingSet.pl stops.and.starts.gff /home/tyang/braker1_babbler/autoAug_2/autoAug/seq/genome_clean.fa /home/tyang/braker1_babbler/autoAug_2/autoAug/cdna/cdna.f.psl utr ...241 hints were filtered because of gene overlap.
1191 hints would be compatible if the hints with gene-overlap wouldn't be filtered.
Finished!
3 Running "cat utr.gff | perl /opt/augustus-3.3/config/../scripts/utrgff2gbrowse.pl"... Finished! Made file utr.train.gbrowse
3 Moved utr.train.gbrowse to /home/tyang/braker1_babbler/autoAug_2/autoAug/autoAugTrain/gbrowse
3 The subset of genes, where we have both UTRs is bothutr.lst.
3 Found script /opt/augustus-3.3/config/../scripts/gff2gbSmallDNA.pl.
3 Running "perl /opt/augustus-3.3/config/../scripts/gff2gbSmallDNA.pl genes.gtf ../../../seq/genome.fa 4000 bothutr.test.gb --good=bothutr.lst 1>gff2gbSmallDNA.stdout 2>gff2gbSmallDNA.stderr" ...ERROR: Number of UTR training examples is smaller than 50. Abort UTR training. If this is the only error message, the AUGUSTUS parameters for your species were optimized ok, but you are lacking UTR parameters. Do not attempt to predict genes with UTRs for this species using the current parameter set!
Finished!
failed to execute:
================================================

Is there any solution for this issue?

Do I need to rerun autoAug.pl with the modification that xvazquezc proposed?

Thanks!!!

Thomas
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: UTR training error?

Post by katharina »

Yes, we haven't made a new AUGUSTUS release. It is already in function on WebAUGUSTUS.

If you send me an e-mail, I will send you the modified code. I am not sure how long it will take until we make a new AUGUSTUS release.

webaugustus@uni-greifswald.de
Post Reply