runtime of optimize_augustus.pl

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

runtime of optimize_augustus.pl

Post by katharina »

Originally posted in the old forum by NN on 10.09.2015 - 13:23
Is there a way to know how much time optimize_augustus.pl can further take?
What is the major determinant of the duration?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: runtime of optimize_augustus.pl

Post by katharina »

by mario on 10.09.2015 - 13:24
Most of the time when users report it takes too many days, it is because it uses more genes than necessary for the evaluation. By default it uses all genes in the input genbank file to evaluate intermediate parameter combinations. Two hundred genes or so are usually enough. I randomly choose such a subset and put it in one file, say validate.gb, and still use the rest, say rest.gb, for training, which is quick.
To monitor progress, I start so that it writes the output to a file, e.g.

Code: Select all

optimize_augustus.pl --species=rice validate.gb --onlytrain=rest.gb --cpus=4 > optimize.out
While it is running I can check the progress with
grep "improving" optimize.out

Code: Select all

improving parameter /Constant/dss_end curently set to 4 
improving parameter /Constant/dss_start curently set to 3 
improving parameter /Constant/ass_start curently set to 2 
improving parameter /Constant/ass_end curently set to 2
It cycles by default at most 5 times through the list of parameters in the config file. It may finish earlier when there are no more changes in a round.
They running time is linear in the cumulative sequence length of the file validate.gb in above example and by the number of threads, here specified as 4.
The additional time required by a large rest.gb files is marginal as these genes are not used for the time-consuming predictions but rather for straightforward parameter estimation.
Some metaparameters like /IntronModel/d also influence runtime of augustus, but I would not worry about that.
Post Reply