Page 1 of 1

running optimize_AUGUSTUS.pl with multiple cpus on SGE

Posted: Fri Nov 20, 2015 12:25 pm
by katharina
Originally posted in the old forum by Anand Rao on 15.03.2013 - 18:24

Hi Mario, Katharina et al,

I am trying to sun optimize_AUGUSTUS.pl with 64 cpus on our Sun Grid Engine and so far have been unable to use > 1 cpu - even after installation of the Perl module Parallel::ForkManager on our SGE

My shell script for qsub looks like so:

Code: Select all

#!/bin/bash
### Change to the current working directory:
#$ -cwd
### Job name:
#$ -N optmzAUG
#$ -S /bin/bash

module load gcc augustus

perl /share/apps/augustus-2.6/scripts/optimize_augustus.pl --species=Medicago_truncatula --cpus=16 --rounds=3 /home/aksrao/AUGUSTUS/Mtr_training/Mt3.5v5_bothUTRs.gb.train --UTR=on --metapars=/share/apps/augustus-2.6/config/species/Medicago_truncatula/Medicago_truncatula_metapars.utr.cfg --trainOnlyUtr=1
There is NO error message, however qstat reveals the use of just one cpu!
So when we tried to change a few things and used openmpi
module load openmpi
we put mpirun in front of the optimize_augustus.pl Perl script we were running
And then submitted with:

Code: Select all

qsub -pe mpi 64 optimizeAUGUSTUS.sh
It ran with 64 cpus but I killed it as it was creating core files!
core.25865: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV),
SVR4-style, from 'etraining'

How should I modify my shell script or qsub submission syntax /run parameters so that I can benefit from multi-threading, else its way too time-consuming!
Thanks in advance for your help.
Sincerely,
Anand

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Posted: Fri Nov 20, 2015 12:25 pm
by katharina
by katharina on 16.03.2013 - 13:50
We cannot help you with the SGE problem. We usually execute optimize_augustus.pl on machines without a grid.
optimize_augustus.pl uses a Perl module for parallelization (ForkManager). I suppose you might find a solution to your problem if you try to find out how ForkManager can be used on a SGE.
We never ran optimize_augustus.pl with more than 8 CPUs.
Are you aware of the option to specify different gene sets for training and validation? If you use a smaller validation gene set, runtime will reduce.
Typical runtimes for optimize_augustus.pl in eukaryotic genome projects are a couple of days up to two weeks (using a single CPU). The parameters that a subject to optimization are listed in your species parameter.cfg and metaparameter.cfg file. If you pipe the output of optimize_augustus.pl to a file, you can check after a short time how many parameters have been optimized. From that time, you can estimate the total training time for all parameters and all rounds.
Katharina

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Posted: Fri Nov 20, 2015 12:25 pm
by katharina
by Andre on 07.10.2013 - 14:54
Hi,
I have a related question. It seems that my optimize_augustus.pl always runs into a infinite loop. It is running with 8 CPUs. The line "found improvement:" already occured 18 times in the shell output.
my command:

Code: Select all

optimize_augustus.pl --rounds=5 --species=arabidopsisRetr --cpus=20 .nr.good.gb --UTR=on &> out.txt
Augustus 2.7v
How do I count the rounds?
Many thanks
Andre

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Posted: Fri Nov 20, 2015 12:25 pm
by katharina
by Katharina on 16.07.2014 - 14:52
You can look at the out.txt file. You will see that parameter adaptation for single parameter is repeated in certain intervals in the file.
Katharina