running optimize_AUGUSTUS.pl with multiple cpus on SGE

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

running optimize_AUGUSTUS.pl with multiple cpus on SGE

Post by katharina »

Originally posted in the old forum by Anand Rao on 15.03.2013 - 18:24

Hi Mario, Katharina et al,

I am trying to sun optimize_AUGUSTUS.pl with 64 cpus on our Sun Grid Engine and so far have been unable to use > 1 cpu - even after installation of the Perl module Parallel::ForkManager on our SGE

My shell script for qsub looks like so:

Code: Select all

#!/bin/bash
### Change to the current working directory:
#$ -cwd
### Job name:
#$ -N optmzAUG
#$ -S /bin/bash

module load gcc augustus

perl /share/apps/augustus-2.6/scripts/optimize_augustus.pl --species=Medicago_truncatula --cpus=16 --rounds=3 /home/aksrao/AUGUSTUS/Mtr_training/Mt3.5v5_bothUTRs.gb.train --UTR=on --metapars=/share/apps/augustus-2.6/config/species/Medicago_truncatula/Medicago_truncatula_metapars.utr.cfg --trainOnlyUtr=1
There is NO error message, however qstat reveals the use of just one cpu!
So when we tried to change a few things and used openmpi
module load openmpi
we put mpirun in front of the optimize_augustus.pl Perl script we were running
And then submitted with:

Code: Select all

qsub -pe mpi 64 optimizeAUGUSTUS.sh
It ran with 64 cpus but I killed it as it was creating core files!
core.25865: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV),
SVR4-style, from 'etraining'

How should I modify my shell script or qsub submission syntax /run parameters so that I can benefit from multi-threading, else its way too time-consuming!
Thanks in advance for your help.
Sincerely,
Anand
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Post by katharina »

by katharina on 16.03.2013 - 13:50
We cannot help you with the SGE problem. We usually execute optimize_augustus.pl on machines without a grid.
optimize_augustus.pl uses a Perl module for parallelization (ForkManager). I suppose you might find a solution to your problem if you try to find out how ForkManager can be used on a SGE.
We never ran optimize_augustus.pl with more than 8 CPUs.
Are you aware of the option to specify different gene sets for training and validation? If you use a smaller validation gene set, runtime will reduce.
Typical runtimes for optimize_augustus.pl in eukaryotic genome projects are a couple of days up to two weeks (using a single CPU). The parameters that a subject to optimization are listed in your species parameter.cfg and metaparameter.cfg file. If you pipe the output of optimize_augustus.pl to a file, you can check after a short time how many parameters have been optimized. From that time, you can estimate the total training time for all parameters and all rounds.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Post by katharina »

by Andre on 07.10.2013 - 14:54
Hi,
I have a related question. It seems that my optimize_augustus.pl always runs into a infinite loop. It is running with 8 CPUs. The line "found improvement:" already occured 18 times in the shell output.
my command:

Code: Select all

optimize_augustus.pl --rounds=5 --species=arabidopsisRetr --cpus=20 .nr.good.gb --UTR=on &> out.txt
Augustus 2.7v
How do I count the rounds?
Many thanks
Andre
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: running optimize_AUGUSTUS.pl with multiple cpus on SGE

Post by katharina »

by Katharina on 16.07.2014 - 14:52
You can look at the out.txt file. You will see that parameter adaptation for single parameter is repeated in certain intervals in the file.
Katharina
Post Reply