Parallel Comparative Genome Prediction?

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Parallel Comparative Genome Prediction?

Post by katharina »

Originally posted in the old forum by jzucker@oriongenomics.com on 09.02.2015 - 22:08
Hi there,
I would like to run augustus in comparative gene prediction mode. I have
succeeded in recompiling augustus to use cgp, have generated a MAF and
phylogenetic tree, and hints for all the genomes. When I ran Augustus in non-
cgp mode, I split up the genome and hints into chunks, and ran them in parallel
on a multi-core machine. However, it is not at all clear to me how I should
split up the genomes and hints files for comparative gene prediction mode. I
have 1TB of memory, 80 cores, and the 5 genomes I have aligned are less than
50Mb each.
Please advise.
Sincerely,
Jeremy
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Parallel Comparative Genome Prediction?

Post by katharina »

by jzuckeroriongenomicscom on 11.02.2015 - 19:36
Parallel Comparative Gene Prediction?
In partial answer to my question, it appears that I can split the MAF into chunks based on the reference genome, so if I (possibly redundantly) split nonreference sequences into the same chunks, then perhaps the genes on each chunk can be predicted independently.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Parallel Comparative Genome Prediction?

Post by katharina »

by stefaniekoenigymailcom on 13.02.2015 - 14:27
Parallel Comparative Gene Prediction
Hi Jeremy,
Please be aware that the comparative gene prediction mode is still under development and currently requires help to get it running and to adjust the parameters to a specific data set.
Rather than reading the genomes into memory, the general approach would be to load all genomes and all hints into a mysql or sqlite database. There should be a manual ("README-cgp.txt") in the release on how to do this. As you already suggested, for parallelization, you simply split the maf alignment with respect to the reference genome into overlapping chunks of the size of for example 2.5 Megabases, and then you run Augustus, on each of the chunks individually. The gene predictions at the boundaries of such chunks can be problematic, that's why we run a script afterwards to merge the gene sets from the different runs.
What genomes do you want to annotate?
Best,
Stefanie
sujaikumar
Posts: 1
Joined: Thu May 26, 2016 10:48 am

Re: Parallel Comparative Genome Prediction?

Post by sujaikumar »

Dear Stefanie / Augustus CGP team / Jeremy
currently requires help to get it running and to adjust the parameters to a specific data set
I haven't been able to find any documentation for splitting a very large MAF file (2.7 GB) into separate partitions and running augustus-cgp on each in parallel. We are willing to try and do that but if any of you have any suggestions or scripts that you use already, that would be very helpful.

Thanks in advance,

- Sujai
Post Reply