Training augustus with busco or?

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
Morko
Posts: 1
Joined: Thu Mar 22, 2018 4:35 pm

Training augustus with busco or?

Post by Morko »

Hi,
I am pretty new to augustus and the whole training process. I've somehow got gene predictions of my assembled fasta of cryptosporidium but there are genes with multiple introns and they don't make much sense. I feel like the dataset for the prediction is just wrong.
I ran busco with --fast option without any special changes to the config and got the "/augustus_output/retraining_parameters" folder which I've immediately used as new augustus species so I could run "augustus [parameters] --species=SPECIES queryfilename" where the SPECIES is a name for my newly created folder in augustus/config/species which includes the files generated by busco(/augustus_output/retraining_parameters).
I haven't run any optimization scripts at all.
Is this a bad approach, what's the best way to train augustus and then get meaningful gene predictions while all I have is MIRA output from fastq Illumina readings and the species isnt't listed in augustus manual.
Thanks much.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Training augustus with busco or?

Post by katharina »

I think what BUSCO does is use some existing AUGUSTUS species (custom defined or automatically chosen based on the selected database) to predict genes with PPX and retrain AUGUSTUS on the predictions (and then run another prediction iteration with PPX). The success of such an approach depends on how well the initial AUGUSTUS species fit to the target genome. Say, you take some non-melanogster fruit fly genome, and the initial prameter set is "fly" (from D. melanogaster) -> that's going to work very well. If the initial parameter set is too far from the target, it does not work well. [In BUSCO, don't skip optimize_augustus.pl (the time consuming step, it might help).]

I don't know what was your starting parameter set (as far as I know BUSCO uses different ones based on the selected database, and I dont know what database you used). But have you tried running it with toxoplasma as starting species?

Otherwise, you can also train AUGUSTUS outside of BUSCO using BRAKER (you need access to the unassembled Illumina reads for that).

Or you can try WebAUGUSTUS with the assembled RNA-Seq data as cDNA file.

Katharina
Post Reply