new genome using augustus

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

new genome using augustus

Post by katharina »

Originally posted by ipcc in the old forum on 18.07.2013 - 03:24

Hi,
I am new using this software. Now i am trying to use AUGUSTUS to annotate a new genome sequence without training parameters.
The problem is the software ask to train the genome using genome sequence and at least one following files :cDNA file, training gene structure file, and protein file, which is not available. How should i train this genome?
Another thing is if i can use part of the genome sequence instead of whole one since my data is large. Which means may i use the parameters of partial sequence to represent whole one?
Thanks
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by katharina in the old forum on 18.07.2013 - 15:53

It is not possible to train AUGUSTUS without a reliable set of training gene structures.
If you lack experimental data of your target species, you might want to give CEGMA a try: http://bioinf.uni-greifswald.de/bioinf/ ... MATraining
I wouldn't worry about the genome size in your case because you'll have to run the training on your own PC, anyway. The actual files used for training AUGUSTUS are in genbank format with a small flanking region of protein coding gene structures, only. If you get a super high number of possible complete training gene structures (e.g. several thousands), you might want to randomly select e.g. 2000 training genes from the genbank file. Otherwise, work with all the data that you have - if you computational resources allow that.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by ipcc in the old forum on 23.07.2013 - 17:09

Dear Katharina,
How about new_species.pl? I found this in scripts/ folder, is that supposed to be an training script for new species?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by katharina in the old forum on 24.07.2013 - 09:17

It is a script that will create a folder with "untrained" config files for training a new species. If you use autoAug.pl, new_species.pl is automatically invoked. If you want to train a species "manually" (i.e. not using a pipeline), you should call that script directly, before starting actual training (training won't work without existing parameter files).
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by ipcc in the old forum on 26.07.2013 - 23:48

Dear Katharina,
I've got the .gff file from CEGMA. So my command will be

Code: Select all

autoAug.pl --species=NewSpecies --genome=genome.fa --trainingset=augustus.gff 
Is this correct?
I don't have any cDNA or Protein sequence of this species, is that ok?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by katharina in the old forum on 29.07.2013 - 10:18

I think that could work. I haven't tried it, myself, though.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by ipcc in the old forum on 05.08.2013 - 00:48

Hi,
I got the out put like this:
1 ####### Finished step 1 at Fri Aug 2 15:42:31 2013. All files are stored in /diag/home/zjrouc/annotation/scalek58_augustus//autoAug/autoAugTrain #######
1 ####### Step 2: Preparing scripts for AUGUSTUS without hints and UTR #######
2 perl /diag/home/zjrouc/software/augustus/augustus.2.5.5/scripts/autoAugPred.pl -g=/diag/home/zjrouc/annotation/scalek58_augustus/scalek58_trim200.fasta --s
pecies=pleco -w=/diag/home/zjrouc/annotation/scalek58_augustus//autoAug -v -v --useexisting
3 All necessary directories have been created under /diag/home/zjrouc/annotation/scalek58_augustus/autoAug/autoAugPred_abinitio
2 Using existing split genome FASTA files.
2 The shell scripts for cluster have been prepared under /diag/home/zjrouc/annotation/scalek58_augustus/autoAug/autoAugPred_abinitio/shells
1 Predictions already there. Reusing them.
3 cd /diag/home/zjrouc/annotation/scalek58_augustus/autoAug/autoAugPred_abinitio/shells
3 cat aug1.out >> aug.out
3 cat aug2.out >> aug.out
3 cat aug3.out >> aug.out
3 cat aug4.out >> aug.out
3 cat aug5.out >> aug.out
3 cat aug6.out >> aug.out
3 cat aug7.out >> aug.out
3 cat aug8.out >> aug.out
3 cat aug9.out >> aug.out
3 cat aug10.out >> aug.out
3 cat aug11.out >> aug.out
3 cat aug12.out >> aug.out
3 cat aug13.out >> aug.out
3 cat aug14.out >> aug.out
3 cat aug15.out >> aug.out
3 cat aug16.out >> aug.out
3 cat aug17.out >> aug.out
3 cat aug18.out >> aug.out
3 cat aug19.out >> aug.out
3 cat aug20.out >> aug.out
3 cat aug.out | /diag/home/zjrouc/software/augustus/augustus.2.5.5/config/../scripts/join_aug_pred.pl > augustus.gff
3 perl /diag/home/zjrouc/software/augustus/augustus.2.5.5/config/../scripts/getAnnoFasta.pl augustus.gff
3 mv augustus.aa ../predictions
3 mv augustus.gtf ../predictions/
3 mv augustus.gff ../predictions/
3 /diag/home/zjrouc/annotation/scalek58_augustus/autoAug/autoAugPred_abinitio/shells/../predictions
3 cat augustus.gtf | /diag/home/zjrouc/software/augustus/augustus.2.5.5/config/../scripts/augustus2gbrowse.pl | perl -pe 's/AUGUSTUS /AUG-ABINIT /' >
augustus.abinitio.gbrowse
3 mv augustus.abinitio.gbrowse ../gbrowse
2 Done with "autoAugPred.pl"
1 ####### Finished step 2. The scripts are stored in /diag/home/zjrouc/annotation/scalek58_augustus//autoAug/autoAugPred_abinitio/shells #######
When above jobs are finished, continue by running the command
autoAug.pl --species=pleco --genome=/diag/home/zjrouc/annotation/scalek58_augustus/scalek58_trim200.fasta --useexisting --hints=/diag/home/zjrouc/annotation/
scalek58_augustus//autoAug/hints/hints.E.gff -v -v --index=1

actually, i've got augustus.aa, which contained predicted aa. do i still need to run the command in the last line?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by Mario in the old forum on 05.08.2013 - 08:44

The predictions you obtained are ab initio. Go on if you want the ones using evidence (hints).
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: new genome using augustus

Post by katharina »

Originally posted by ipcc in the old forum on 06.08.2013 - 02:53

thank you , Mario.
Post Reply