improve existing AUGUSTUS parameters using RNA-Seq

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

improve existing AUGUSTUS parameters using RNA-Seq

Post by katharina »

The current, new answer to this topic is: Use BRAKER1!

by Martin on 21.03.2012 - 14:23
I am using Augustus for gene predictions on Botrytis cinerea. I am wondering what training set was used to predict gene models using the species=botrytis_cinerea settings? I am involved in a Botrytis cinerea resequencing and annotation project for which lots of new data (RNAseq, illumina) has been generated. I hope to improve the current setting by retraining Augustus. Would it be possible to send me a copy of the (genbank) file that was used to train Augustus? It will help me in selecting new/different gene models that I can add to the current training set.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: improve existing AUGUSTUS parameters using RNA-Seq

Post by katharina »

by Mario on 21.03.2012 - 14:31
those training sets came from Jason Stajich and I believe there were based on the CEGMA pipeline or some other protein homology method. Unfortunately, I can't find them anymore -- it is quite a long time ago.
As you now have RNA-Seq, you can probably get better results by starting from scratch.
You could de novo assemble the RNA-Seq (e.g. soap, abyss) and then use the resulting cDNA fasta file as if they were (longer) ESTs on this web server
http://bioinf.uni-greifswald.de/webaugustus
to retrain Augustus. You could then compare both the original and new version side-by-side on a sample of genes to see whether it improves. Alternatively, you can use above training web server, too, when you have trusted protein sequences of Botrytis cinerea for training.
Please let us know if your newly trained parameters are better than the old ones and you want us to include them in the Augustus distribution.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: improve existing AUGUSTUS parameters using RNA-Seq

Post by katharina »

by katharina on 22.03.2012 - 16:19
In case the RNA-Seq assembly does not produce contigs of sufficient length, you could also use the following approach:
1) map the un-assembled RNA-Seq data against the existing transcript set
2) filter for transcripts that are fully covered by RNA-Seq data without gaps and insertions
As I see it, this approach has two issues to be considered:
a) you will not see if your such selected transcripts are too short
b) you select for highly expressed genes, so there is a certain bias.
However, I have used this approach in several genome projects (not based on a previously released gene set but on the basis of a "preliminary gene set" that I created from other sources of evidence).
Post Reply