Train Augustus from RNA-Seq and genome alone

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Train Augustus from RNA-Seq and genome alone

Post by katharina »

Originally posted in the old forum by Jerome on 04.05.2012 - 10:55
Is there a way to train directly from RNAseq?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Train Augustus from RNA-Seq and genome alone

Post by katharina »

by Mario on 04.05.2012 - 11:05
Yes, but not yet a completely satisfying one
You can (de novo) assemble your RNA-Seq using any assembly program that works on transcriptome reads.
Then you can input the resulting fasta as cdna.fa in the autoAugTrain.pl pipeline or on the training web server
http://bioinf.uni-greifswald.de/webaugustus
This internally currently uses PASA to align the assembled mRNA fragments to the genome and look for complete ORFs in them. The result it is a set of CDS gene structures.
However, from this we currently only train the coding regions of a gene. A training set of UTRs cannot, in my opinion, reliably determined from assembled RNA-Seq as the exact end of UTRs will likely not be found by the assembler.
We are currently working on a method for creating a training set of UTRs from a prediction of the coding regions and the alignments of the raw RNA-Seq reads to the genome. This may be done in the summer or fall, only.
Mario
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Train Augustus from RNA-Seq and genome alone

Post by katharina »

by katharina on 07.05.2012 - 14:33
I have observed several times that RNAseq data has a degradation bias towards one end of the transcripts. This resulted in assemblies of incomplete transcripts. If incomplete transcripts in which one end is always missing (to a large proportion) are fed into PASA to create training genes, the resulting training gene set is of rather low quality, which results in low-quality parameters.
I therefore recommend that you check your RNAseq data carefully before relying on AUGUSTUS parameters that were trained on a training gene set that was made from assembled RNAseq transcripts.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Train Augustus from RNA-Seq and genome alone

Post by katharina »

by katharina on 29.10.2012 - 11:02
If you use some software to assemble RNAseq data into transcripts, you still need to identify the ORFs within the assembled transcripts!
I have recently seen that users submitt gff files of assembled transcripts as training gene structures files to our augustus training web service. This does not work! There are two options that do work, though:
1) submit the assembled fasta files to our web service as EST/cDNA files
2) identify the ORFs within assembled transcripts and supply a gtf file with (genome scale) coordinates of CDS features from those transcripts to AUGUSTUS.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Train Augustus from RNA-Seq and genome alone

Post by katharina »

by katharina on 20.01.2015 - 15:34
We now recommend BRAKER1 for training AUGUSTUS from RNA-Seq (unassembled): http://bioinf.uni-greifswald.de/bioinf/ ... index.html
Post Reply