Page 1 of 1

training augustus

Posted: Fri Nov 20, 2015 1:23 pm
by katharina
Originally posted in the old forum by Anna on 29.10.2012 - 11:13
Could I use hints from RNA-seq, hints from transcriptome contigst and hints from closely related specie to train Augustus?
If not, could I skip pasa assembly step and use assembled transcriptome contigs for training?
Greetings, Anna

Re: training augustus

Posted: Fri Nov 20, 2015 1:23 pm
by katharina
by katharina on 29.10.2012 - 12:30
There is principal difference in the formats of hints and training gene structures:
Hints file for AUGUSTUS contain a number of features in column 3 (e.g. CDSpart, intron, exonpart, exon, CDS, ...), and specific tags in the last column (e.g. grp=, mult=, pri=, ...).
Training gene structure files must contain the feature CDS in the third column, and a grouping tag in the last column. An example for valid gene structure file gff format is given at http://bioinf.uni-greifswald.de/webaugu ... #structure
Transcriptome assemblers usually solve the problem of assembling transcripts, but not the problem of finding a long ORF in the transcripts. For generating training gene structures files in gff format that contain the CDS information, you need to find the long ORF, though. If you have solved that problem, and if you have back-mapped the local transcript-ORF-coordinates to genome-scale coordinates with valid intron-exon-boundaries, you can directly feed the assembled and identified gene structures in to AUGUSTUS for training. If not, PASA is one of the softwares that can solve the problem of finding ORFs in the assembled transcripts for you.