training augustus

Discussions about WebAUGUSTUS, the Web Service of AUGUSTUS.

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

training augustus

Post by katharina »

Originally posted in the old forum by Anna on 29.10.2012 - 11:13
Could I use hints from RNA-seq, hints from transcriptome contigst and hints from closely related specie to train Augustus?
If not, could I skip pasa assembly step and use assembled transcriptome contigs for training?
Greetings, Anna
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: training augustus

Post by katharina »

by katharina on 29.10.2012 - 12:30
There is principal difference in the formats of hints and training gene structures:
Hints file for AUGUSTUS contain a number of features in column 3 (e.g. CDSpart, intron, exonpart, exon, CDS, ...), and specific tags in the last column (e.g. grp=, mult=, pri=, ...).
Training gene structure files must contain the feature CDS in the third column, and a grouping tag in the last column. An example for valid gene structure file gff format is given at http://bioinf.uni-greifswald.de/webaugu ... #structure
Transcriptome assemblers usually solve the problem of assembling transcripts, but not the problem of finding a long ORF in the transcripts. For generating training gene structures files in gff format that contain the CDS information, you need to find the long ORF, though. If you have solved that problem, and if you have back-mapped the local transcript-ORF-coordinates to genome-scale coordinates with valid intron-exon-boundaries, you can directly feed the assembled and identified gene structures in to AUGUSTUS for training. If not, PASA is one of the softwares that can solve the problem of finding ORFs in the assembled transcripts for you.
Post Reply