augustus training and prediction from scratch

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

augustus training and prediction from scratch

Post by katharina »

Originally posted by Cecilia in the old forum on 23.05.2012 - 21:59

I've got genome.fa, some EST sequences and RNA-Seq data for a
species. How to do training and genome annotaion using augustus pipeline?

My plan is
* De novo assembly of RNA-seq reads and get transcripts.fasta file
* cat transcripts.fa ESTs.fa > evidence.fa
* autoAug.pl --species=NewSpecies --genome=genome.fa --trainingset=evidence.fa --pasa --cdna=evidence.fa --maxIntronLen=1000

Is there anything wrong with my plan? Should I generate hints from RNASeq and ESTs before running autoAug.pl?
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: augustus training and prediction from scratch

Post by katharina »

Originally posted by katharina in the old forum on 23.05.2012 - 23:07

You shouldn't use the trainingset flag if you use cDNA-data. trainingset is used for protein sequences, genbank and gff training gene files, only.

The value of maxIntronLen could be estimated if you inspect spliced alignments of your ESTs and/or RNAseq data to the genome (considering those alignments with standard splice sites, only).

I used a combination of assembled RNAseq and ESTs in an insect species some months ago, also using the autoAug pipeline, so in principle, your plan should work.
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: augustus training and prediction from scratch

Post by katharina »

Originally posted by Cecilia in the old forum on 24.05.2012 - 02:36

Thanks Katharina. So my command will become:

•autoAug.pl --species=NewSpecies --genome=genome.fa --pasa --cdna=evidence.fa

I guess the pasa pipeline will handle the EST/transcript alignments to genome so I'm safe to skip the blat/hints generation step?
User avatar
katharina
Site Admin
Posts: 530
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: augustus training and prediction from scratch

Post by katharina »

Originally posted by katharina in the old forum on 29.05.2012 - 10:16

PASA will produce training gene structures.

Hints are regions in the genome for which you have entrinsic evidence for coding regions. The autoAug pipeline produces such hints form BLAT alignments, not from the PASA assemblies. That means, BLAT and PASA do different things at different points in the pipeline. You should not manually skipt the BLAT step.
Post Reply