augustus utr training error

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

augustus utr training error

Post by katharina »

Originally posted in the old forum by yunz on 05.04.2013 - 17:19
Hi, everyone,
I tried to train AUGUSTUS for my genome. I am using version 2.7beta to predict my genome using one repeatmasked genome (240 Mb), one cdna data set (~12000 transcripts) and cegma output (446 genes) as traingeneset. I ran the autoAug.pl. The first steps worked well, but for the 6th step "Training AUGUSTUS with UTR ", the command:
perl /home/zhang/software/augustus.2.7/config/../scripts/makeUtrTrainingSet.pl stops.and.starts.gff
/home/zhang/software/augustus.2.7/scripts/autoAug/seq/genome_clean.fa /home/zhang/software/augustus.2.7/scripts/autoAug/cdna/cdna.psl utr
gave the print log: 404 hints were filtered because of gene overlap.17383 hints would be compatible if the hints with gene-overlap wouldn't be filtered. Finished!
But I checked the utr.gb and utr.gff file and found both were empty, so the next steps could not run correctly. It is impossible that no transcript have utr, because I did tblastn augustus predicted proteins against those transcripts and found many transcripts have utrs and these transcripts truly start not with the start codon "ATG, TTG or GTG".
Did someone faced this problem? How to fix it, any suggestion is appreciated .
Best Wishes,
yunz
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: augustus utr training error

Post by katharina »

by katharina on 06.04.2013 - 12:09
The script makeUtrTrainingSet.pl looks for coordinates in the genome that are upstream the start codon (i.e. outside the protein coding region) or downstream the start codon (also outside the protein coding region), and where MOST EST-alignments end. If you used full-length cDNAs, makeUtrTrainingSet.pl is probably not a good script to find the UTR ends, because in that case, you'll only have one (or in rare cases few) cDNA sequences that cover the same gene.
It is possible to add UTRs to predicted genes e.g. using the MAKER pipeline.
Katharina
Post Reply