question on exonpart hints

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

question on exonpart hints

Post by katharina »

Originally posted in the old forum by Jason on 22.01.2013 - 03:30
Hi again,
Thanks first the quick response as always.
My question is concerning to the hint type "exonpart":
"part of an exon in the biological sense. The bonus applies only
to exons that contain the interval from the hint. Just
overlapping means no bonus at all. The malus applies to every
base of an exon. Therefore the malus for an exon is exponential
in the length of an exon: malus=exonpartmalus^length.
Therefore the malus should be close to 1, e.g. 0.99."
So if I have a manually curated protein from species A and I used scipio to map it against related species B. It turns out that the whole protein (11 exons) is contained in one ORF (no stop codons!) because they have very different exon-intron structure. To my understanding is if I use the 11 exon alignments as ep hints it will increase augustus's chance to predict one gene one exon, and only if I use the 11 exon alignments as exon(or even CDS) hints then it might make augustus to predict one gene multiple exons within one ORF?.
Just want to clarify this - it turns out scipio can be pretty good hints after filtering out the spurious alignments.
Best Wishes,
Jason
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: question on exonpart hints

Post by katharina »

by mario on 05.02.2013 - 11:15
I am not quite sure I understand your setting, but maybe this answers your question. Suppose in your target species B you specify CDSpart (more appropriate than exonpart as you already know they are coding) hints as this:

Code: Select all

  -----CDSpart----               ------CDSpart-------
  a              b               c                  d
Then in Augustus a candidate coding region that goes from a all the way through to d is rewarded the same as a gene structure with two exons that contain the two CDSpart hints. Very often, in particular in species with long introns, there is even no gene structure with the big CDS (stop codons, frame shift), also there may be good splice site patterns just after b and before c.
If you actually know that the CDS goes from a to b then you use CDS hints instead of CDSpart hints.
If you only approximately know the boundaries of the exons but know that the region between b and c is approximately an intron, then you can use an intronpart hint from b+delta to c-delta, where delta is sufficiently large to allow for incorrect alignment ends and small structural differences between the species. In that case you may want to have a look at transMap2hints.pl for comparison, this is made for the case when approximate gene structures are given from a genome-to-genome alignment.
Hope that helps. Peace, Mario
Post Reply