hint groups never obeyed

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

hint groups never obeyed

Post by katharina »

Originally posted in the old forum by EJ Blom on 10.10.2014 - 14:00

Dear Augustus developers,
I noticed something strange. I am performing a de novo gene prediction analysis on a plant genome. Since another genome was already published in the public domain, I mapped the predicted proteins from that analysis on my own genome using exonerate. Next I created hints from the exonerate out. The resulting file looks like this:

Scaffold0000001 xnt2h CDSpart 3836352 3836419 . + . src=P;grp=prot86786_cds;pri=4
Scaffold0000001 xnt2h intron 3836435 3842304 . + . src=P;grp=prot86786_cds;pri=4

For the config file, I used: extrinsic.MP.cfg
Although I have many 100% mappings (according to exonerate), my hints never get incorporated into my prediction.
First all matching hints get thrown out:

Delete group HintGroup prot64455_cds, 56065-6653123, mult= 1, priority= 4 75 features
Delete group HintGroup prot125425_cds, 57671-6319842, mult= 1, priority= 4 6 features
Delete group HintGroup prot125424_cds, 57671-6319842, mult= 1, priority= 4 6 features

which is reflected by later messages:
incompatible hint groups: 1
P: 1 (prot106446_cds)

and

hint groups fully obeyed: 0

This is how I run Augustus

$AUGUSTUS --species=plant --protein=on --codingseq=on --progress=true --gff3=on --alternatives-from-evidence=false --hintsfile=$ HINTS --alternatives-from-sampling=false --extrinsicCfgFile=$EXCFFILE $GENOME_PART > $PREDICTED_GENES;

Perhaps the option --alternatives-from-evidence=false disables the integration of hints? I've used this option in order to obtain only the longest predicted gene.
Thanks in advance,
Best
EJ Blom
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: hint groups never obeyed

Post by katharina »

by Mario on 16.10.2014 - 15:29

A hint is deleted if it is impossible to obey the hint with a standard gene structure: no in-frame stops, usually GT and AG splice sites and a few other things.

For a group to be deleted it is sufficient that one hint from the group is impossible to obey.
As you get no valid hints groups at all, and assuming you have many of them, I suspect that there is a systematic error, such as coordinates being off by 1 position or the reading frame being wrong.
Can you please check all hints of a single hint group: Are the suggested splice sites standard ones?
If yes, please post "proof", e.g. a section from a Genbank file showing the genome sequence at the hint region so one can manually check everything.
One of your hint groups is very large with 75 features (=gff lines),

Have a look at this section of the extrinsic.cfg file
individual_liability: Only unsatisfiable hints are disregarded. By default this flag is not set
and the whole hint group is disregarded when one hint in it is unsatisfiable.
[SOURCE-PARAMETERS]
T individual_liability

You may want to set this option for your exonerate hints source. Or, if exonerate hints are sometimes not satisfyable, you may want to make the hints fuzzy, using the CDSpart and intronpart versions instead
and a smaller interval.
Mario

--alternatives-from-evidence=false is not a problem, but it still may be better to turn it off during the search of a bug or an error.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: hint groups never obeyed

Post by katharina »

by EJ Blom on 20.10.2014 - 11:17

Hi Mario,
Thanks for the extensive reply. With regard to the suggested splice sites. Should I manually check the protein alignment and verify that there are standard splice sites for the given alignment?
I'll also tweak the extrinsic.cfg file.
In the meantime, I did the following. I first did a de novo prediction for a certain DNA region. Next, I took one predicted CDS and intron and created a hints file with those exact numbers:

Scaffold0000999 xnt2h CDSpart 565167 565345 . + . src=P;grp=prot_102120_cds;pri=2
Scaffold0000999 xnt2h intron 565346 570617 . + . src=P;grp=prot_102120_cds;pri=2

I would expect that this "hint" is incorporated perfectly, since it matches the de novo prediction.
However, I am puzzled by the output:

Evidence for and against this transcript:
% of transcript supported by hints (any source): 9.52
CDS exons: 1/10
P: 1
CDS introns: 1/9
P: 1
5'UTR exons and introns: 0/1
3'UTR exons and introns: 0/1
hint groups fully obeyed: 0
incompatible hint groups: 1
P: 1 (prot_102120_cds)
end gene g1

On the one hand, it looks like hints are incorporated into an exon (1 out of 10) and an intron (1 out of 9).
However, at the end the message says: incompatible hint groups: 1
In addition, no "Delete group Hintgroup..." message in the beginning.
The question is, are the hint groups now integrated or are they not?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: hint groups never obeyed

Post by katharina »

by mario on 21.10.2014 - 15:09
From the information you give I can see that at least one hint group was used. And the predicted gene structure is not obeying at least one hint group.
If the "Delete group..." message is not appearing then they are not deleted (anymore). That does not necessarily mean that they are obeyed however. This is a question of the bonus parameters in the extrinsic.cfg files and, of course, of what gene structures are plausible.
I still think you should look at a minimal example and check manually if and why some hint groups are not satisfiable. If it is some systematic error it may decrease accuracy unnecessarily.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: hint groups never obeyed

Post by katharina »

by EJ Blom on 23.10.2014 - 09:40
Actually what I performed in my previous post is quite a minimal example. Like I said, I took one exon and one intron from the Augustus run without hints and I introduced the intron/exon data using the exact coordinates as extrinsic evidence for a second run.
And it looks indeed as if both data sources (both intron and exon) are incorporated. But why is it then that the message appears:
incompatible hint groups: 1
P: 1 (prot_102120_cds)
I don't understand that last message since I used perfect hints.
Post Reply