Reducing average length of gene structure predictions
I am trying to predict using AUGUSTUS, gene structures (both full and partial), in "intergenomic" regions of Arabidopsis thaliana.
Since this is a well annotated species, I dont expect very long predictions, because those must have already been discovered.
I am using two approaches to gene prediction
a. external hints generated by another user and software, and then fed into AUGUSTUS
b. using protein profile i.e. augustus --ppx
In either case, my predicted genes are in the size range 10K - 30K nt long.
In contrast, the size range for "protein genes" according to TAIR (Arabidopsis resource) is typically around 1K-3K long.
So my intergenic gene structure predictions are artifactually long and need to be made shorter.
So which parameters should I play around with so that my predicted gene structures are reduced to biologically meaningful lengths? And which dataset can I use to benchmark my parameter optimization attempts?
I see from other posts that such issues are not new to me, so I am hoping there are existing fixes to my problem. I have related questions to those posts:
1. http://bioinf.uni-greifswald.de/bioinf/ ... einFusions
http://bioinf.uni-greifswald.de/bioinf/ ... rOfIntrons
Can reduction in malus value be done inside extrinsic hints NOT generated from RNA-Seq data?
My external hints happen to be generated using Selenoprofiles2 by an upstream user.
2. http://bioinf.uni-greifswald.de/bioinf/ ... ortIntrons
Like minimum intron size can be fixed, can maximum intron size also be set? Mario's reply seems to indicate that it can be specified in the SPECIES_intron_probs.pbl file, correct?
If max-length=100, and if sum of probabilities of lengths from 1 - 100 is not zero, will it lead to any strange predictions? Even though Mario says AUGUSTUS will not complain
3. http://bioinf.uni-greifswald.de/bioinf/ ... VsPpxModes
Not sure if using external hints Vs using protein profile based searching will make a difference, it does not seem to in my case
4. http://bioinf.uni-greifswald.de/bioinf/ ... rinsticCfg
Is there a link for how and which lines to alter in extrinsic.cfg for hints etc not generated using RNA-Seq work?