Originally posted in the old forum by Sammy on 01.07.2013 - 22:08
How can we estimate the accuracy of gene prediction if we are using species A parameters for prediction of species B?
Estimation of prediction accuracy
Moderator: bioinf
Re: Estimation of prediction accuracy
by katharina on 09.07.2013 - 15:23
First of all, in order to measure gene prediction accuracy in your target species, you need a reliable set of genes of your target species. You can construct such a gene set e.g. by using ESTs2genome or protein2genome alignments.
Then, you predicted genes in the reliable gene set using the parameter set that you like. Count how many nucleotides/exons/genes were predicted correctly as coding (TP), how many were wrongly predicted as coding (FP) and how many were wrongly not predicted as coding, i.e. overseen (FN).
Katharina
First of all, in order to measure gene prediction accuracy in your target species, you need a reliable set of genes of your target species. You can construct such a gene set e.g. by using ESTs2genome or protein2genome alignments.
Then, you predicted genes in the reliable gene set using the parameter set that you like. Count how many nucleotides/exons/genes were predicted correctly as coding (TP), how many were wrongly predicted as coding (FP) and how many were wrongly not predicted as coding, i.e. overseen (FN).
Katharina
Re: Estimation of prediction accuracy
by Sammy on 10.07.2013 - 01:34
Thanks, are you telling about http://emboss.bioinformatics.nl/cgi-bin ... est2genome program?
Thanks, are you telling about http://emboss.bioinformatics.nl/cgi-bin ... est2genome program?
Re: Estimation of prediction accuracy
by katharina on 10.07.2013 - 14:11
I have no experience with the EMBOSS est2genome programe. We usually use PASA. But est2genome might work as well. The main point is that you need to obtain full-length gene structures in gtf format - using whatever tool.
Katharina
I have no experience with the EMBOSS est2genome programe. We usually use PASA. But est2genome might work as well. The main point is that you need to obtain full-length gene structures in gtf format - using whatever tool.
Katharina