Training based on an existing profile
Posted: Thu Mar 08, 2018 1:50 pm
Hello,
I would like to train augustus on a new mammalian species. I have created a new profile and trained it on a large, high-confidence set of genes that I have curated, but the results are not much better than using the human profile.
Since the human profile is of high quality, I thought I could just re-train that one with my set of high-confidence genes, instead of working with a new profile from scratch. I have done that just by copying the directory /config/species/human/ and changing the word "human" for the name of my species everywhere inside that folder. I have then used the high-confidence set of genes to optimize parameters and metaparmeters and the results are slightly better, but while I get very high sensitivity levels, the specificity levels are now actually much lower.
My question is: is there a better way to re-train an available profile with a new species's gene set than the "brute" way I used? If not, do you know the reason why I am getting so many false positives and how could I prevent that?
Thank you very much in advance!!
I would like to train augustus on a new mammalian species. I have created a new profile and trained it on a large, high-confidence set of genes that I have curated, but the results are not much better than using the human profile.
Since the human profile is of high quality, I thought I could just re-train that one with my set of high-confidence genes, instead of working with a new profile from scratch. I have done that just by copying the directory /config/species/human/ and changing the word "human" for the name of my species everywhere inside that folder. I have then used the high-confidence set of genes to optimize parameters and metaparmeters and the results are slightly better, but while I get very high sensitivity levels, the specificity levels are now actually much lower.
My question is: is there a better way to re-train an available profile with a new species's gene set than the "brute" way I used? If not, do you know the reason why I am getting so many false positives and how could I prevent that?
Thank you very much in advance!!