Page 1 of 1

How to take data input from databases for training of beta vulgaris genome

Posted: Fri Nov 20, 2015 12:56 pm
by katharina
Originally posted in the old forum by SC_LU on 05.05.2014 - 22:55
Hi,
I want use augustus for Beta vulgaris training. But i am little confused what data and from where to take data for training.
1. Form genome tab, beta vulgaris has 9 chromosome (http://www.ncbi.nlm.nih.gov/genome/?term=Beta+vulgaris). so I should download each chromosome separately and make a single file for beta vulgaris genome.fasta. then submit it for whole genome is it like this?
2. I am slightly sure for EST data of this species. when i am looking for est data for from NCBI repository(http://www.ncbi.nlm.nih.gov/nucest/?term=Beta+vulgaris). it has some bogus hit as well. So, taking est like this is also confusing and not sure all the ESTs belongs to same genome.
3. same kind of confusion for Protein.
Plz explain, how to take data(GENOME, EST and PROTEIN) for beta vulgaris training.

Re: How to take data input from databases for training of beta vulgaris genome

Posted: Fri Nov 20, 2015 12:56 pm
by katharina
by katharina on 07.05.2014 - 11:26
1) Yes, you should concatenate all chromsome files. While doing so, also have a look at the fasta headers, they should be short and unique (have a look at http://bioinf.uni-greifswald.de/webaugu ... on_problem )
2) On that NCBI page, on the right, you find a list "Top Organisms", click on your target species (in your particular case, you might have to do that twice, in two separate download steps, because Beta vulgaris also has a subspecies listed). This should eliminate ESTs that do not belong to your species from the list. Again, pay attention to the fasta headers after download, you need to modify them!
3) The same as in 2) should work for Proteins.