confused by the protocol of AUGUSTUS

Discussions about training AUGUSTUS from various sources of evidence. Not discussed here: BRAKER1 and WebAUGUSTUS!

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 26.06.2012 - 04:58
Hi,
I predict the new plant genome recently.And I choose Augustus to do my work.
I chose the parameter of Arabidopsis(--species=arabidopsis) to predicted the plant genome a short time before. But I got sufficient raw reads though mRNA-seq.
Now I want to get the parameter of the new plant genome by myself. When I read the protocol of the augustus,I get some problems of the steps of gene prediction.
Can you tell me the processes step-by-step?
And, how to compile a traning set? What format the training set need? genbank only? But I only have the raw reads from mRNA-seq.How to tranform the fastq format to genbank format?
Dose the "training AUGUSTUS" and the hints file have some relationship between them?
Hope for your help!
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by katharina on 26.06.2012 - 11:11
Hi Lu,
I guess it will work ok for you to simply use the Arabidopsis parameters and integrate RNAseq data of your target species.
From the content of your post, I assume that it will be rather difficult for you to retrain AUGUSTUS, yourself. Generally, you can use the autoAug.pl pipeline at http://bioinf.uni-greifswald.de/augustu ... s/scripts/ - but that really requires prior knowledge about file formats, file contents and some understanding of the code in case you encounter problems.
We do not have a script to convert fastq to genbank format because the genbank file is supposed to contain full-length transcripts with flanking genomic regions, and the fastq reads are only fragments of transcripts, with no flanking genomic regions.
You could assemble the fastq reads into preliminary transcripts and submit those transcripts (fasta format) as "ESTs" into our AUGUSTUS training webserver application at http://bioinf.uni-greifswald.de/webaugu ... ing/create . Maybe this will produce training genes of acceptable quality, and maybe this will improve the parameters for your target genome. However, you will need to check the resulting gene predictions visually, e.g. in a genome browser to come to conclusions about whether you should use the newly trained parameters instead of Arabidopsis. We won't be able to give you support on that. Please also read this post in case you are considering to retrain AUGUSTUS on the basis of RNAseq data and genome sequence: http://bioinf.uni-greifswald.de/bioinf/ ... enomeAlone
The training gene file and the hints file are independent files. Training gene files contain training gene structures (full length transcripts ideally) with flanking genomic regions. The hints file may contain extrinsic evidence from any source, and it rather contains "segments" than full length gene structures. Many segments may contain support for a full length gene, but the segments are kind of independent and may originate from different sources.
Good luck with your genome annotation!
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 27.06.2012 - 06:42
Hi Katharina,
Thank you for your reply!
I read the directions of autoAug.pl.
Whether I should training parameter of my species($PATH/augustus/config/species) before using the parameter --species of program autoAug.pl.
Does the parameter "--useexisting" consistent with the "--species"? That is to say, if I have the meta parameter of my species, can I use the --species=test and --useexisting?
Dose the autuAug.pl generate the hints automatically?
I summarized the process of Augustus. Can you check it for me ?
Firstly,we can use these data (cDNA file,Genome file,Protein file and Gene structure file) to do the parameter training(to generate the folder of $PATH/augustus/config/species/test)
secondly,the cDNA file can be used to generate the hints file.
finally,the command of the augustus is as follows:
augustus --species=test --gff3=on --hintsfile=hints.gff genome.fasta > outfile
Does the autoAug.pl contain the all the steps above?
Best wishes!
Hope for your help!
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 27.06.2012 - 09:31
filter redundancy
Hi Katharina,
I found a website that describe "The gene should be non-redundant".(http://bioinf.uni-greifswald.de/augustu ... .html#meta)
Is there a program that can filter redundancy data?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by katharina on 27.06.2012 - 13:31
Hi Lu,
I don't understand whether "Whether I should training parameter of my species($PATH/augustus/config/species) before using the parameter --species of program autoAug.pl" is a question or a statement.
In general, you need some species parameters to run AUGUSTUS. You can use Arabisopsis for plants, as I said before, or you can retrain AUGUSTUS for your species. But you definitely need to specify a "--species=...".
--useexisting in autoAug.pl: use and change the present config and parameter files if they exist for 'species', that means e.g. if you run autoAug.pl for retraining and you specify an existing species like Arabidopsis, the orginal Arabidopsis parameters will be overwritten! You might not want to do this, except if you already produced your own, preliminary parameter set that you want to improve. --useexisting will also continue an interrupted autoAug.pl run (e.g. if you had to resolve some error and want to continue). Answering your question: "if I have the meta parameter of my species, can I use the --species=test and --useexisting?" - yes, you can. But I am not sure whether that's really what you want to do. The autoAug.pl pipeline is meant to run training and gene prediction in "one run". So if you don't want to retrain, you should run AUGUSTUS, directly, without autoAug.pl
Yes, you can use a cDNA file to assemble training gene structures, e.g. via autoAug.pl using PASA; and yes, you can use the same cDNA file later, to create hints for AUGUSTUS. But in your case, you might not want to use the hints from the cDNA file, but from the raw RNAseq data.
The autoAug.pl pipeline will create hints form the input cDNA file automatically, yes. The autoAug.pl pipeline does not support the creation of hints form RNAseq data, though!
Your augustus command looks good, but you should consider the other command options, e.g. the UTR flag, codingseq, and so on. It depends on whether you have UTR parameters, and on what you want to do with the output.
"Does the autoAug.pl contain the all the steps above?"
If you e.g. call autoAug.pl with a cDNA and a genome file, it will:
a) assemble training genes from the genome sequence and the cDNA file with PASA
b) train AUGUSTUS
c) create ab inito gene predictions
d) create hints from the cDNA file
e) predict genes with cDNA hints
f) if possible, assemble a UTR training set, train UTR parameters, and run predictions with UTR parameters.
Concerning the redundancy, you can e.g. BLAST your genes all vs. all and filter the output. You'll have to define some sequence identity threshold. The current AUGUSTUS release does not contain a script for that purpose because it's a rather general task.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 02.07.2012 - 04:20
Thank you
Hi,Katharina
Thank you very much for your help!
I run the program well so far.
Best regard!
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 02.07.2012 - 04:58
In this website(http://bioinf.uni-greifswald.de/webaugu ... torial.gsp),there is a schematic diagram (figure 1) that display the processes of augustus.The data of this figure are cDNA file, genome file, protein file and gene structure file. I have the cDNA file and genome file now. How can I get the protein file and gene structure file without gene prediction?
Hope for your help!
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by katharina on 02.07.2012 - 10:57
Hi Lu,
the text below the diagram on that website explains that different input file combinations are possible.
Only cDNA and genome file is only one option out of many.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 09.07.2012 - 04:14
Hi Katharina,
Thank you for your reply.But I don't understand this sentence "Only cDNA and genome file is only one option out of many." well.
Could you explain it detailedly for me?
Thanks!
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Lu Fang on 09.07.2012 - 09:16
Hi Katharina,
I try to use the program "autoAug.pl" to run training and predicting.The command as follows:
autoAug.pl --genome=/share_bio/panfs_bio/bioinformatics/fangl/Rhizophoraceae/R_scaffold_masker/hongshu.scafSeq.masked --species=Rhizophoraceae_hs --cdna=/share_bio/panfs_bio/bioinformatics/fangl/Rhizophoraceae/Augustus/dataset/trainingSetCandidates.cds --pasa -v -v
The parameter "--species=Rhizophoraceae_hs" isn't exist in the folder "PATH/augustus/config/species"
The error is "Rhizophoraceae_hs already exists in the mysql".
Is there anything wrong with my command?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: confused by the protocol of AUGUSTUS

Post by katharina »

by Mario on 23.07.2012 - 17:32
PASA that is called by autoAug.pl is using a MySQL database. In this case it probably already exists from a previous, possibly interrupted, run of autoAug.pl. There are two solutions
1) delete that database from your mysql server:
mysql -u root
drop database Rhizophoraceae_hs
2) run autoAug.pl with parameter --useexisting
Actually, there is, of course, a third solution, using a different name, but that does not make much sense. I recommend option 2).
Post Reply