Directly to Contents
|AUGUSTUS Training Tutorial|
This website explains step-by-step how to use the AUGUSTUS training web server application to train AUGUSTUS parameters for you individual species of interest. You find a similar tutorial on how to predict genes with pre-trained AUGUSTUS parameters here (click).
Functionalities of the AUGUSTUS training web server application are (with a single run):
Before submitting a training job for your species of interest, please check whether parameters have already been trained and have been made publicly available for your species at our species overview table
1 - Job Submission in general
1.1 - Finding the training submission form
1.2 - Filling in general job data
1.2.1 - E-mail address
184.108.40.206 - What your e-mail address is used for
1.2.2 - Species name
220.127.116.11 - The species name format
18.104.22.168 - Purpose of entering a species name
1.2.3 - Genome file
22.214.171.124 - Genome file format
126.96.36.199 - Genome file upload options
188.8.131.52 - What the genome file is used for
1.3 - Optional obligatory fields
1.3.1 - cDNA file
184.108.40.206 - cDNA file format
220.127.116.11 - cDNA file upload options
18.104.22.168 - What cDNA files are used for
1.3.2 - Protein file
22.214.171.124 - Protein file format
126.96.36.199 - Protein file upload options
188.8.131.52 - What the protein file is used for
1.3.3 - Training gene structure file
184.108.40.206 - Training gene structure file format
220.127.116.11.1 - Training gene structure file in genbank format
18.104.22.168.2 - Training gene structure file in gff format
22.214.171.124 - What training gene structure files are used for
1.4 - Verfification that you are a human
1.5 - The submitt button
1.6 - Example data files
2 - What happens after submission
2.1 - Submission duplication
2.2 - Errors during training
3 - Training Results
3.1 - The AutoAug.log file
3.2 - The AutoAug.err file
3.3 - The parameters.tar.gz archive
3.4 - The training.gb.gz file
3.5 - Gene prediction archives
The pipeline invoked by submitting a job to the AUGUSTUS training web server application is complex. The following image gives an overview of processes that will possibly be invoked - depending on the input file combination:
The sharp-edge fields are input values of the submission form, the round edged fields are background processes/applied software tools/file formats/result files.
The input fields of the AUGUSTUS training web server application form are: E-mail, Species name, Genome file, cDNA file, Protein file and Training gene structure file. The actual processes invoked by job submission depend on the combination of input files.
In the following, you find detailed instructions for submitting an AUGUSTUS training job.
You find the AUGUSTUS training submission form by clicking on the following link in the left side navigation bar:
This section describes fields that should be filled in for every job submission, i.e. fields that are obligatory (except for the email adress, which is optional but strongly recommended).
At first, we recommend that you enter a valid e-mail address:
It is possible to run AUGUSTUS without giving an e-mail adress but here are some reasons why we recommend supplying an e-mail adress:
We use your e-mail address for the following purposes:
We do not use your e-mail address to send you any spam, i.e. about web service updates. We do not share your e-mail addresses with any third parties.
The species name must be a string of at most 30 characters length. It is not allowed to contain spaces.
The species name is the name of the species for whose genome you want to train AUGUSTUS. The species name is an obligatory parameter. Considering that AUGUSTUS training is such a time consuming process, our objective is to know the names of species for which AUGUSTUS was trained in order to make the trained parameters available to the public so that others who are interested in the same species as you do not have to rerun the training process. (You will be asked for permission before your parameter set is shared.)
However, if you do not want to reveal the true species name, you may use any other string shorter than 30 characters as a species name.
The genome file is an obligatory file for training AUGUSTUS.
>Chr.1 CCTCCTCCTGTTTTTCCCTCAATACAACCTCATTGGATTATTCAATTCAC CATCCTGCCCTTGTTCCTTCCATTATACAGCTGTCTTTGCCCTCTCCTTC TCTCGCTGGACTGTTCACCAACTCTCAGCCCGCGATCCCAATTTCCAGAC AACCCATCTTATCAGCTTGGCCACGGCCTCGACCCGAACAGACCGGCGTC CAGCGAGAAGAGCGTCGCCTCGACGCCTCTGCTTGACCGCACCTTGATGC TCAAGACTTATCGCGATGCCAAGAAGCGTCTCATCATGTTCGACTACGA >Chr.2 CGAAACGGGCACCTATACAACGATTGAAACCATTATTCAAGCTCAGCAAG CGTCTATGCTAGCGGTTATTGCGAGCACTTCAGCGGTTGCTACTACGACT ACTACTTGATAAATGAAACGGCTATAAAAGAGGCTGGGGCAAAAGTATGT TAGTTGAAGGGTGACCTGAACGATGAATCGGTCGAATTTTTTATTGGCAG AGGGAAGGTAGGTTTACTCAATTTAGTTACTTCTAGCCGTTGATTGGAGG AGCGCAAGCGACGAGGAGGCTCATCGGCCGCCCGCGGAAAGCGTAGTCT TACACGGAAATCAACGGCGGTGTCATAAGCGAG >Chr.3 .....
The maximal number of scaffolds allowed in a genome fileis 250000. If your file contains more scaffolds, please remove all short scaffolds. For training AUGUSTUS, short scaffolds are worthless because no complete training genes can be generated from them. In terms of prediction, it is possible to predict genes in short scaffolds. However, those genes will in most cases be incomplete and probably unreliable.
Besides plain fasta format, our server accepts gzipped-fasta format for genome file upload. You find more information about gzip at the gzip homepage. Gzipped files have the file ending *.gz.
The AUGUSTUS training web server application offers two possiblities for transferring the genome file to the server: Upload a file and specify a web link to file.
You cannot do both at the same time! You must either select a file on your harddrive or give a web link!
The genome file can be used for two purposes:
This section describes a number of fields from which at least one must be specified for training AUGUSTUS.
This feature is only for personal, academic, and non-profit use as this is required by the BLAT license.
The cDNA file is a multiple fasta DNA file that contains e.g. ESTs or full-length cDNA sequences. Allowed sequence characters: A a T t G g C c H h X x R r Y y W w S s M m K k B b V v D d N n U u. Empty lines are not allowed and will be removed from the submitted file by the webserver application. An example for correct cDNA file format is given at 126.96.36.199 - Genome file format.
It is currently possible to submitt assembled RNA-seq transcripts instead of or mixed with ESTs as a cDNA/EST file. However, you should be aware that the success of creating training genes with assembled RNA-seq data depends very much on the assembly quality, and that RNA-seq files are often much bigger than EST or cDNA files, which increases runtime of a training job. In order to keep runtime of your training job as low as possible, you should remove all assembled RNA-seq transcripts from your file that do not map to the submitted genome sequence. (In principle, this holds true for EST and cDNA files, too, but there, the problem is not as pronounced due to a smaller number of sequences.)
It is currently not allowed to upload RNA-seq raw sequences. (We filter for the average length of cDNA fasta entries and may reject the entire training job in case the sequences are on average too short, i.e. shorter than 400 bp.)
Besides plain fasta format, our server accepts gzipped-fasta format for cDNA file upload. You find more information about gzip at the gzip homepage. Gzipped files have the file ending *.gz. The maximal supported file size is 1 GB.
There are two options for cDNA file upload: upload from your local harddrive, or upload from a public http or ftp server. Please see 188.8.131.52 - Genome file upload options for a more detailed description of upload options.
The cDNA file can be used for two purposes by the AUGUSTUS training web server application:
The protein file is a multiple fasta file that contains protein sequences, e.g. from a closely related species. Allowed sequence characters: A a R r N n D d C c E e Q q G g H h I i L l K k M m F f P p S s T t W w Y y V v B b Z z J j X x. Empty lines are not allowed but will simply be removed from the file by the webserver application.
Correct file format example:
>protein1 maaaafgqlnleepppiwgsrsvdcfekleqigegtygqvymakeiktgeivalkkirmd neregfpitaireikilkklhhenvihlkeivtspgrdrddqgkpdnnkykggiymvfey mdhdltgladrpglrftvpqikcymkqlltglhychvnqvlhrdikgsnllidnegnlkl adfglarsyshdhtgnltnrvitlwyrppelllgatkygp >protein2 neregfpitaireikilkklhhenvihlkeivtspgrdrddqgkpdnnkykggiymvfey mdhdltgladrpglrftvpqikcymkqlltglhychvnqv >protein3 ...
There are two options for protein file upload: upload from your local harddrive, or upload from a public http or ftp server. Please see 184.108.40.206 - Genome file upload options for a more detailed description of upload options.
The protein file is used for generating training gene structures by mapping the protein sequences against the supplied genome sequence. You may e.g. upload a file with protein sequences from a closely related species in order to obtain training genes and AUGUSTUS parameters for the new sspecies whose genome you uploaded.
Training gene structure files can be submitted in two different formats: Genbank format or gff format.
Gene structures in genbank format must contain the coding sequence parts and flanking regions. Flanking regions are important because AUGUSTUS is supposed to differentiate between genes and intergenic regions. The length of flanking regions depends on the length of genes in the target genome. In our pipeline, flanking regions are set to the average gene length (exceptionally applying the extreme limits between 1000 and 10000 nt). It is very important to make sure that the flanking regions do not contain any other protein coding gene parts, i.e. we recommend to trim flanking regions in a way that will exclude other CDS parts.
Correct file format example (condensed view, the three dots represent further lines of sequences):
LOCUS Chr.1_1-159458 159458 bp DNA FEATURES Location/Qualifiers source 1..159458 CDS complement(join(2421..2655,3858..4005,4080..4235,5569..5857 ,10316..10534,155240..155458)) /gene="1474336" BASE COUNT 49195 a 29117 c 28985 g 49950 t 2211 n ORIGIN 1 aaaatacatc acaatacatt taattcactt tccatcatcg agattaacga aaattattta 61 aaatatcgaa gatgaaaata tcctcaagat gatactgaac ggctaagaaa aatacatcac 121 acaactttaa ttcattttcc atcatcgaga ttaacgaaaa gaaaaaattt taactcccta ... 159301 atacgccacc aggtatttcg cctgattgtt cctcgaatat cttctctctc tctatatata 159361 tatatattac ttggcacgat aatcgtcgaa tcgttattta taaattgctt catctatcgc 159421 gatatttttg caacaactct cgcttttctc tttccatt // LOCUS Chr.1_313992-323129 9138 bp DNA FEATURES Location/Qualifiers source 1..9138 CDS join(4001..4048,4989..5138) /gene="194551" BASE COUNT 2829 a 1502 c 1750 g 2948 t 109 n ORIGIN 1 ttttccttct ttcttttttt tttatttaca ttaatgagaa ttttcgcaaa tatttcatcg 61 ctgccatcct tttttttcct cgacgtcaat cacgcgacac atttgttaga gaaatggatt 121 ttaatcttga aaaaagaaaa atacaaatgc caacgcattt caaatccttt cctattatta ... 9001 tcaacgaaac aaataattgc ttcacaaaat atcgcacgta acaacaatat agacttcaat 9061 attcaacaat tcttttcctt tatacacaaa gatacacaaa atataaaagt tttaatactt 9121 caacttcaac gaaacagg //
If you want to train UTRs, you have to additionally incorporate mRNA information in your genbank file.
Correct file format example (including UTR training):
LOCUS scf7180001240730_g20 526 bp DNA FEATURES Location/Qualifiers source 1..526 mRNA 99..125 CDS 99..99 BASE COUNT 164 a 99 c 68 g 195 t ORIGIN 1 gtgacggagc ccaaggacga gcccgtgccc tcagagccca cgtccgacgt gaggcccgcg 61 ccagcgcccc tcccgccgcc cgtcgcagcc actgcttaga ctttactaat ataaacattg 121 aaaatatttt gtgttttatt tccaatcatt gaattataat cctattataa tataactaac 181 attcgtaatt ttacaaaata actatgcaaa ttattttgta ttttcgtttt aaattatact 241 tttcatataa atttctacaa atcttattca agaccataag tatccgctcg ctctacttcg 301 ggcatttcct ttatttatat cttatttgac ttattttgat tatttaggct tatgttttcg 361 atactattga aaacagaaaa taatttcata taattaataa tatattttca attaatatat 421 ttaacaaata tttgtatagt tcaagcggac aaatccgttc ccatagtatt tatataaatt 481 ttaatttaga gtaataacag tttgctgtat tgttgtagtc aaatac // LOCUS scf7180001240751_g30 876 bp DNA FEATURES Location/Qualifiers source 1..876 mRNA complement(401..777) CDS complement(777..777) BASE COUNT 300 a 136 c 116 g 324 t ORIGIN 1 aatgtaggaa aatgaaatat ttatttaaat tgttattatc acttcttcgc tctagtgtct 61 tggcaaagcg cggcgttgag ttcagcctct cacacgcaat gcctccagaa ttcggcgaaa 121 tgtgggggac agagtgtatt aacactaagt tccctcagcc acgactggtg aaattatata 181 ttcagtttgt atactattac tcatgcaaac acttcatcat actttcactc aatcagtaaa 241 gcataatatt ttatttaata ttgtttatca atactatttc cttgttgtta aatattattt 301 tatttattat attaaattaa aatgtcaaaa ttaaaagtag gtgatgattt attactatct 361 tttctatcca agaaaaaaaa gacacactga aacaattgta atttttgtta tgtttttatt 421 acttaatatt attataaaaa tttgtaaata cgaaataaaa tagatagacg taataatatt 481 tatttgttag ttaataataa taatgataat tacgaaagat acaagaaata tgcataaatg 541 agtgttatat tatgtatttt atgagaatat aaatataaaa actgtcattg attatatttt 601 ctaaatactt tcattttatg gcttgctggc ttttcaattt ccttatgttt cagcttttca 661 ctcaatagag cgaaaccttc atcgacatgt aagccaatag aacaattaca aactaacttt 721 attacatcag tcttttcatt tctttaagct tcaggcaaat atcatctaaa tgcctttcaa 781 ctcgctacta acatcgcgtc gttatataaa tcagtgtata cggaattaaa cctgtcatgt 841 ctcttgcaag acgtgtctgc tgttgtcacg cacaca //
Training gene structure in gff format must comply with the fasta entry names of the genome file.
In general, gff format must contain the following columns (The columns are separated by tabulators):
Correct file format example (without UTR):
Chr.1 mySource CDS 1767 1846 1.000 - 0 transcript_id "1597_1" Chr.1 mySource CDS 1666 1709 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 1486 1605 1.000 - 2 transcript_id "1597_1" Chr.1 mySource CDS 1367 1427 1.000 - 2 transcript_id "1597_1" Chr.1 mySource CDS 1266 1319 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 1145 1181 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 847 1047 1.000 - 0 transcript_id "1597_1" Chr.2 mySource CDS 9471 9532 1.000 + 0 transcript_id "1399_2" Chr.2 mySource CDS 9591 9832 1.000 + 1 transcript_id "1399_2" Chr.2 mySource CDS 9885 10307 1.000 + 2 transcript_id "1399_2" Chr.2 mySource CDS 10358 10507 1.000 + 2 transcript_id "1399_2" Chr.2 mySource CDS 10564 10643 1.000 + 2 transcript_id "1399_2"
Correct file format example (with UTR):
Chr.1 mySource 5'-UTR 277153 277220 45 + . transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 277221 277238 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 278100 278213 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 278977 279169 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 279630 279648 0.94 + 2 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 279734 279768 0.94 + 1 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 280307 280344 1 + 2 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource 3'-UTR 280345 280405 78 + . transcript_id "g22472.t1"; gene_id "g22472";
While the cDNA and protein file upload options fulfill the purpose of training gene generation in context with a genome sequence file, the gene structure file upload form provides a flexible interface to users who generated training gene structures externally instead of using our webserver application for this purpose. Supplied training gene structures are directly used for optimizing species-specific AUGUSTUS parameters.
If the training gene structure file is in gff-format, sequence names in the gff file have to match fasta headers in the genome file. If the training gene structure file is in genbank format, no direct dependency between training gene strcuture file and genome file exists, i.e. you could submitt training genes that were generated from a different chromosome than the genome file that you submitt.
Trying to avoid abuse of our web server application through bots, we implemented a captcha. The captcha is an image that contains a string. You have to type the string from the image into the field next to the image.
After filling out the appropriate fields in the submission form, you have to click on the button that says "Start Training" at the bottom of the page. It might take a while until you are redirected to the status page of your job. The reason is that we are checking various file formats prior job acceptance, and that the transfer of files from your local harddrive to our server might take a while. Please be patient and wait until you are redirected to the status page! Do not click the button more than once (it won't do any harm but it also doesn't speed up anything).
In the following, we provide some correctly formatted, compatible example data files:
http://bioinf.uni-greifswald.de/trainaugustus/examples/chr1to3.fa - This file may be used as a Genome file. It contains the first three chromosomes of Mus musculus from GenBank (modified headers).
http://bioinf.uni-greifswald.de/trainaugustus/examples/estsChr1to3.fa - This file may be used as a cDNA file. It contains EST sequences of Mus musculus from GenBank (modified headers).
http://bioinf.uni-greifswald.de/trainaugustus/examples/chr1to6.fa - This file may be used as a Genome file. It contains the first six chromosomes of Mus musculus from GenBank (modified headers).
http://bioinf.uni-greifswald.de/trainaugustus/examples/rattusProteinsChr1to6.fa - This file may be used as a Protein file. It contains proteins of Rattus norvegicus from GenBank (modified headers) that map to chromosomes 1 to 6.
http://bioinf.uni-greifswald.de/trainaugustus/examples/chr1to3.fa - This file may be used as a Genome file. It contains the first three chromosomes of Mus musculus from GenBank (modified headers).
http://bioinf.uni-greifswald.de/trainaugustus/examples/traingenes.gb - This file contains 221 training genes for Mus musculus in GenBank format (including flanking regions).
You can insert some of these sample data sets by pressing the "Fill in Sample Data" button:
After you click the "Start Training" button, the web server application first validates whether the combination of your input fields is generally correct. If you submitted unsupported input combinations, you will be redirected to the training submission form and an error message will be displayed at the top of the page.
If all fields were filled in correctly, the application is actually initiated. You will receive an e-mail that confirms your job submission and that contains a link to the job status page (if you supplied an e-mail adress), and you will be redirected to the job status page.
In the beginning, the status page will display that your job has been submitted. This means, the web server application is currently uploading your files and validating file formats. After a while, the status will change to waiting for execution. This means that all file formats have been confirmed and an actually AUGUSTUS training job has been submitted to our grid engine, but the job is still pending. Depending on waiting queue length, this status may persist for a while. Please contact us in case you job is pending for more than one month. Later, the job status will change to computing. This means the job is currently computing. When the page displays finished, all computations have been finished and a website with your job's results has been generated.
You will be notified by email when your job has finished - if you supplied an email adress.
Since training AUGUSTUS is a very resource consuming process, we try to avoid data duplication. In case you or somebody else tries to submitt exactly the same input file combination more than once, the duplicated job will be stopped and the submitter of the redundant job will receive information where to find the previous job status page.
You should automatically receive an e-mail in case an error occurs during the AUGUSTUS training process. The admin of this server is also notified by e-mail about errors. We will get in touch with you, again, after we figured out what caused the error. (If you did not supply an email adress, we will probably ignore the error because we cannot communicate with you about the input files.)
After job computations have finished, you will receive an e-mail (if you supplied an email adress). The job status web page may at this point in time look similar to this:
This page will should contain the files AutoAug.log and AutoAug.err. If no other files are displayed, you will find the reason for this behaviour in one of those files (e.g. if the creation of a sufficient number of training genes was not possible using the data that you supplied).
If training was successful, at least the files parameters.tar.gz and ab_initio.tar.gz should be displayed.
If no training gene structure file was supplied, training.gb.gz should be linked.
If a cDNA file was submitted, hints_pred.tar.gz should be shown.
To save files on your local system, click the file link with the right mouse button and select "Save Link As..." (or similar).
Unpack *.tar.gz archives locally on your system by typing tar -xzvf *.tar.gz into your shell. (You find more information about the software tar at the GNU tar website.)
To view a real training output example, click here!
This file contains all logging messages of the AUGUSTUS training and prediction processed invoked by the AUGUSTUS training web server application. It is a plain text file, i.e. you should be able to open it with any text editor or even your web browser. In your own interest, you should check the AutoAug.log file for the number of training genes that were generated from your input files (except if you submitted a training gene structure file), for the number UTR training genes, and for gene prediction accuracy.
If your job ran successfully, the log file will contain a line that says 1 training set training.gb.train contains x sequences and y genes.. If the number of x is low, you should probably refrain from further using the produced parameter set. From our experience, you need at least a couple of hundreds of training genes in order to obtain good parameter sets. If UTR training was possible, the log file will contain similar information for the number of UTR training examples. Here, you also want to see a number that is at least a couple of hundreds - otherwise do not use the produced UTR parameters for gene prediction.
The log file also contains a line that says 1 The accuracy after optimizing without CRF-etraining is z. This accuracy value was obtained from a test gene set that was not used for training computed with the following formula:
3*nucleotide_sensitivity + 2*nucleotide_specificity + 4*exon_sensitivity + 3*exon_specificity + 2*gene_sensitivity + 1*gene_specificity)/15
Commonly observed values at this position range from 40 to 60 percent. If you obtain a very low value, this gives a strong indication that the obtained parameter set is not very useful for predicting genes accurately.
This file contains all error messages of the AUGUSTUS training and prediction processed invoked by the AUGUSTUS training web server application. It is a plain text file, i.e. you should be able to open it with any text editor or even your web browser. In your own interest, you should check the AutoAug.err file. If this file is not empty, something did go wrong and you probably shouldn't trust the results blindly - in case any were produced at all.
One frequently occuring error in the AutoAug.err file is the following:
The file with UTR parameters for train****** does not seem to exist. This likely means that the UTR model has not beeen trained yet for train******.
This error message tells you that no UTR parameters were trained for your species. If no other error messages are contained above the first UTR error message, the general results of your job are ok, you simply did not get UTR parameters and thus no predictions with UTR.
Further error messages (that e.g. lead to null results) are explained on our Help page.
This file contains the parameters that were trained for your species of interest based on the submitted input files. The AUGUSTUS parameter archive always contains the following files for your species (the term species will be replaced by the species name that you entered when submitting a training job):
If you want to use the parameters for predicting genes with a local AUGUSTUS installation, place the species parameter folder into your $AUGUSTUS_CONFIG_PATH species folder. Be aware that in general, our web server application does not optimize the CRF parameters. Therefore, do not run AUGUSTUS with CRF-flag with these parameters. UTR parameters have only been optimized for your species if this website contains an archive utr_pred.tar.gz. It does not make sense to locally predict genes with UTR if the UTR parameters were not optimized for your species.
If you want to run further predictions for your species of interest in other genomic sequences or with other hints on our web server application, you do not need to save the parameter archive locally for re-upload. Just remember the ID of your job and specify it on the prediction interface. You find your job's ID in the headline of this web site at "Training results for job ...".
This file contains generated training genes that were used to optimize AUGUSTUS parameters in genbank format. The file is compressed with gzip. To unpack it, type gunzip training.gb.gz in your shell. You find more information about gzip at gzip.org.
The files ab_inito.tar.gz and pred_hints.tar.gz are gene prediction archives. Their content depends on the input file combination.
You find an example for AUGUSTUS prediction gff format at the AUGUSTUS prediction tutorial.
 Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith Jr, R.K., Jr., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town, C.D. et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res, 31, 5654-5666.
 Kent, W.J. (2002) BLAT—The BLAST-Like Alignment Tool. Genome Res, 12, 656-664.
 Keller, O., Odronitz, F., Stanke, M., Kollmar, M., Waack, S. (2008) Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species. BMC Bioinformatics 9, 278.
 Stanke, M., Diekhans, M., Baertsch, R., Haussler, D. (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, doi: 10.1093/bioinformatics/btn013