Directly to Contents
This website contains short instructions and some frequently asked questions concerning
Why do I not get any results?
Why is the server busy?
What is the species name?
Why should I give my e-mail address?
File upload versus web link
Instructions for fasta headers
Which files must or can I submit for training AUGUSTUS?
Which files are required for predicting genes in a new genome?
Training gene structure file
What is the project identifier?
What does my job status mean?
UTR prediction: yes or no?
Allowed gene structure
What about that data duplication?
Why is the prediction accuracy in the genome of my species not as good as I expected?
What about data privacy and security?
Gene prediction results
I am not from academia/not non-profit. What can I do?
Why do I see a running dog after pressing the submission button?
One frequently occuring error in the AutoAug.err file is the following:
The file with UTR parameters for train****** does not seem to exist. This likely means that the UTR model has not beeen trained yet for train******.
This error message tells you that no UTR parameters were trained for your species. If no other error messages are contained above the first UTR error message, the general results of your job are ok, you simply did not get UTR parameters and thus no predictions with UTR.
Illegal division by zero at /usr/local/augustus/trunks/scripts/autoAugTrain.pl line 241.
failed to execute: No such file or directory
This error occurs when not training gene structures were generated/available. This may be caused by one of the following circumstances:
Training AUGUSTUS is a very resource and time consuming process. We use a grid engine queuing system with a limited number of waiting slots. If we estimate that the time from job submission to computation start might be very long, our web server might display a message that our server is buisy. The submission of new jobs is then disabled (prediction and training submission will both be disabled). Please wait one or two weeks before you try a new submission. If the problem persists longer than a month, or if your job is urgent, please contact firstname.lastname@example.org.
The species name is the name of the species for whose genome you want to train AUGUSTUS. The species name is an obligatory parameter. Considering that AUGUSTUS training is such a time consuming process, our objective is to know the names of species for which AUGUSTUS was trained in order to make the trained parameters available to the public so that others who are interested in the same species as you do not have to rerun the training process. (We will only explicitely publish your parameter set with the next AUGUSTUS release after confirming via e-mail that you agree to this.)
However, if you do not want to reveal the true species name, you may use any other string shorter than 30 characters as a species name.
The species name is not allowed to contain spaces!
Unlike many other bioinformatics web services, the AUGUSTUS web server application is not an implementation of a fail-safe procedure. Particularly the assembly of a training gene set from extrinsic data (ESTs and protein sequences) and a genome sequence may not always work perfectly. Our pipeline may issue warnings or errors, and sometimes, we need to get some feedback from you via e-mail in order to figure out what is the problem with your particular input data set.
In addition, training and running AUGUSTUS are rather time consuming processes that may take up to several weeks (depending on the input data). It may be more convenient to receive an e-mail notification about your job having finished, than checking the status page over and over, again.
Therefore, we strongly recommend that you enter an e-mail adress.
If supplied, we use your e-mail address for the following purposes:
We do not use your e-mail address to send you any spam, i.e. about web service updates. We do not share your e-mail address with any third parties.
Job submission without giving an email adress is possible but discouraged.
The AUGUSTUS training and prediction web server application offers in some cases two possiblities for transferring files to the server: Upload a file and specify a web link to file.
You cannot do both at the same time! For each file type (e.g. the genome file), you must either select a file on your harddrive or give a web link!
We observed that most problems with generating training genes for training AUGUSTUS are caused by fasta headers in the sequence files. Some of the tools in our pipeline will truncate fasta headers if they are too long or contain spaces, or contain special characters. This definitely leads to a lot of warning messages in the AutoAug.err file, and it may also lead to non-unique fasta entry names, which will lead to a crash of the pipeline. We therefore strongly recommend that you adhere to the following rules for fasta headers when using our web services:
In the following we give some header examples that will not cause problems:
The following kinds of headers will cause at least warning messages but probably also a pipeline crash:
>contig1 length=1000 Arabidopsis thaliana
>Drosophila melanogaster scaffold 10000
If you have a fasta file with unsuitable headers and you do not know how to modify them automatically, you may use the Perl script simplifyFastaHeaders.pl. After saving it on your local Unix system, first check whether the location of Perl in the first line of the script is correct for your system (#!/usr/bin/perl). If Perl is installed in another location, you need to modify that line! Then, execute the script with the following parameters:
perl simplifyFastaHeaders.pl in.fa nameStem out.fa header.map
Why is the simplification of fasta headers not a built in function of the web service? The reason is that we think you should be able to recognize the predictions later on! Gene predictions will be made available in gff format, which contains the sequence name in the first column. Therefore, you should modify the fasta headers yourself, before submitting data to the web service!
You need to specify
Please consider that training AUGUSTUS is a time and resource consuming process. For optimal results, you should specify as much information as possible for a single training run instead of starting the AUGUSTUS training multiple times with different file combinations! If you have a lot of EST data, we recommend that you submitt ESTs instead of protein sequences since ESTs will likely allow the generation of a UTR training set.
For predicting genes in a new genome with already trained parameters, you need to specify
The genome file is an obligatory file for training AUGUSTUS and for making predictions with pre-trained parameters in a new genome. It must contain the genome sequence in (multiple) fasta format. Every header begins with a >. The sequence must be DNA. Allowed sequence characters: A a T t G g C c H h X x R r Y y W w S s M m K k B b V v D d N n. (Internally, AUGUSTUS will interpret everyting that is not A a T t C c G g as an N!) Empty lines are not allowed. If they occur, they will automatically be removed by the webserver applications.
Headers must be unique within a file! We recommend that you use short fasta headers. Headers like
>gi|382483733|gb|GZ667513.1|GW667513 SSH_BP_47 Some species Wicked root cDNA library Some species cDNA clone SSH_BP_47 similar to Putative NADH-cytochrome B5 reductase, mRNA sequence
>Chr.1 CCTCCTCCTGTTTTTCCCTCAATACAACCTCATTGGATTATTCAATTCAC CATCCTGCCCTTGTTCCTTCCATTATACAGCTGTCTTTGCCCTCTCCTTC TCTCGCTGGACTGTTCACCAACTCTCAGCCCGCGATCCCAATTTCCAGAC AACCCATCTTATCAGCTTGGCCACGGCCTCGACCCGAACAGACCGGCGTC CAGCGAGAAGAGCGTCGCCTCGACGCCTCTGCTTGACCGCACCTTGATGC TCAAGACTTATCGCGATGCCAAGAAGCGTCTCATCATGTTCGACTACGA >Chr.2 CGAAACGGGCACCTATACAACGATTGAAACCATTATTCAAGCTCAGCAAG CGTCTATGCTAGCGGTTATTGCGAGCACTTCAGCGGTTGCTACTACGACT ACTACTTGATAAATGAAACGGCTATAAAAGAGGCTGGGGCAAAAGTATGT TAGTTGAAGGGTGACCTGAACGATGAATCGGTCGAATTTTTTATTGGCAG AGGGAAGGTAGGTTTACTCAATTTAGTTACTTCTAGCCGTTGATTGGAGG AGCGCAAGCGACGAGGAGGCTCATCGGCCGCCCGCGGAAAGCGTAGTCT TACACGGAAATCAACGGCGGTGTCATAAGCGAG >Chr.3 .....
The maximal number of scaffolds allowed in a genome file is 250000. If your file contains more scaffolds, please remove all short scaffolds. For training AUGUSTUS short scaffolds are worthless because no complete training genes can be generated from them. In terms of prediction, it is possible to predict genes in short scaffolds. However, those genes will in most cases be incomplete and probably unreliable.
The cDNA file is a multiple fasta DNA file that contains e.g. ESTs or full-length cDNA sequences. Allowed sequence characters: A a T t G g C c H h X x R r Y y W w S s M m K k B b V v D d N n U u. Empty lines are not allowed and will be removed from the submitted file by the webserver application. See Genome file for a format example. Upload of a cDNA file to our web server application will invoke the software BLAT , which is on our webserver application only available for academic, personal and non-profit use.
The protein file is a multiple fasta file that contains protein sequences as supporting evidence for genes. Allowed sequence characters: A a R r N n D d C c E e Q q G g H h I i L l K k M m F f P p S s T t W w Y y V v B b Z z J j X x. Empty lines are not allowed but will simply be removed from the file by the webserver application.
Correct file format example:
>protein1 maaaafgqlnleepppiwgsrsvdcfekleqigegtygqvymakeiktgeivalkkirmd neregfpitaireikilkklhhenvihlkeivtspgrdrddqgkpdnnkykggiymvfey mdhdltgladrpglrftvpqikcymkqlltglhychvnqvlhrdikgsnllidnegnlkl adfglarsyshdhtgnltnrvitlwyrppelllgatkygp >protein2 neregfpitaireikilkklhhenvihlkeivtspgrdrddqgkpdnnkykggiymvfey mdhdltgladrpglrftvpqikcymkqlltglhychvnqv >protein3 ...
Submitting a protein file to our AUGUSTUS training web server application will invoke Scipio , which uses BLAT . Therefore, protein file upload is only available for academic, personal and non-profit use on our web server application.
You can submit your own, externally created training gene structures to the AUGUSTUS training web server application. Regardless of the format, gene structure files are not allowed to contain java metacharacters like "*" or "?".
Training gene structure files can be submitted in two different formats: Genbank format or gff format.
Gene structures in genbank format must contain the coding sequence parts and flanking regions. Flanking regions are important because AUGUSTUS is supposed to differentiate between genes and intergenic regions. The length of flanking regions depends on the length of genes in the target genome. In our pipeline, flanking regions are set to the average gene length (exceptionally applying the extreme limits between 1000 and 10000 nt). It is very important to make sure that the flanking regions do not contain any other protein coding gene parts, i.e. we recommend to trim flanking regions in a way that will exclude other CDS parts.
It is important for our pipeline that the LOCUS names within a submitted training gene structure file are unique, i.e. you should not use the same LOCUS name more than one time!
Correct file format example (condensed view, the three dots represent further lines of sequences):
LOCUS Chr.1_1-159458 159458 bp DNA FEATURES Location/Qualifiers source 1..159458 CDS complement(join(2421..2655,3858..4005,4080..4235,5569..5857 ,10316..10534,155240..155458)) /gene="1474336" BASE COUNT 49195 a 29117 c 28985 g 49950 t 2211 n ORIGIN 1 aaaatacatc acaatacatt taattcactt tccatcatcg agattaacga aaattattta 61 aaatatcgaa gatgaaaata tcctcaagat gatactgaac ggctaagaaa aatacatcac 121 acaactttaa ttcattttcc atcatcgaga ttaacgaaaa gaaaaaattt taactcccta ... 159301 atacgccacc aggtatttcg cctgattgtt cctcgaatat cttctctctc tctatatata 159361 tatatattac ttggcacgat aatcgtcgaa tcgttattta taaattgctt catctatcgc 159421 gatatttttg caacaactct cgcttttctc tttccatt // LOCUS Chr.1_313992-323129 9138 bp DNA FEATURES Location/Qualifiers source 1..9138 CDS join(4001..4048,4989..5138) /gene="194551" BASE COUNT 2829 a 1502 c 1750 g 2948 t 109 n ORIGIN 1 ttttccttct ttcttttttt tttatttaca ttaatgagaa ttttcgcaaa tatttcatcg 61 ctgccatcct tttttttcct cgacgtcaat cacgcgacac atttgttaga gaaatggatt 121 ttaatcttga aaaaagaaaa atacaaatgc caacgcattt caaatccttt cctattatta ... 9001 tcaacgaaac aaataattgc ttcacaaaat atcgcacgta acaacaatat agacttcaat 9061 attcaacaat tcttttcctt tatacacaaa gatacacaaa atataaaagt tttaatactt 9121 caacttcaac gaaacagg //
If you want to train UTRs, you have to additionally incorporate mRNA information in your genbank file.
Correct file format example (including UTR training):
LOCUS scf7180001240730_g20 526 bp DNA FEATURES Location/Qualifiers source 1..526 mRNA 99..125 CDS 99..99 BASE COUNT 164 a 99 c 68 g 195 t ORIGIN 1 gtgacggagc ccaaggacga gcccgtgccc tcagagccca cgtccgacgt gaggcccgcg 61 ccagcgcccc tcccgccgcc cgtcgcagcc actgcttaga ctttactaat ataaacattg 121 aaaatatttt gtgttttatt tccaatcatt gaattataat cctattataa tataactaac 181 attcgtaatt ttacaaaata actatgcaaa ttattttgta ttttcgtttt aaattatact 241 tttcatataa atttctacaa atcttattca agaccataag tatccgctcg ctctacttcg 301 ggcatttcct ttatttatat cttatttgac ttattttgat tatttaggct tatgttttcg 361 atactattga aaacagaaaa taatttcata taattaataa tatattttca attaatatat 421 ttaacaaata tttgtatagt tcaagcggac aaatccgttc ccatagtatt tatataaatt 481 ttaatttaga gtaataacag tttgctgtat tgttgtagtc aaatac // LOCUS scf7180001240751_g30 876 bp DNA FEATURES Location/Qualifiers source 1..876 mRNA complement(401..777) CDS complement(777..777) BASE COUNT 300 a 136 c 116 g 324 t ORIGIN 1 aatgtaggaa aatgaaatat ttatttaaat tgttattatc acttcttcgc tctagtgtct 61 tggcaaagcg cggcgttgag ttcagcctct cacacgcaat gcctccagaa ttcggcgaaa 121 tgtgggggac agagtgtatt aacactaagt tccctcagcc acgactggtg aaattatata 181 ttcagtttgt atactattac tcatgcaaac acttcatcat actttcactc aatcagtaaa 241 gcataatatt ttatttaata ttgtttatca atactatttc cttgttgtta aatattattt 301 tatttattat attaaattaa aatgtcaaaa ttaaaagtag gtgatgattt attactatct 361 tttctatcca agaaaaaaaa gacacactga aacaattgta atttttgtta tgtttttatt 421 acttaatatt attataaaaa tttgtaaata cgaaataaaa tagatagacg taataatatt 481 tatttgttag ttaataataa taatgataat tacgaaagat acaagaaata tgcataaatg 541 agtgttatat tatgtatttt atgagaatat aaatataaaa actgtcattg attatatttt 601 ctaaatactt tcattttatg gcttgctggc ttttcaattt ccttatgttt cagcttttca 661 ctcaatagag cgaaaccttc atcgacatgt aagccaatag aacaattaca aactaacttt 721 attacatcag tcttttcatt tctttaagct tcaggcaaat atcatctaaa tgcctttcaa 781 ctcgctacta acatcgcgtc gttatataaa tcagtgtata cggaattaaa cctgtcatgt 841 ctcttgcaag acgtgtctgc tgttgtcacg cacaca //
Training gene structure in gff format must comply with the fasta entry names of the genome file.
In general, gff format must contain the following columns (The columns are separated by tabulators):
Correct file format example (without UTR):
Chr.1 mySource CDS 1767 1846 1.000 - 0 transcript_id "1597_1" Chr.1 mySource CDS 1666 1709 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 1486 1605 1.000 - 2 transcript_id "1597_1" Chr.1 mySource CDS 1367 1427 1.000 - 2 transcript_id "1597_1" Chr.1 mySource CDS 1266 1319 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 1145 1181 1.000 - 1 transcript_id "1597_1" Chr.1 mySource CDS 847 1047 1.000 - 0 transcript_id "1597_1" Chr.2 mySource CDS 9471 9532 1.000 + 0 transcript_id "1399_2" Chr.2 mySource CDS 9591 9832 1.000 + 1 transcript_id "1399_2" Chr.2 mySource CDS 9885 10307 1.000 + 2 transcript_id "1399_2" Chr.2 mySource CDS 10358 10507 1.000 + 2 transcript_id "1399_2" Chr.2 mySource CDS 10564 10643 1.000 + 2 transcript_id "1399_2"
Correct file format example (with UTR):
Chr.1 mySource 5'-UTR 277153 277220 45 + . transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 277221 277238 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 278100 278213 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 278977 279169 1 + 0 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 279630 279648 0.94 + 2 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 279734 279768 0.94 + 1 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource CDS 280307 280344 1 + 2 transcript_id "g22472.t1"; gene_id "g22472"; Chr.1 mySource 3'-UTR 280345 280405 78 + . transcript_id "g22472.t1"; gene_id "g22472";
For the gene prediction web server application, it is possible to submit an externally created file that contains extrinsic evidence for gene structures in gff format.
In general, gff format must contain the following columns (The columns are separated by tabulators):
Correct format example:
HS04636 anchor exonpart 500 506 0 - . source=M HS04636 anchor exon 966 1017 0 + 0 source=M HS04636 anchor start 966 968 0 + 0 source=M HS04636 anchor dss 2199 2199 0 + . source=M HS04636 anchor stop 7631 7633 0 + 0 source=M HS04636 anchor intronpart 7631 7633 0 + 0 source=M
A *.tar.gz archive with a folder containing the following files is required for predicting genes in a new genome with pre-trained parameters:
where species is replaced by the name of the species you trained AUGUSTUS for (e.g. carrot would result it carrot/carrot_parameters.cfg). The additional species before the slash means that all those files must reside in a directory that is called species (or in our example: carrot) before you tar and gzip it. If you simply tar and gzip the folder that contains parameters of an AUGUSTUS training run, everything should work fine.
If you trained AUGUSTUS on this webserver, you may instead of uploading a parameter archive, simply specify the project identifier of this training run. You find the project identifier for example in the subject line for your training confirmation e-mail, where it says Your AUGUSTUS training job project_id. Project identitfiers typically consist of the letters pred or train, followed by a random string of 8 digits resulting in for example train345kljD4.
In the beginning, the status page will display that your job has been submitted. This means, the web server application is currently uploading your files and validating file formats. After a while, the status will change to waiting for execution. This means that all file formats have been confirmed and an AUGUSTUS training job has been submitted to our grid engine, but the job is still pending. Depending on waiting queue length, this status may persist for a while. Please contact us in case you job is pending for more than one month. Later, the job status will change to computing. This means the job is currently computing. When the page displays finished, all computations have been finished and a website with your job's results has been generated.
You will receive an e-mail with the link to the results of your job when computations are finished if you specified an email adress.
It takes significantly more time to predict UTRs but in addition to reporting UTRs, it usually is also a little more accurate on the coding regions when ESTs are given as extrinsic evidence.
UTR prediction is only possible if UTR parameter files exist for your species. Even if UTR parameter files exist for a species, you should make sure, that they are species specific, i.e. have actually been optimized for your target species. It is a waste of time to predict UTRs with general (template) parameters.
UTR prediction is only supported in combination with the following two gene structure constraints:
UTR prediction is not possible in combination with the gene structure constraints:
If no UTR parameter files exist for your species but you enables UTR prediction in the form, the web server application will overrule the choice to predict UTRs by simply not predicting any UTRs.
Predict any number of (possibly partial) genes: This option is set by default. AUGUSTUS may predict no gene at all, one or more genes. The genes at the boundaries of the input sequence may be partial. Partial here means that not all of the exons of a gene are contained in the input sequence, but it is assumed that the sequence starts or ends in a non-coding region.
Predict only complete genes: AUGUSTUS assumes that the input sequence does not start or end within a gene. Zero or more complete genes are predicted.
Predict only complete genes - at least one: As the previous option. But AUGUSTUS predicts at least one gene (if possible).
Predict exactly one complete gene: AUGUSTUS assumes that the sequence contains exactly one complete gene. Note: This feature does not work properly in combination with alternative transcripts.
Ignore conflicts with other strand: By default AUGUSTUS assumes that no genes - even on opposite strands - overlap. Indeed, this usually is the case but sometimes an intron contains a gene on the opposite strand. In this case, or when AUGUSTUS makes a false prediction on the one strand because it falsely thinks there is a conflicting gene on the other strand, AUGUSTUS should be run with this option set. It then predicts the genes on each strand separately and independently. This may lead to more false positive predictions, though.
We are trying to avoid data duplication. If you submitted some data that was already submitted before, by you or by somebody else, we will display a link to the previously submitted job.
Gene prediction accuracy of AUGUSTUS in the genome of a certain species depends on the quality of training genes that were used for optimizing species specific parameters. The pipeline behind our AUGUSTUS training web server application offers a fully automated way of generating training genes, but it does not replace manual quality checks on the training genes that are often needed for improving the training gene set quality.
In order to improve accuracy, you could manually inspect the generated training genes and select a trustworthy subset and try retraining AUGUSTUS with this subset. It also helps to compare the training gene set to other sources of evidence that are not supported by our web server application, e.g. RNA-seq data.
The results of your job submission (i.e. in case of the training web server application that means log files, trained parameters, training genes, ab initio gene predictions and gene prediction with hints; or in case of the prediction web server application the augustus prediction archive) are publicly available. The link to your job status contains a long, pseudo-random string (uuid), and one needs to guess the string in order to get access to the results - but this is not particularly secure!
Other users who submit exactly the same input files as have been submitted before, will be redirected to the results page of the previously submitted job. They do not need to guess the link.
Files that you upload to our server, e.g. sequence files, are not directly made available to anyone. However, if you chose to upload a file via http/ftp link, the link to your file is displayed on the job status page.
We are interested in redistributing high quality parameter sets for novel species with the AUGUSTUS release. We will not do so without your explicit permission.
Our server logs e-mail adresses, IP adresses and all job submission details. We store this data for a limited time in order to be able to trace back errors or e.g. contact you about a permission to publish parameter sets. By submitting a job, you agree that we log this data.
Please contact email@example.com if your particular job requires a more secure environment, e.g. as part of a collaboration.
After job computations have finished, you will receive an e-mail (if you supplied an e-mail adress). The job status web page may at this point in time look similar to this:
This page should contain the file augustus.tar.gz. Please make a "right click" on the link and select "Save As" (or similar) to save the file on your local harddrive.
augustus.tar.gz is a gene prediction archive and its content depends on the input file combination. You can unpack the archive by typing tar -xzvf *.tar.gz into your shell. (You find more information about the software tar at the GNU tar website.)
# This output was generated with AUGUSTUS (version 2.6). # AUGUSTUS is a gene prediction tool for eukaryotes written by Mario Stanke (firstname.lastname@example.org) # and Oliver Keller (email@example.com). # Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008), # Using native and syntenically mapped cDNA alignments to improve de novo gene finding # Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013 # reading in the file /var/tmp/augustus/AUG-1855139717/hints.gff ... # Setting 1group1gene for E. # Sources of extrinsic information: M E # Have extrinsic information about 1 sequences (in the specified range). # Initialising the parameters ... # human version. Use default transition matrix. # Looks like /var/tmp/augustus/AUG-1855139717/input.fa is in fasta format. # We have hints for 1 sequence and for 1 of the sequences in the input set. # # ----- prediction on sequence number 1 (length = 6483, name = HSACKI10) ----- # # Delete group HintGroup , 5803-5803, mult= 1, priority= -1 1 features # Forced unstranded hint group to the only possible strand for 3 groups. # Deleted 1 groups because some hint was not satisfiable. # Constraints/Hints: HSACKI10 anchor start 182 184 0 + . src=M HSACKI10 anchor stop 3058 3060 0 + . src=M HSACKI10 anchor dss 4211 4211 0 + . src=M HSACKI10 b2h ep 1701 2075 0 . . grp=154723761;pri=4;src=E HSACKI10 b2h ep 1716 2300 0 + . grp=13907559;pri=4;src=E HSACKI10 b2h ep 1908 2300 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h ep 3592 3593 0 + . grp=13907559;pri=4;src=E HSACKI10 b2h ep 3836 3940 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h ep 5326 5499 0 + . grp=27937842;pri=4;src=E HSACKI10 b2h ep 5805 6157 0 + . grp=27937842;pri=4;src=E HSACKI10 b2h exon 3142 3224 0 + . grp=13907559;pri=4;src=E HSACKI10 b2h exon 3142 3224 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h exon 3592 3748 0 + . grp=154736078;pri=4;src=E HSACKI10 anchor intronpart 5000 5100 0 + . src=M HSACKI10 b2h intron 2301 3141 0 + . grp=13907559;pri=4;src=E HSACKI10 b2h intron 2301 3141 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h intron 3225 3591 0 + . grp=13907559;pri=4;src=E HSACKI10 b2h intron 3225 3591 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h intron 3749 3835 0 + . grp=154736078;pri=4;src=E HSACKI10 b2h intron 5500 5804 0 + . grp=27937842;pri=4;src=E HSACKI10 anchor CDS 6194 6316 0 - 0 src=M HSACKI10 anchor CDSpart 5900 6000 0 + . src=M # Predicted genes for sequence number 1 on both strands # start gene g1 HSACKI10 AUGUSTUS gene 182 3060 0.63 + . g1 HSACKI10 AUGUSTUS transcript 182 3060 0.63 + . g1.t1 HSACKI10 AUGUSTUS start_codon 182 184 . + 0 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS initial 182 225 1 + 0 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS internal 1691 2300 0.86 + 1 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS terminal 3049 3060 0.74 + 0 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS CDS 182 225 1 + 0 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS CDS 1691 2300 0.86 + 1 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS CDS 3049 3060 0.74 + 0 transcript_id "g1.t1"; gene_id "g1"; HSACKI10 AUGUSTUS stop_codon 3058 3060 . + 0 transcript_id "g1.t1"; gene_id "g1"; # coding sequence = [atgatgaaaccctgtctctaccaaaaagacaaaaaattagccagctcaagcaagcactactcttcctcccgcagtggag # gaggaggaggaggaggaggatgtggaggaggaggaggagtgtcatccctaagaatttctagcagcaaaggctcccttggtggaggatttagctcaggg # gggttcagtggtggctcttttagccgtgggagctctggtgggggatgctttgggggctcatcaggtggctatggaggattaggaggttttggtggagg # tagctttcatggaagctatggaagtagcagctttggtgggagttatggaggcagctttggagggggcaatttcggaggtggcagctttggtgggggca # gctttggtggaggcggctttggtggaggcggctttggaggaggctttggtggtggatttggaggagatggtggccttctctctggaaatgaaaaagta # accatgcagaatctgaatgaccgcctggcttcctacttggacaaagttcgggctctggaagaatcaaactatgagctggaaggcaaaatcaaggagtg # gtatgaaaagcatggcaactcacatcagggggagcctcgtgactacagcaaatactacaaaaccatcgatgaccttaaaaatcagagaacaacataa] # protein sequence = [MMKPCLYQKDKKLASSSKHYSSSRSGGGGGGGGCGGGGGVSSLRISSSKGSLGGGFSSGGFSGGSFSRGSSGGGCFGG # SSGGYGGLGGFGGGSFHGSYGSSSFGGSYGGSFGGGNFGGGSFGGGSFGGGGFGGGGFGGGFGGGFGGDGGLLSGNEKVTMQNLNDRLASYLDKVRAL # EESNYELEGKIKEWYEKHGNSHQGEPRDYSKYYKTIDDLKNQRTT] # Evidence for and against this transcript: # % of transcript supported by hints (any source): 20 # CDS exons: 1/3 # E: 1 # CDS introns: 0/2 # 5'UTR exons and introns: 0/0 # 3'UTR exons and introns: 0/0 # hint groups fully obeyed: 0 # incompatible hint groups: 5 # E: 3 (gi|154723761,gi|13907559,gi|154736078) # M: 2 # end gene g1 ###
Click here to view a real AUGUSTUS prediction web service output!
It is important that you check the results of an AUGUSTUS gene prediction run. Do not trust predictions blindly! Prediction accuracy depends on the input sequence quality, on hints quality and on whether a given parameter set fits to the species of the supplied genomic sequence.
Users who are not from academia or a non-profit organisation, and who are not using our web application for personal purposes, only, have the following options:
As Loriot said (freely translated): Life without a dog is possible, but pointless. ... the animation is simply displayed to make the waiting time during job submission more pleasant ;-)