Message "No source specified" when running augustus

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Message "No source specified" when running augustus

Post by katharina »

Originally posted in the old forum by Su on 28.08.2015 - 19:14
Hi,
I prepared the exon hint with cufflinks using my RNAseq data(convert cufflinks output .gtf file to .gff file with gtf2gff.pl in the scripts directory.).
Then I merged this exon hint with intron hint and repeat hint and ran augustus with this command:

Code: Select all

augustus --species=Myspecies --exonnames=on --codingseq=on --protein=on --extrinsicCfgFile=extrinsic.M.RM.E.W.cfg --alternatives-from-evidence=true --hintsfile=merged_hint.gff --allow_hinted_splicesites=atac genome.fa > output
In short time, I got this message:

Code: Select all

No source specified (e.g. by source=M in the last column)
Error in hint line: scaffoldXX_covXXX Cufflinks exon ddddddd ddddddd . . . gene_id "XXXX_dddddd"; transcript_id "XXXX_dddddd"; exon_number "d"; oId "CUFF.ddddd.d"; tss_id "TSSddddd";
…
Could not read strand.
Maybe you used spaces instead of tabulators?
I think this message means exon hint was not used for gene prediction.
What's wrong?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Message "No source specified" when running augustus

Post by katharina »

by Katharina on 31.08.2015 - 08:58
Can you please post an excerpt of your hints file?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Message "No source specified" when running augustus

Post by katharina »

by Su on 31.08.2015 - 13:10
This is excerpt of exon hint file:

Code: Select all

scaffold10015_cov179	Cufflinks	exon	17	49	.	+	.	gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.1.1"; tss_id "TSS1"; 
scaffold10015_cov179	Cufflinks	exon	982	1400	.	+	.	gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; oId "CUFF.1.1"; tss_id "TSS1"; 
scaffold10015_cov179	Cufflinks	exon	775	1400	.	+	.	gene_id "XLOC_000001"; transcript_id "TCONS_00000002"; exon_number "1"; oId "CUFF.2.1"; tss_id "TSS2"; 
scaffold10017_cov124	Cufflinks	exon	117	258	.	-	.	gene_id "XLOC_000002"; transcript_id "TCONS_00000003"; exon_number "1"; oId "CUFF.3.1"; tss_id "TSS3";
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Message "No source specified" when running augustus

Post by katharina »

by Katharina on 31.08.2015 - 15:12
Ok, the problem is that your file is not in "hints format" but it gtf format.
You need to reformat the last column to contain the following information: src=E (delete everything else) in the last column.
You find further information on hints format for AUGUSTUS at http://bioinf.uni-greifswald.de/webaugustus/help#hints (it does not matter whether you write "src=E" or "source=E"). You might want to have a look at your extrinsic.cfg file for running AUGUSTUS; check that the source actually exists in your extrinsic.cfg, and that the features that you have (I only see exon) actually have a score.
If results with one of the standard extrinsic.cfg files are not satisfactory, you might have to play with the scores a bit.
Since Cufflinks produces "groups of exons that belong to one transcript", you may additionally consider adding exon group identifiers to the last column, e.g.:

Code: Select all

src=E;grp=gene1
(gene1 is in this case the identifier of a particular group of exons that belong to one transcript).
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Message "No source specified" when running augustus

Post by katharina »

by Su on 04.09.2015 - 06:31
Thank you, Katharina.
I tried to write a perl script that convert cufflink gtf file to gff for augustus hint.
This is the output gff file:

Code: Select all

scaffold10015_cov179	Cufflinks	exon	17	49	.	+	.	src=E;grp=TCONS_00000001 
scaffold10015_cov179	Cufflinks	exon	982	1400	.	+	.	src=E;grp=TCONS_00000001 
scaffold10015_cov179	Cufflinks	exon	775	1400	.	+	.	src=E;grp=TCONS_00000002 
scaffold10017_cov124	Cufflinks	exon	117	258	.	-	.	src=E;grp=TCONS_00000003
This will properly work as hint file?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Message "No source specified" when running augustus

Post by katharina »

by Katharina on 07.09.2015 - 09:02
I hope it will. It looks ok to me.
Post Reply