how to correctly use RNA-seq data
Posted: Fri Nov 20, 2015 12:59 pm
Originally posted in the old forum by mark on 26.03.2013 - 16:56
I have Illumina reads in _1.fastq and _2.fastq format, but they don't have /1 and /2 in the identifiers. And, note that I only have 1 row with identifiers (starting with @, those starting with + are removed, probably because its redundant).
How should I reformat these so that I can use it for hints with AUGUSTUS:
http://bioinf.uni-greifswald.de/bioinf/ ... seq.Tophat
Thanks in advance, Mark
I have Illumina reads in _1.fastq and _2.fastq format, but they don't have /1 and /2 in the identifiers. And, note that I only have 1 row with identifiers (starting with @, those starting with + are removed, probably because its redundant).
Code: Select all
$ bzcat f_1.fastq.bz2 | head -n 8
@HWI-ST486:365:C16E0ACXX:3:1101:1382:1950 1:N:0:CCGTCC
NGAAATCATCACCGAAGAAGTCACCAAGTCTGACTTGAAACAATTGGTTGG
+
1=DDDDDDFFDDIIDIIIIB9CFEIDEEEIIIIIIIIIIIIIIIEIEIII
@HWI-ST486:365:C16E0ACXX:3:1101:1451:1958 1:N:0:CCGTCC
NTTGATTTTAAATCAGCCGTAGTTACATGTCTGGTCGAATCTTCGGTACAT
+
1=DDDFFHHHHHJJIJJJJJJJJJIJJJIJIJIIJJIJJJIJHFGGGIII
$ bzcat f_2.fastq.bz2 | head -n 8
@HWI-ST486:365:C16E0ACXX:3:1101:1382:1950 2:N:0:CCGTCC
GGCTTCTTCAATACCTTAACCTTGCGGATGTAGACATCGTGCAATGGGTAG
+
@CCFFDFFHHHHHDHGIJGIJGGIGIJDGHADG@FHGIIJIIIHIJGIFFB
@HWI-ST486:365:C16E0ACXX:3:1101:1451:1958 2:N:0:CCGTCC
GAACGTTTGCAGTATACCCGTGATTGCATTTGCTTGGATTTTTGTCCTGAA
+
@@CFFFFFHHHHHIJIJJJIHJGIJIIGIJJJJJJJIHJIJJJJJJIIJIG
http://bioinf.uni-greifswald.de/bioinf/ ... seq.Tophat
Thanks in advance, Mark