Filterbam error of segmentation fault
Posted: Thu Jan 05, 2023 4:13 pm
Hello,
First of all, thank you for the development of your tool!
I am currently trying to make the structural annotation of a specific contig. At the moment I am producing hint from two libraries RNA-seq illumina. So first I changed the name of the reads to "-1 and -2" via a sed command. Then I created an interleave fastq from R1 and R2.
Here is a head of my fastq:
@SRR1693840.1 EXTGAIIX_101210:5:11105-1
TNCATCCAGGCGGTCCATTCGATCCATTAGGTCTAGCAAATGATCCCGACCAAGCTGCAATCCTAAAAGTGAAGGAAATTAAGAATGGAAGACTTGCTATGTTTGCCAT
+
B!B@BEFFFDGDGGGDHEHBDGADGGGGGGGEGGGGGGDDBGGGGFBHDHDFFBCD>8GGB8D<BBBB+GBDAADG@3DDD8?E@ABBD3B?;'?:::::?C8A8C####
@SRR1693840.1 EXTGAIIX_101210:5:11105-2
CTTCCATTAAATAAAGCAAAATGCCTTCAGTGAGATGTATTGTACATTCAAGATTGGATTAACAAAAAAGCATGAAAATGATCTGATTCATCACAGAGTTGGAGCCT
+
HDIIDGIFFGIDIFI>EGGGGG>G@>DE:EDGGE8GG@GDGGBG8GF=BFBGGGEG@EGECGGG>>DDDAD>GG8EG<DD8E8D-DBD88G@<GBDG<<>A########
@SRR1693840.2 EXTGAIIX_101210:5:11101-1
CNAAGCAACTATAGCCACTTTATTTTATTAAAAAAGCAAACATATATGATATGCTTTTTCTACATTATCATTACATATTGACTCTAATATCTGCTTCACAC
I then ran the alignment by following the commands below:
``bash
conda activate tophat
tophat -o $outputDIR -p 4 $index_DIR/dgl_COI_bwt_index $RNAseq_leaves_r1,$RNAseq_leaves_r2 $RNAseq_roots_r1,$RNAseq_roots_r2
RNAseq_leaves_interleave=$RNAseq_files_DIR/SSRR1693840_Pisum_sativum_cv_Cameor_Leaves_paired_end_library_stage_B_Low-nitrate_Hydroponics_readNameChanged_interleave.fastq.gz
tophat -o $outputDIR1/ -p 16 $index_DIR/dgl_COI_bwt_index $RNAseq_leaves_interleave
# convert junction bed file from the first run:
junction_bedFile_PATH=$outputDIR1/junctions.bed
cat $junction_bedFile_PATH | python2 /NetScratch/cpichot/.conda/envs/tophat/bin/bed_to_juncs > $outputDIR1/junctions_parsed.bed
RNAseq_roots_interleave=$RNAseq_files_DIR/SRR1664818_Pisum_sativum_cv_Cameor_Root_system_paired_end_library_stage_A_High-nitrate_Hydroponics_readNameChanged_interleave.fastq.gz
tophat -j $outputDIR1/junctions_parsed.bed -o $outputDIR2/ -p 16 $index_DIR/dgl_COI_bwt_index $RNAseq_roots_interleave
```
Finally I wanted to filter the bam by following these commands:
``bash
# Filtering RNA-seq alignment:
conda activate samtools
align_results_bam_leaves_PATH=$outputDIR1/accepted_hits.bam
align_results_bam_root_PATH=$outputDIR2/accepted_hits.bam
samtools sort -n $align_results_bam_leaves_PATH > $outputDIR1/accepted_hits.s
samtools sort -n $align_results_bam_root_PATH > $outputDIR2/accepted_hits.s
# filter alignments with filterBam: # do not work at this time !!!!!!
filterBam --uniq --paired --in $outputDIR1/accepted_hits.s --out $outputDIR1/accepted_hits.sf.bam
filterBam --uniq --paired --in $outputDIR2/accepted_hits.s --out $outputDIR2/accepted_hits.sf.bam
```
However, when I try to do the filtering I get a segmentation fault error
Do you have any suggestions? Did I prepare the fastq wrong?
Thanks in advance for your help
best regards,
Clement
First of all, thank you for the development of your tool!
I am currently trying to make the structural annotation of a specific contig. At the moment I am producing hint from two libraries RNA-seq illumina. So first I changed the name of the reads to "-1 and -2" via a sed command. Then I created an interleave fastq from R1 and R2.
Here is a head of my fastq:
@SRR1693840.1 EXTGAIIX_101210:5:11105-1
TNCATCCAGGCGGTCCATTCGATCCATTAGGTCTAGCAAATGATCCCGACCAAGCTGCAATCCTAAAAGTGAAGGAAATTAAGAATGGAAGACTTGCTATGTTTGCCAT
+
B!B@BEFFFDGDGGGDHEHBDGADGGGGGGGEGGGGGGDDBGGGGFBHDHDFFBCD>8GGB8D<BBBB+GBDAADG@3DDD8?E@ABBD3B?;'?:::::?C8A8C####
@SRR1693840.1 EXTGAIIX_101210:5:11105-2
CTTCCATTAAATAAAGCAAAATGCCTTCAGTGAGATGTATTGTACATTCAAGATTGGATTAACAAAAAAGCATGAAAATGATCTGATTCATCACAGAGTTGGAGCCT
+
HDIIDGIFFGIDIFI>EGGGGG>G@>DE:EDGGE8GG@GDGGBG8GF=BFBGGGEG@EGECGGG>>DDDAD>GG8EG<DD8E8D-DBD88G@<GBDG<<>A########
@SRR1693840.2 EXTGAIIX_101210:5:11101-1
CNAAGCAACTATAGCCACTTTATTTTATTAAAAAAGCAAACATATATGATATGCTTTTTCTACATTATCATTACATATTGACTCTAATATCTGCTTCACAC
I then ran the alignment by following the commands below:
``bash
conda activate tophat
tophat -o $outputDIR -p 4 $index_DIR/dgl_COI_bwt_index $RNAseq_leaves_r1,$RNAseq_leaves_r2 $RNAseq_roots_r1,$RNAseq_roots_r2
RNAseq_leaves_interleave=$RNAseq_files_DIR/SSRR1693840_Pisum_sativum_cv_Cameor_Leaves_paired_end_library_stage_B_Low-nitrate_Hydroponics_readNameChanged_interleave.fastq.gz
tophat -o $outputDIR1/ -p 16 $index_DIR/dgl_COI_bwt_index $RNAseq_leaves_interleave
# convert junction bed file from the first run:
junction_bedFile_PATH=$outputDIR1/junctions.bed
cat $junction_bedFile_PATH | python2 /NetScratch/cpichot/.conda/envs/tophat/bin/bed_to_juncs > $outputDIR1/junctions_parsed.bed
RNAseq_roots_interleave=$RNAseq_files_DIR/SRR1664818_Pisum_sativum_cv_Cameor_Root_system_paired_end_library_stage_A_High-nitrate_Hydroponics_readNameChanged_interleave.fastq.gz
tophat -j $outputDIR1/junctions_parsed.bed -o $outputDIR2/ -p 16 $index_DIR/dgl_COI_bwt_index $RNAseq_roots_interleave
```
Finally I wanted to filter the bam by following these commands:
``bash
# Filtering RNA-seq alignment:
conda activate samtools
align_results_bam_leaves_PATH=$outputDIR1/accepted_hits.bam
align_results_bam_root_PATH=$outputDIR2/accepted_hits.bam
samtools sort -n $align_results_bam_leaves_PATH > $outputDIR1/accepted_hits.s
samtools sort -n $align_results_bam_root_PATH > $outputDIR2/accepted_hits.s
# filter alignments with filterBam: # do not work at this time !!!!!!
filterBam --uniq --paired --in $outputDIR1/accepted_hits.s --out $outputDIR1/accepted_hits.sf.bam
filterBam --uniq --paired --in $outputDIR2/accepted_hits.s --out $outputDIR2/accepted_hits.sf.bam
```
However, when I try to do the filtering I get a segmentation fault error
Do you have any suggestions? Did I prepare the fastq wrong?
Thanks in advance for your help
best regards,
Clement