Using many tophat2 bam files

Discussions about predicting genes with AUGUSTUS. Not covered here: WebAUGUSTUS and BRAKER1

Moderator: bioinf

Post Reply
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Using many tophat2 bam files

Post by katharina »

Originally posted in the old forum by Dews on 26.09.2014 - 17:20
Hi All,
I am trying to generate intron hints for a large number of bam files. I thought it would be best to merge the bam files first, then sort. The merge file is too large to sort and proceed. I saw the suggestion to generate intron hints for each and then combine the hints file.
Is it as simple as concatenating the many intron hints files into a single file before running the first run of Augustus?
Thanks for the help/advice.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Using many tophat2 bam files

Post by katharina »

by katharina on 27.09.2014 - 13:26
It is better to merge the bam files before generating hints. However, if that is impossible, it is an easy option to generate the hints from separate bam files and concatenate the hints. The disadvantage of doing so is that hints for certain features may occur several times, and they will be treated as separate hints (whereas they would have the mult=X tag in the last column if you first joined the bam files). In practice, joining the hints files works well.
Katharina
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Using many tophat2 bam files

Post by katharina »

by Dews on 27.09.2014 - 15:00
Thanks Katharina! Is it reasonable for me to write a perl script that add the mult=X tags for the same hints from different files?
Another option I considered was to sort and filter each bam file and then attempt to combine all the bam files, then sort and filter again. I am afraid I'm missing something that would make this a bad idea.
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Using many tophat2 bam files

Post by katharina »

by katharina on 27.09.2014 - 15:50
It should'nt be a problem to filter the bam files separately, as long as one bam file contains the matches for both reads in a pair, each.
I think it's feasible to combine the hints with a perl script, yes. For intron hints, that should be easy. For exonpart hints, it is a more complicated story. The result will still not be the same as when you converted the big bam file to a wig. (There is a smoothing parameter involved.)
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Using many tophat2 bam files

Post by katharina »

by Dews on 27.09.2014 - 20:32
Thank you once again. I am only creating intron hints files right now so I think I've successfully combined the first iteration of hints files. I'm now running the first Augustus.
I am afraid I've missed something now. Where do we convert the big bam file to a wig file?
User avatar
katharina
Site Admin
Posts: 531
Joined: Wed Nov 18, 2015 6:14 pm
Location: Greifswald
Contact:

Re: Using many tophat2 bam files

Post by katharina »

by katharina on 28.09.2014 - 14:48
You only need a wig file for creating exonparthints. So don't worry about it, since you are only doing intronhints.
Post Reply