BRAKER, Galba, & TSEBRA Book Chapter Data (2025)
We provide example files for running BRAKER and Galba with data from Drosophila melanogaster:
- genome.fa.gz (42 MB) for all pipelines [1]
- proteins.fa.gz (32 MB) for Galba, BRAKER2, & BRAKER3 [2]
- Arthropoda.fa.gz (1.2 GB) for BRAKER2 & BRAKER3 [1]
- isoseq.bam (49 MB) for BRAKER3 with Iso-Seq [3]
- hints.gff.gz (647 KB) for BRAKER1 [1]
- rnaseq.bam (1.9 GB) for BRAKER1 & BRAKER3 [1]
- file1_1.fastq.gz, file1_2.fastq.gz (1.5 GB, 1.6 GB) for BRAKER3 [1]
Please unpack files ending with *.gz using the gunzip tool after download, and before running BRAKER or Galba.
Data Sources
The files hosted here are described in different sources:
[1] Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M., & Borodovsky, M. (2021). BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics, 3(1), lqaa108.
[2] Brůna, T., Li, H., Guhlin, J., Honsel, D., Herbold, S., Stanke, M., ... & Hoff, K. J. (2023). Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinformatics, 24(1), 327. (For hints.gff.gz, rnaseq.bam, and reads_1.fq.gz as well reads_2.fq.gz, we chose the single library SRR19416937 to speed up practice runs.)
[3] Križanović, K., Echchiki, A., Roux, J., & Šikić, M. (2018). Evaluation of tools for long read RNA-seq splice-aware alignment. Bioinformatics, 34(5), 748-754. (We prepared dataset 6, the PacBio CCS error corrected reads, older chemistry, with minimap2 and samtools.)