I still spend all of my time developing BBTools. However, due to internal politics, the JGI website has changed to not feature any JGI-developed bioinformatics software (BBTools, MetaHipmer, etc). Feel free to send an email to JGI's director (nmouncey@lbl.gov) if you think it is helpful for JGI's users to have bioinformatics software such as BBTools on its website, because helping users is supposed to be our goal.
Found the problem in the parser; it will be fixed in 39.20.
OK! I'll look into it and get back to you.
That's odd; looks to me like it should be working. What version are you using?
39.15 is out now.
reformat.sh producing output with 4 reads having the same id
Hmmm... looks like the issue here is that the intermediate fastq file is not really interleaved. Sam/bam files are not interleaved in the first place and should never be treated as such; the order of records is arbitrary and reformatting them as fastq (with samtools) doesn't change that. If you force Reformat to process a noninterleaved file as interleaved, strange things will happen; in this case, the records for that pair are nonadjacent and that causes the header replication, as per the sam specification...
Issues when reading IDs with UMIs
All fixed; will be released in BBTools 39.15. Along with the new flags "umi" and "umisubs" so that you can require reads to only be classified as duplicates if their UMIs match.
Thanks for this report... JGI doesn't use UMI's so I haven't seen them before in Illumina reads. I've duplicated the error and am modifying my header parsers to support UMIs, so that will work correctly in the next release.
When I wrote that, PacBio did not have paired reads. They have a new sequencing machine now for short reads that I think does produce pairs but I have not seen any data for it so I'm not sure of the header structure.
The problem here is that the read headers differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....
The problem here is that the reads differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....
Good suggestion; I'm opening a new process for bgzip and piping the input. Shouldn't be too hard to catch the error code.
I was able to replicate this behavior and it is fixed for the next release (39.05, probably this week). Sometimes I don't notice this kind of issue because I always keep paired reads interleaved in a single file.
Hi Pierre, Sorry about that, those two and a couple other versions of CrisprFinder made it into the release accidentally. Only CrisprFinder.java should be there. I'll delete them for the next release. Thanks for notifying me! -Brian
Have you retried this recently? There was a problem with the configuration of the servers for a few weeks which should have been resolved on ~Jan 6th, but I'm not 100% sure it's fixed for everyone. Also, can you tell me which version of Java and BBTools you are using?
I'm not sure why I wrote the last line like that. I changed it to: return ((rka1.kmerMinusStrand==rkb1.kmerMinusStrand) && (rka2.kmerMinusStrand==rkb2.kmerMinusStrand)); ...which fixes the problem. That will be released soon in 38.71. Sorry for the delay!
I have fixed this in the current release (38.70), using jnylander's suggestion. Thanks!
I added a "seed" flag for the next release (30.71), which should make the program deterministic. The default seed is -1, and negative seeds will still produce nondeterminstic output. Positive seeds will set the random number generator into a specific state and produce deterministic output as long as the same seed is used.
Hi - I'm really sorry about that, but BBMap does not support Wheat as it has a chromosome longer than 500Mbp, the current limit. It's the only organism I'm aware of that has this issue. I'll try to clarify the error message. You could break the chromosome at the centromere, but you're probably better off using a different aligner.
java.util.zip.ZipException: too many length or distance symbols
Hi Nick, The problem here is running multiple bbmap indexing jobs simultaneously in the same directory. If you want to use the same index for multiple processes, then index once (e.g. "bbmap.sh ref=contigs.fasta"), wait for it to finish, and then run subsequent mapping jobs for that reference. You can run as many mapping processes if you want simultaneously. Or, if you have multiple references, index them in different directories, or use the "nodisk" flag so that no index is written to disk. If you...
java.lang.ArrayIndexOutOfBoundsException and hang
BBMap v38.38: problem with parameter "phist" in bbduk
Hi Martin, I originally developed polymer-tracking to deal with a certain kind of NovaSeq error mode (analyzing and trimming poly-C from dark cycles) and it looks like I did not adequately test phist outside of that context. The phist report is fine with the -da flag; I've disabled those unnecessary assertions, so the next release (38.39, later today) will run fine with assertions enabled. Thanks for your report!
38.37 is now released, with this fixed.
Thanks for the report, Sidney! Indeed, the qin flag got broken (it was being reset by autodetection). I've fixed it now and the fixed version will be released later today in v38.37. The problem in this case (apart from the broken qin flag) is that the quality scores go out of the expected range of 0-41, causing them to be autodetected as ASCII-64. So for that particular file you will still need to use the flag "qin=33" or "ignorebadquality".
The new version is available now.
Problem running taxtree
The problem is that NCBI added a new taxonomic level, "subcohort", which they did not previously use (this happens occasionally). I just put in support for "subcohort" in the code and BBTools 38.36 can handle the lastest taxonomy data fine; I'll release it in a little while.
Hi Stephane, I didn't notice at first, but looks like you misspelled "threads" :) Normally I use "t=24" so I don't risk misspelling.
Hi Stephane, This is not surprising; the reads are clumped by read 1, and read 2 is just along for the ride, getting put out in the same order as read 1. As such, when you have 2 files, file 1 will compress much better when you have variable insert size. I had not personally noticed this since I work with interleaved files. You do not want to use "unpair" and "repair" unless you are doing error-correction. If you ARE doing error-correction, then yes, add those flags. Honestly, I'm not entirely sure...
Bug in call variants with relation to sam files created from minimap2
Hi Kim, CallVariants requires either cigar strings in the modern format (with = and X for match and mismatch, instead of M) or else the reads must thave MD tags. I have not used minimap but it may have a way to either produce cigar strings with = and X, or MD tags. I don't see it in the documentation, though. I plan to change CallVariants in the near future so that MD tags are not required. -Brian
Hi Tunzi, It looks like the problem is possibly that you are running out of memory....
Hello Niranjan, You can use Dedupe like this: dedupe.sh in1=read1.fq in2=read2.fq...
Hi Damon, BBMap is a global aligner and will penalize reads for going off the end...
"filterbyname" by default requires an exact name match. In fastq format, this line:...
Hi! Apologies for not noticing this earlier; I mainly watch SeqAnswers. Would it...
You can produce PacBio reads like this: bbmap.sh ref=reference.fasta randomreads.sh...
Synthetic reads will have names like this: @0_chr1_0_102714249_102714348_993561_chromosome_17...
Hi Davide, The current intent of the kmask flag is to only mask regions matching...
Sorry for not responding sooner, I didn't notice this! These quality thresholds are...
Explanation of low minq and maxq values (high error rates)
See this link: http://en.wikipedia.org/wiki/Phred_quality_score The formula is: Q=-10*log10(P)...