brian-jgi Activity

Activity for Brian Bushnell

7 months ago
Brian Bushnell posted a comment on ticket #74

I still spend all of my time developing BBTools. However, due to internal politics, the JGI website has changed to not feature any JGI-developed bioinformatics software (BBTools, MetaHipmer, etc). Feel free to send an email to JGI's director (nmouncey@lbl.gov) if you think it is helpful for JGI's users to have bioinformatics software such as BBTools on its website, because helping users is supposed to be our goal.
8 months ago
Brian Bushnell posted a comment on ticket #75

Found the problem in the parser; it will be fixed in 39.20.
8 months ago
Brian Bushnell posted a comment on ticket #75

OK! I'll look into it and get back to you.
8 months ago
Brian Bushnell posted a comment on ticket #75

That's odd; looks to me like it should be working. What version are you using?
10 months ago
Brian Bushnell posted a comment on ticket #71

39.15 is out now.
10 months ago
Brian Bushnell modified ticket #70

reformat.sh producing output with 4 reads having the same id
10 months ago
Brian Bushnell posted a comment on ticket #70

Hmmm... looks like the issue here is that the intermediate fastq file is not really interleaved. Sam/bam files are not interleaved in the first place and should never be treated as such; the order of records is arbitrary and reformatting them as fastq (with samtools) doesn't change that. If you force Reformat to process a noninterleaved file as interleaved, strange things will happen; in this case, the records for that pair are nonadjacent and that causes the header replication, as per the sam specification...
10 months ago
Brian Bushnell modified ticket #71

Issues when reading IDs with UMIs
10 months ago
Brian Bushnell posted a comment on ticket #71

All fixed; will be released in BBTools 39.15. Along with the new flags "umi" and "umisubs" so that you can require reads to only be classified as duplicates if their UMIs match.
10 months ago
Brian Bushnell posted a comment on ticket #71

Thanks for this report... JGI doesn't use UMI's so I haven't seen them before in Illumina reads. I've duplicated the error and am modifying my header parsers to support UMIs, so that will work correctly in the next release.
1 year ago
Brian Bushnell posted a comment on ticket #69

When I wrote that, PacBio did not have paired reads. They have a new sequencing machine now for short reads that I think does produce pairs but I have not seen any data for it so I'm not sure of the header structure.
1 year ago
Brian Bushnell modified a comment on ticket #69

The problem here is that the read headers differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....
1 year ago
Brian Bushnell posted a comment on ticket #69

The problem here is that the reads differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....
2 years ago
Brian Bushnell posted a comment on ticket #64

Good suggestion; I'm opening a new process for bgzip and piping the input. Shouldn't be too hard to catch the error code.
2 years ago
Brian Bushnell posted a comment on ticket #63

I was able to replicate this behavior and it is fixed for the next release (39.05, probably this week). Sometimes I don't notice this kind of issue because I always keep paired reads interleaved in a single file.
2 years ago
Brian Bushnell posted a comment on ticket #60

Hi Pierre, Sorry about that, those two and a couple other versions of CrisprFinder made it into the release accidentally. Only CrisprFinder.java should be there. I'll delete them for the next release. Thanks for notifying me! -Brian
4 years ago
Brian Bushnell posted a comment on ticket #51

Have you retried this recently? There was a problem with the configuration of the servers for a few weeks which should have been resolved on ~Jan 6th, but I'm not 100% sure it's fixed for everyone. Also, can you tell me which version of Java and BBTools you are using?
6 years ago
Brian Bushnell posted a comment on ticket #18

I'm not sure why I wrote the last line like that. I changed it to: return ((rka1.kmerMinusStrand==rkb1.kmerMinusStrand) && (rka2.kmerMinusStrand==rkb2.kmerMinusStrand)); ...which fixes the problem. That will be released soon in 38.71. Sorry for the delay!
6 years ago
Brian Bushnell posted a comment on ticket #23

I have fixed this in the current release (38.70), using jnylander's suggestion. Thanks!
6 years ago
Brian Bushnell posted a comment on ticket #24

I added a "seed" flag for the next release (30.71), which should make the program deterministic. The default seed is -1, and negative seeds will still produce nondeterminstic output. Positive seeds will set the random number generator into a specific state and produce deterministic output as long as the same seed is used.
6 years ago
Brian Bushnell posted a comment on ticket #25

Hi - I'm really sorry about that, but BBMap does not support Wheat as it has a chromosome longer than 500Mbp, the current limit. It's the only organism I'm aware of that has this issue. I'll try to clarify the error message. You could break the chromosome at the centromere, but you're probably better off using a different aligner.
7 years ago
Brian Bushnell modified ticket #13

java.util.zip.ZipException: too many length or distance symbols
7 years ago
Brian Bushnell posted a comment on ticket #13

Hi Nick, The problem here is running multiple bbmap indexing jobs simultaneously in the same directory. If you want to use the same index for multiple processes, then index once (e.g. "bbmap.sh ref=contigs.fasta"), wait for it to finish, and then run subsequent mapping jobs for that reference. You can run as many mapping processes if you want simultaneously. Or, if you have multiple references, index them in different directories, or use the "nodisk" flag so that no index is written to disk. If you...
7 years ago
Brian Bushnell modified ticket #15

java.lang.ArrayIndexOutOfBoundsException and hang
7 years ago
Brian Bushnell modified ticket #16

BBMap v38.38: problem with parameter "phist" in bbduk
7 years ago
Brian Bushnell posted a comment on ticket #16

Hi Martin, I originally developed polymer-tracking to deal with a certain kind of NovaSeq error mode (analyzing and trimming poly-C from dark cycles) and it looks like I did not adequately test phist outside of that context. The phist report is fine with the -da flag; I've disabled those unnecessary assertions, so the next release (38.39, later today) will run fine with assertions enabled. Thanks for your report!
7 years ago
Brian Bushnell posted a comment on ticket #15

38.37 is now released, with this fixed.
7 years ago
Brian Bushnell posted a comment on ticket #15

Thanks for the report, Sidney! Indeed, the qin flag got broken (it was being reset by autodetection). I've fixed it now and the fixed version will be released later today in v38.37. The problem in this case (apart from the broken qin flag) is that the quality scores go out of the expected range of 0-41, causing them to be autodetected as ASCII-64. So for that particular file you will still need to use the flag "qin=33" or "ignorebadquality".
7 years ago
Brian Bushnell posted a comment on ticket #14

The new version is available now.
7 years ago
Brian Bushnell modified ticket #14

Problem running taxtree
7 years ago
Brian Bushnell posted a comment on ticket #14

The problem is that NCBI added a new taxonomic level, "subcohort", which they did not previously use (this happens occasionally). I just put in support for "subcohort" in the code and BBTools 38.36 can handle the lastest taxonomy data fine; I'll release it in a little while.
7 years ago
Brian Bushnell posted a comment on ticket #10

Hi Stephane, I didn't notice at first, but looks like you misspelled "threads" :) Normally I use "t=24" so I don't risk misspelling.
7 years ago
Brian Bushnell posted a comment on ticket #11

Hi Stephane, This is not surprising; the reads are clumped by read 1, and read 2 is just along for the ride, getting put out in the same order as read 1. As such, when you have 2 files, file 1 will compress much better when you have variable insert size. I had not personally noticed this since I work with interleaved files. You do not want to use "unpair" and "repair" unless you are doing error-correction. If you ARE doing error-correction, then yes, add those flags. Honestly, I'm not entirely sure...
8 years ago
Brian Bushnell modified ticket #6

Bug in call variants with relation to sam files created from minimap2
8 years ago
Brian Bushnell posted a comment on ticket #6

Hi Kim, CallVariants requires either cigar strings in the modern format (with = and X for match and mismatch, instead of M) or else the reads must thave MD tags. I have not used minimap but it may have a way to either produce cigar strings with = and X, or MD tags. I don't see it in the documentation, though. I plan to change CallVariants in the near future so that MD tags are not required. -Brian
9 years ago
Brian Bushnell posted a comment on ticket #4

Hi Tunzi, It looks like the problem is possibly that you are running out of memory....
9 years ago
Brian Bushnell posted a comment on discussion General Discussion

Hello Niranjan, You can use Dedupe like this: dedupe.sh in1=read1.fq in2=read2.fq...
9 years ago
Brian Bushnell posted a comment on discussion General Discussion

Hi Damon, BBMap is a global aligner and will penalize reads for going off the end...
1 decade ago
Brian Bushnell posted a comment on discussion General Discussion

"filterbyname" by default requires an exact name match. In fastq format, this line:...
1 decade ago
Brian Bushnell posted a comment on discussion General Discussion

Hi! Apologies for not noticing this earlier; I mainly watch SeqAnswers. Would it...
1 decade ago
Brian Bushnell posted a comment on discussion General Discussion

You can produce PacBio reads like this: bbmap.sh ref=reference.fasta randomreads.sh...
1 decade ago
Brian Bushnell posted a comment on discussion General Discussion

Synthetic reads will have names like this: @0_chr1_0_102714249_102714348_993561_chromosome_17...
1 decade ago
Brian Bushnell posted a comment on ticket #2

Hi Davide, The current intent of the kmask flag is to only mask regions matching...
1 decade ago
Brian Bushnell posted a comment on discussion General Discussion

Sorry for not responding sooner, I didn't notice this! These quality thresholds are...
1 decade ago
Brian Bushnell modified ticket #1

Explanation of low minq and maxq values (high error rates)
1 decade ago
Brian Bushnell posted a comment on ticket #1

See this link: http://en.wikipedia.org/wiki/Phred_quality_score The formula is: Q=-10*log10(P)...

Brian Bushnell Activity

Activity for Brian Bushnell

Brian Bushnell posted a comment on ticket #74

Brian Bushnell posted a comment on ticket #75

Brian Bushnell posted a comment on ticket #75

Brian Bushnell posted a comment on ticket #75

Brian Bushnell posted a comment on ticket #71

Brian Bushnell modified ticket #70

Brian Bushnell posted a comment on ticket #70

Brian Bushnell modified ticket #71

Brian Bushnell posted a comment on ticket #71

Brian Bushnell posted a comment on ticket #71

Brian Bushnell posted a comment on ticket #69

Brian Bushnell modified a comment on ticket #69

Brian Bushnell posted a comment on ticket #69

Brian Bushnell posted a comment on ticket #64

Brian Bushnell posted a comment on ticket #63

Brian Bushnell posted a comment on ticket #60

Brian Bushnell posted a comment on ticket #51

Brian Bushnell posted a comment on ticket #18

Brian Bushnell posted a comment on ticket #23

Brian Bushnell posted a comment on ticket #24

Brian Bushnell posted a comment on ticket #25

Brian Bushnell modified ticket #13

Brian Bushnell posted a comment on ticket #13

Brian Bushnell modified ticket #15

Brian Bushnell modified ticket #16

Brian Bushnell posted a comment on ticket #16

Brian Bushnell posted a comment on ticket #15

Brian Bushnell posted a comment on ticket #15

Brian Bushnell posted a comment on ticket #14

Brian Bushnell modified ticket #14

Brian Bushnell posted a comment on ticket #14

Brian Bushnell posted a comment on ticket #10

Brian Bushnell posted a comment on ticket #11

Brian Bushnell modified ticket #6

Brian Bushnell posted a comment on ticket #6

Brian Bushnell posted a comment on ticket #4

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell posted a comment on ticket #2

Brian Bushnell posted a comment on discussion General Discussion

Brian Bushnell modified ticket #1

Brian Bushnell posted a comment on ticket #1