[go: up one dir, main page]

Activity for Brian Bushnell

  • Brian Bushnell Brian Bushnell posted a comment on ticket #74

    I still spend all of my time developing BBTools. However, due to internal politics, the JGI website has changed to not feature any JGI-developed bioinformatics software (BBTools, MetaHipmer, etc). Feel free to send an email to JGI's director (nmouncey@lbl.gov) if you think it is helpful for JGI's users to have bioinformatics software such as BBTools on its website, because helping users is supposed to be our goal.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #75

    Found the problem in the parser; it will be fixed in 39.20.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #75

    OK! I'll look into it and get back to you.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #75

    That's odd; looks to me like it should be working. What version are you using?

  • Brian Bushnell Brian Bushnell posted a comment on ticket #71

    39.15 is out now.

  • Brian Bushnell Brian Bushnell modified ticket #70

    reformat.sh producing output with 4 reads having the same id

  • Brian Bushnell Brian Bushnell posted a comment on ticket #70

    Hmmm... looks like the issue here is that the intermediate fastq file is not really interleaved. Sam/bam files are not interleaved in the first place and should never be treated as such; the order of records is arbitrary and reformatting them as fastq (with samtools) doesn't change that. If you force Reformat to process a noninterleaved file as interleaved, strange things will happen; in this case, the records for that pair are nonadjacent and that causes the header replication, as per the sam specification...

  • Brian Bushnell Brian Bushnell modified ticket #71

    Issues when reading IDs with UMIs

  • Brian Bushnell Brian Bushnell posted a comment on ticket #71

    All fixed; will be released in BBTools 39.15. Along with the new flags "umi" and "umisubs" so that you can require reads to only be classified as duplicates if their UMIs match.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #71

    Thanks for this report... JGI doesn't use UMI's so I haven't seen them before in Illumina reads. I've duplicated the error and am modifying my header parsers to support UMIs, so that will work correctly in the next release.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #69

    When I wrote that, PacBio did not have paired reads. They have a new sequencing machine now for short reads that I think does produce pairs but I have not seen any data for it so I'm not sure of the header structure.

  • Brian Bushnell Brian Bushnell modified a comment on ticket #69

    The problem here is that the read headers differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....

  • Brian Bushnell Brian Bushnell posted a comment on ticket #69

    The problem here is that the reads differ in two places. Normally, Illumina uses one of these two formats: @stuff/1 @stuff/2 or @stuff 1:morestuff @stuff 2:morestuff Of these, the /1 and /2 is obsolete for Illumina as far as I know, though Complete Genomics / BGI are adopting it. My effort to determine pairing is based on observation of Illumina data since there is no formal fastq specification regarding pair naming conventions, and they usually put the read identifier in the "optional description"....

  • Brian Bushnell Brian Bushnell posted a comment on ticket #64

    Good suggestion; I'm opening a new process for bgzip and piping the input. Shouldn't be too hard to catch the error code.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #63

    I was able to replicate this behavior and it is fixed for the next release (39.05, probably this week). Sometimes I don't notice this kind of issue because I always keep paired reads interleaved in a single file.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #60

    Hi Pierre, Sorry about that, those two and a couple other versions of CrisprFinder made it into the release accidentally. Only CrisprFinder.java should be there. I'll delete them for the next release. Thanks for notifying me! -Brian

  • Brian Bushnell Brian Bushnell posted a comment on ticket #51

    Have you retried this recently? There was a problem with the configuration of the servers for a few weeks which should have been resolved on ~Jan 6th, but I'm not 100% sure it's fixed for everyone. Also, can you tell me which version of Java and BBTools you are using?

  • Brian Bushnell Brian Bushnell posted a comment on ticket #18

    I'm not sure why I wrote the last line like that. I changed it to: return ((rka1.kmerMinusStrand==rkb1.kmerMinusStrand) && (rka2.kmerMinusStrand==rkb2.kmerMinusStrand)); ...which fixes the problem. That will be released soon in 38.71. Sorry for the delay!

  • Brian Bushnell Brian Bushnell posted a comment on ticket #23

    I have fixed this in the current release (38.70), using jnylander's suggestion. Thanks!

  • Brian Bushnell Brian Bushnell posted a comment on ticket #24

    I added a "seed" flag for the next release (30.71), which should make the program deterministic. The default seed is -1, and negative seeds will still produce nondeterminstic output. Positive seeds will set the random number generator into a specific state and produce deterministic output as long as the same seed is used.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #25

    Hi - I'm really sorry about that, but BBMap does not support Wheat as it has a chromosome longer than 500Mbp, the current limit. It's the only organism I'm aware of that has this issue. I'll try to clarify the error message. You could break the chromosome at the centromere, but you're probably better off using a different aligner.

  • Brian Bushnell Brian Bushnell modified ticket #13

    java.util.zip.ZipException: too many length or distance symbols

  • Brian Bushnell Brian Bushnell posted a comment on ticket #13

    Hi Nick, The problem here is running multiple bbmap indexing jobs simultaneously in the same directory. If you want to use the same index for multiple processes, then index once (e.g. "bbmap.sh ref=contigs.fasta"), wait for it to finish, and then run subsequent mapping jobs for that reference. You can run as many mapping processes if you want simultaneously. Or, if you have multiple references, index them in different directories, or use the "nodisk" flag so that no index is written to disk. If you...

  • Brian Bushnell Brian Bushnell modified ticket #15

    java.lang.ArrayIndexOutOfBoundsException and hang

  • Brian Bushnell Brian Bushnell modified ticket #16

    BBMap v38.38: problem with parameter "phist" in bbduk

  • Brian Bushnell Brian Bushnell posted a comment on ticket #16

    Hi Martin, I originally developed polymer-tracking to deal with a certain kind of NovaSeq error mode (analyzing and trimming poly-C from dark cycles) and it looks like I did not adequately test phist outside of that context. The phist report is fine with the -da flag; I've disabled those unnecessary assertions, so the next release (38.39, later today) will run fine with assertions enabled. Thanks for your report!

  • Brian Bushnell Brian Bushnell posted a comment on ticket #15

    38.37 is now released, with this fixed.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #15

    Thanks for the report, Sidney! Indeed, the qin flag got broken (it was being reset by autodetection). I've fixed it now and the fixed version will be released later today in v38.37. The problem in this case (apart from the broken qin flag) is that the quality scores go out of the expected range of 0-41, causing them to be autodetected as ASCII-64. So for that particular file you will still need to use the flag "qin=33" or "ignorebadquality".

  • Brian Bushnell Brian Bushnell posted a comment on ticket #14

    The new version is available now.

  • Brian Bushnell Brian Bushnell modified ticket #14

    Problem running taxtree

  • Brian Bushnell Brian Bushnell posted a comment on ticket #14

    The problem is that NCBI added a new taxonomic level, "subcohort", which they did not previously use (this happens occasionally). I just put in support for "subcohort" in the code and BBTools 38.36 can handle the lastest taxonomy data fine; I'll release it in a little while.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #10

    Hi Stephane, I didn't notice at first, but looks like you misspelled "threads" :) Normally I use "t=24" so I don't risk misspelling.

  • Brian Bushnell Brian Bushnell posted a comment on ticket #11

    Hi Stephane, This is not surprising; the reads are clumped by read 1, and read 2 is just along for the ride, getting put out in the same order as read 1. As such, when you have 2 files, file 1 will compress much better when you have variable insert size. I had not personally noticed this since I work with interleaved files. You do not want to use "unpair" and "repair" unless you are doing error-correction. If you ARE doing error-correction, then yes, add those flags. Honestly, I'm not entirely sure...

  • Brian Bushnell Brian Bushnell modified ticket #6

    Bug in call variants with relation to sam files created from minimap2

  • Brian Bushnell Brian Bushnell posted a comment on ticket #6

    Hi Kim, CallVariants requires either cigar strings in the modern format (with = and X for match and mismatch, instead of M) or else the reads must thave MD tags. I have not used minimap but it may have a way to either produce cigar strings with = and X, or MD tags. I don't see it in the documentation, though. I plan to change CallVariants in the near future so that MD tags are not required. -Brian

  • Brian Bushnell Brian Bushnell posted a comment on ticket #4

    Hi Tunzi, It looks like the problem is possibly that you are running out of memory....

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    Hello Niranjan, You can use Dedupe like this: dedupe.sh in1=read1.fq in2=read2.fq...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    Hi Damon, BBMap is a global aligner and will penalize reads for going off the end...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    "filterbyname" by default requires an exact name match. In fastq format, this line:...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    Hi! Apologies for not noticing this earlier; I mainly watch SeqAnswers. Would it...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    You can produce PacBio reads like this: bbmap.sh ref=reference.fasta randomreads.sh...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    Synthetic reads will have names like this: @0_chr1_0_102714249_102714348_993561_chromosome_17...

  • Brian Bushnell Brian Bushnell posted a comment on ticket #2

    Hi Davide, The current intent of the kmask flag is to only mask regions matching...

  • Brian Bushnell Brian Bushnell posted a comment on discussion General Discussion

    Sorry for not responding sooner, I didn't notice this! These quality thresholds are...

  • Brian Bushnell Brian Bushnell modified ticket #1

    Explanation of low minq and maxq values (high error rates)

  • Brian Bushnell Brian Bushnell posted a comment on ticket #1

    See this link: http://en.wikipedia.org/wiki/Phred_quality_score The formula is: Q=-10*log10(P)...

1