[go: up one dir, main page]

|
|
Log in / Subscribe / Register

In-band deduplication for Btrfs

In-band deduplication for Btrfs

Posted Mar 14, 2016 15:29 UTC (Mon) by mathstuf (subscriber, #69389)
In reply to: In-band deduplication for Btrfs by nix
Parent article: In-band deduplication for Btrfs

I thought I read somewhere that someone had an object of 40 zeros and had some collision locally. Or was that just a potentiality?


to post comments

In-band deduplication for Btrfs

Posted Mar 15, 2016 17:08 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

I've only heard of it as a possibility. Nobody has ever mentioned encountering a real collision, and frankly I'm not worried about one turning up in the foreseeable future.

In-band deduplication for Btrfs

Posted Aug 4, 2016 15:24 UTC (Thu) by JoeyUnknown (guest, #110181) [Link]

It's unlikely, but it should still not be that difficult to read the data and compare both. For those that don't need the performance hit, that would be turned off and you would remove a dimension from your data structure (btree[hashkey]->bucket->blocks to btree[hashkey]->block).

It should be a performance option. I can think of plenty of cases where for me a hash is fine. In some cases however, I don't really want to play dice with my data. Secondary to that, while right now the possibility of a collision is low, in future things can happen that might change that.

In some cases, depending on scenario, I would rather a system that performs worse than a possibility of a bizarre hidden integrity failure which can make a heck of a mess. If there ever was a hash collision, chances are it wouldn't be detected. The data would just have to be rebuilt and repaired or something. It's just one less vetor to worry about when it comes to big data where integrity is sensitive.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds