Add Collect Struct for building snapshots
Need a facility to collect keys of key/value pairs and organize them into a snapshot First pass collects all the keys keys are sorted and duplicates eliminated Then all transactions are collected and a index table is built
Collect provides a struct that creates a snapshot. Any key value pair can be extracted easily by key. Format of the snapshot:
8 bytes -- offset to index file
N transactions {
4 bytes -- length of TX M
M bytes -- the transaction
}
N Index Table Entries {
32 bytes -- hash of transaction
8 bytes -- offset in snapshot to transaction entry in transactions
to begin collection
*NewCollect(outputName string, build bool) (collect Collect, err error)
- Provide an output name, such as 'Snapshot_101'
- To create a snapshot, specify build = true
- If simply accessing an existing snapshot, build = false
- Note that specifying an existing snapshot with build = true will delete the snapshot
Building a snapshot creates the output file, a couple temporary directories and a number of temporary files
- The output file
- Temporary directory one is the main one, and is named outputName + ".tmp"
- Temporary directory two is for hashes, as is under temporary directory one outputName +".tmp/hashes"
- 256 temporary files under directory one for sorting
- The temporary file for hash sorting and deduplicating
- 256 temporary files under directory two for collecting hashes
to end collection building
*func (c Collect) Close() (err error) Close() closes all open files and removes all temporary files and directories
To collect hashes
- *func (c Collect) WriteHash(hash []byte) (err error) -- Takes each hash as encountered and writes the hash into one of 256 buckets based on the first byte of the hash
- *func (c Collect) BuildHashFile() (err error) -- Sorts each bucket file and appends it in order into a temporary hashes file. As this is done, duplicates are elliminated
- *func (c Collect) GetHash() []byte -- After the BuildHashFile() call, each hash can be accessed in order by successive calls to GetHash(). A null is returned at EOF