[go: up one dir, main page]

Add Collect Struct for building snapshots

Need a facility to collect keys of key/value pairs and organize them into a snapshot First pass collects all the keys keys are sorted and duplicates eliminated Then all transactions are collected and a index table is built

Collect provides a struct that creates a snapshot. Any key value pair can be extracted easily by key. Format of the snapshot:

  8 bytes  -- offset to index file
  N transactions {
     4 bytes -- length of TX M
     M bytes -- the transaction
  }
  N Index Table Entries {
     32 bytes -- hash of transaction
      8 bytes -- offset in snapshot to transaction entry in transactions

to begin collection

*NewCollect(outputName string, build bool) (collect Collect, err error)

  • Provide an output name, such as 'Snapshot_101'
  • To create a snapshot, specify build = true
  • If simply accessing an existing snapshot, build = false
  • Note that specifying an existing snapshot with build = true will delete the snapshot

Building a snapshot creates the output file, a couple temporary directories and a number of temporary files

  • The output file
  • Temporary directory one is the main one, and is named outputName + ".tmp"
  • Temporary directory two is for hashes, as is under temporary directory one outputName +".tmp/hashes"
  • 256 temporary files under directory one for sorting
  • The temporary file for hash sorting and deduplicating
  • 256 temporary files under directory two for collecting hashes

to end collection building

*func (c Collect) Close() (err error) Close() closes all open files and removes all temporary files and directories

To collect hashes

  1. *func (c Collect) WriteHash(hash []byte) (err error) -- Takes each hash as encountered and writes the hash into one of 256 buckets based on the first byte of the hash
  2. *func (c Collect) BuildHashFile() (err error) -- Sorts each bucket file and appends it in order into a temporary hashes file. As this is done, duplicates are elliminated
  3. *func (c Collect) GetHash() []byte -- After the BuildHashFile() call, each hash can be accessed in order by successive calls to GetHash(). A null is returned at EOF