Sets and The Universal Type
Safely solving more problems with the same language tools
7 July 2015
Kevin Gillette
Kevin Gillette
map[T]struct{}
or map[T]bool
package main
import "fmt"
func main() {
set := make(map[string]bool) set["a"] = true set["b"] = true // iterate over set elements (unordered) for elem := range set { fmt.Println(elem) } fmt.Println("c in set?", set["c"])
}
A set is a collection of unique elements.
Union: elements in at least one of A or B Intersection: elements in both A and B Symmetric Difference: elements in either A or B but not both Difference: elements in A but not B
Except for Set Difference, all of these operations are associative and commutative.
Associativity: ((a + b) + c) == (a + (b + c)) Commutativity: (a + b) == (b + a)
package main
import (
"fmt"
"sort"
)
type StringSet map[string]bool // Intersect returns a copy containing the intersection of both sets. func (s StringSet) Intersect(t StringSet) StringSet { m := make(StringSet) for k := range s { if t[k] { m[k] = true } } return m } func main() { x := StringSet{"a": true, "b": true} y := StringSet{"b": true, "c": true} z := x.Intersect(y) fmt.Printf("%v & %v == %v\n", x, y, z) }
func (s StringSet) String() string {
lst := make(sort.StringSlice, 0, len(s))
for k, ok := range s {
if ok {
lst = append(lst, k)
}
}
sort.Sort(lst)
return fmt.Sprint(lst)
}
package main
import (
"fmt"
"sort"
)
type StringSet map[string]bool
// Intersect updates s in-place to contain the intersection of both sets. func (s StringSet) IntersectInPlace(t StringSet) { for k := range s { if !t[k] { delete(s, k) } } } func (s StringSet) Copy() StringSet { m := make(StringSet, len(s)) for k := range s { m[k] = true } return m } func main() { x := StringSet{"a": true, "b": true} // try with {"c": false} y := StringSet{"b": true, "c": true} z := x.Copy() z.IntersectInPlace(y) fmt.Printf("%v & %v == %v\n", x, y, z) }
func (s StringSet) String() string {
lst := make(sort.StringSlice, 0, len(s))
for k, ok := range s {
if ok {
lst = append(lst, k)
}
}
sort.Sort(lst)
return fmt.Sprint(lst)
}
Element type is theoretically inconsequential, but can practical consequences:
map[T]bool
requires discipline in use (three possible element states)map[T]struct{}
is simpler, more compact in memory, but needs more syntaxWe can choose to give the user convenience or control.
We can provide both at the expense of code/API volume.
We can alternatively provide both at the expense of API complexity:
func (s StringSet) Intersect(t StringSet, inplace bool) StringSet
Writing ad-hoc algorithms may be error prone.
It's very easy to under-test code.
Test coverage reports would not have helped find the previous issue.
There's no known technique for writing type-safe, polymorphic map-based set algorithms within the current Go language.
They may be impossible.
8We can settle on one or a small number of well-tested concrete implementations.
type Set map[string]struct{}
High overhead if your data isn't already in string-form.
API and binary bloat if you want to support all the builtin types, and...
Sets of structs, pointers, channels, and other non-builtin types can be very useful.
9
We can provide a reusable library based on map[interface{}]struct{}
We can use various reflect package techniques to avoid the explicit type assertions.
We can use code-generation.
map-based sets in Go are not competitive with other languages' "native" sets.
Go maps have excellent element-access performance, but the undefined iteration order greatly restricts the quality of the algorithms that can be used for the other set operations.
A custom ordered-map implementation could eliminate some performance barriers, but would likely have the same tradeoffs as regular maps.
Can we avoid these compromises and performance issues altogether?
To answer this question, we can look a Go stdlib package that faced a similar problem, and devised a simple, yet elegant solution.
12A polymorphic, type-safe, single algorithm implementation. But how?
type Interface interface { Len() int Less(i, j int) bool Swap(i, j int) }
golang.org/pkg/sort/#Interface
13Sorting is necessarily based around the concept of order:
int
typeint
represents an index; it is a proxy for the real value (brilliant!)type IntSlice []int func (p IntSlice) Len() int { return len(p) } func (p IntSlice) Less(i, j int) bool { return p[i] < p[j] } func (p IntSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
An ordering can be defined on sets too:
sort.Interface
implementation is implicitly a set implementation.sort.Sort
and set.Uniq
sanitize arbitrary inputs. None, either, or both may be used depending on how "prepared" the inputs already are.set.Union
and friends implement the set operations.func SymDiff(data sort.Interface, pivot int) (size int)
pivot
delimits the two concatenated input sets.
package main
import (
"fmt"
"sort"
"github.com/xtgo/set"
)
func main() {
a, b := sort.IntSlice{2, 3, 4}, sort.IntSlice{1, 2, 4, 5} data := append(a, b...) size := set.SymDiff(data, len(a)) c := data[:size] fmt.Println(c)
}
Set semantics are merely slice semantics.
17type Op func(data sort.Interface, pivot int) (size int) func Apply(op Op, data sort.Interface, pivots []int) (size int)
Concurrent, adaptive application of op
to all of the input sets.
op
must be mathematically associative; doc examples show set.Diff
approach.
package main
import (
"fmt"
"sort"
"github.com/xtgo/set"
)
func main() {
a := sort.IntSlice{2, 3, 5, 7, 11, 13} // primes b := sort.IntSlice{0, 1, 2, 3, 5, 8, 13} // Fibonacci numbers c := sort.IntSlice{1, 2, 5, 14, 42} // Catalan numbers data := append(append(a, b...), c...) pivots := set.Pivots(len(a), len(b), len(c)) size := set.Apply(set.Inter, data, pivots) data = data[:size] fmt.Println(data)
}
BenchmarkInter32 199 ns/op 1 allocs/op BenchmarkMapInter32 5530 ns/op 6 allocs/op BenchmarkInter64K 214941 ns/op 1 allocs/op BenchmarkMapInter64K 10522886 ns/op 370 allocs/op
BenchmarkMap*
correspond to straightforward map[int]struct{}
algorithmsBenchmarkInter*
correspond to algorithms based around sort.Interface
package main
import (
"fmt"
"math/rand"
"sort"
)
func Shuffle(data sort.Interface) { sort.Sort(proxy{data, rand.Perm(data.Len())}) } type proxy struct { data sort.Interface sort.IntSlice } func (s proxy) Swap(i, j int) { s.data.Swap(i, j); s.IntSlice.Swap(i, j) } func main() { data := sort.StringSlice{"zero", "one", "two", "three", "four"} Shuffle(data) fmt.Println(data) }
interface{}
could have been pivot
and size
intsMany data-structures with a definable order could have sort-style algorithms.
[]int
is a good proxy data-structure that can be used to form more complex algorithms on top of "concrete method" interfaces.
ex: merge sort and specialized sorts API-compatible with sort.Sort