[go: up one dir, main page]

Sets and The Universal Type

Safely solving more problems with the same language tools

7 July 2015

Kevin Gillette

Sets in Go

package main

import "fmt"

func main() {
    set := make(map[string]bool)
    set["a"] = true
    set["b"] = true

    // iterate over set elements (unordered)
    for elem := range set {
        fmt.Println(elem)
    }

    fmt.Println("c in set?", set["c"])
}
2

Sets (minimal theory)

A set is a collection of unique elements.

Union:                 elements in at least one of A or B
Intersection:          elements in both A and B
Symmetric Difference:  elements in either A or B but not both
Difference:            elements in A but not B

Except for Set Difference, all of these operations are associative and commutative.

Associativity: ((a + b) + c) == (a + (b + c))
Commutativity:       (a + b) == (b + a)
3

Convenient Implementation

package main

import (
	"fmt"
	"sort"
)

type StringSet map[string]bool

// Intersect returns a copy containing the intersection of both sets.
func (s StringSet) Intersect(t StringSet) StringSet {
    m := make(StringSet)
    for k := range s {
        if t[k] {
            m[k] = true
        }
    }
    return m
}

func main() {
    x := StringSet{"a": true, "b": true}
    y := StringSet{"b": true, "c": true}
    z := x.Intersect(y)
    fmt.Printf("%v & %v == %v\n", x, y, z)
}

func (s StringSet) String() string {
	lst := make(sort.StringSlice, 0, len(s))
	for k, ok := range s {
		if ok {
			lst = append(lst, k)
		}
	}
	sort.Sort(lst)
	return fmt.Sprint(lst)
}
4

Efficient/Flexible Implementation

package main

import (
	"fmt"
	"sort"
)

type StringSet map[string]bool

// Intersect updates s in-place to contain the intersection of both sets.
func (s StringSet) IntersectInPlace(t StringSet) {
    for k := range s {
        if !t[k] {
            delete(s, k)
        }
    }
}

func (s StringSet) Copy() StringSet {
    m := make(StringSet, len(s))
    for k := range s {
        m[k] = true
    }
    return m
}

func main() {
    x := StringSet{"a": true, "b": true} // try with {"c": false}
    y := StringSet{"b": true, "c": true}
    z := x.Copy()
    z.IntersectInPlace(y)
    fmt.Printf("%v & %v == %v\n", x, y, z)
}

func (s StringSet) String() string {
	lst := make(sort.StringSlice, 0, len(s))
	for k, ok := range s {
		if ok {
			lst = append(lst, k)
		}
	}
	sort.Sort(lst)
	return fmt.Sprint(lst)
}
5

Implementation Compromises (1/3)

Element type is theoretically inconsequential, but can practical consequences:

6

Implementation Compromises (2/3)

We can choose to give the user convenience or control.

We can provide both at the expense of code/API volume.

We can alternatively provide both at the expense of API complexity:

func (s StringSet) Intersect(t StringSet, inplace bool) StringSet
7

Implementation Compromises (3/3)

Writing ad-hoc algorithms may be error prone.

It's very easy to under-test code.

Test coverage reports would not have helped find the previous issue.

There's no known technique for writing type-safe, polymorphic map-based set algorithms within the current Go language.

They may be impossible.

8

Solutions? (1/3)

We can settle on one or a small number of well-tested concrete implementations.

type Set map[string]struct{}

High overhead if your data isn't already in string-form.

API and binary bloat if you want to support all the builtin types, and...

Sets of structs, pointers, channels, and other non-builtin types can be very useful.

9

Solutions? (2/3)

We can provide a reusable library based on map[interface{}]struct{}

We can use various reflect package techniques to avoid the explicit type assertions.

10

Solutions? (3/3)

We can use code-generation.

11

Performance?

map-based sets in Go are not competitive with other languages' "native" sets.

Go maps have excellent element-access performance, but the undefined iteration order greatly restricts the quality of the algorithms that can be used for the other set operations.

A custom ordered-map implementation could eliminate some performance barriers, but would likely have the same tradeoffs as regular maps.

Can we avoid these compromises and performance issues altogether?

To answer this question, we can look a Go stdlib package that faced a similar problem, and devised a simple, yet elegant solution.

12

Sorting in Go

A polymorphic, type-safe, single algorithm implementation. But how?

type Interface interface {
    Len() int
    Less(i, j int) bool
    Swap(i, j int)
}
13

The Universal Type (1/2)

Sorting is necessarily based around the concept of order:

type IntSlice []int

func (p IntSlice) Len() int           { return len(p) }
func (p IntSlice) Less(i, j int) bool { return p[i] < p[j] }
func (p IntSlice) Swap(i, j int)      { p[i], p[j] = p[j], p[i] }
14

The Universal Type (2/2)

An ordering can be defined on sets too:

15

Sets in Go: a different approach (1/4)

16

Sets in Go: a different approach (2/4)

func SymDiff(data sort.Interface, pivot int) (size int)

pivot delimits the two concatenated input sets.

package main

import (
	"fmt"
	"sort"

	"github.com/xtgo/set"
)

func main() {
    a, b := sort.IntSlice{2, 3, 4}, sort.IntSlice{1, 2, 4, 5}
    data := append(a, b...)

    size := set.SymDiff(data, len(a))
    c := data[:size]
    fmt.Println(c)
}

Set semantics are merely slice semantics.

17

Sets in Go: a different approach (3/4)

18

Sets in Go: a different approach (4/4)

type Op func(data sort.Interface, pivot int) (size int)

func Apply(op Op, data sort.Interface, pivots []int) (size int)

Concurrent, adaptive application of op to all of the input sets.

op must be mathematically associative; doc examples show set.Diff approach.

package main

import (
	"fmt"
	"sort"

	"github.com/xtgo/set"
)

func main() {
    a := sort.IntSlice{2, 3, 5, 7, 11, 13}   // primes
    b := sort.IntSlice{0, 1, 2, 3, 5, 8, 13} // Fibonacci numbers
    c := sort.IntSlice{1, 2, 5, 14, 42}      // Catalan numbers
    data := append(append(a, b...), c...)
    pivots := set.Pivots(len(a), len(b), len(c))

    size := set.Apply(set.Inter, data, pivots)
    data = data[:size]
    fmt.Println(data)
}
19

Good in theory, but in practice? (1/3)

20

Good in theory, but in practice? (2/3)

21

Good in theory, but in practice? (3/3)

BenchmarkInter32            199 ns/op     1 allocs/op
BenchmarkMapInter32        5530 ns/op     6 allocs/op
BenchmarkInter64K        214941 ns/op     1 allocs/op
BenchmarkMapInter64K   10522886 ns/op   370 allocs/op
22

Additional Possibilities: shuffle algorithm

package main

import (
	"fmt"
	"math/rand"
	"sort"
)

func Shuffle(data sort.Interface) { sort.Sort(proxy{data, rand.Perm(data.Len())}) }

type proxy struct {
    data sort.Interface
    sort.IntSlice
}

func (s proxy) Swap(i, j int) { s.data.Swap(i, j); s.IntSlice.Swap(i, j) }

func main() {
    data := sort.StringSlice{"zero", "one", "two", "three", "four"}
    Shuffle(data)
    fmt.Println(data)
}
23

Additional Possibilities: container/heap

24

Additional Possibilities

Many data-structures with a definable order could have sort-style algorithms.

[]int is a good proxy data-structure that can be used to form more complex algorithms on top of "concrete method" interfaces.

ex: merge sort and specialized sorts API-compatible with sort.Sort

25

Thanks (in chronological order)

26

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)