1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
// This module provides a relatively simple thread-safe pool of reusable
// objects. For the most part, it's implemented by a stack represented by a
// Mutex<Vec<T>>. It has one small trick: because unlocking a mutex is somewhat
// costly, in the case where a pool is accessed by the first thread that tried
// to get a value, we bypass the mutex. Here are some benchmarks showing the
// difference.
//
// 1) misc::anchored_literal_long_non_match 21 (18571 MB/s)
// 2) misc::anchored_literal_long_non_match 107 (3644 MB/s)
// 3) misc::anchored_literal_long_non_match 45 (8666 MB/s)
// 4) misc::anchored_literal_long_non_match 19 (20526 MB/s)
//
// (1) represents our baseline: the master branch at the time of writing when
// using the 'thread_local' crate to implement the pool below.
//
// (2) represents a naive pool implemented completely via Mutex<Vec<T>>. There
// is no special trick for bypassing the mutex.
//
// (3) is the same as (2), except it uses Mutex<Vec<Box<T>>>. It is twice as
// fast because a Box<T> is much smaller than the T we use with a Pool in this
// crate. So pushing and popping a Box<T> from a Vec is quite a bit faster
// than for T.
//
// (4) is the same as (3), but with the trick for bypassing the mutex in the
// case of the first-to-get thread.
//
// Why move off of thread_local? Even though (4) is a hair faster than (1)
// above, this was not the main goal. The main goal was to move off of
// thread_local and find a way to *simply* re-capture some of its speed for
// regex's specific case. So again, why move off of it? The *primary* reason is
// because of memory leaks. See https://github.com/rust-lang/regex/issues/362
// for example. (Why do I want it to be simple? Well, I suppose what I mean is,
// "use as much safe code as possible to minimize risk and be as sure as I can
// be that it is correct.")
//
// My guess is that the thread_local design is probably not appropriate for
// regex since its memory usage scales to the number of active threads that
// have used a regex, where as the pool below scales to the number of threads
// that simultaneously use a regex. While neither case permits contraction,
// since we own the pool data structure below, we can add contraction if a
// clear use case pops up in the wild. More pressingly though, it seems that
// there are at least some use case patterns where one might have many threads
// sitting around that might have used a regex at one point. While thread_local
// does try to reuse space previously used by a thread that has since stopped,
// its maximal memory usage still scales with the total number of active
// threads. In contrast, the pool below scales with the total number of threads
// *simultaneously* using the pool. The hope is that this uses less memory
// overall. And if it doesn't, we can hopefully tune it somehow.
//
// It seems that these sort of conditions happen frequently
// in FFI inside of other more "managed" languages. This was
// mentioned in the issue linked above, and also mentioned here:
// https://github.com/BurntSushi/rure-go/issues/3. And in particular, users
// confirm that disabling the use of thread_local resolves the leak.
//
// There were other weaker reasons for moving off of thread_local as well.
// Namely, at the time, I was looking to reduce dependencies. And for something
// like regex, maintenance can be simpler when we own the full dependency tree.
use ;
use ;
use Mutex;
/// An atomic counter used to allocate thread IDs.
static COUNTER: AtomicUsize = new;
thread_local!;
/// The type of the function used to create values in a pool when the pool is
/// empty and the caller requests one.
type CreateFn<T> =
;
/// A simple thread safe pool for reusing values.
///
/// Getting a value out comes with a guard. When that guard is dropped, the
/// value is automatically put back in the pool.
///
/// A Pool<T> impls Sync when T is Send (even if it's not Sync). This means
/// that T can use interior mutability. This is possible because a pool is
/// guaranteed to provide a value to exactly one thread at any time.
///
/// Currently, a pool never contracts in size. Its size is proportional to the
/// number of simultaneous uses.
// SAFETY: Since we want to use a Pool from multiple threads simultaneously
// behind an Arc, we need for it to be Sync. In cases where T is sync, Pool<T>
// would be Sync. However, since we use a Pool to store mutable scratch space,
// we wind up using a T that has interior mutability and is thus itself not
// Sync. So what we *really* want is for our Pool<T> to by Sync even when T is
// not Sync (but is at least Send).
//
// The only non-sync aspect of a Pool is its 'owner_val' field, which is used
// to implement faster access to a pool value in the common case of a pool
// being accessed in the same thread in which it was created. The 'stack' field
// is also shared, but a Mutex<T> where T: Send is already Sync. So we only
// need to worry about 'owner_val'.
//
// The key is to guarantee that 'owner_val' can only ever be accessed from one
// thread. In our implementation below, we guarantee this by only returning the
// 'owner_val' when the ID of the current thread matches the ID of the thread
// that created the Pool. Since this can only ever be one thread, it follows
// that only one thread can access 'owner_val' at any point in time. Thus, it
// is safe to declare that Pool<T> is Sync when T is Send.
//
// NOTE: It would also be possible to make the owning thread be the *first*
// thread that tries to get a value out of a Pool. However, the current
// implementation is a little simpler and it's not clear if making the first
// thread (rather than the creating thread) is meaningfully better.
//
// If there is a way to achieve our performance goals using safe code, then
// I would very much welcome a patch. As it stands, the implementation below
// tries to balance safety with performance. The case where a Regex is used
// from multiple threads simultaneously will suffer a bit since getting a cache
// will require unlocking a mutex.
unsafe
/// A guard that is returned when a caller requests a value from the pool.
///
/// The purpose of the guard is to use RAII to automatically put the value back
/// in the pool once it's dropped.