Messages in this thread |  | | Date | Mon, 23 Sep 2013 18:19:47 -0700 | | Subject | Re: [RFC GIT PULL] softirq: Consolidation and stack overrun fix | | From | Linus Torvalds <> |
| |
On Mon, Sep 23, 2013 at 5:10 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > BTW, that boils down to a choice between using r13 as either a TLS for > current or current_thread_info, or as a per-cpu pointer, which one is > the most performance critical ?
I think you can tune most of the architecture setup to best suit your needs.
For example, on x86, we don't have much choice: the per-cpu accessors are going to be faster than the alternatives, and there are patches afoot to tune the preempt and rcu-readside counters to use the percpu area (and then save/restore things at task switch time). But having the counters natively in the thread_info struct is fine too and is what we do now.
Generally, we've put the performance-critical stuff into "current_thread_info" as opposed to "current", so it's likely that if the choice is between those two, then you might want to pick %r13 pointing to the thread-info rather than the "struct task_struct" (ie things like low-level thread flags). But which is better probably depends on load, and again, some of it you can tweak by just making per-architecture structure choices and making the macros point at one or the other.
There's a few things that really depend on per-cpu areas, but I don't think it's a huge performance issue if you have to indirect off memory to get that. Most of the performance issues with per-cpu stuff is about avoiding cachelines ping-ponging back and forth, not so much about fast direct access. Of course, if some load really uses a *lot* of percpu accesses, you get both.
The advantage of having %r13 point to thread data (which is "stable" as far as the compiler is concerned) as opposed to having it be a per-cpu pointer (which can change randomly due to task switching) is that from a correctness standpoint I really do think that either thread-info or current is *much* easier to handle than using it for the per-cpu base pointer.
Linus
|  |