Running Android on a mainline graphics stack

By Jonathan Corbet
September 12, 2017

The Android system may be based on the Linux kernel, but its developers have famously gone their own way for many other parts of the system. That includes the graphics subsystem, which avoids user-space components like X or Wayland and has special (often binary-only) kernel drivers as well. But that picture may be about to change. As Robert Foss described in his Open Source Summit North America presentation, running Android on the mainline graphics subsystem is becoming possible and brings a number of potential benefits.

He started the talk by addressing the question of why one might want to use mainline graphics with Android. The core of the answer was simple enough: we use open-source software because it's better, and running mainline graphics takes us toward a fully open system. With mainline graphics, there are no proprietary blobs to deal with. That, in turn, makes it easy to run current versions of the kernel and higher-level graphics software like Mesa.

Getting the security fixes found in current kernels is worth a lot in its own right, but up-to-date kernels also bring new features, lots of bug fixes, better performance, and reduced power usage. The performance and power-consumption figures for most hardware tends to improve for years after its initial release as developers find ways to further optimize the software. Running a fully free system increases the possibilities for long-term support. Many devices have a ten-year (or longer) life span; if they are running free software, they can be supported by anybody. That is, Foss said, one of the main reasons why the GPU vendors tend not to open-source their drivers. Using mainline graphics also makes it possible to support multiple vendors with a single stack, and to switch vendors at will.

At the bottom of the Android graphics stack is the kernel, of course; but the layer above that tends to be a proprietary vendor driver. That driver, like most GPU drivers, has a substantial user-space component. Android's display manager is SurfaceFlinger; it takes graphical objects from the various apps and composes them onto the screen. The interface between SurfaceFlinger and the driver is called HWC2; it is implemented by the user-space component of the vendor driver. Among other things, HWC2 implements common interfaces like OpenGL and Vulkan.

The HWC2 interface is also responsible for composing objects into the final display and implementing the abstractions describing those objects. When possible, it will offload work from the GPU to a hardware-based compositor. In the end, he said, GPUs are not particularly good at composing, so offloading that work can speed it up and save power. HWC2 is found in ChromeOS as well as in Android.

To create an open-source stack, one clearly has to replace the proprietary vendor drivers. That means providing a driver for the GPU itself and an implementation of the HWC2 API. The latter can be found in the drm_hwc (or drm_hwcomposer) project, which was originally written at Google but which has since escaped into the wider community. It is sometimes used on Android systems now, Foss said, especially in embedded settings. The manufacturers of embedded devices are finding that their long-term support needs are well met with open-source drivers.

So a free Android stack is built around drm_hwc. It also includes components like Mesa and libdrm, and it's all based on the kernel's direct rendering manager (DRM) layer. Finally, there is a component called gbm_gralloc, which handles memory allocations and associates properties (which color format is in use, for example) with video buffers.

So what is the status of this work? There are a couple of important kernel components that were prerequisites to this support; one of those is buffer synchronization, which has recently been merged. This feature allows multiple drivers to collaborate around shared buffers; it was inspired by a similar feature in the Android kernel. Some GPU drivers now have support for synchronization. The other important piece was the atomic display API; it's the only API that supports synchronization. Most drivers have support for this API at this point, which is good, since HWC2 requires it.

There are a few systems where all of this works now. The i.MX6 processor with the Vivante gc3000 GPU has complete open-source support; versions with older GPUs are not yet supported at the same level. There is support for the DragonBoard 410c with the Adreno GPU. The MinnowBoard Turbot has an Intel HD GPU which has "excellent open-source software support". Finally, the HiKey 960 is a new high-end platform; it's not supported yet but that support is "in the works".

Foss concluded by saying that support for Android on the mainline graphics stack is now a reality for a growing number of platforms. The platforms he named are development boards and such, though, so your editor took the opportunity to ask if there was any prospect for handsets with mainline graphics support in the future. Foss answered that there are "rumors" that Google likes this work and is keeping an eye on it. Time will tell whether those rumors turn into mainstream Android devices that can run current mainline kernels with blob-free graphics support.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting your editor's travel to the Open Source Summit.]

Index entries for this article
Kernel	Android
Conference	Open Source Summit North America/2017

to post comments

Running Android on a mainline graphics stack

Posted Sep 13, 2017 3:09 UTC (Wed) by Tara_Li (guest, #26706) [Link] (10 responses)

> In the end, he said, GPUs are not particularly good at composing,
> so offloading that work can speed it up and save power.

I find this idea somewhat interesting - how much more ooomph is needed to do the composing, and why don't GPUs have that bit built in? After all, if you're going to need a separate unit to do GPU work, then another to do composing of everything the GPU generates, you're going to need another bus with hella bandwidth to get A to B.

And are there going to be more stages that get offloaded from the GPU - I know for a time, there were separate "physics engines" you could buy to offload some of *that* from the CPU/GPU - collision detection, flight of debris, etc...

Running Android on a mainline graphics stack

Posted Sep 13, 2017 5:43 UTC (Wed) by linusw (subscriber, #40300) [Link] (2 responses)

There are special hardware engines for composing, also called 2D accelerators.
On the ST-Ericsson ill-fated U8500 we had a hardware block called "B2R2" which reads "blit, blend, rotate and rescale", which is what compositors need. I vaguely recall that the TI OMAP had something similar. (Maybe someone can fill in?)
If there is a mainline kernel-to-userspace abstraction for these engines is another question. I think at the time it was made into a custom character device and used directly from what is now HWC2.

Running Android on a mainline graphics stack

Posted Sep 13, 2017 6:26 UTC (Wed) by zyga (subscriber, #81533) [Link]

My kernel knowledge is very inadequate to comment on potential APIs but, looking at various chipsets designed for STBs there were very power efficient hardware video mixers that did color conversion, scaling, and alpha blending (with limits). The idea was that a very slow CPU could offload video decoding (e.g. from the antenna or IP feed) to one block, render some simple UI onto a buffer and then blend those all together for free, every frame, perfectly. The video buffer was in YUV and the UI was in 565 RGB. There were also some specialized layers for subtitles and some other niche applications. Essentially each layer had some limited set of ways in which it could be used, with restrictions on buffer format, blending order etc.

The hardware I used to deal with ~15 years could handle one video and one bitmap layer. Later on we got more and more features, two video layers (one full features with better de-interlace and scaling features and one limited for picture-in-picture), additional layers arbitrary graphics for some nicer blending possibilities. All of this was on hardware that could not do any openGL.

Unfortunately none of that had sane drivers. At the time each vendor provided their own libraries to configure and use the video stack. Nowadays the problem is less visible because we get those speedy CPUs and even integrated graphics has a lot to offer but I suspect, if available and used correctly, we could save some power in idle-desktop / watching-video use cases.

OMAP DSS

Posted Sep 13, 2017 20:24 UTC (Wed) by rvfh (guest, #31018) [Link]

OMAP had a display subsystem (DSS) with several overlays (4 IIRC), at least one of witch could be made "secure" (inaccessible to Linux, only to the TEE, for DRM). It was storing pixels in a special crafted way that enabled cheap (free?) rotation.

I am no expert so if you know better feel free to correct me!

Running Android on a mainline graphics stack

Posted Sep 13, 2017 7:55 UTC (Wed) by daniels (subscriber, #16193) [Link]

Oh, GPUs have more than enough oomph. But running all that oomph takes power. So display controllers often have overlay planes built in, which lets you do composition far more efficiently. Particularly important for video, where you can see very real increases in runtime by avoiding the GPU entirely.

Running Android on a mainline graphics stack

Posted Sep 13, 2017 12:03 UTC (Wed) by excors (subscriber, #95769) [Link] (4 responses)

One major issue is that normal GPU rendering is optimised for writing an entire large framebuffer to DRAM, often in some complicated tiled pattern and with a non-raster storage format, to make best use of the GPU's many caches and its parallelism.

For display composition you don't want to write to memory at all - ideally you'd 'render' each pixel in raster order just as it's about to be sent out of the the HDMI port (or equivalent), and then you save all the latency and power cost of writing to DRAM in the GPU then reading it back in the display controller.

Usually a phone isn't doing much 3D GPU stuff, it's just displaying a few static images (status bar, app UI) and perhaps a decoded video, and the "rendering" is just some colour conversion and scaling and alpha-blending, so it's easy to do in raster order.

(In practice you'd probably render a few lines at once and store them in on-chip memory until they're sent out to the display, to tolerate some jitter in the rendering speed, but that's only a few KBs of memory so it's fast and cheap. You still get timing problems if e.g. you try to alpha-blend too many planes at once and the compositor fills the line buffer more slowly than the display consumes it, in which case you probably have to fall back to expensive OpenGL composition to avoid display glitches, and you need clever drivers to decide exactly when and how to fall back.)

As far as I'm aware, all modern mobile SoCs (except maybe the absolute cheapest terrible ones) have special hardware to do that, though they all do it with significantly different feature sets and are completely unrelated at the kernel level; the only standardisation is that they all implement the Android HWC HAL.

> if you're going to need a separate unit to do GPU work, then another to do composing of everything the GPU generates, you're going to need another bus with hella bandwidth to get A to B.

I think A and B are the same place. Mobile SoCs don't have dedicated VRAM like in discrete desktop GPUs - OpenGL will render to a framebuffer in the shared system DRAM, alongside all the other static images and decoded videos etc, and the compositing hardware will read all those layers straight from DRAM as it needs them.

> And are there going to be more stages that get offloaded from the GPU

Plenty have already - some chips used their GPU for video encoding, camera image processing, etc, and tend to move it into dedicated hardware eventually (to save power and improve performance). Vendors who don't have dedicated hardware for some feature argue strongly that their GPU is great and efficient and there's no need for dedicated hardware, and then a couple of years later their new chip moves that feature into dedicated hardware and they say how great it is now. CV algorithms and neural nets seem likely to be the next features to follow that pattern.

Running Android on a mainline graphics stack

Posted Sep 13, 2017 21:33 UTC (Wed) by roc (subscriber, #30627) [Link] (3 responses)

FirefoxOS ran on really terrible hardware and all of those chipsets had dedicated 2D compositing.

Running Android on a mainline graphics stack

Posted Sep 13, 2017 22:23 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

Your standards for "really terrible" might be excessively high :-) . I could be misremembering but I think some low-cost Broadcom chips (maybe BCM21664 or similar?), used in e.g. some Samsung phones a few years ago, had VideoCore 4 but with all the fun bits stripped out to minimise cost. That included removing the VPU and any clever display hardware, so compositing had to be done entirely in OpenGL. Admittedly that's a fairly old chip, I don't know if any more recent ones have similar limitations.

(Proper non-stripped-down VC4, like in Raspberry Pi, does compositing with HVS <https://dri.freedesktop.org/docs/drm/gpu/vc4.html>. The mainline vc4 driver uses that to implement DRM atomic mode setting.)

Running Android on a mainline graphics stack

Posted Sep 14, 2017 4:33 UTC (Thu) by roc (subscriber, #30627) [Link]

I think the lowest-end FirefoxOS phones were Qualcomm MSM7225A Snapdragon S1. The very lowest-end was 128MB RAM, so it really was terrible :-).

Running Android on a mainline graphics stack

Posted Sep 15, 2017 0:36 UTC (Fri) by anholt (subscriber, #52292) [Link]

I've wished before that I had some 21664 hardware (with a reasonable debug environment) that I could port the vc4 stack to. It's a shame to not cover it.

Running Android on a mainline graphics stack

Posted Sep 13, 2017 13:03 UTC (Wed) by mjthayer (guest, #39183) [Link]

>> In the end, he said, GPUs are not particularly good at composing,
>> so offloading that work can speed it up and save power.

> I find this idea somewhat interesting - how much more ooomph is needed to do the
> composing, and why don't GPUs have that bit built in? After all, if you're going to need
> a separate unit to do GPU work, then another to do composing of everything the GPU
> generates, you're going to need another bus with hella bandwidth to get A to B.

I thought that embedded GPUs tended to use main RAM directly rather than dedicated video memory. If that is what you meant.

Fame

Posted Sep 13, 2017 4:07 UTC (Wed) by bojan (subscriber, #14302) [Link]

> The Android system may be based on the Linux kernel, but its developers have famously gone their own way for many other parts of the system.

Surely, you meant infamously there Jon. :-)

Running Android on a mainline graphics stack

Posted Sep 13, 2017 22:01 UTC (Wed) by bero (subscriber, #89787) [Link] (1 responses)

We've seen it unofficially running on handsets -- at last year's Linaro Connect, we've demoed a Nexus 7 running a fully free graphics stack.
Given the Nexus 5, 6, 5X and 6P as well as the Pixels use Adreno GPUs as well, it should be doable for those devices as well (and yes, experiments are underway).

Running Android on a mainline graphics stack

Posted Sep 14, 2017 17:57 UTC (Thu) by rahvin (guest, #16953) [Link]

The trick here is to Convince Google that it's in their interest to shift Android back to the mainline stack rather than their in house monster. Given Google's propensity to reinvent the wheel for everything they do I don't put high hopes on them recognizing the benefit of offloading this code and directly contributing back changes needed for android.

Hell, they'd probably replace the kernel with something in house if they could, oh wait they are already moving that direction.

Running Android on a mainline graphics stack

Posted Sep 15, 2017 16:56 UTC (Fri) by markjanes (guest, #58426) [Link]

Android-IA is an effort to implement Android on mainline kernel/mesa. It has it's own open implementation of hwcomposer. My understanding is that it currently runs well on Intel hardware.
https://01.org/android-IA