GStreamer: state of the union
The annual GStreamer conference took place October 21-22 in Prague, (unofficially) co-located with the Embedded Linux Conference Europe. The GStreamer project is a library for connecting media elements such as sources, encoders and decoders, filters, streaming endpoints, and output sinks of all sorts into a fully customizable pipeline. It offers cross-platform support, a large set of plugins, modern streaming and codec formats, and hardware acceleration as some of its features. Kicking off this year's conference was Tim-Philipp Müller with his report on the last 12 months of development and what we can look forward to next.
The core team has been sticking to a more or less six-month release schedule, adjusted somewhat for other timelines. The project is aiming to land the next 1.14 release enough ahead of the Ubuntu 18.04 long-term support version so that for the next few years there is a relatively recent version for developers to base their work on.
Project components
There is a system of categorization (worth a read for the amusing descriptions) for plugins based on the film The Good, the Bad and the Ugly. These plugins comprise most of the useful functionality of GStreamer. "gst-plugins-good" is made up of plugins with solid documentation, tests, and well-written code that should be used as examples for writing new plugins. "gst-plugins-ugly" is similar in quality to "good" but may pose distribution issues because of patents or licenses. "bad" is for all the rest that are of varying quality, perhaps not well documented, or not recommended for production use and to base new plugins on.
There is an ongoing mission in the GStreamer project to consolidate the platform by trying to promote plugins and pieces of code from the "bad" repository into "good" by fixing whatever may be at issue. There is now an effort to clearly document why a plugin remains in the "bad" category so that contributors know what needs to be fixed and maintainers can remember why a plugin was considered unfit and re-assess it at a future date.
Patents on the MP3 and AC-3 audio codecs have expired. The
mpg123 decoder and the lame MP3 encoder have been
moved to "good", though the GPL liba52-based
a52dec decoder for AC-3 must remain in "ugly". GStreamer
itself is released under the LGPL, so GPL plugins that could pose problems
for distributors wind up in "ugly".
Performance and multiprocess improvements
Ongoing efforts to improve parallelism have paid off; the operations
of video scaling and conversion are now multithreaded, and an upcoming ipcpipeline
can be
used to split pipelines across multiple processes. Multiprocessing can be
used to isolate potentially dangerous demuxers, parsers, and decoders. This
is a concern for anyone parsing user input in the form of media
files, which is a longstanding source of application vulnerabilities.
There have been fruitful efforts to use an abstracted zero-copy
mechanism in DMABuf and other operations involving passing
buffers between sources, elements, and sinks. Memory allocation queries
for the tee pipeline element are now aggregated for zero-copy.
High-speed playback in the DASH
HTTP adaptive streaming format used by Chromecast among other things is
being enhanced. Playing a media file at faster than the normal rate, such
as listening to a podcast at 2x speed, normally consumes more bandwidth
than regular playback. New support has landed to reduce those bandwidth
requirements for DASH.
Hardware acceleration support continues to improve in the form of
improvements to the integration with the video acceleration
API (VA API). Encoders now have ranks so that they can be chosen to
prefer those that have
hardware acceleration. Support for
libva 2.0 (VA API 1.0) has been enhanced. Static analysis
issues found by the
Coverity tool
were all fixed. There is a new low-latency mode for H.264 decoding. Constant bit rate
for VP9, variable bit rate for H.265, and a "quality" parameter for
encoding are all now supported.
Other new features
There is now comprehensive support in the upstream gst-plugins-bad for using
Intel's Media
SDK, which is an SDK offered for recent Intel chip platforms such as
Apollo Lake, Haswell, Broadwell, and Skylake. This enables hardware
acceleration for encoding and decoding common video formats, video
post-processing, and rendering. The goal is to make it easy for developers
to "use MSDK in their GStreamer-based applications with minimal knowledge
of the MSDK API
".
Work
to allow x264enc (using the GPL-licensed libx264 encoder)
to be used with multiple bit depths was described as "very hard" due to the
fact that the bit depth must be specified at compile time for the
library. Now multiple versions of the library can be compiled and then
selected at run time.
Timed Text Markup Language support has been added. This is part of the SMPTE closed-captioning standard for online video, and has potential to be a general intermediary representation for text subtitles and captions.
rtpbin is now
enhanced for accepting and demultiplexing bundled RTP streams for
purposes of constructing a WebRTC pipeline. This greatly simplifies the
process for doing W3C-standards-based live video streaming and conferencing
in a web browser using gstreamer and will soon be mandated by WebRTC. More
on new WebRTC features further down.
A rewrite of splitmuxsink
to be more deterministic about splitting output streams based on time or
size has been done. Typical uses of this element involve segmenting recordings of things
such as surveillance video inside of older container formats (e.g. classic
MP4) that cannot be split in arbitrary locations without properly
finalizing the file by writing out headers and tables and beginning a new
file.
In the upcoming release the hlssink2 element will take
elementary streams as input and output chunks for HTTP Live Streaming (HLS)
making use of the splitmuxsink element.
In addition, GstShark was demonstrated at the conference, which enables rich pipeline tracing and graphing capabilities. It is particularly useful for pinpointing causes of poor frame rates or high latency.
A casual mention was made of the fact that GStreamer now has the first implementation of the Real Time Streaming Protocol (RTSP) version 2.0, both for client and server. RTSP is in wide use for controlling live media streams from devices such as IP cameras.
There is interest in using the systems programming language Rust for GStreamer development to improve memory and thread safety and to use modern language features. In another talk, Sebastian Dröge described the current state of the GStreamer Rust bindings. They have been in development for quite some time and many people are actively developing with them. The bindings provide a mostly complete set of functionality for both application and plugin development, and no longer require the use of "unsafe" sections by users of the bindings. They are mostly auto-generated now via GObject introspection and have a native Rust feel while retaining the same API usage patterns familiar to anyone used to working with GStreamer.
Future work
Debugging improvements slated for the coming 1.14 release include debug log ring buffer storage, which is useful for keeping recent logs for long running tasks or in disk-constrained environments, Müller said. A "new more reliable" leak tracer that is now more thread-safe and supports snapshots and dumping a list of live objects is also planned. The leak tracer is currently very Unix-specific as it relies on Unix signals, so work is needed to come up with a suitable mechanism on Windows as well.
Plans of varying concreteness were mentioned for future work and improvements:
- Adaptive streaming with multiple bit rates could be improved for DASH and HLS.
- Internally implementing a stream-handling API could be done in more demuxers as well as adding better handling of stream deactivation support.
- More native sink elements for output on Windows, iOS, and Android are needed along with native UI toolkits.
- Windows code needs to be updated to use newer APIs and legacy support for Windows XP should be dropped.
- Android, iOS, and macOS "need some love" to catch up with the latest versions.
- Adding support for the ONVIF surveillance camera interoperability standard, including for things like audio back channel and special modes.
- Better OpenCV integration for the popular computer vision library.
- Support for virtual reality formats.
The conference kickoff concluded with an open question about how the project can better interact with users and contributors. The existing workflow of attaching patch files to Bugzilla entries may feel cumbersome to some contributors (such as the author) compared to modern pull-request workflows. There is a desire to move to an open source solution such as GitLab, which would provide pull requests and help track sets of changes that may span multiple repositories. GitHub was explicitly mentioned as a platform that the project will not be moving to due to its proprietary nature, which is something the free-software project is exactly not about. While there is already a mailing list and excellent IRC channel, there is a possibility that a "proper" forum may be coming soon as well for people to have discussions and post multimedia.
GStreamer and WebRTC
A hot topic right now both at the recent IETF conference and at the GStreamer conference was WebRTC support; Müller mentioned that he was getting asked 30 times a day "how do I stream to my web browser?" WebRTC is a draft standard being worked on by the W3C and IETF to enable live video and audio streaming in a web browser, something that until recently was only achievable with Flash, in practice, or in limited server-side use with HLS for cross-platform and browser compatibility. WebRTC makes peer-to-peer videoconferencing in a web browser possible, although it has other uses cases, such as simplifying the streaming of live video to or from a web browser and even telephony, as in several existing WebRTC-to-SIP gateways.
Development has been active and ongoing for rich WebRTC support in GStreamer. Matthew Waters came to Prague all the way from Australia to talk about building a selective-forwarding server that can support multi-party conferencing with GStreamer. WebRTC is a peer-to-peer protocol, so multi-party conferences without a server in the middle handling media streams is prohibitively expensive for a large number of users. This is because each and every peer in the call will stream to every other peer in a mesh-style network. Another possible design for multi-party WebRTC is a central mixing server, also called an MCU (Multi-point Control Unit), which moves most of the cost to the provider instead. A middle ground where the server only forwards the media streams to the other peers (a Selective Forwarding Unit) is a good compromise for sharing the computational and bandwidth costs between the user and the provider.
Waters was able to achieve this by creating a new element,
webrtcbin, that provides the necessary network transport
requirements for a WebRTC session such as DTLS and SRTP (encrypted datagram
formats for streaming media), trickle
ICE
for network traversal and shorter call setup times, and the trusty
rtpbin element for RTP all wrapped up with an API similar to
W3C
JavaScript PeerConnection API.
While it can be complicated to write a server to handle the many moving parts required to do WebRTC well, GStreamer makes it eminently practical to construct fully customized client and server applications with this relatively new protocol. As is frequently the case, the GStreamer project is not only on the forefront of emerging media technologies but the talented and dedicated community is quick to showcase examples and demonstrations of how to make use of the features in a non-trivial application that makes it look easy.
[Videos of this year's talks can be found online.]
| Index entries for this article | |
|---|---|
| GuestArticles | Spiegelmock, Mischa |
| Conference | GStreamer Conference/2017 |