Video compression is one of the most important issues in digital media and represents a great deal of pain to digital editors around the world. Namely, video dominates almost 80% of the Internet and video consumption on mobile devices doubles every 18 months. The existing codecs use no machine learning at all.
To try and resolve this issue, a group of enthusiasts got together in 2015 and founded WaveOne Inc., a startup for video compression solutions with deep learning, utilizing the latest advancements in machine learning (ML) and artificial intelligence (AI) “to create custom-tailored, context-dependent solutions”. Therefore they have started building a new algorithm for video compression, learned end-to-end for the low-latency mode using deep learning, staying true to their claim “context-adaptive compression of digital media”.
What does the WaveOne algorithm bring to the table?
Better compression means higher quality content, less buffering, and lower bandwidth expenses. The WaveOne team is on a good path to achieving these results, as they already report that their approach outperforms all existing video codecs across nearly the entire bitrate range, the first ML-based method to do so.
In addition, the team says that “on standard-definition videos, relative to our algorithm, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger.” Moreover, the WaveOne approach does not suffer from blocking artifacts and pixelation and produces better-looking videos.
WaveOne Vs. traditional codecs
According to WaveOne, their approach offers several advantages compared to traditional codecs:
- Prediction beyond translation – deep learning methods are better in predicting spatiotemporal patterns that cannot be easily described with a simple movement of pixels such as a person walking or an animal turning its head;
- Powerful motion vector representation – distributing the bandwidth so the more important areas have sophisticated motion boundaries and precise motion vectors, instead of traditionally partitioning the frame into a hierarchy of blocks and specifying the same motion vector for all pixels in a block, quantized to a particular precision;
- Propagation of a learned state – propagating arbitrary information that the algorithm learns is important, unlike traditional using of the previous frame and motion vectors as a “prior knowledge” to help encode the next frame;
- Joint compression of motion and residual – WaveOne uses a single information bottleneck and allows its network fine control over the optimal tradeoff for each frame;
- Multi-flow representation – instead of a single layer of motion vectors used by traditional codecs;
- Spatial rate control – which allows a single architecture to achieve the same performance that one can get by tuning separate architectures for each different bitrate.
WaveOne is the pioneer of ML and AI learned video compression and will definitely open this untapped field to other interested players. Although their solution so far focuses on low-latency mode only, they still have a lot of work to do on running in real-time or outperforming the entire rate-distortion range.
However, from what we have seen, the team seems capable enough to achieve great results and will certainly do their best to create solutions for other issues as well. For more details, you can also check their papers on arxiv.org, and of course, review their website.
Photo credit: The feature image has been done by Pawel Szvmanski. The example photos, graphics, and graphs are owned by WaveOne.