When my company, Vonage Holdings Corp. (NYSE: VG), made the strategic decision to offer mobile video services a few yeas ago, the first step from a technology perspective was to evaluate the codec options and make a recommendation: H.264 or VP8.
These were the very early days of webRTC, and no one had yet implemented VP8 on a mobile device -- certainly not on an iPhone -- so the choice was quite difficult to make.
In the scenario we were exploring, a point-to-point video conversation between two smartphones, both codecs were comparable in terms of quality and resiliency under adverse network conditions. H.264 was a bit better in terms of CPU usage and battery consumption (due to native support), and was an established, tried, and tested, and widely-deployed mature technology. VP8 was relatively unknown, but Google (Nasdaq: GOOG) believed in it enough to include it in webRTC. Our developers liked its open-source framework, which gave them complete freedom to innovate and troubleshoot. And it was free.
Decisions, decisions. We ended up choosing VP8, and looking back now, that was the right path to take.
One reason the results of our comparison between VP8 and H.264 were so close was that we had no need for a major feature of H.264, namely scalability. Scalability gives the codec the ability to detect the constraints of each endpoint in a call, based on the power and resources of the device and the limitations of network connectivity, and to serve a stream optimized for each user’s conditions. This technology is available via an annex to the H.264 codec called SVC, Scalable Video Coding, as opposed to the standard flavor, AVC, Advanced Video Coding.
Among other things, such as error resiliency, SVC provides crucial advantages in a multi-user context such as video conferencing. Virtually all of the current video conferencing solutions use H.264 SVC, including, ironically, VP8’s patron, Google, in its Hangouts offering.
The titans have chosen sides in the codec war: Microsoft Corp. (Nasdaq: MSFT) and Apple Inc. (Nasdaq: AAPL) championing H.264, and Google pushing VP8. But all this is just the undercard leading up to the main event. The real battle is set for the near future with the evolution of video codecs into the next generation: In one corner H.265, and in the other, VP9. Both will offer improved compression; more efficient CPU and power consumption; more robust resiliency and error correction; and overall higher quality and a better user experience. VP9 will be part of webRTC, and will therefore be ubiquitous, deployed in billions of browsers within a few years.
In addition, Google has recently announced a partnership with the scalability guru, Vidyo Inc. , to insure that webRTC and VP9 offer the same SVC capabilities as H.265 in terms of video conferencing and multi-user applications. And VP9 will remain free. I should point out that to take full advantage of scalability in a real-world, multi-user context, there is considerable back-end server work that is required. WebRTC is strictly a client technology, and there are and will be many companies with excellent business models that provide added value paying services in the middle.
So if the future levels the playing field from the technology and feature perspectives, why would anyone choose to pay for licensing H.265 instead of using VP9 for free? Although Cisco has recently announced that they will be open-sourcing the binaries to H.264, it remains to be seen whether this will satisfy developers.
In any event, it is doubtful that H.265 will be free, at least for the foreseeable future. One suggestion has to do with an issue that we wrestled with when considering VP8 vs. H.264. R&D will offer its technology recommendations, and finance will provide spreadsheets with scenarios and projections, but ultimately, one department holds veto power, namely legal. VP8 and presumably VP9 are clouded in infringement uncertainty and stories are whispered around the boardroom table. The unstoppable force of webRTC’s ubiquitous and free technology meets the immovable object of legal indemnity.
But can’t we all live in harmony? What about transcoding between H.265 and VP9, thus allowing interoperability between those who choose one path or another?
I posed this question to former Vidyo SVP Marty Hollander, who answered that in theory, transcoding is possible, but that it would not work in practice. When you calculate the time required for buffering and encoding a video signal on one side and decoding it on the other and add to that the network delay (we still have not found a way to move packets faster than the speed of light), you reach a value close to a quarter of a second. That is still reasonable for a real-time conversation offering users an experience comparable to face to face. As long as the lag time remains short, video users will be able to interact, to interrupt, to respond, and to read gestures and expressions as they would if the person were in front of them. Add the buffering and computational requirements of even the most efficient transcoding algorithms and you push the delay beyond the threshold for normal conversation.
Once people get used to a video experience that feels natural (i.e. when both endpoints use the same codec), they will not accept anything less.
So the battle lines are drawn and over the next months the combatants will take their sides on the field. Will there be one winner, or will the communication world remain bifurcated through the next round of video technology? Ultimately, if video remains divided, users will be the real losers since the dream of unified communication across services will be sacrificed.
This is what happens, as the Joker said to Batman, when an unstoppable force meets an immovable object.
— Baruch Sterman, PhD, VP Technology Research