How Comcast Used AI to Make Its X1 Voice Remote Bilingual

Comcast's cloud-powered X1 voice remote started off as an English-only platform, and later added Spanish to the mix to handle searches for movies and TV shows or simple channel changes.

But rather than placing queries uttered in English and Spanish in their own distinct and separate silos, Comcast Corp. (Nasdaq: CMCSA, CMCSK) went a step further by making its voice system "bilingual" and able to understand both languages at the same time.

Comcast realized early on that such an undertaking would not scale by using manual processes -- at last check, the operator has activated about 23 million X1 voice remotes, and handled some 3.5 billion voice commands through the first half of 2018. Comcast estimates that it also processes more than 6 million Spanish-language voice commands each month. (See Comcast X1 Voice Remote Wins Emmy and Comcast Turns X1 Remote Into Phone Finder.)

Comcast estimates that X1 processes more than 6 million Spanish-language voice commands each month. 'Peliculas gratis' and 'free movies' are the top generic commands, respectively, in Spanish and English.
Comcast estimates that X1 processes more than 6 million Spanish-language voice commands each month. "Peliculas gratis" and "free movies" are the top generic commands, respectively, in Spanish and English.

The operator quickly learned that keeping X1 voice remote queries in separate English and Spanish buckets wouldn't do, as the platform would instead need to decipher words from both languages at the same time and likewise understand the underlying context of those queries.

"As we started to test our algorithm with Spanish-speaking customers, we started to learn that they typically speak English and Spanish in an interchangeable way," said Jeanine Heck, Comcast's vice president of AI product.

To tackle that challenge, Comcast developed an AI system and paired it with machine learning techniques to establish the foundation of that bilingual capability and continues to build on it with enhancements and by analyzing data in ways that aim to improve the overall accuracy of the results generated by the X1 voice remote platform.

While some words might sound the same in English and Spanish, they'll often have different meanings. With that in mind, the X1 voice remote system also had to go a layer deeper to figure out which of those words were intended for the English or Spanish definitions.

To accommodate that, Comcast passes voice commands through both an English- and Spanish-learning algorithm and uses machine learning techniques to pull the results together in a way that it can confidently reconcile a bilingual query and produce an accurate result.

The approach "differentiates our experience from a lot of other multilingual devices or platforms," Heck said, noting that other systems typically require the user to make a holistic changeover to a different language rather than stitching the languages together on one common search and navigation system.

Home in on the opportunities and challenges facing European cable operators. Join Light Reading for the Cable Next-Gen Europe event in London on November 6. All attendees get in free!

While automating that process as much as possible is critical, Comcast has been improving and streamlining the bilingual voice system with human-aided "supervised learning," which Heck notes is how most machine learning is done today. In that area, a team of real people analyzes voice commands received on the multilingual voice remote experience and labels them. Were the results or intent correct? Was the correct action taken? If not, the machine learning system is supplied with updated labeled data to help it improve its accuracy.

"We do that on a weekly basis," said Heck, who looks after all the AI that Comcast uses in its customer-facing products. "We have dedicated people who are labeling and training the algorithm to continually improve."

While the goal is 100% accuracy, the supervised learning angle has helped to improve accuracy by more than 10% in just a year.

"You can really make double-digit improvements to your accuracy if you really focus your efforts with data labeling," Heck explained. "We have made huge strides with our bilingual voice accuracy … because of this approach."

Prior to using labeled data and supervised learning, Comcast (at least for its initial English version) was using a more traditional, manual pattern-based approach that sought out clues on user intent, such as a search for a specific channel or a movie or TV show title.

Comcast's team started to employ a deep learning-based voice engine in 2014, creating a "real step-function of improvements in our accuracy," Heck said. "It was amazing to see the power of deep learning."

As for next steps, Comcast, at least on the TV side, is also looking into unsupervised learning that does not rely as heavily on human-labeled data but searches for clues from the aggregated data.

By way of example, Heck said that could enable Comcast to get a better understanding of viewing habits and be smarter with the results based on when a customer inputs a voice command.

— Jeff Baumgartner, Senior Editor, Light Reading

Be the first to post a comment regarding this story.
Sign In