Voice recognition was once a concept that the public usually associated with media such as “Star Trek” and “Minority Report”. It was simply a cool, futuristic idea.
Fast forward today, products such as Amazon’s Alexa and Google Assistant have made the faraway science fiction ideas into usable items that may be sitting in your living room. Phrases associated to these devices have seeped into modern culture; just ask anyone named “Alexa” how many jokes are thrown their way. The launch of Apple’s Siri service in 2011 captured the public’s imagination with it’s slick visuals and an intonation and inflection that was close to being human. However, early public technologies were slow, erroneous and often quite inefficient. These problems stemmed from the multitude of technical problems that recognizing human speech can provide. Human speech is remarkable. It allows us to share a huge amount of information in a rather succinct and accurate manner. However, it is not completely consistent, it does not obey most of the rules that we have for writing, and its transformation through dialects, accents and slang can impose some sort of a challenge. Many early attempts could not handle so many variations in speech patterns, and other systems struggled in processing continuous speech. The breakthrough came with the use of neural networks and of the Hidden Markov Model. Their technicalities are different but they essentially perform the same function: they use the information that is known to the system to figure out what is still hidden. It allows programs to search through increasingly larger vocabularies and to match these to what is actually being said. It is these that have allowed for the accurate voice recognition we see in modern devices. These voice recognition services have also provided an important commercial battleground. These devices are now useful in completing everyday queries like providing weather reports or noting down a shopping list. Consumers are becoming increasingly familiar with them and are willing to use more of them within their homes. Greater familiarization with such devices also leads consumers to make purchases through their systems (62% of Google home users reported their device to make purchase) have made it into an important sales ground. In summary, these services were once simply gimmicks in the commercial space, something similar to bonus features, not to be considered as a main selling point. This has now certainly changed. Fancier technologies such as machine learning and flying cars may one day steal the headlines, but as Google’s head of search, Ben Gomes, states: “Speech recognition and the understanding of language is core to the future of search and information”. We are more than eager to follow up on its development and track down its progression in the next few years. |