This is a big one. If you are not familiar with ASR, here's a quote by Ben Gomes, Head of Search at Google:
Speech recognition and the understanding of language is core to the future of search and information, but there are lots of hard problems such as understanding how a reference works, understanding what ‘he’, ‘she’ or ‘it’ refers to in a sentence. It’s not at all a trivial problem to solve in language and that’s just one of the millions of problems to solve in language.
Mike Cohen, Manager of Speech Technologies at Google explains the process as follows (he is talking about English but the general principles are the same for any language):
So the lexical models are built by stringing together acoustic models, the language model is built by stringing together word models, and it all gets compiled into one enormous representation of spoken English, let’s say, and that becomes the model that gets learned from data, and that recognizes or searches when some acoustics come in and it needs to find out what’s my best guess at what just got said.
Speech recognition behaves well if we put a few restrictions on the conditions under which it is used. We currently cannot claim that our ASR engine works in all conditions and domains. But we think you will be pleased with its recognition power if you talk to the ASR engine like you are chatting to a close friend.
This is the demo version of the Myanmar Automatic Speech Recognition, so we do not guarantee 100% for the result.
AI Predicted Text :