{"id":36,"date":"2023-03-31T09:24:48","date_gmt":"2023-03-31T09:24:48","guid":{"rendered":"https:\/\/www.intellisofttechnologies.com\/blog\/?p=36"},"modified":"2023-05-08T13:02:17","modified_gmt":"2023-05-08T13:02:17","slug":"speech-recognition-and-its-dilemmas","status":"publish","type":"post","link":"https:\/\/www.intellisofttechnologies.com\/blog\/2023\/03\/31\/speech-recognition-and-its-dilemmas\/","title":{"rendered":"Speech Recognition And Its Dilemmas!"},"content":{"rendered":"\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<h2 class=\"wp-block-heading\">Contents<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#one\">What is speech recognition?<\/a><\/li>\n\n\n\n<li><a href=\"#two\">Speech recognition algorithms and computation techniques<\/a><\/li>\n\n\n\n<li><a href=\"#three\">What makes speech recognition hard?<\/a><\/li>\n\n\n\n<li><a href=\"#four\">4 major speech recognition challenges<\/a><\/li>\n<\/ul>\n<\/div><\/div>\n\n\n\n<h1 class=\"wp-block-heading has-vivid-red-color has-text-color\"><strong>Can language be mastered with algorithms?<\/strong><\/h1>\n\n\n\n<p class=\"has-black-color has-text-color\">Ah, the stark irony in this sentence is a highly debatable topic, making the answer to this question less significant.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Linguistics marrying mathematics is a complex affair, not just because of the different dynamics of these fields but due to their core foundational essence. Think back to school days, when some of us excelled in languages but struggled with numbers or vice versa. Not many find a perfect balance between these two fields (or abilities).<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">But artificial intelligence (AI) aims to achieve that balance. It strives to make human-machine interactions smoother by decoding and identifying spoken languages and converting them into text.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">The recent advancements backed by AI and Machine Learning around integrating grammar, structure, accents, syntax, dialects, and adaptability have streamlined the pace and efficacy of <em>h<\/em><em>uman-computer interaction (HCI). <\/em>It has revolutionised the overall modern communication experience.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/speech.jpg\" alt=\"\" class=\"wp-image-106\" width=\"512\" height=\"493\" title=\"Speech recognition image by Intellisoft Technologies\" srcset=\"https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/speech.jpg 1024w, https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/speech-300x289.jpg 300w, https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/speech-768x739.jpg 768w, https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/speech-600x577.jpg 600w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading has-vivid-red-color has-text-color has-large-font-size\" id=\"one\"><strong>What\u2019s and Whys About Speech Recognition<\/strong><\/h2>\n\n\n\n<p class=\"has-black-color has-text-color\">Automatic Speech Recognition (ASR) or computer speech recognition is a machine\u2019s ability to convert human speech into a written format, hence the name <em>Speech-to-Text. <\/em>However, it is quite often confused with voice recognition.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Voice recognition technology primarily focuses on identifying <em>individual user voices<\/em> using biometric technology.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Think of speech recognition as the initial trigger that enables voice technology to perform smoothly. We owe it to ASR technology for the quick, fun, and adaptive responses of Alexa, Cortana, or Siri (our beloved voice assistants!). Had it not been for speech recognition and its advancements, our speech would have been just audio recordings to the computers\u2014even today.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Now, let\u2019s take a glance at how Speech Recognition functions: <em>analysing the audio, breaking it into smaller parts, converting it into a machine-friendly (readable) structure, and finally using algorithms to interpret it for producing the most apt text presentation. <\/em><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">This technology is assessed on its accuracy rate, viz., speed and word error rate, or WER. Factors such as accent, volume, pronunciation, background noise, industry-specific jargon, etc., directly affect the WER.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-vivid-red-color has-text-color has-medium-font-size\" id=\"two\"><strong>A few speech recognition algorithms and computation techniques:<\/strong>&nbsp;<\/h2>\n\n\n\n<ul class=\"has-black-color has-text-color wp-block-list\">\n<li>Natural Language Processing (NLP),&nbsp;<\/li>\n\n\n\n<li>Hidden Markov models (HMM),&nbsp;<\/li>\n\n\n\n<li>Neural networks,&nbsp;<\/li>\n\n\n\n<li>N-grams,&nbsp;<\/li>\n\n\n\n<li>Speaker Diarization (SD).<\/li>\n<\/ul>\n\n\n\n<p class=\"has-vivid-red-color has-text-color has-medium-font-size\"><em>ASR has become a highly innovative and speculative field generating metadata across sources. <\/em>As per Gartner\u2019s predictions-<strong> <\/strong>25% of employee interactions with various applications will be mainly via voice by 2023. <\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-medium-font-size\">A few main reasons behind its growing popularity are:&nbsp;<\/p>\n\n\n\n<ul class=\"has-black-color has-text-color wp-block-list\">\n<li>High Speed&nbsp;<\/li>\n\n\n\n<li>Predictive outcomes (or analytics) it can deliver<\/li>\n\n\n\n<li>Its role in accelerating automation&nbsp;<\/li>\n\n\n\n<li>Its ability to cater exceptionally well to the rapidly growing \u201cremote world&#8221;.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading has-vivid-red-color has-text-color has-large-font-size\" id=\"three\"><strong>What makes speech recognition challenging?<\/strong><\/h3>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why is speech recognition hard?<\/strong><\/h3>\n\n\n\n<p class=\"has-black-color has-text-color\">Our language is arbitrary. Hence, its peculiarities and complexities make it very challenging for the machine to analyse and produce error-free transcription. Further, the involvement of various abbreviations, syntaxes, acronyms, phrases, dialects, accents, context, semantics, pragmatics, etc., poses dilemmas limiting ASR\u2019s efficacy, efficiency, and accuracy.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-full is-resized\"><img data-dominant-color=\"e5d5e6\" data-has-transparency=\"false\" style=\"--dominant-color: #e5d5e6;\" decoding=\"async\" src=\"https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/04\/Wavy_Tech-20_Single-05-1-300x288-1.webp\" alt=\"\" class=\"not-transparent wp-image-141\" width=\"300\" height=\"288\"\/><figcaption class=\"wp-element-caption\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.freepik.com\/free-vector\/artificial-intelligence-abstract-concept-illustration_12291056.htm\" target=\"_blank\">Image by vectorjuice<\/a> <\/figcaption><\/figure>\n<\/div>\n\n\n<p class=\"has-vivid-red-color has-text-color\" id=\"four\"><strong>The biggest speech recognition challenges<\/strong>:<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>1) Imprecision and Misinterpretations:<\/strong> Context is key!<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">To master this, the machine would have to learn but, most importantly, understand the difference between hearing and listening. While communicating, we take into account the speaker\u2019s expressions, body language, tone, and pitch and then determine the meaning (as well as the sentiments behind it).<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">But for the machine, it is a tough spot since they lack contextual experience (and sentiments) and run solely on algorithms.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>2) Background Noise: <\/strong>It hinders accuracy big time<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Loud surroundings and background noise make speech recognition unreliable and unfit for large public spaces. The technology lags in mitigating and filtering background noises to isolate the human voice. Hence, additional external devices (like headsets) can help in this scenario. But that is just too much extra baggage. Another aid here is acoustic training, but it has its limitations too.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>3) Language Base: <\/strong>The more, the merrier!<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">The current gap in language coverage provides a barrier to adoption. The large number of varying accents and dialects is among the major factors impacting accuracy. That\u2019s why we not only need more languages in the arena but also need to include more accents and dialects. It can help by providing more exposure, experience, and learning opportunities for the machine.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>4) Data Security and Privacy: <\/strong>Cost and implementation<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">For a machine to learn and train, massive data input is required. The current approach to obtaining data via paid research or studies is very restricting. It forms a fraction of the total voice data generated in this digital age. Accessing, using, and managing the collected data raises questions about data security and individual user privacy.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">This conflict of interest narrows the availability of the data inputs required for AI, making data accessibility even harder.<\/p>\n\n\n\n<p class=\"has-vivid-red-color has-text-color has-medium-font-size\"><strong>Wrap-Up<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Speech recognition technology is an inclusive package deal a user gets with any modern digital experience (that\u2019s how embedded it has become). This evolving technology revolves around adaptability, paving the way for more unique use cases.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Human\u2019s quest to streamline human-machine interaction has come a long way. Sure, it ain\u2019t perfect at the moment (nothing is!). But who knows what the future holds?<\/p>\n\n\n\n<p class=\"has-vivid-red-color has-text-color\"><em>Share your thoughts in the comments below!<\/em> <\/p>\n\n\n\n<ul class=\"wp-block-social-links is-style-pill-shape is-layout-flex wp-block-social-links-is-layout-flex\"><\/ul>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><\/div>\n<\/div>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Can language be mastered with algorithms?<\/p>\n<p>Ah, the stark irony in this sentence is a highly debatable topic, making the answer to this question less significant.<\/p>\n<p>Linguistics marrying mathematics is a complex affair, not just because of the different dynamics of these fields but due to their core foundational essence. Think back to school days, when some of us excelled in languages but struggled with numbers or vice versa. Not many find a perfect balance between these two fields (or abilities).<\/p>\n","protected":false},"author":1,"featured_media":147,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[6,7,9],"class_list":["post-36","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-ai","tag-machine-learning","tag-speech-recognition"],"aioseo_notices":[],"featured_media_urls":{"thumbnail":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468-150x150.webp",150,150,true],"medium":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468-300x285.webp",300,285,true],"medium_large":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"large":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"1536x1536":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"2048x2048":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"blogus-slider-full":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"blogus-featured":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false],"blogus-medium":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468-480x380.webp",480,380,true],"portfolio_item-thumbnail":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468-480x400.webp",480,400,true],"portfolio_item-masonry":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-600x570.webp",600,570,true],"portfolio_item-thumbnail_cinema":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468-480x335.webp",480,335,true],"portfolio_item-thumbnail_portrait":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-600x665.webp",600,665,true],"portfolio_item-thumbnail_square":["https:\/\/www.intellisofttechnologies.com\/blog\/wp-content\/uploads\/2023\/03\/Screenshot-2023-03-29-180649-e1680096081591-1-e1683550920468.webp",480,456,false]},"_links":{"self":[{"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/posts\/36","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/comments?post=36"}],"version-history":[{"count":1,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions"}],"predecessor-version":[{"id":143,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/posts\/36\/revisions\/143"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/media\/147"}],"wp:attachment":[{"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/media?parent=36"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/categories?post=36"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.intellisofttechnologies.com\/blog\/wp-json\/wp\/v2\/tags?post=36"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}