WO2013167934A1 - Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages - Google Patents

Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages Download PDF

Info

Publication number
WO2013167934A1
WO2013167934A1 PCT/IB2012/052258 IB2012052258W WO2013167934A1 WO 2013167934 A1 WO2013167934 A1 WO 2013167934A1 IB 2012052258 W IB2012052258 W IB 2012052258W WO 2013167934 A1 WO2013167934 A1 WO 2013167934A1
Authority
WO
WIPO (PCT)
Prior art keywords
nlal
name
user
directory
list
Prior art date
Application number
PCT/IB2012/052258
Other languages
French (fr)
Inventor
Ioannis KAMATAKIS
Original Assignee
Mls Multimedia S.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mls Multimedia S.A. filed Critical Mls Multimedia S.A.
Priority to PCT/IB2012/052258 priority Critical patent/WO2013167934A1/en
Publication of WO2013167934A1 publication Critical patent/WO2013167934A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition

Definitions

  • the present invention relates to human-machine interfaces and, particularly, to an improved method for speech recognition, a voice user interface and a corresponding communication device, which enables its users to communicate with an individual, who is selected from a directory list constructed with a mixed-alphabet of a "non-Latin-alphabet language" (NLAL, i.e., with a non "ISO basic Latin” alphabet [02], such as Greek, Russian, Vietnamese, Arabic, Chinese, Hebrew, etc.).
  • NLAL non-Latin-alphabet language
  • Communication devices such as e.g. mobile telephones and computers in general, have undergone a multitude of improvements in their functionality, capabilities and user interfaces, during the last few years, in order to facilitate efficient and user-friendly communication services for their users.
  • Today, such communication devices are used to call or send SMS or send a fax to another telephone user, or to send an e-mail to another e-mail recipient.
  • Finding the desired recipient of one's message often involves a selection process through a directory list, or lookup catalog, or phonebook.
  • Such lists may be quite long, sometimes containing several hundred or thousand entries, so that, browsing through them may be a time- consuming and tiresome or annoying process for the user.
  • Vocal user interfaces typically consist of an input device, such as a microphone, an Automatic Speech Recognition (ASR) unit, which decodes the sound signals and transforms them into an intermediate data format, and appropriate software to process the ASR output using some match-and-select methodology.
  • ASR Automatic Speech Recognition
  • directory lists are constructed with various styles, alphabets, phonetics, semantics, etc., matching a speaker's vocal input against such a directory list often results in poor performance and recognition accuracy.
  • NLAL non-Latin-alphabet languages
  • languages with non-"ISO basic Latin” alphabets [02].
  • NLALs can be expressed in written with English transliteration, i.e., with "ISO basic Latin” alphabet characters (e.g. Greek-English or Russian-English) alphabets.
  • the country-specific alphabet may also be used in combination with the Latin alphabet.
  • Greek (or Russian) language in which, words may be written with Greek (Cyrillic, respectively) alphabet characters and others with Latin alphabet characters, etc.
  • a directory list constructed by a user in some NLAL, contains certain grammatical characteristics not existing in the English language.
  • Greek words and names have accentuation and gender-sensitive endings, e.g. Kefalas is different from Kefalas and Papadopoulos (male) is different from Papadopoulou (female).
  • Greek words and names may also be spelled with phonetic, grammatical or optical similarity with respect to the Greek character set, e.g.: the name "Xenofon” may be written by a Greek user as "Ksenofon” (phonetic), or "Xenophon” (grammatical), or "3enofwn” (optical).
  • the present invention provides an effective solution to the aforementioned problem for NLALs. using an intelligent algorithm, which achieves improved performance, higher recognition rates and better selection accuracy, compared to other similar solutions
  • Speech recognition systems are computer systems that map spoken utterances to strings of words [01]. To achieve this transformation of recorded audio input to written representation, several smaller processes are applied sequentially: The first step in the processing sequence is to transform the recorded audio input into frequency spectrums by means of a Fast Fourier Transformation. The spectrums are passed on to a Hidden Markov Model which is a statistical, model determining the most probable phoneme sequence. This phoneme sequence is forwarded on to a language model. There are two different models whose usage mainly depends on the task the speech recognition is applied for. One of the language models is the statistical n-gram approach, often used in open-domain speaker-dependent dictation tasks. The second language model consists of rule-based grammars.
  • Speech recognition technology is used more and more for telephone applications like travel booking and information, financial account information, customer service call routing, and directory assistance. Such applications can achieve remarkably high accuracy by using constrained grammar recognition. Research and development in speech recognition technology has continued to grow as the cost for implementing such voice-activated systems has dropped and the usefulness and efficacy of these systems has improved.
  • speech recognition has enabled the automation of certain applications that are not automatable using push-button Interactive Voice Response (IVR) systems [04], like directory assistance and systems that allow callers to "dial" by speaking names listed in an electronic phone book.
  • IVR Interactive Voice Response
  • command words such as "dial”, “phone book”, “emergency”, “reject” or “accept” in a manner which is similar to the customary speed dialing.
  • the communication application associated with these command words can be used directly in the corresponding manner by a user, without the user personally having to train the system for this purpose beforehand using this set of words.
  • this factory-predetermined vocabulary of command words and possibly also names may be for the user, it does not replace a user-specific adaptation, e.g. the insertion of new commands.
  • a speaker-dependent speech recognition system is optimized in relation to the speaker concerned, since it must be trained in the voice of the user before its first use. This is known as "say-in” or training and it is used to create a feature vector sequence from at least one feature vector.
  • Romanization [05] or latinization is the representation of a written word or spoken speech with the Roman (Latin) script, or a system for doing so, where the original word or (non-Latin alphabet) language (NLAL) uses a different writing system.
  • Methods of Romanization include transliteration [06], for representing written text, and transcription [07], for representing the spoken word. The latter can be subdivided into phonemic transcription [08], which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription [09], which records speech sounds with precision.
  • Each language Romanization has its own set of rules for pronunciation of the Romanized words.
  • Greeklish is a combination of the words Greek and English (also known as Grenglish, Latinoellinika, or ASCII Greek), and refers to the Greek language written using the Latin alphabet.
  • This type of Romanization mainly captures informal, ad-hoc practices of writing Greek text in environments, where the use of the Greek alphabet is technically impossible or cumbersome, especially in electronic media.
  • the present invention proposes a method to intelligently deal with these ad-hoc practices of users of mobile phones and other communication devices.
  • KR20010079272 (A), Yu Seung Hyuk [ KR] - "System for executing dialing through name speech recognition and managing telephone directory using wire telephone and mobile phone in remote speech recognition server", 28/6/2001, 22/8/2001.
  • VoCon® 3200 is Nuance's speaker-independent, continuous speech recognition engine, supporting recognition of natural, conversational input in over 30 languages, large vocabularies, dynamic content, such as music titles, noise-robust front-end, etc.
  • This technology is an ASR technology, without the overlay layer for intelligent directory entry selection specially targeted to NLALs.
  • Vlingo is a virtual assistant that turns spoken words into action by combining voice-to-text technology, natural language processing, and
  • Vlingo's Intent Engine to understand the user's intent and take the appropriate action.
  • the user speaks to his/her phone and connects with people, businesses and various activities.
  • Siri is an intelligent personal assistant for the iPhone (4S), that helps the user to get things done just by asking. It allows the user to use his/her voice to send messages, schedule meetings, place phone calls, etc. Siri understands natural speech, and asks the user questions if it needs more information to complete a task. It communicates with Apple's data centers to perform its functionality and return a response. Siri understands and can speak English, French, German, and Japanese and it is expected to also support additional languages, including Chinese, Korean, Italian, and Spanish.
  • Skyvi is an intelligent personal assistant for the Android-based systems, similar to iPhone's Siri, featuring voice-texting, finding and calling places, get directions, calling contacts, location reminder beacons, Facebook / Twitter, question asking with voice, etc.
  • Cyberon Voice Commander is a speech dialog system that provides natural human interface for users to communicate seamlessly with mobile devices. Through Voice Commander, users can make phone calls, look up contact info, launch program or check for calendars. It features speaker-independent voice recognition technology, voice control of name/digit dialing, supports several worldwide languages including English, German, French, Italian, Spanish, Portuguese, Brazil Portuguese, Russian, Vietnamese, Polish, Cantonese, etc., (http://www.cyberon.com.tw/pro-solSL.php).
  • Cyberon Voice Dialer provides speaker-independent speech recognition and text-to-speech technology to work on all phone platforms and accomplish features, such as voice dialing, contact look-up, and shortcuts launch.
  • the current invention operates on a mobile telephone or, more generally, on a computing device with processing capabilities, storage capacity, network connectivity and ability to run communication application software, such as for phone calling, email sending, SMS- sending, fax-sending, etc. It is assumed that the user of such a system already has or progressively builds a list of directory entries, each of which contains at least the name of a person with whom the user wishes to communicate and possibly one or more telephone numbers, an e-mail address, etc. In this context, we consider directories constructed in some non-Latin alphabet language (NLAL), possibly using the Latin and the language- specific alphabets interchangeably.
  • NLAL non-Latin alphabet language
  • a user chooses the desired name entry, by manually selecting it via a user interface consisting of a search form or a drop-down list, combined with a touch screen, a keyboard or some buttons.
  • the present invention replaces this selection process by a voice interface, combined with proper, intelligent, NLAL-specific processing of the directory list.
  • a typical Automatic Speech Recognition (ASR) engine [101], preferably optimized for a specific target-NLAL (e.g., Greek, Russian, Vietnamese, Arabic, Chinese, Hebrew, etc.), is installed in the target device.
  • ASR Automatic Speech Recognition
  • the user directory list may consist of name entries in different styles and formats which are very user-dependent.
  • IDSI intermediate database of search items
  • each directory entry comprises: language detection [201], normalization [202], inverting Romanization [203] (e.g., greeklish to Greek, volapuk to Cyrillic, etc.), accentuation checks [204], special character conversion [205], finding synonyms and hypocoristics [206], phonetic transcription [207] word-splitting and detection of first name and surname [208] .
  • language detection [201]
  • normalization [202]
  • inverting Romanization e.g., greeklish to Greek, volapuk to Cyrillic, etc.
  • accentuation checks e.g., greeklish to Greek, volapuk to Cyrillic, etc.
  • special character conversion e.g., finding synonyms and hypocoristics
  • phonetic transcription e.g., phonetic transcription of first name and surname [208] .
  • a proper software module is installed on the device, enabling it to also accept voice input as an alternative to touch, or mouse, or keyboard input, for directory entry selection.
  • a software application running solely on the device implements the invention by performing the necessary intelligent processing of the user's voice commands in order to effectively select a name to communicate with and / or a command to execute.
  • the system device + software
  • the procedure depicted in the flowchart of FIG. 3 is executed. According to this procedure, the device user may issue a voice command such as "dial George on mobile” or "send SMS to Nick Papadopoulos" (in Greek or other NLAL).
  • the system searches through the directory list and selects one or more entries that match the voice name or command as much as possible.
  • This selection process is “speaker independent”, i.e., without requiring the user to personally train the system for this purpose, beforehand. It is also "ASR agnostic”, i.e., it may use any ASR engine, as long as it is adapted and optimized for the specific NLAL (FIG. 1). Moreover, as our method can handle each NLAL (as well as each non-NLAL) separately, as an individual case, apparently it can handle all languages together, thus being language-independent.
  • a directory entry may be recognized with standard or ad hoc Romanization rules, or even with combined use of Latin (Romanized) and non-Latin alphabets.
  • a name may be recognized in various forms (e.g.: Dimitri, Dhmhtrh, Demetres, etc.), by identifying phonetic equivalence classes between letters, digraphs and phonemes in the Greek language, (e.g. ⁇ I, H, Y ⁇ , ⁇ E and Al ⁇ , etc.)
  • a name may be recognized in various forms (e.g.: mom, mother, ma, metera, etc.) [in Greek], by defining equivalence classes based on the semantics of various names or titles in the Greek language.
  • Spelling errors in the directory entries e.g.: "Menlaos” instead of “Menelaos” (missing letter “e”), etc., can be overcome or corrected using database lookups.
  • Male and female gender names can be distinguished between one another, e.g.: "Maria Papadopoulos” should be “Maria Papadopoulou” (female gender). Subsequently, a search for "Papadopoulos” would match both "George Papadopoulos” and "Maria Papadopoulou”.
  • a spoken name can be partially matched within some directory entry, e.g.: "Dimitri” in “Papadopoulos Dim itrios” .
  • Each directory entry may include the search term (e.g. a person's first name) either as a first or last component (sub-string), e.g. searching for "Maria” would match both "Maria Papadopoulou” and "Papadopoulou Maria”.
  • search term e.g. a person's first name
  • sub-string e.g. searching for "Maria” would match both "Maria Papadopoulou” and "Papadopoulou Maria”.
  • the system may execute complex statements, consisting of a command ("dial”, “call”, “sms”, etc.), a name and an optional phone type (e.g.: “mobile”, “home”, “office”, “fax”, etc.), e.g.: “dial, Dimitri Papadopoulos, office” [said in Greek] (FIG.4(b)).
  • the aforementioned method is implemented using rule-based grammar, database lookup and catalog lookup techniques. Given the user's spoken word or phrase, each directory entry is evaluated against it and it is given a score (value) between 0% and 100%. All directory entries with scores above a predefined (fixed) threshold (e.g. 60%) are considered as valid matches. If no directory entry is matched with a score above the threshold, then an error voice message is issued, prompting the user to repeat the name or command. If more than one directory entries are matched with scores above the threshold, then they may subsequently be presented to the user via a GUI [105], as a list for further manual (or by voice) selection (FIG. 4(a)). The list may be sorted as either "most-recently-used-first", or “most-frequently-used-first”, or “highest-matching-score-first”, or alphabetically, etc.
  • a predefined (fixed) threshold e.g. 60%
  • the selection process is complete either when the user selects a single directory entry, or cancels the entire process.
  • the software may ask for confirmation of the user's final selection, with a predefined voice message or by speaking out (with voice synthesis) the user's selection.
  • the aforementioned procedure is depicted in FIG. 3.
  • FIGURE 1 depicts the main hardware and software components comprising the human-machine interface for the name selection from a directory list.
  • FIGURE 2 shows functional aspects of the implemented methodology and in particular the one-time pre-processing of each directory (catalog) entry.
  • FIGURE 3 is a flowchart of the procedure of vocal name selection by a speaker from a directory list.
  • FIGURE 4 shows two simulated screenshots of a typical mobile telephone, executing the directory list selection: a) prompting the speaker with select among 3 best matches, [left], b) prompting the speaker to select the telephone type to use, [right].

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to methods and corresponding processing and communication devices implementing them for intelligent, offline, speaker-independent, vocal name-selection specifically targeted to directory lists constructed in non-"ISO basic Latin" alphabet languages (NLAL). Initially, a pre-existing directory list of names residing in the user's communication device is pre-processed and transformed (based on rules from the NLAL grammar, spelling and vocabulary) to an "intermediate database of search items" (IDSI). This database is updated every time a new directory entry is inserted or an existing one is edited. The vocal name selection is carried out as follows: the user speaks out a name (or a command following by a name) near the microphone of the communication device. The user's vocal input is processed by an Automatic Speech Recognition (ASR) engine, optimized for the specific NLAL. The ASR's output is then compared against the IDSI and a list of best matches is presented (via a GUI) sorted to the user, who finally completes the selection by hand or by voice. The rules applied to the pre-processing of the directory list are particular to the NLAL and relate to: composing the directory list with characters from NLAL and/or Latin alphabets; identifying a name among different forms of NLAL names; considering accentuation, phonetic or optical similarity (with respect to the NLAL alphabet), semantic equivalence grouping, partial matching, phonetic equivalence classes between letters and NLAL phonemes, special characters and common spelling errors.

Description

METHODS AND SYSTEM I MPLEMENTI NG I NTELLI GENT VOCAL NAME-SELECTI ON FROM Dl RECTORY LI STS COMPOSED I N NON- LATI N ALPHABET LANGUAGES
Fl ELD:
The present invention relates to human-machine interfaces and, particularly, to an improved method for speech recognition, a voice user interface and a corresponding communication device, which enables its users to communicate with an individual, who is selected from a directory list constructed with a mixed-alphabet of a "non-Latin-alphabet language" (NLAL, i.e., with a non "ISO basic Latin" alphabet [02], such as Greek, Russian, Turkish, Arabic, Chinese, Hebrew, etc.).
BACKGROUND:
The statements in this section merely provide background information related to the present disclosure and may not constitute a complete prior art reference.
A. The Problem :
Communication devices, such as e.g. mobile telephones and computers in general, have undergone a multitude of improvements in their functionality, capabilities and user interfaces, during the last few years, in order to facilitate efficient and user-friendly communication services for their users. Today, such communication devices are used to call or send SMS or send a fax to another telephone user, or to send an e-mail to another e-mail recipient. Finding the desired recipient of one's message often involves a selection process through a directory list, or lookup catalog, or phonebook. Such lists may be quite long, sometimes containing several hundred or thousand entries, so that, browsing through them may be a time- consuming and tiresome or annoying process for the user. To ease this process, a variety of selection methods have been proposed and implemented at times, most of them involving scrollable lists and hierarchical alphabetic shortcuts, all requiring user input by means of either a touch-screen of a keyboard or special buttons. To facilitate this list selection for hands-free operation (functionality especially useful e.g. for drivers, blind or handicapped people), vocal user-interfaces have also been proposed, only requiring from the user to be able to hear and speak, in order to select something from a list of items.
Vocal user interfaces (VUI) typically consist of an input device, such as a microphone, an Automatic Speech Recognition (ASR) unit, which decodes the sound signals and transforms them into an intermediate data format, and appropriate software to process the ASR output using some match-and-select methodology. As speech recognition is user and language dependent, the performance and degree of accuracy of various VUIs may sometimes be unsatisfactory. Moreover, as various directory lists are constructed with various styles, alphabets, phonetics, semantics, etc., matching a speaker's vocal input against such a directory list often results in poor performance and recognition accuracy.
To improve the performance, speed and recognition accuracy of this matching and selection process of an entry from a directory list, it is necessary to enable both a high performance ASR and an intelligent algorithm that recognizes a plethora of equivalent variations of the user input, corresponding to the entries of the directory list.
Although the above problems have been adequately addressed for the English language, this is not the case for most "non-Latin-alphabet languages" (NLAL), i.e., languages with non-"ISO basic Latin" alphabets [02]. NLALs can be expressed in written with English transliteration, i.e., with "ISO basic Latin" alphabet characters (e.g. Greek-English or Russian-English) alphabets. The country-specific alphabet may also be used in combination with the Latin alphabet. Consider, for example, the Greek (or Russian) language, in which, words may be written with Greek (Cyrillic, respectively) alphabet characters and others with Latin alphabet characters, etc. Also, it often happens that a directory list, constructed by a user in some NLAL, contains certain grammatical characteristics not existing in the English language. For example, Greek words and names have accentuation and gender-sensitive endings, e.g. Kefalas is different from Kefalas and Papadopoulos (male) is different from Papadopoulou (female). Greek words and names may also be spelled with phonetic, grammatical or optical similarity with respect to the Greek character set, e.g.: the name "Xenofon" may be written by a Greek user as "Ksenofon" (phonetic), or "Xenophon" (grammatical), or "3enofwn" (optical).
The combination of all these name variations, met particularly in NLAL directory lists, creates a much more complex problem space, which makes its solution more difficult and challenging, in order to achieve satisfactory and intelligent voice recognition and directory entry selection results. All methods and solutions proposed until today have been limited in scope, requiring the user's vocal input to match the directory listing as much as possible, without variations and often failing to successfully recognize names, if they appeared in slightly different style and format in the directory list.
The present invention provides an effective solution to the aforementioned problem for NLALs. using an intelligent algorithm, which achieves improved performance, higher recognition rates and better selection accuracy, compared to other similar solutions
B. Related State-of-the-Art : Speech Recognition
Speech recognition systems are computer systems that map spoken utterances to strings of words [01]. To achieve this transformation of recorded audio input to written representation, several smaller processes are applied sequentially: The first step in the processing sequence is to transform the recorded audio input into frequency spectrums by means of a Fast Fourier Transformation. The spectrums are passed on to a Hidden Markov Model which is a statistical, model determining the most probable phoneme sequence. This phoneme sequence is forwarded on to a language model. There are two different models whose usage mainly depends on the task the speech recognition is applied for. One of the language models is the statistical n-gram approach, often used in open-domain speaker-dependent dictation tasks. The second language model consists of rule-based grammars. Their core features are hand-written rules which define the acceptable utterances for a system exactly [01]. The present invention focuses on grammars as well as on alphabets and vocabularies, as they form the better suited language model in closed-domain speaker-independent dialogue systems.
Speech recognition technology [03] is used more and more for telephone applications like travel booking and information, financial account information, customer service call routing, and directory assistance. Such applications can achieve remarkably high accuracy by using constrained grammar recognition. Research and development in speech recognition technology has continued to grow as the cost for implementing such voice-activated systems has dropped and the usefulness and efficacy of these systems has improved.
Furthermore, speech recognition has enabled the automation of certain applications that are not automatable using push-button Interactive Voice Response (IVR) systems [04], like directory assistance and systems that allow callers to "dial" by speaking names listed in an electronic phone book.
Many communication devices today offer speaker-independent speech control. In the context of speech control, the user enters command words such as "dial", "phone book", "emergency", "reject" or "accept" in a manner which is similar to the customary speed dialing. The communication application associated with these command words can be used directly in the corresponding manner by a user, without the user personally having to train the system for this purpose beforehand using this set of words.
However convenient this factory-predetermined vocabulary of command words and possibly also names may be for the user, it does not replace a user-specific adaptation, e.g. the insertion of new commands. This applies particularly in the case of name selection, i.e. a special speech control, wherein specific numbers are dialed when the name is spoken. Therefore devices of greater complexity offer a speaker-dependent speech control in addition to a speaker-independent speech control.
A speaker-dependent speech recognition system is optimized in relation to the speaker concerned, since it must be trained in the voice of the user before its first use. This is known as "say-in" or training and it is used to create a feature vector sequence from at least one feature vector.
Romanization in NLALs
In linguistics, Romanization [05] or latinization is the representation of a written word or spoken speech with the Roman (Latin) script, or a system for doing so, where the original word or (non-Latin alphabet) language (NLAL) uses a different writing system. Methods of Romanization include transliteration [06], for representing written text, and transcription [07], for representing the spoken word. The latter can be subdivided into phonemic transcription [08], which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription [09], which records speech sounds with precision. Each language Romanization has its own set of rules for pronunciation of the Romanized words.
One example of Romanization is Greeklish [10]. Greeklish, is a combination of the words Greek and English (also known as Grenglish, Latinoellinika, or ASCII Greek), and refers to the Greek language written using the Latin alphabet. This type of Romanization mainly captures informal, ad-hoc practices of writing Greek text in environments, where the use of the Greek alphabet is technically impossible or cumbersome, especially in electronic media. The present invention proposes a method to intelligently deal with these ad-hoc practices of users of mobile phones and other communication devices.
A long list of global Romanization standards can be found in [05] for several languages, including Greek, Russian, Turkish, Arabic, Persian, Chinese, Hebrew, Indie, Japanese, Korean, Thai, Vietnamese, Bulgarian, Ukrainian, etc. However, in practice, people tend to apply Romanization in ad-hoc, rather than in standard ways, which is indicative of the complexity of the problem this invention provides a solution for, among other linguistic characteristics of NLALs.
References
[01] Hanne Marie Kosinowski, "Modular Grammars for Speech Recognition in Ontology-Based Dialogue Systems", Bachelor Thesis, Faculty of Computational Linguistics, Saarland University, Aug. 2010.
[02] http://en.wikipedia.org/wiki/ISO basic Latin alphabet, Wikipedia, "ISO basic Latin alphabet". [03] http://en.wikipedia.org/wiki/Speech recognition, Wikipedia,
"Speech Recognition".
[04] http://en.wikipedia.Org/wiki/l nteractive voice response. Wikipedia,
"Interactive voice response".
[05] http://en.wikipedia.org/wiki/Romanization , Wikipedia,
"Romanization".
[06] http://en.wikipedia.org/wiki/Transliteration. Wikipedia,
"Transliteration".
[07] http://en.wikipedia.org/wiki/Transcription (linguistics) , Wikipedia, "Transcription".
[08] http://en.wikipedia.org/wiki/Phonemic orthography. Wikipedia,
"Phonemic orthography".
[09] http://en.wikipedia.org/wiki/Phonetic transcription. Wikipedia,
"Phonetic transcription".
[10] http: //en .wikipedia.org/wiki/Greekiish, Wikipedia, "Greeklish".
C. Selected relevant I nventions
The following inventions were found to be related to our present invention, in terms of the method and systems described herein:
1. W09926232 (A1), Naumburger Volkmar [DE] - "Device and methods for speaker-independent spoken name selection for
telecommunications terminals", 19/11/1997, 27/5/1999.
2. KR20080107376 (A), Ruwisch Dietmar [DE] - "Communication device having speaker independent speech recognition", 14/2/2006,
3/8/2007.
3. US7475017 (B2), Ju Yun-Cheng [US] - "Method and apparatus to
improve name confirmation in voice-dialing systems", 27/7/2004, 6/1/2009.
4. CN2626149 (Y), Wu Zhenli [CN], Gong Liheng [CN] - "Speech
recognition controlled dialing telephone set", 26/6/2003, 14/7/2004.
5. KR20010079272 (A), Yu Seung Hyuk [ KR] - "System for executing dialing through name speech recognition and managing telephone directory using wire telephone and mobile phone in remote speech recognition server", 28/6/2001, 22/8/2001.
6. US6963633 (B1), Diede William F [US], Bechtel Kay L [US] - "Voice dialing using text names", 7/2/2000, 8/11/2005.
7. CN2415556 (Y), Chen Xiuzhi [CN] - "Voice identifying, inquiring and dialing telephone directory", 13/3/2000, 17/1/2001.
8. US6260012 (B1), Parkjoung-Kyou [KR] - "Mobile phone having
speaker dependent voice recognition method and apparatus",
27/2/1998, 10/7/2001. All of the aforementioned inventions are about voice recognition, voice- dialing or vocal name-selection, but they differ from our invention in one or more of the following aspects: they do not handle effectively directories composed in NLAL, and/or they are not offline methods (i.e. they require a remote server communication connection), and/or they are speaker-dependent methods. D. Selected relevant Technologies and Products
There is a long list of products and related technologies for voice, hands- free, digital-assistant applications, such as: Iris, vLingo, Siri, Skyvi, Speaktoit Assistant, Andy, Sonalight Text by Voice, Jeannie, VoCon, AIVC, TiKL, EVA Intern, Gosms, Dropbox, Voice Search, Voice Actions, Voice Commander, etc. Five of the most representative, relevant and well-known ones are presented below, along with their differences compared to our invention.
1. Nuance's VoCon® 3200
http://www.nuance.com/for-business/by-product/automotive-products- services/vocon3200/ index .htm
VoCon® 3200 is Nuance's speaker-independent, continuous speech recognition engine, supporting recognition of natural, conversational input in over 30 languages, large vocabularies, dynamic content, such as music titles, noise-robust front-end, etc.
Differences: This technology is an ASR technology, without the overlay layer for intelligent directory entry selection specially targeted to NLALs.
2. vLingo Virtual Assistant
http://www.vnngo.com/
Vlingo is a virtual assistant that turns spoken words into action by combining voice-to-text technology, natural language processing, and
Vlingo's Intent Engine to understand the user's intent and take the appropriate action. The user speaks to his/her phone and connects with people, businesses and various activities.
http://www.ylingo.com/content/screenshots
http://bloQ.ylingo.com/ylingo-lanauage-beta/ (for new foreign languages, no Greek or other NLALs yet)
Differences: Vlingo does not provide intelligent directory-entry selection specially targeted to NLALs. 3. Siri (iPhone)
http://www.apple.com/iphone/features/siri-faq.htm I
Siri is an intelligent personal assistant for the iPhone (4S), that helps the user to get things done just by asking. It allows the user to use his/her voice to send messages, schedule meetings, place phone calls, etc. Siri understands natural speech, and asks the user questions if it needs more information to complete a task. It communicates with Apple's data centers to perform its functionality and return a response. Siri understands and can speak English, French, German, and Japanese and it is expected to also support additional languages, including Chinese, Korean, Italian, and Spanish.
Differences: Siri does not support intelligent directory-entry selection specially targeted to NLALs. It also does not work offline (without connection to a remote server). 4. Skyvi (Android) http://www.skyviapp.com/
Skyvi is an intelligent personal assistant for the Android-based systems, similar to iPhone's Siri, featuring voice-texting, finding and calling places, get directions, calling contacts, location reminder beacons, Facebook / Twitter, question asking with voice, etc.
Differences: Skyvi does not yet support intelligent directory-entry selection specially targeted to NLALs. It also does not work offline (without connection to a remote server). 5. Voice Commander / Voice Dialer , (by Cyberon Corp.)
http://www.cyberon.com .tw/ en index .php
Cyberon Voice Commander is a speech dialog system that provides natural human interface for users to communicate seamlessly with mobile devices. Through Voice Commander, users can make phone calls, look up contact info, launch program or check for calendars. It features speaker-independent voice recognition technology, voice control of name/digit dialing, supports several worldwide languages including English, German, French, Italian, Spanish, Portuguese, Brazil Portuguese, Russian, Turkish, Polish, Cantonese, etc., (http://www.cyberon.com.tw/pro-solSL.php).
http://www.cvberon.com .tw/flash demo.php
http://www.cyberon.com .tw/order Product con.php?N0100= 15
Cyberon Voice Dialer provides speaker-independent speech recognition and text-to-speech technology to work on all phone platforms and accomplish features, such as voice dialing, contact look-up, and shortcuts launch.
Differences: Voice Commander does not support intelligent directory- entry selection specially targeted to NLALs, with improved performance characteristics.
DETAI LED DESCRI PTI ON
The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
The current invention operates on a mobile telephone or, more generally, on a computing device with processing capabilities, storage capacity, network connectivity and ability to run communication application software, such as for phone calling, email sending, SMS- sending, fax-sending, etc. It is assumed that the user of such a system already has or progressively builds a list of directory entries, each of which contains at least the name of a person with whom the user wishes to communicate and possibly one or more telephone numbers, an e-mail address, etc. In this context, we consider directories constructed in some non-Latin alphabet language (NLAL), possibly using the Latin and the language- specific alphabets interchangeably.
Typically, a user (e.g. a telephone caller) chooses the desired name entry, by manually selecting it via a user interface consisting of a search form or a drop-down list, combined with a touch screen, a keyboard or some buttons. The present invention replaces this selection process by a voice interface, combined with proper, intelligent, NLAL-specific processing of the directory list. Initially, a typical Automatic Speech Recognition (ASR) engine [101], preferably optimized for a specific target-NLAL (e.g., Greek, Russian, Turkish, Arabic, Chinese, Hebrew, etc.), is installed in the target device. The user directory list may consist of name entries in different styles and formats which are very user-dependent. For example, they can be entered with surname-first or surname-last, with Greek or Latin or mixed alphabet, with or without accent ('), with different name endings, with a short (nick) name or a full name (e.g. Nikos and Nikolaos), or with equivalent title names (e.g. father and dad), etc. As a first step, such a directory list is pre-processed by means of proper software and an "intermediate database of search items" (IDSI) [102] is generated, consisting of a number of alternatives per directory entry with which this specific entry can be voiced as, by the user. The preprocessing of each directory entry comprises: language detection [201], normalization [202], inverting Romanization [203] (e.g., greeklish to Greek, volapuk to Cyrillic, etc.), accentuation checks [204], special character conversion [205], finding synonyms and hypocoristics [206], phonetic transcription [207] word-splitting and detection of first name and surname [208] .
After the pre-processing and transformation of the initial directory list, a proper software module is installed on the device, enabling it to also accept voice input as an alternative to touch, or mouse, or keyboard input, for directory entry selection. A software application running solely on the device implements the invention by performing the necessary intelligent processing of the user's voice commands in order to effectively select a name to communicate with and / or a command to execute. When the system (device + software) is operational, the procedure depicted in the flowchart of FIG. 3 is executed. According to this procedure, the device user may issue a voice command such as "dial George on mobile" or "send SMS to Nick Papadopoulos" (in Greek or other NLAL). Then, the system searches through the directory list and selects one or more entries that match the voice name or command as much as possible. This selection process is "speaker independent", i.e., without requiring the user to personally train the system for this purpose, beforehand. It is also "ASR agnostic", i.e., it may use any ASR engine, as long as it is adapted and optimized for the specific NLAL (FIG. 1). Moreover, as our method can handle each NLAL (as well as each non-NLAL) separately, as an individual case, apparently it can handle all languages together, thus being language-independent.
Hereafter, without loss of generality, the Greek language is used as an example, to simplify the presentation. In general, our methodology applies to any NLAL exhibiting analogous characteristics. Apparently, it can also be applied to all Latin-alphabet based languages as special (more simplified) cases, not satisfying any of these characteristics.
Names and commands, spoken by a user, in Greek, are recognized based on the following rules:
1. A directory entry may be recognized with standard or ad hoc Romanization rules, or even with combined use of Latin (Romanized) and non-Latin alphabets.
2. A name may be recognized in various forms (e.g.: Dimitri, Dhmhtrh, Demetres, etc.), by identifying phonetic equivalence classes between letters, digraphs and phonemes in the Greek language, (e.g. { I, H, Y} , { E and Al} , etc.)
3. A name may be recognized in various forms (e.g.: mom, mother, mama, metera, etc.) [in Greek], by defining equivalence classes based on the semantics of various names or titles in the Greek language.
4. The lack of accentuation of the directory entries can be ignored or corrected based on a dictionary of common proper names and grammar rules.
5. Special characters, such as { * , &, $, ...} , can be ignored, or they can be replaced with equivalent notation, e.g. "&" with {"and" or "ke" [Greek]} .
6. Spelling errors in the directory entries, e.g.: "Menlaos" instead of "Menelaos" (missing letter "e"), etc., can be overcome or corrected using database lookups.
7. Male and female gender names can be distinguished between one another, e.g.: "Maria Papadopoulos" should be "Maria Papadopoulou" (female gender). Subsequently, a search for "Papadopoulos" would match both "George Papadopoulos" and "Maria Papadopoulou".
8. A spoken name can be partially matched within some directory entry, e.g.: "Dimitri" in "Papadopoulos Dim itrios" .
9. Each directory entry may include the search term (e.g. a person's first name) either as a first or last component (sub-string), e.g. searching for "Maria" would match both "Maria Papadopoulou" and "Papadopoulou Maria".
10. The system may execute complex statements, consisting of a command ("dial", "call", "sms", etc.), a name and an optional phone type (e.g.: "mobile", "home", "office", "fax", etc.), e.g.: "dial, Dimitri Papadopoulos, office" [said in Greek] (FIG.4(b)).
The aforementioned method is implemented using rule-based grammar, database lookup and catalog lookup techniques. Given the user's spoken word or phrase, each directory entry is evaluated against it and it is given a score (value) between 0% and 100%. All directory entries with scores above a predefined (fixed) threshold (e.g. 60%) are considered as valid matches. If no directory entry is matched with a score above the threshold, then an error voice message is issued, prompting the user to repeat the name or command. If more than one directory entries are matched with scores above the threshold, then they may subsequently be presented to the user via a GUI [105], as a list for further manual (or by voice) selection (FIG. 4(a)). The list may be sorted as either "most-recently-used-first", or "most-frequently-used-first", or "highest-matching-score-first", or alphabetically, etc.
The selection process is complete either when the user selects a single directory entry, or cancels the entire process. The software may ask for confirmation of the user's final selection, with a predefined voice message or by speaking out (with voice synthesis) the user's selection. The aforementioned procedure is depicted in FIG. 3.
DESCRI PTI ON OF DRAW I NGS
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
FIGURE 1 depicts the main hardware and software components comprising the human-machine interface for the name selection from a directory list.
FIGURE 2 shows functional aspects of the implemented methodology and in particular the one-time pre-processing of each directory (catalog) entry.
FIGURE 3 is a flowchart of the procedure of vocal name selection by a speaker from a directory list.
FIGURE 4 shows two simulated screenshots of a typical mobile telephone, executing the directory list selection: a) prompting the speaker with select among 3 best matches, [left], b) prompting the speaker to select the telephone type to use, [right].
ACRONYMS
ASR Automatic Speech Recognition
GUI Graphical User Interface
I DSI Intermediate Database Search Items
NLAL Non-Latin Alphabet Language
RBPP Rule-Based Pre-Processing
SMS Short Message System
VUI Vocal User Interface

Claims

CLAI MS The invention claimed is:
1. A method for intelligent, speaker-independent, language-independent speech recognition for the selection of an entry from a directory list (e.g., from a phonebook or from a name catalog) constructed in a non-"ISO basic Latin" alphabet language (NLAL), comprising:
• "rule-based pre-processing" (RBPP) of a pre-existing user directory list, i.e., a transformation (based on the NLAL's particular grammar, spelling and vocabulary) that produces an "intermediate database of search items" (IDSI), consisting of vectors of equivalent terms, with each vector corresponding to multiple directory list entries and each directory list entry potentially corresponding to multiple vectors;
• installing on a mobile phone, or other computing device with storage, processing and communication capabilities, a software module that enables the device to accept voice input;
• receiving the speaker's voice and processing it via a generic
Automatic Speech Recognition (ASR) engine, which is optimized for the corresponding NLAL;
· matching the ASR's output against the said I DSI vectors;
• assigning a score to each said match for every directory list entry;
• selecting the "best matches", i.e. the said matches with a score greater or equal to a fixed, predefined threshold value;
• presenting the user with a sorted list of "best matches" (if any) to select from ;
• enabling the user to finally select a single directory list entry out of the list of "best matches", either manually or by voice;
• applying the said RBPP to any subsequent modification of the directory list performed by the user, (i.e., addition of new directory entries, or editing of existing ones), thus resulting in consequent updates of the said IDSI;
• and completely offline operation, i.e. without the need for
connection and data transfer with a remote or external server; wherein the said RBPP uses rules: a) to identify NLAL words or names written with characters from the NLAL's alphabet, or from the Latin alphabet (with transliteration), or from a combination of the two;
b) to identify (and to potentially correct) different forms of NLAL
names based on their gender (e.g., Papadopoulos, Papadopoulou) ; c) to identify (and to potentially correct) a name among different
forms of NLAL names, with or without accentuation, or with phonetic similarity (e.g., "Mihalis", "Michalis") , or with optical (with respect to the NLAL alphabet) similarity (e.g., "Mihalis", "Mixalis")
(also referred to as "phonetic and phonemic transcription"), or with a range of most common spelling errors according to the NLAL grammar rules; d) to group (and consider as equivalent) different names, titles, or words, with similar semantics, such as synonyms and
hypocoristics, e.g. {"mom", "mama", "meetera"} or {"Nikos Papadopoulos", "boss", "manager"} .
2. A method, as claimed in claim 1 , wherein the name or word spoken by the user may fully or partially match some directory list entry or some vector term of the said IDSI, and may appear within such an entry or vector term in any particular order.
3. A method, as claimed in claim 2, wherein the said list of "best
matches" is sorted in order of: "most-recently-used-first", or "most- frequently-used-first", or "highest-matching-score-first", or alphabetically.
4. A method, as claimed in claim 3, wherein the said selection process may refer and apply to telephone calling, or e-mail sending, or SMS sending, or fax sending.
5. A method, as claimed in claim 4, wherein the said system may
execute complex statements, spoken by the user, consisting of a command prefix (e.g., "dial", "call", "sms", etc.), a name (as destination) and an optional destination/target type (e.g.: "mobile", "home", "office", "fax", etc.). E.g.: "dial, Dimitri Papadopoulos, on mobile".
6. A method, as claimed in claim 5, wherein the entire selection process may also be performed via an online connection with an external or remote server, where the required data processing is performed.
An electronic device with data processing, data storage,
communication, and voice recognition capabilities, with installed system software and application software, altogether implementing the methods, as claimed in claims 1-6.
PCT/IB2012/052258 2012-05-07 2012-05-07 Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages WO2013167934A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2012/052258 WO2013167934A1 (en) 2012-05-07 2012-05-07 Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2012/052258 WO2013167934A1 (en) 2012-05-07 2012-05-07 Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages

Publications (1)

Publication Number Publication Date
WO2013167934A1 true WO2013167934A1 (en) 2013-11-14

Family

ID=46210316

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2012/052258 WO2013167934A1 (en) 2012-05-07 2012-05-07 Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages

Country Status (1)

Country Link
WO (1) WO2013167934A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026232A1 (en) 1997-11-19 1999-05-27 Deutsche Telekom Ag Device and methods for speaker-independent spoken name selection for telecommunications terminals
CN2415556Y (en) 2000-03-13 2001-01-17 陈修志 Voice identifying, inquiring and dialing telephone directory
US6260012B1 (en) 1998-02-27 2001-07-10 Samsung Electronics Co., Ltd Mobile phone having speaker dependent voice recognition method and apparatus
US6272464B1 (en) * 2000-03-27 2001-08-07 Lucent Technologies Inc. Method and apparatus for assembling a prediction list of name pronunciation variations for use during speech recognition
KR20010079272A (en) 2001-06-28 2001-08-22 유승혁 Utterance Speech Recognition Dialing and a Phone-book Management System for the Remote Speech Recognition Server is used to Telephone and Mobile-Phone
CN2626149Y (en) 2003-06-26 2004-07-14 深圳市捷通语音技术开发有限公司 Speech recognition controlled dialing telephone set
US6963633B1 (en) 2000-02-07 2005-11-08 Verizon Services Corp. Voice dialing using text names
US20070255567A1 (en) * 2006-04-27 2007-11-01 At&T Corp. System and method for generating a pronunciation dictionary
KR20080107376A (en) 2006-02-14 2008-12-10 인텔렉츄얼 벤처스 펀드 21 엘엘씨 Communication device having speaker independent speech recognition
US7475017B2 (en) 2004-07-27 2009-01-06 Microsoft Corporation Method and apparatus to improve name confirmation in voice-dialing systems

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026232A1 (en) 1997-11-19 1999-05-27 Deutsche Telekom Ag Device and methods for speaker-independent spoken name selection for telecommunications terminals
US6260012B1 (en) 1998-02-27 2001-07-10 Samsung Electronics Co., Ltd Mobile phone having speaker dependent voice recognition method and apparatus
US6963633B1 (en) 2000-02-07 2005-11-08 Verizon Services Corp. Voice dialing using text names
CN2415556Y (en) 2000-03-13 2001-01-17 陈修志 Voice identifying, inquiring and dialing telephone directory
US6272464B1 (en) * 2000-03-27 2001-08-07 Lucent Technologies Inc. Method and apparatus for assembling a prediction list of name pronunciation variations for use during speech recognition
KR20010079272A (en) 2001-06-28 2001-08-22 유승혁 Utterance Speech Recognition Dialing and a Phone-book Management System for the Remote Speech Recognition Server is used to Telephone and Mobile-Phone
CN2626149Y (en) 2003-06-26 2004-07-14 深圳市捷通语音技术开发有限公司 Speech recognition controlled dialing telephone set
US7475017B2 (en) 2004-07-27 2009-01-06 Microsoft Corporation Method and apparatus to improve name confirmation in voice-dialing systems
KR20080107376A (en) 2006-02-14 2008-12-10 인텔렉츄얼 벤처스 펀드 21 엘엘씨 Communication device having speaker independent speech recognition
US20070255567A1 (en) * 2006-04-27 2007-11-01 At&T Corp. System and method for generating a pronunciation dictionary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HANNE MARIE KOSINOWSKI: "Bachelor Thesis", August 2010, FACULTY OF COMPUTATIONAL LINGUISTICS, article "Modular Grammars for Speech Recognition in Ontology-Based Dialogue Systems"
JUHA ISO-SIPILÄ: "Design and Implementation of a Speaker-Independent Voice Dialing System: A Multi-lingual Approach", 18 April 2008 (2008-04-18), XP055057102, Retrieved from the Internet <URL:http://URN.fi/URN:NBN:fi:tty-200902201010> [retrieved on 20130319] *

Similar Documents

Publication Publication Date Title
KR102596446B1 (en) Modality learning on mobile devices
CN107039038B (en) Learning personalized entity pronunciation
CN106796788B (en) Improving automatic speech recognition based on user feedback
US8290775B2 (en) Pronunciation correction of text-to-speech systems between different spoken languages
US7974843B2 (en) Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer
US9640175B2 (en) Pronunciation learning from user correction
CN110110319B (en) Word level correction of speech input
US8380505B2 (en) System for recognizing speech for searching a database
EP3032532A1 (en) Disambiguating heteronyms in speech synthesis
US8423351B2 (en) Speech correction for typed input
US20060293889A1 (en) Error correction for speech recognition systems
US11093110B1 (en) Messaging feedback mechanism
JP2015153108A (en) Voice conversion support device, voice conversion support method, and program
CN111540353B (en) Semantic understanding method, device, equipment and storage medium
KR20090019198A (en) Method and apparatus for automatically completed text input using speech recognition
KR20200125735A (en) Multi-party conversation recording/output method using speech recognition technology and device therefor
CN107632982B (en) Method and device for voice-controlled foreign language translation equipment
US20150310853A1 (en) Systems and methods for speech artifact compensation in speech recognition systems
US20170337922A1 (en) System and methods for modifying user pronunciation to achieve better recognition results
Maskeliunas et al. Voice-based human-machine interaction modeling for automated information services
EP3241123B1 (en) Voice recognition-based dialing
US7430503B1 (en) Method of combining corpora to achieve consistency in phonetic labeling
US20170337923A1 (en) System and methods for creating robust voice-based user interface
WO2013167934A1 (en) Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages
Sharma et al. Exploration of speech enabled system for English

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12726199

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12726199

Country of ref document: EP

Kind code of ref document: A1