WO2013167934A1

WO2013167934A1 - Methods and system implementing intelligent vocal name-selection from directory lists composed in non-latin alphabet languages

Info

Publication number: WO2013167934A1
Application number: PCT/IB2012/052258
Authority: WO
Inventors: Ioannis KAMATAKIS
Original assignee: Mls Multimedia S.A.
Priority date: 2012-05-07
Filing date: 2012-05-07
Publication date: 2013-11-14

Abstract

The invention relates to methods and corresponding processing and communication devices implementing them for intelligent, offline, speaker-independent, vocal name-selection specifically targeted to directory lists constructed in non-"ISO basic Latin" alphabet languages (NLAL). Initially, a pre-existing directory list of names residing in the user's communication device is pre-processed and transformed (based on rules from the NLAL grammar, spelling and vocabulary) to an "intermediate database of search items" (IDSI). This database is updated every time a new directory entry is inserted or an existing one is edited. The vocal name selection is carried out as follows: the user speaks out a name (or a command following by a name) near the microphone of the communication device. The user's vocal input is processed by an Automatic Speech Recognition (ASR) engine, optimized for the specific NLAL. The ASR's output is then compared against the IDSI and a list of best matches is presented (via a GUI) sorted to the user, who finally completes the selection by hand or by voice. The rules applied to the pre-processing of the directory list are particular to the NLAL and relate to: composing the directory list with characters from NLAL and/or Latin alphabets; identifying a name among different forms of NLAL names; considering accentuation, phonetic or optical similarity (with respect to the NLAL alphabet), semantic equivalence grouping, partial matching, phonetic equivalence classes between letters and NLAL phonemes, special characters and common spelling errors.

Description

METHODS AND SYSTEM I MPLEMENTI NG I NTELLI GENT VOCAL NAME-SELECTI ON FROM Dl RECTORY LI STS COMPOSED I N NON- LATI N ALPHABET LANGUAGES

Fl ELD:

The present invention relates to human-machine interfaces and, particularly, to an improved method for speech recognition, a voice user interface and a corresponding communication device, which enables its users to communicate with an individual, who is selected from a directory list constructed with a mixed-alphabet of a "non-Latin-alphabet language" (NLAL, i.e., with a non "ISO basic Latin" alphabet [02], such as Greek, Russian, Turkish, Arabic, Chinese, Hebrew, etc.).

BACKGROUND:

The statements in this section merely provide background information related to the present disclosure and may not constitute a complete prior art reference.

A. The Problem :

Communication devices, such as e.g. mobile telephones and computers in general, have undergone a multitude of improvements in their functionality, capabilities and user interfaces, during the last few years, in order to facilitate efficient and user-friendly communication services for their users. Today, such communication devices are used to call or send SMS or send a fax to another telephone user, or to send an e-mail to another e-mail recipient. Finding the desired recipient of one's message often involves a selection process through a directory list, or lookup catalog, or phonebook. Such lists may be quite long, sometimes containing several hundred or thousand entries, so that, browsing through them may be a time- consuming and tiresome or annoying process for the user. To ease this process, a variety of selection methods have been proposed and implemented at times, most of them involving scrollable lists and hierarchical alphabetic shortcuts, all requiring user input by means of either a touch-screen of a keyboard or special buttons. To facilitate this list selection for hands-free operation (functionality especially useful e.g. for drivers, blind or handicapped people), vocal user-interfaces have also been proposed, only requiring from the user to be able to hear and speak, in order to select something from a list of items.

Vocal user interfaces (VUI) typically consist of an input device, such as a microphone, an Automatic Speech Recognition (ASR) unit, which decodes the sound signals and transforms them into an intermediate data format, and appropriate software to process the ASR output using some match-and-select methodology. As speech recognition is user and language dependent, the performance and degree of accuracy of various VUIs may sometimes be unsatisfactory. Moreover, as various directory lists are constructed with various styles, alphabets, phonetics, semantics, etc., matching a speaker's vocal input against such a directory list often results in poor performance and recognition accuracy.

To improve the performance, speed and recognition accuracy of this matching and selection process of an entry from a directory list, it is necessary to enable both a high performance ASR and an intelligent algorithm that recognizes a plethora of equivalent variations of the user input, corresponding to the entries of the directory list.

Although the above problems have been adequately addressed for the English language, this is not the case for most "non-Latin-alphabet languages" (NLAL), i.e., languages with non-"ISO basic Latin" alphabets [02]. NLALs can be expressed in written with English transliteration, i.e., with "ISO basic Latin" alphabet characters (e.g. Greek-English or Russian-English) alphabets. The country-specific alphabet may also be used in combination with the Latin alphabet. Consider, for example, the Greek (or Russian) language, in which, words may be written with Greek (Cyrillic, respectively) alphabet characters and others with Latin alphabet characters, etc. Also, it often happens that a directory list, constructed by a user in some NLAL, contains certain grammatical characteristics not existing in the English language. For example, Greek words and names have accentuation and gender-sensitive endings, e.g. Kefalas is different from Kefalas and Papadopoulos (male) is different from Papadopoulou (female). Greek words and names may also be spelled with phonetic, grammatical or optical similarity with respect to the Greek character set, e.g.: the name "Xenofon" may be written by a Greek user as "Ksenofon" (phonetic), or "Xenophon" (grammatical), or "3enofwn" (optical).

The combination of all these name variations, met particularly in NLAL directory lists, creates a much more complex problem space, which makes its solution more difficult and challenging, in order to achieve satisfactory and intelligent voice recognition and directory entry selection results. All methods and solutions proposed until today have been limited in scope, requiring the user's vocal input to match the directory listing as much as possible, without variations and often failing to successfully recognize names, if they appeared in slightly different style and format in the directory list.

The present invention provides an effective solution to the aforementioned problem for NLALs. using an intelligent algorithm, which achieves improved performance, higher recognition rates and better selection accuracy, compared to other similar solutions

B. Related State-of-the-Art : Speech Recognition

Speech recognition systems are computer systems that map spoken utterances to strings of words [01]. To achieve this transformation of recorded audio input to written representation, several smaller processes are applied sequentially: The first step in the processing sequence is to transform the recorded audio input into frequency spectrums by means of a Fast Fourier Transformation. The spectrums are passed on to a Hidden Markov Model which is a statistical, model determining the most probable phoneme sequence. This phoneme sequence is forwarded on to a language model. There are two different models whose usage mainly depends on the task the speech recognition is applied for. One of the language models is the statistical n-gram approach, often used in open-domain speaker-dependent dictation tasks. The second language model consists of rule-based grammars. Their core features are hand-written rules which define the acceptable utterances for a system exactly [01]. The present invention focuses on grammars as well as on alphabets and vocabularies, as they form the better suited language model in closed-domain speaker-independent dialogue systems.

Speech recognition technology [03] is used more and more for telephone applications like travel booking and information, financial account information, customer service call routing, and directory assistance. Such applications can achieve remarkably high accuracy by using constrained grammar recognition. Research and development in speech recognition technology has continued to grow as the cost for implementing such voice-activated systems has dropped and the usefulness and efficacy of these systems has improved.

Furthermore, speech recognition has enabled the automation of certain applications that are not automatable using push-button Interactive Voice Response (IVR) systems [04], like directory assistance and systems that allow callers to "dial" by speaking names listed in an electronic phone book.

Many communication devices today offer speaker-independent speech control. In the context of speech control, the user enters command words such as "dial", "phone book", "emergency", "reject" or "accept" in a manner which is similar to the customary speed dialing. The communication application associated with these command words can be used directly in the corresponding manner by a user, without the user personally having to train the system for this purpose beforehand using this set of words.

However convenient this factory-predetermined vocabulary of command words and possibly also names may be for the user, it does not replace a user-specific adaptation, e.g. the insertion of new commands. This applies particularly in the case of name selection, i.e. a special speech control, wherein specific numbers are dialed when the name is spoken. Therefore devices of greater complexity offer a speaker-dependent speech control in addition to a speaker-independent speech control.

A speaker-dependent speech recognition system is optimized in relation to the speaker concerned, since it must be trained in the voice of the user before its first use. This is known as "say-in" or training and it is used to create a feature vector sequence from at least one feature vector.

Romanization in NLALs

In linguistics, Romanization [05] or latinization is the representation of a written word or spoken speech with the Roman (Latin) script, or a system for doing so, where the original word or (non-Latin alphabet) language (NLAL) uses a different writing system. Methods of Romanization include transliteration [06], for representing written text, and transcription [07], for representing the spoken word. The latter can be subdivided into phonemic transcription [08], which records the phonemes or units of semantic meaning in speech, and more strict phonetic transcription [09], which records speech sounds with precision. Each language Romanization has its own set of rules for pronunciation of the Romanized words.

One example of Romanization is Greeklish [10]. Greeklish, is a combination of the words Greek and English (also known as Grenglish, Latinoellinika, or ASCII Greek), and refers to the Greek language written using the Latin alphabet. This type of Romanization mainly captures informal, ad-hoc practices of writing Greek text in environments, where the use of the Greek alphabet is technically impossible or cumbersome, especially in electronic media. The present invention proposes a method to intelligently deal with these ad-hoc practices of users of mobile phones and other communication devices.

A long list of global Romanization standards can be found in [05] for several languages, including Greek, Russian, Turkish, Arabic, Persian, Chinese, Hebrew, Indie, Japanese, Korean, Thai, Vietnamese, Bulgarian, Ukrainian, etc. However, in practice, people tend to apply Romanization in ad-hoc, rather than in standard ways, which is indicative of the complexity of the problem this invention provides a solution for, among other linguistic characteristics of NLALs.

References

[01] Hanne Marie Kosinowski, "Modular Grammars for Speech Recognition in Ontology-Based Dialogue Systems", Bachelor Thesis, Faculty of Computational Linguistics, Saarland University, Aug. 2010.

[02] http://en.wikipedia.org/wiki/ISO basic Latin alphabet, Wikipedia, "ISO basic Latin alphabet". [03] http://en.wikipedia.org/wiki/Speech recognition, Wikipedia,

"Speech Recognition".

[04] http://en.wikipedia.Org/wiki/l nteractive voice response. Wikipedia,

"Interactive voice response".

[05] http://en.wikipedia.org/wiki/Romanization , Wikipedia,

"Romanization".

[06] http://en.wikipedia.org/wiki/Transliteration. Wikipedia,

"Transliteration".

[07] http://en.wikipedia.org/wiki/Transcription (linguistics) , Wikipedia, "Transcription".

[08] http://en.wikipedia.org/wiki/Phonemic orthography. Wikipedia,

"Phonemic orthography".

[09] http://en.wikipedia.org/wiki/Phonetic transcription. Wikipedia,

"Phonetic transcription".

[10] http: //en .wikipedia.org/wiki/Greekiish, Wikipedia, "Greeklish".

C. Selected relevant I nventions

The following inventions were found to be related to our present invention, in terms of the method and systems described herein:

1. W09926232 (A1), Naumburger Volkmar [DE] - "Device and methods for speaker-independent spoken name selection for

telecommunications terminals", 19/11/1997, 27/5/1999.

2. KR20080107376 (A), Ruwisch Dietmar [DE] - "Communication device having speaker independent speech recognition", 14/2/2006,

3/8/2007.

3. US7475017 (B2), Ju Yun-Cheng [US] - "Method and apparatus to

improve name confirmation in voice-dialing systems", 27/7/2004, 6/1/2009.

4. CN2626149 (Y), Wu Zhenli [CN], Gong Liheng [CN] - "Speech

recognition controlled dialing telephone set", 26/6/2003, 14/7/2004.

5. KR20010079272 (A), Yu Seung Hyuk [ KR] - "System for executing dialing through name speech recognition and managing telephone directory using wire telephone and mobile phone in remote speech recognition server", 28/6/2001, 22/8/2001.

6. US6963633 (B1), Diede William F [US], Bechtel Kay L [US] - "Voice dialing using text names", 7/2/2000, 8/11/2005.

7. CN2415556 (Y), Chen Xiuzhi [CN] - "Voice identifying, inquiring and dialing telephone directory", 13/3/2000, 17/1/2001.

8. US6260012 (B1), Parkjoung-Kyou [KR] - "Mobile phone having

speaker dependent voice recognition method and apparatus",

27/2/1998, 10/7/2001. All of the aforementioned inventions are about voice recognition, voice- dialing or vocal name-selection, but they differ from our invention in one or more of the following aspects: they do not handle effectively directories composed in NLAL, and/or they are not offline methods (i.e. they require a remote server communication connection), and/or they are speaker-dependent methods. D. Selected relevant Technologies and Products

There is a long list of products and related technologies for voice, hands- free, digital-assistant applications, such as: Iris, vLingo, Siri, Skyvi, Speaktoit Assistant, Andy, Sonalight Text by Voice, Jeannie, VoCon, AIVC, TiKL, EVA Intern, Gosms, Dropbox, Voice Search, Voice Actions, Voice Commander, etc. Five of the most representative, relevant and well-known ones are presented below, along with their differences compared to our invention.

1. Nuance's VoCon® 3200

http://www.nuance.com/for-business/by-product/automotive-products- services/vocon3200/ index .htm

VoCon® 3200 is Nuance's speaker-independent, continuous speech recognition engine, supporting recognition of natural, conversational input in over 30 languages, large vocabularies, dynamic content, such as music titles, noise-robust front-end, etc.

Differences: This technology is an ASR technology, without the overlay layer for intelligent directory entry selection specially targeted to NLALs.

2. vLingo Virtual Assistant

http://www.vnngo.com/

Vlingo is a virtual assistant that turns spoken words into action by combining voice-to-text technology, natural language processing, and

Vlingo's Intent Engine to understand the user's intent and take the appropriate action. The user speaks to his/her phone and connects with people, businesses and various activities.

http://www.ylingo.com/content/screenshots

http://bloQ.ylingo.com/ylingo-lanauage-beta/ (for new foreign languages, no Greek or other NLALs yet)

Differences: Vlingo does not provide intelligent directory-entry selection specially targeted to NLALs. 3. Siri (iPhone)

http://www.apple.com/iphone/features/siri-faq.htm I

Siri is an intelligent personal assistant for the iPhone (4S), that helps the user to get things done just by asking. It allows the user to use his/her voice to send messages, schedule meetings, place phone calls, etc. Siri understands natural speech, and asks the user questions if it needs more information to complete a task. It communicates with Apple's data centers to perform its functionality and return a response. Siri understands and can speak English, French, German, and Japanese and it is expected to also support additional languages, including Chinese, Korean, Italian, and Spanish.

Differences: Siri does not support intelligent directory-entry selection specially targeted to NLALs. It also does not work offline (without connection to a remote server). 4. Skyvi (Android) http://www.skyviapp.com/

Skyvi is an intelligent personal assistant for the Android-based systems, similar to iPhone's Siri, featuring voice-texting, finding and calling places, get directions, calling contacts, location reminder beacons, Facebook / Twitter, question asking with voice, etc.

Differences: Skyvi does not yet support intelligent directory-entry selection specially targeted to NLALs. It also does not work offline (without connection to a remote server). 5. Voice Commander / Voice Dialer , (by Cyberon Corp.)

http://www.cyberon.com .tw/ en index .php

Cyberon Voice Commander is a speech dialog system that provides natural human interface for users to communicate seamlessly with mobile devices. Through Voice Commander, users can make phone calls, look up contact info, launch program or check for calendars. It features speaker-independent voice recognition technology, voice control of name/digit dialing, supports several worldwide languages including English, German, French, Italian, Spanish, Portuguese, Brazil Portuguese, Russian, Turkish, Polish, Cantonese, etc., (http://www.cyberon.com.tw/pro-solSL.php).

http://www.cvberon.com .tw/flash demo.php

http://www.cyberon.com .tw/order Product con.php?N0100= 15

Cyberon Voice Dialer provides speaker-independent speech recognition and text-to-speech technology to work on all phone platforms and accomplish features, such as voice dialing, contact look-up, and shortcuts launch.

Differences: Voice Commander does not support intelligent directory- entry selection specially targeted to NLALs, with improved performance characteristics.

DETAI LED DESCRI PTI ON

The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.

The current invention operates on a mobile telephone or, more generally, on a computing device with processing capabilities, storage capacity, network connectivity and ability to run communication application software, such as for phone calling, email sending, SMS- sending, fax-sending, etc. It is assumed that the user of such a system already has or progressively builds a list of directory entries, each of which contains at least the name of a person with whom the user wishes to communicate and possibly one or more telephone numbers, an e-mail address, etc. In this context, we consider directories constructed in some non-Latin alphabet language (NLAL), possibly using the Latin and the language- specific alphabets interchangeably.

Typically, a user (e.g. a telephone caller) chooses the desired name entry, by manually selecting it via a user interface consisting of a search form or a drop-down list, combined with a touch screen, a keyboard or some buttons. The present invention replaces this selection process by a voice interface, combined with proper, intelligent, NLAL-specific processing of the directory list. Initially, a typical Automatic Speech Recognition (ASR) engine [101], preferably optimized for a specific target-NLAL (e.g., Greek, Russian, Turkish, Arabic, Chinese, Hebrew, etc.), is installed in the target device. The user directory list may consist of name entries in different styles and formats which are very user-dependent. For example, they can be entered with surname-first or surname-last, with Greek or Latin or mixed alphabet, with or without accent ('), with different name endings, with a short (nick) name or a full name (e.g. Nikos and Nikolaos), or with equivalent title names (e.g. father and dad), etc. As a first step, such a directory list is pre-processed by means of proper software and an "intermediate database of search items" (IDSI) [102] is generated, consisting of a number of alternatives per directory entry with which this specific entry can be voiced as, by the user. The preprocessing of each directory entry comprises: language detection [201], normalization [202], inverting Romanization [203] (e.g., greeklish to Greek, volapuk to Cyrillic, etc.), accentuation checks [204], special character conversion [205], finding synonyms and hypocoristics [206], phonetic transcription [207] word-splitting and detection of first name and surname [208] .

After the pre-processing and transformation of the initial directory list, a proper software module is installed on the device, enabling it to also accept voice input as an alternative to touch, or mouse, or keyboard input, for directory entry selection. A software application running solely on the device implements the invention by performing the necessary intelligent processing of the user's voice commands in order to effectively select a name to communicate with and / or a command to execute. When the system (device + software) is operational, the procedure depicted in the flowchart of FIG. 3 is executed. According to this procedure, the device user may issue a voice command such as "dial George on mobile" or "send SMS to Nick Papadopoulos" (in Greek or other NLAL). Then, the system searches through the directory list and selects one or more entries that match the voice name or command as much as possible. This selection process is "speaker independent", i.e., without requiring the user to personally train the system for this purpose, beforehand. It is also "ASR agnostic", i.e., it may use any ASR engine, as long as it is adapted and optimized for the specific NLAL (FIG. 1). Moreover, as our method can handle each NLAL (as well as each non-NLAL) separately, as an individual case, apparently it can handle all languages together, thus being language-independent.

Hereafter, without loss of generality, the Greek language is used as an example, to simplify the presentation. In general, our methodology applies to any NLAL exhibiting analogous characteristics. Apparently, it can also be applied to all Latin-alphabet based languages as special (more simplified) cases, not satisfying any of these characteristics.

Names and commands, spoken by a user, in Greek, are recognized based on the following rules:

1. A directory entry may be recognized with standard or ad hoc Romanization rules, or even with combined use of Latin (Romanized) and non-Latin alphabets.

2. A name may be recognized in various forms (e.g.: Dimitri, Dhmhtrh, Demetres, etc.), by identifying phonetic equivalence classes between letters, digraphs and phonemes in the Greek language, (e.g. { I, H, Y} , { E and Al} , etc.)

3. A name may be recognized in various forms (e.g.: mom, mother, mama, metera, etc.) [in Greek], by defining equivalence classes based on the semantics of various names or titles in the Greek language.

4. The lack of accentuation of the directory entries can be ignored or corrected based on a dictionary of common proper names and grammar rules.

5. Special characters, such as { ^* , &, $, ...} , can be ignored, or they can be replaced with equivalent notation, e.g. "&" with {"and" or "ke" [Greek]} .

6. Spelling errors in the directory entries, e.g.: "Menlaos" instead of "Menelaos" (missing letter "e"), etc., can be overcome or corrected using database lookups.

7. Male and female gender names can be distinguished between one another, e.g.: "Maria Papadopoulos" should be "Maria Papadopoulou" (female gender). Subsequently, a search for "Papadopoulos" would match both "George Papadopoulos" and "Maria Papadopoulou".

8. A spoken name can be partially matched within some directory entry, e.g.: "Dimitri" in "Papadopoulos Dim itrios" .

9. Each directory entry may include the search term (e.g. a person's first name) either as a first or last component (sub-string), e.g. searching for "Maria" would match both "Maria Papadopoulou" and "Papadopoulou Maria".

10. The system may execute complex statements, consisting of a command ("dial", "call", "sms", etc.), a name and an optional phone type (e.g.: "mobile", "home", "office", "fax", etc.), e.g.: "dial, Dimitri Papadopoulos, office" [said in Greek] (FIG.4(b)).

The aforementioned method is implemented using rule-based grammar, database lookup and catalog lookup techniques. Given the user's spoken word or phrase, each directory entry is evaluated against it and it is given a score (value) between 0% and 100%. All directory entries with scores above a predefined (fixed) threshold (e.g. 60%) are considered as valid matches. If no directory entry is matched with a score above the threshold, then an error voice message is issued, prompting the user to repeat the name or command. If more than one directory entries are matched with scores above the threshold, then they may subsequently be presented to the user via a GUI [105], as a list for further manual (or by voice) selection (FIG. 4(a)). The list may be sorted as either "most-recently-used-first", or "most-frequently-used-first", or "highest-matching-score-first", or alphabetically, etc.

The selection process is complete either when the user selects a single directory entry, or cancels the entire process. The software may ask for confirmation of the user's final selection, with a predefined voice message or by speaking out (with voice synthesis) the user's selection. The aforementioned procedure is depicted in FIG. 3.

DESCRI PTI ON OF DRAW I NGS

The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.

FIGURE 1 depicts the main hardware and software components comprising the human-machine interface for the name selection from a directory list.

FIGURE 2 shows functional aspects of the implemented methodology and in particular the one-time pre-processing of each directory (catalog) entry.

FIGURE 3 is a flowchart of the procedure of vocal name selection by a speaker from a directory list.

FIGURE 4 shows two simulated screenshots of a typical mobile telephone, executing the directory list selection: a) prompting the speaker with select among 3 best matches, [left], b) prompting the speaker to select the telephone type to use, [right].

ACRONYMS

ASR Automatic Speech Recognition

GUI Graphical User Interface

I DSI Intermediate Database Search Items

NLAL Non-Latin Alphabet Language

RBPP Rule-Based Pre-Processing

SMS Short Message System

VUI Vocal User Interface

Claims

CLAI MS The invention claimed is:

1. A method for intelligent, speaker-independent, language-independent speech recognition for the selection of an entry from a directory list (e.g., from a phonebook or from a name catalog) constructed in a non-"ISO basic Latin" alphabet language (NLAL), comprising:

• "rule-based pre-processing" (RBPP) of a pre-existing user directory list, i.e., a transformation (based on the NLAL's particular grammar, spelling and vocabulary) that produces an "intermediate database of search items" (IDSI), consisting of vectors of equivalent terms, with each vector corresponding to multiple directory list entries and each directory list entry potentially corresponding to multiple vectors;

• installing on a mobile phone, or other computing device with storage, processing and communication capabilities, a software module that enables the device to accept voice input;

• receiving the speaker's voice and processing it via a generic

Automatic Speech Recognition (ASR) engine, which is optimized for the corresponding NLAL;

· matching the ASR's output against the said I DSI vectors;

• assigning a score to each said match for every directory list entry;

• selecting the "best matches", i.e. the said matches with a score greater or equal to a fixed, predefined threshold value;

• presenting the user with a sorted list of "best matches" (if any) to select from ;

• enabling the user to finally select a single directory list entry out of the list of "best matches", either manually or by voice;

• applying the said RBPP to any subsequent modification of the directory list performed by the user, (i.e., addition of new directory entries, or editing of existing ones), thus resulting in consequent updates of the said IDSI;

• and completely offline operation, i.e. without the need for

connection and data transfer with a remote or external server; wherein the said RBPP uses rules: a) to identify NLAL words or names written with characters from the NLAL's alphabet, or from the Latin alphabet (with transliteration), or from a combination of the two;

b) to identify (and to potentially correct) different forms of NLAL

names based on their gender (e.g., Papadopoulos, Papadopoulou) ; c) to identify (and to potentially correct) a name among different

forms of NLAL names, with or without accentuation, or with phonetic similarity (e.g., "Mihalis", "Michalis") , or with optical (with respect to the NLAL alphabet) similarity (e.g., "Mihalis", "Mixalis")

(also referred to as "phonetic and phonemic transcription"), or with a range of most common spelling errors according to the NLAL grammar rules; d) to group (and consider as equivalent) different names, titles, or words, with similar semantics, such as synonyms and

hypocoristics, e.g. {"mom", "mama", "meetera"} or {"Nikos Papadopoulos", "boss", "manager"} .

2. A method, as claimed in claim 1 , wherein the name or word spoken by the user may fully or partially match some directory list entry or some vector term of the said IDSI, and may appear within such an entry or vector term in any particular order.

3. A method, as claimed in claim 2, wherein the said list of "best

matches" is sorted in order of: "most-recently-used-first", or "most- frequently-used-first", or "highest-matching-score-first", or alphabetically.

4. A method, as claimed in claim 3, wherein the said selection process may refer and apply to telephone calling, or e-mail sending, or SMS sending, or fax sending.

5. A method, as claimed in claim 4, wherein the said system may

execute complex statements, spoken by the user, consisting of a command prefix (e.g., "dial", "call", "sms", etc.), a name (as destination) and an optional destination/target type (e.g.: "mobile", "home", "office", "fax", etc.). E.g.: "dial, Dimitri Papadopoulos, on mobile".

6. A method, as claimed in claim 5, wherein the entire selection process may also be performed via an online connection with an external or remote server, where the required data processing is performed.

An electronic device with data processing, data storage,

communication, and voice recognition capabilities, with installed system software and application software, altogether implementing the methods, as claimed in claims 1-6.