WO2009007131A1 - Speech control of computing devices - Google Patents

Speech control of computing devices Download PDF

Info

Publication number
WO2009007131A1
WO2009007131A1 PCT/EP2008/005691 EP2008005691W WO2009007131A1 WO 2009007131 A1 WO2009007131 A1 WO 2009007131A1 EP 2008005691 W EP2008005691 W EP 2008005691W WO 2009007131 A1 WO2009007131 A1 WO 2009007131A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
words
word
input
context mapping
Prior art date
Application number
PCT/EP2008/005691
Other languages
French (fr)
Inventor
Ezechias Emmanuel
Original Assignee
Vandinburg Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/843,982 external-priority patent/US20090018830A1/en
Application filed by Vandinburg Gmbh filed Critical Vandinburg Gmbh
Publication of WO2009007131A1 publication Critical patent/WO2009007131A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Definitions

  • the invention relates to techniques for controlling computing devices via speech and is applicable to different computing devices such as mobile phones, notebooks and other mobile devices as well as personal computers, gaming consoles, computer-controlled machinery and other stationary devices.
  • Controlling computing devices via speech provides for a human user or operator a fast and easy way of interacting with the device; for example, the time-consuming input of commands via keypad or keyboard can be omitted and the hands are free for other purposes such as moving a mouse or control lever or performing manual activities like carrying the device, carrying goods, etc. Therefore, speech control may conveniently be applied for such different operations as controlling mobile phones, gaming consoles or household appliances, but also for controlling machines in an industrial environment.
  • today's speech control systems require that the user inputs a command via speech which he or she would otherwise enter by typing or by clicking on an appropriate button.
  • the input speech signal is then provided to a speech recognition component which recognizes the spoken command.
  • the recognized command is output in a machine-readable form to the device which is to be controlled.
  • a typical speech control device may store some pre-determined speech samples representing, for example, a set of commands.
  • a recorded input speech signal is then compared to the stored speech samples.
  • a probability calculation block may determine, based on matching the input speech signal to the stored speech samples, a probability value for each of the stored samples, the value indicating the probability that the respective sample corresponds to the input speech signal. The sample with the largest probability value will then be selected.
  • Each stored speech sample may have an executable program code associated therewith, which represents the respective command in a form that is executable by the computing device. The program code will then be provided to a processor of the computing device in order to perform the recognized command.
  • Speech recognition is notoriously prone to errors. In some cases, the speech recognition system is not able to recognize a command at all. Then the user has to decide whether to repeat the speech input or to manually input the command. Often, a speech recognition system does not recognize the correct command, such that the user has to cancel the wrongly recognized command before repeating the input attempt.
  • a method of controlling a computing device via speech comprises the following steps: Transforming speech input into a text string comprising one or more input words; comparing each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words; identifying, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and preparing an execution of the identified function.
  • the computing device may in principle be any hardware device which is adapted to perform at least one instruction.
  • a 'computing device 1 as understood herein may be any programmable device, for example a personal computer, notebook, phone, or control device for machinery in an industrial area, but also other areas such as private housing; e.g. the computing device may be a coffee machine.
  • a computing device may be a general purpose device, such as a personal computer, or may be an embedded system, e.g. using a microprocessor or microcontroller within an Application-Specific Integrated Circuit (ASIC) or Field-Programmable Gate Array (FPGA).
  • ASIC Application-Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the term 'computing device 1 is intended to include essentially any device which is controllable, e.g.
  • a hardware and/or software interface such as an Application Programming Interface (API)
  • API Application Programming Interface
  • machine-readable instruc- tions in the form of, e.g., an executable code which may be generated by a compiler, assembler or similar tool in any programming language, macro language, interpreter language, etc.
  • the executable code may be in binary form or any other machine- readable form.
  • the computing facility may be represented, e.g., in hardware, firmware, software or a combination thereof.
  • the computing device may comprise a microprocessor for controlling other parts of the device such as, e.g., a display, an actuator, a signal generator, a remote device, etc.
  • the function(s) for controlling the computing device may include some or all commands for the operating system of the computing device or for an application executed on the computing device, but may further include functions which are not directly accessible via a user interface but require an input on an expert level such as via a system console or command window.
  • the functions may express functionality in a syntax specific for an operating system, an application, a programming language, a macro language, etc.
  • a context mapping word may represent the entire function or one or more aspects of the functionality of the function the context mapping word is associated with.
  • the context mapping word may represent the aspect in textual form.
  • a context mapping word may be directly associated with a function or may additionally or alternatively be indirectly associated with a function; for example, the context mapping word may be associated with a function parameter.
  • Multiple context mapping words associated with a particular function may be provided in order to enable that the function may be identified from within different contexts.
  • the context mapping words associated with a function may represent different names (alias names) of the func- tion the context mapping words are associated with, or may represent technical and non-technical names, identifications or descriptions of the function or aspects of it.
  • the context mapping words may represent the function or one or more aspects of it in different pronunciations (e.g., male and female pronuncia- tion), dialects, or human languages.
  • the associations of context mapping words and functions may be represented in the context mapping table in different ways.
  • all controllable functions or function parameters
  • the associated context mapping words may be arranged in a row (column) corresponding to the position of the function in the function column.
  • one and the same context mapping word appears multiple times in the context mapping table in case it is associated with multiple functions.
  • each context word may be represented only one time in the context mapping table, but the correspondingly associated function appears multiple times.
  • each context mapping word and each function is represented exactly one time in the context mapping table and the associations between them are represented via links, pointers or other structures known in the field of database technologies.
  • the identified function may be executed immediately after the identification (or after the entire input text string has been parsed). Alternatively or in addition, the identified function may also be executed at a later time.
  • the function in the context mapping table has executable program code associated with it.
  • the step of preparing the execution of the identified function may then comprise providing an executable program code representing the identified function on the computing device.
  • the step of preparing the execution of the identified function comprises providing a text string representing a call of the identified function. The string may be provided immediately or at a later time to an interpreter, compiler etc. in order to generate executable code.
  • the step of identifying the function comprises, in case an input word matches a context mapping word associated with multiple functions, identifying one function of the multiple functions which is associated with multiple matching context mapping words. This function may then be used as the identified function.
  • the step of comparing each one of the one or more input words with context mapping words may comprise the step of buffering an input word in a context buffer in case the input word matches a context mapping word that is associated with two or more functions.
  • the step of buffering the input word may further comprise to buffer the input word in the context buffer including, for each of the two or more functions or function parameters associated with the input word, an indication of the function or function parameter.
  • the step of identifying the function may then comprise to compare indications of functions or function parameters of two or more input words buffered in the context buffer and to identify corresponding indications.
  • One variant of the method aspect may comprise the further step of comparing an input word with function names in a function name mapping table, in which each of the function names represents one of the functions for controlling the computing device.
  • the method in this variant may comprise the further step of identifying, in case the input word matches with at least a part of a function name, the function associated with the at least partly matching function name.
  • the function name mapping table may further comprise function parameters for comparing the function parameters with input words.
  • Entries corresponding to the same function or function parameter in the context mapping table and the function name mapping table may be linked with each other.
  • a linked entry in the function name mapping table may be associated with executable program code representing at least a part of a function.
  • the method comprises the further steps of compar- ing input words with irrelevant words in an irrelevant words mapping table; and, in case an input word matches with an irrelevant word, excluding the input word from identifying the function.
  • the irrelevant words mapping table may comprise, for example, textual representations of spoken words such as 'the 1 , 'a', 'please', etc.
  • the method comprises the preparatory steps of establishing a main words table comprising multiple word strings, each word string representing one or more potential input words in a string format and each word string being associated with a unique number; and establishing the context mapping table, wherein the context mapping words are represented by the unique numbers associated with the word string corresponding to the context mapping word in the main words table.
  • the step of comparing input words with context mapping words comprises the steps of representing each input word by its number as specified in the main words table; and comparing the number representations of input words and the number representations of context mapping words with each other in order to determine matches of input words to context mapping words.
  • the step of identifying the function comprises the steps of identifying for each matching context mapping word one or more functions associated therewith; determining for each function an attraction value indicating how often the function has been identified; and identifying the function with the highest attraction value.
  • the method comprises the preparatory step of establishing one or more auxiliary words tables, each auxiliary words table comprising associations of a primary word with one or more secondary words.
  • the step of comparing input words with context mapping words then comprises determining, based on the auxiliary words tables, if an input word matches with a secondary word, and, in case of a match, selectively replacing the input word with the primary word associated with the matching secondary word.
  • the secondary words may comprise at least one of synonyms, antonyms, word type representations, definitions of the re- spectively associated primary words, and phonetic representations of the secondary words.
  • This implementation may comprise the preparatory step of generating or updating at least one of the main words table and the auxiliary words tables based on an external dictionary.
  • the step of transforming the speech input into the text string is performed in a speech recognition device and the steps of comparing input words of the text string with context mapping words and identifying the function associated with a matching context mapping word are performed in a control device.
  • the method may then comprise the further step of establishing a data trans- mission connection between the remote speech recognition device and the control device for transmitting data comprising the text string.
  • a method of controlling a computing device via speech is proposed, wherein the method is performed in a control device and in a speech input device remotely arranged from the control device.
  • the method comprises the steps of transforming, in the speech input device, speech input into speech data representing the speech input; establishing a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the control device; and converting, in the control device, the speech data into one or more control commands for controlling the computing device.
  • control device and the speech input device are remotely arranged from each other does not necessarily include that these devices are arranged spatially or geographically remote from each other.
  • both devices may be located in the same building or room, but are assumed to be remotely arranged in case the data transmission connection is a connection configured for transmitting data between separate devices.
  • the data transmission connection may run over a local area network (LAN), wide area network (WAN), and/or a mobile network.
  • LAN local area network
  • WAN wide area network
  • mobile network for example, in case a mobile phone is used as speech input device and the speech input is transmitted using VoIP over a mobile network towards a notebook having installed a speech recognition/control application, the mobile phone and the notebook are assumed to be remotely arranged to each other even if they are physically located nearby to each other.
  • a computer program product comprises program code portions for performing the steps of any one of the method aspects described herein when the computer program product is executed on one or more computing devices.
  • the computer program product may be stored on a computer readable recording medium, such as a permanent or re- writeable memory within or associated with a computing device or a removable CD- ROM or DVD. Additionally or alternatively, the computer program product may be provided for download to a computing device, for example via a data network such as the Internet or a communication line such as a telephone line or wireless link.
  • a control device for controlling a computing device via speech.
  • the control device comprises a speech recognition component adapted to transform speech input into a text string comprising one or more input words; a matching component adapted to compare each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words; an identification component adapted to identify, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and a preparation component adapted to prepare an execution of the identified function.
  • the control device may be implemented on the computing device, which may be a mobile device such as a notebook, mobile phone, handheld, wearable computing devices such as head-up display devices, etc., or a stationary device such as a personal computer, household appliance, machinery, etc.
  • a control device for controlling a computing device via speech which comprises a data interface adapted to establish a data transmission connection between a remote speech input device and the control de- vice for receiving data comprising a text string representing speech input from the remote speech input device, wherein the text string comprises one or more input words; a matching component adapted to compare each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for control- ling the computing device and at least one of the at least one function is associated with multiple context mapping words; an identification component adapted to identify, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and a preparation component adapted to prepare an execution of the identi- fertil function.
  • a system for controlling a computing device via speech comprises a control device and a speech input device.
  • the speech input device is adapted to transform speech input into speech data represent- ing the speech input.
  • the control device is adapted to convert the speech data into one or more control commands for controlling the computing device.
  • Each of the speech input device and the control device comprises a data interface adapted to establish a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the control device.
  • a seventh aspect is related to a speech input device, wherein the speech input device is adapted for inputting and transforming speech input into speech data representing the speech input and the speech input device comprises a data transmission interface.
  • use of the speech input device is pro- posed for establishing, via the data transmission interface, a data transmission connection for transmitting the speech data to a remote computing device, wherein the computing device transforms the speech data into control functions for controlling the computing device.
  • An eighth aspect is related to a computing device including a speech recognition component for transforming speech input into control functions for controlling the computing device and a data reception interface for establishing a data reception connection. According to the eighth aspect, use of the computing device is proposed for receiving, via the data reception interface, speech data from a remote speech input device and for transforming the received speech data into control functions for controlling the computing device.
  • Fig. 1 schematically illustrates an embodiment of a control device for controlling a computing device via speech
  • Fig. 2 illustrates an embodiment of a context mapping component of the control device of Fig. 1;
  • Fig. 3 illustrates an embodiment of a context mapping table for use with the context mapping component of Fig. 2;
  • Fig. 4 illustrates an embodiment of a function name mapping table for use with the context mapping component of Fig. 2;
  • Fig. 5 illustrates an example of a text string representing a speech input
  • Fig. 6 illustrates a content of a context buffer used by the context mapping component of Fig. 2 when parsing the text string of Fig. 5;
  • Figs. 7A-7C illustrate contents of an instruction space used by the context mapping component of Fig. 2;
  • Fig. 8 schematically illustrates an embodiment of a control system for controlling a computing device via speech
  • Hg. 9 illustrates a first embodiment of a method of controlling a computing device via speech
  • Rg. 10 illustrates an embodiment of a context mapping procedure which may be performed within the framework of the method of Fig. 9;
  • Fig. 11 illustrates a second embodiment of a method of controlling a computing device via speech
  • Fig. 12 illustrates a further embodiment of a context mapping stage
  • Figs. 13a)-c) illustrate a further example of the processing of an exemplary input sentence by the context mapping stage of Fig. 12.
  • This may include, for example, network- based and/or client-server based scenarios, in which at least one of a speech recognition component, a context mapping component and, e.g., an instruction space for providing an identified function is accessible via a server in a Local Area Network (LAN) or Wide Area Network (WAN).
  • LAN Local Area Network
  • WAN Wide Area Network
  • Hg. 1 schematically illustrates an embodiment of a control device 100 for controlling a computing device 102 via speech.
  • the computing device 102 may be a personal computer or similar device including an operating system (OS) 104 and an application (APP) 106.
  • the computing device 102 may or may not be connected with other devices (not shown).
  • the control device 100 includes a built-in speech input device comprising a microphone 108 and an Analogue-to-Digital (A/D) converter 110 which digitizes an analogue electric signal from the microphone 108 representing a speech input by a human user.
  • the A/D converter 110 provides the digital speech signal 112 to a
  • Speech recognition (SR) component 114 operates to transform the speech signal 112 into a text string 116 which represents the speech input in a textual form.
  • the text string 116 comprises a sequence of input words.
  • the text string 116 is provided to a context mapping component 118, which converts the text string 116 into one or more control functions 120 for controlling the computing device 102.
  • the control functions 120 may comprise, e.g., one or more control commands with or without control parameters.
  • the context mapping component 118 operates by accessing one or more databases; only one database is exemplarily illustrated in Hg. 1, which stores a context mapping table (CMT) 122. The operation of the context mapping component 118 will be described in detail further below.
  • CMT context mapping table
  • the control function or functions 120 resulting from the operation of the context mapping component 118 are stored in an instruction space 124.
  • the operating system 104 or the application 106, or both, of the computing device 102 may access the instruction space 124 in order to execute the instructions stored therein, i.e. the control functions which possibly include one or more function parameters.
  • the functions 120 stored in the instruction space 124 may for example be represented in textual form as function calls, e.g., conforming to the syntax of at least one of the operating system 104 and the application(s) 106.
  • a specific software-API may be defined, to which the functions (instructions) 120 conform.
  • the instruction space 124 may also store the control functions 120 in the form of a source code (one or more programs), which has to be transformed into an executable code by a compiler, assembler, etc. before execution.
  • the control functions may be represented in the form of one or more executable program codes, which do not require any compilation, interpretation or similar steps before execution.
  • the control device 100 and the computing device 102 may be implemented on a common hardware.
  • the control device 100 may be implemented in the form of software on a hardware of the computing device 102 running the operating system 104 and one or more applications 106.
  • the control device 100 is implemented at least in part on a separate hardware.
  • software components of the control device 100 may be implemented on a removable storage device such as an USB stick.
  • the control device is adapted to store the control functions 120 on a removable storage, for example a removable storage disk or stick. The removable storage may then be provided to the computing device 102 in order that the computing device 102 may load the stored control functions into the instruction space 124, which in this scenario belongs to the computing device 102.
  • the control device 100 may send the control functions 120 via a wireless or hardwired connection to the computing device 102.
  • Fig. 2 illustrates in more detail functional building blocks of the context mapping component 118 in Fig. 1.
  • the context mapping component 118 comprises a matching component 202, an identification component 204 and a number of databases, namely the database storing the context mapping table 122 and further databases for storing a context buffer 206, an irrelevant words mapping table 208 and a function name mapping table 210.
  • Both components 202 and 204 may provide control functions and/or parameters thereof to the instruction space 124 (cf. Fig. 1).
  • the context mapping component 118 may receive a text string 116 from the Speech recognition component 118.
  • the text string may comprise one or more input words 212 (Fig. 2).
  • the matching component 202 is, amongst others, adapted to compare each one of the one or more input words 212 with context mapping words stored in the context mapping table 208.
  • the example context mapping table 122 is in more detail depicted in Fig. 3.
  • the table 122 in Fig. 3 comprises in column 302 function identification numbers (IDs), wherein each function ID references one and exactly one function which may be performed to control the computing device 102 in Hg. 1. Consequently, each row of the table 122 corresponding to an entry of a function ID in column 302 is assigned to a particular function.
  • IDs function identification numbers
  • context mapping words (CMW, CMW_0, ).
  • the number of context mapping words associated with a function may be from 1 to a maximum number, which may be given for any particular implementation. For example, the maximum number may be 255.
  • the function ID "1" in row 306 of table 122 may refer to a function "ScanFile", which may be performed on the computing device 102 in order to scan all files on the computer fur the purpose of, e.g., finding a particular file. Between 1 and the maximum number of context mapping words may be associated with the function ScanFile. In the simple example table 122, only two context mapping words are associated with this function, namely as CMWJ) the word "scan" and as CMW_1 the word "file”.
  • the function ID "2" may refer to a function Scan- Drive to scan the drives available to the computing device 102; as context mapping words CMWJ) and CMW_1, the words “scan” and “drive” are associated with this function.
  • the function ID "3" may refer to a function "ScanIPaddress”, which may be provided in the computing device 102 to scan a network in order to determine if a particular computer is connected therewith.
  • the context mapping words CMWJ), CMW_1 and CMW_2 associated with this function are the words “scan", "file” and "computer".
  • a context mapping table may also define associations of context mapping words with function parameters.
  • a corresponding example is depicted in Fig. 3 with row 312 of table 122.
  • the human name "Bob" as context mapping word is associated with ID "15".
  • the ID may be assigned, e.g., to the IP address of the computer of a human user named Bob.
  • various context mapping words are defined which a human user may use to express that a device such a computer is turned or switched on or off.
  • the parameter ID 134 may thus refer to a function parameter "ON” and the parameter ID 135 may refer to a function parameter "OFF”.
  • the context mapping table 122 in Fig. 3 is structured such that a function (or its ID) is represented in the table only once. Then, a context mapping word relevant for multiple functions may occur several times in the table.
  • the context mapping word "scan" is associated with three functions in table 122, namely the functions referenced with IDs 1, 2, and 3 in lines 306, 308 and 310.
  • Other embodiments of context mapping tables may be based on a different structure. For exam- pie, each context mapping word may be represented only once in the table. Then, the functions (or their IDs) would appear multiple times in the table. With such a structure, the CMW "scan” would appear only once, and would be arranged such that the associations with the function IDs 1, 2 and 3 are indicated. The function ID "1" would appear two times in the table, namely to indicate the associations of the CMWs "scan” and "file” with this function. Other mechanisms of representing associations of context mapping words with control functions may also be deployed.
  • the matching component 202 of control device 118 may also be adapted to employ the irrelevant words mapping table 208 when parsing the input words 212.
  • This table 208 may comprise, in textual form, words which are assumed to be irrelevant for determining control functions. For example, articles such as "the” and words primarily required for grammatical or syntactical reasons in human language sentences such as "for", "if” etc. may be represented as irrelevant words in the irrelevant words mapping table 208.
  • the matching component 202 may discard the input word from further processing, such that the word is excluded from identifying the function.
  • the matching component 202 may further be adapted to employ the function name mapping table 210 when parsing the input words 212.
  • Fig. 4 illustrates an example embodiment 400 of the function name mapping table 210.
  • the table 400 comprises a function ID column 402 similar to column 302 in context mapping table 122 in Fig. 3.
  • a further column 404 comprises, for each of the function IDs in column 402, the associated function name in textual form.
  • the function ID "1" is associated with the function name "ScanFile", which may represent the file scanning func- tionality already described above.
  • the function name mapping table 400 thus represents the mapping of function IDs to functions as used (amongst others) in the context mapping table 122 in Fig. 3.
  • the matching component 202 and the identification component 204 may thus access the function name mapping table 400 also for resolving function IDs into function names before putting a function call to the instruction space 124.
  • the table 400 also allows resolving parameter IDs. For example, the ID "15" is assigned to the IP address 127. 0.0.7. which in the example implementation discussed here may be the IP address of the computer of the human user Bob in a network the computing device 102 is connected with (compare with table 3 in Hg. 3, row 312). Further, the parameter IDs 134 and 135 are resolved to function parameters "ON" and "OFF", respectively (see lines 314 in Fig. 3).
  • the textual representation of a function in column 404 may be such that it can be used as at least a part of a call for this function.
  • the column 404 may include the textual representation "ScanFile” because the operating system 104 of computing device 102 in Fig. 1 is adapted to handle a function call such as "ScanFile ([parameter I]; [parameter 2])". Brackets "(", ")” and separators ";” may be added to the function call in later steps, as will be described below.
  • a textual representation such as "Scan-File” or "Scan File” could not be used as a valid function call in this example, and such representations may therefore not be included in the function name mapping table.
  • the function name mapping table may also provide access to an executable program code for executing a function. This is also illustrated in Fig. 4, wherein a function ID "273" is associated with a pointer "*ls", which may point to an executable code for listing the content of a directory.
  • the executable program code may be provided to at least one of the control device 100 and the computing device 102, e.g., in the form of one or more program libraries.
  • the matching component 202 processes each of the input words 212 in the text string 116. In case a present input word is found in the irrelevant words mapping table 208, the input word is discarded. In case a present input word matches with a context mapping word in context mapping table 122, the matching component 202 buffers the input word in the context buffer 206. In case the input word directly matches with a function call in the function name mapping table 210, the matching component 202 may immediately prepare an execution of the corresponding function by, e.g., providing the textual representation of the function call specified in column 404 of table 400 or an executable program code or a link thereto to the instruction space 124.
  • the matching component 204 may immediately place a function or a function parameter in the instruction space 124 in case an input word matches unambiguously with a function or a function parameter name given in the function name mapping table 210.
  • a function or a function parameter name given in the function name mapping table 210.
  • the human user speaks an IP address such as that reference with ID "15" in the example function name mapping table 400 in Fig. 4.
  • the matching component 202 may instantly provide this parameter to the instruction space 124.
  • an input word may also match unambiguously with a function or function parameter in the context mapping table 122. This may be the case if a present input word matches with a context mapping word which is associated with only one function or function parameter (other functions or function parameters the context mapping word is associated with may be ruled out for other reasons). In this case also, the matching component 202 may instantly provide the function or function parameter to the instruction space 124.
  • the matching component 204 After the matching component 204 has finished parsing the available input words 212, it provides a trigger signal to the identification component 204.
  • the identifica- tion component 204 works to resolve any ambiguity which may occur due to the fact that in the context mapping table a context mapping word may be associated with multiple control functions, i.e. one or more input words cannot be matched unambiguously to one or more functions or function parameters.
  • the identification component 204 accesses the context mapping words which have been buffered in the context buffer 206.
  • the component 204 identifies a function by determining buffered context mapping words associated with the same function.
  • a textual representation 502 of an example sentence is given which a user may speak.
  • Line 504 in Fig. 5 indicates results of the processing of each of the input words of sentence 502 in the matching component 202 of Fig. 2.
  • the words "please”, “the”, “for”, “if”, "it”, “is” have been identified as irrelevant (indicated as "irr.” in line 504) words, e.g. because these words are represented as irrelevant words in the irrelevant words mapping table 208. These words will not be considered in the further processing.
  • the input word "scan” of sentence 502 is represented as a context mapping word multiple times in the example context mapping table 122, in which "scan” is associated with the function IDs 1, 2 and 3 (reference numbers 306, 308, 310).
  • the further input words "network” and "computer” of sentence 502 are also context mapping words associated with function IDs in table 122, namely with ID "3" (the words found by the matching component 202 to be included in the context mapping table 122 are marked “context” in line 504 in Fig. 5).
  • the content of the context buffer 206 after the matching component 204 has parsed the entire input text string 502 is schematically illustrated in Fig. 6. All the context mapping words (or input words) "scan", “network”, “computer” have been buffered in the context buffer 204 (column 602).
  • the matching component 202 buffers an input word in the context buffer 206, it also stores the function ID(s), the corresponding context mapping word is associated with, as indications of the function(s). This is depicted in column 604 in Fig. 6.
  • the context mapping word "scan” is associated with the functions referenced by function IDs 1, 2 and 3 in the context mapping table 122 (see Fig. 3).
  • "network” and “computer” are each associated with function ID 3.
  • the input word "Bob's" is associated with function ID (parameter ID) 15.
  • the matching component 202 finds the word "on” in the function name mapping table 210 (this is marked “name” in line 504 in Fig. 5). Function names or parameter names found in the function name mapping table may immediately put into the instruction space 124. This instruction space will be discussed next.
  • Fig. 7 A schematically illustrates the status of the instruction space 124 (Fig. 4) after the matching component 204 has completed parsing the text string 502.
  • the instruc- tion space 124 is prepared to receive for one or more functions ("function_l”, “func- tion_2”, etc. in column 702) and function parameters for these functions ("fparm_l.l”, “fparm_1.2” for function_l, etc.) values which may the storage place indicated as column 704 in Fig. 7 (empty storage places are illustrated as "void” places).
  • the instruction space 124 may not explicitly contain indications such as "function_l” and "fparm_l.l”; these indications are used in the figures mainly for illustrative purposes.
  • the instruction space may be structured in any way which allows to represent the information of a type of a stored data. For example, an identified function call may be stored in a particular storage place in the instruction space reserved for this purpose, while function parameters may be stored in a separate storage place.
  • the matching component 202 has only unambiguously detected the function parameter "ON" from the function name mapping table 210 (see Fig. 4). All the other matching input words have matched with context mapping words in the context mapping table 122, which is why they have been placed in the context buffer 206. Note that in a different embodiment, which is based on storing only those context mapping words in the context buffer which are associated with multiple functions or function parameters, also the parameter "Bob's" would have been replaced with the IP address defined for this parameter (Fig. 4, function ID 15) and put into the instruction space, as this parameter can unambiguously be determined.
  • the identification component 204 analyzes the function IDs stored in the context buffer 206 (Fig. 6). The analysis may, e.g. comprise to compare the function IDs stored for the different context mapping words (column 604) and/or to determine function IDs common to several context mapping words. For the simple example illustrated in Fig. 6, the identification com- ponent 204 detects that the function ID "3" is common to the context mapping words "scan", "network” and "computer”. The component 204 may conclude that the function referenced with ID "3" is the intended function, e.g. on the basis of the determination that the ID "3" occurs multiple times in column 604 in Fig.
  • the identification component 204 determines from the function name mapping table 210 the function referenced by ID "3", namely the function "Scanl- Paddress”. The component 204 puts the identified function call in the instruction space 124.
  • Fig. 7B illustrates the status of the instruction space 124 after the identification com- ponent 204 has entirely parsed the context buffer 206 of Fig. 6.
  • the function "scanl- Paddress" has been identified.
  • the identification component 204 has further replaced the parameter "Bob's" by the IP address 127.0.0.7 and has put this parameter into the instruction space. Storage place provided for further functions or function parameters has not been used.
  • a context mapping table comprises a large number of functions (function IDs) and function parameters, many of them probably associated with a large number of context mapping words.
  • a context mapping table may comprise several hundred functions with several thousand function parameters and may allow up to 256 context mapping words per function/parameter.
  • the function name mapping table if present, then comprises a correspondingly large number of functions and function parameters.
  • the identification component 206 or another component of the control device 100 or computing device 102 eventually prepares execution of the identified function. As illustrated in Fig. 7C, this may comprise to put the function call in textual form in the instruction space 124. It is to be noted that default parameters may be used in case not all parameters required for a particular function call can be identified from the input text string.
  • the function call may instantly or at a later time be executed by the computing device 102.
  • the context mapping component 118 may provide a trigger signal (not shown in Fig. 1) to the operating system 104 of computing device 102. In response to the trigger, the operating system 104 may access the instruction space 124, extract the function call illustrated in Fig. 7C, and may then perform the function. While in Hg.
  • control device 100 comprises a built-in speech input device with a microphone 108 and A/D converter 110
  • a speech input device may as well be remotely arranged from the control device. This is exemplarily illustrated in Fig. 8, in which a system 800 for controlling a computing device 802 via speech is depicted.
  • the system 800 comprises a separate speech input device 804 which may be connected via a data transport network 806 with a control device 808.
  • the speech input device 800 comprises a microphone 810 and an A/D converter 812, which outputs a digital speech signal 814 much as the A/D converter 110 in Hg. 1.
  • the transport network 804 may for example be an IP, ISDN and/or ATM network.
  • the data transmission connection 818 may for example be a Voice-over-IP (VoIP), ISDN, or a Voice-over-ATM (VoATM) connection, or any other hardwired or wireless connection.
  • VoIP Voice-over-IP
  • VoIP Voice-over-ATM
  • the connection 818 may run entirely or in part(s) over a mobile network such as a GSM or UMTS network.
  • the control device 808 comprises an interface 820 which is adapted to extract the speech signal 814' from the data received via the transport connection 818.
  • the interfaces 816 and 820 may each comprise an IP socket, an ISDN card, etc.
  • the interface 820 forwards the speech data 814 1 to a speech recognition component 822, which may or may not operate similarly to the speech recognition component 114 in Fig. 1.
  • the further processing may comprise a context mapping as has been described hereinbefore. In the embodiment illustrated in Rg. 8, no context mapping is performed but the speech recognition component 822 operates to provide recognized words directly as control commands 824 to operating system 826 and/or an application 828 of the computing device 802.
  • the speech input device 804 of Hg. 8 may be a mobile phone
  • the data transmission connection 818 may comprise a VoIP connection
  • the control device 808 may be installed as a software application on a notebook exemplarily representing the computing device 802.
  • Skype may be used for the VoIP connection
  • the control device application may make use of a speech recognition feature such as that provided with Windows Vista (Skype and Windows Vista are trademarks of Skype Limited and Microsoft Corp., respectively).
  • a speech recognition component such as the component 5 114 or 822 of Fig. 1 and Hg. 8, respectively, may be remotely arranged from a context mapping component such as the component 118 in Fig. 1.
  • a text string comprising one or more input words is transmitted via a data transmission connection from the speech recognition component towards the context mapping component.
  • the considerations discussed above with respect to the em-o bodiment 800 in Fig. 8 may be applied accordingly, except that for the transmission of a data string no VoIP, VoATM or such-like speech data transmission mechanism is required.
  • the speech recognition described as part of the techniquess proposed herein may be based on any kind of speech recognition algorithm capable of converting a speech signal to a sequence of words and implemented in the form of hardware, firmware, software or a combination there from.
  • Voice recognition' as known to the skilled person is - in its precise meaning - directed to identifying a person who is speaking, but is often generally interchangeably used wheno 'speech recognition' is meant. In any case, the term 'speech recognition' as used herein may or may not include 'voice recognition 1 .
  • the respective speech recognition component such as component 114 or 822 illustrated in Figs. 1 and 8, respectively, may be5 implemented together with other components on a common hardware or on a separate or dedicated hardware unit which is connectable wireless or hardwired to other components.
  • a mobile phone or smart phone adapted for speech recognition may be used, which can be connected via USB, Bluetooth, etc. with a computing device, on which, e.g., a context mapping component such as component 118o of Fig. 1 is implemented.
  • Fig. 9 is a flow diagram illustrating steps of an embodiment of a method 900 of controlling a computing device via speech.
  • the method 900 may be performed using, e.g., the control device 100 of Fig. 1. 5
  • the method starts in step 902 with accepting a speech input, which may be provided from a speech input device such as microphone 108 and A/D converter 110 in Fig. 1.
  • a speech input is transformed into a text string comprising one or more input words.
  • This step may for example be performed in a speech recognition component such as the component 108 in Fig. 1.
  • each one of the one or more input words is compared with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words.
  • a context mapping table is illustrated in Fig. 3.
  • the step 906 is performed by the matching component 202.
  • step 908 in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word is identified. It is to be noted that in the example configuration of Figs. 1 and 2 the step 908 of identifying the intended function may be performed in the identification component 204, but also in the matching component 202. While the identification component 204 is adapted to resolve ambiguities by appropriately operating on the context buffer 206, the matching component 202 may identify a function in the function name mapping table 210.
  • step 910 the execution of the identified function is prepared, for example by providing a call of the function or an executable program code in an instruction space such as the storage component 124 depicted in Figs. 1 and 2.
  • step 912 the method 900 stops and waits for further speech input.
  • Fig. 10 is a flow diagram illustrating an embodiment of a context mapping procedure 1000.
  • the procedure 1000 is a possible realization of at least a part of the steps 906 and 908 of Fig. 9.
  • procedure 1000 parses all input words of a text string such as text string 116 in Fig. 1.
  • step 1002 it is determined if an input word is present. If this is the case, the procedure goes on to step 1004 wherein it is tested if the present input word is an irrelevant word, which may be determined by comparing the present word with irrelevant words stored in an irrelevant words mapping table such as table 208 illustrated in Fig. 2. In case it is determined that the present input word is an irrelevant word, in step 1006 the present word is discarded and the procedure goes back to step 1002. In case the present input word is not an irrelevant word, for example because it does not match with any word in the irrelevant words mapping table, the procedure goes on to step 1008. In this step it is tested whether the present input word matches with a context mapping word in a context mapping table such as table 122 in Figs. 1 and 2.
  • a present input word may only be buffered in the context buffer in case the matching context mapping word is associated with at least two functions or function parameters (not shown in Fig. 10).
  • the procedure goes on to step 1012 with testing if the present input word matches with a function name (or function parameter name), which may be determined by comparing the input word with the function names in a function name mapping table such as table 210 in Figs. 2 and 4.
  • a function name or function parameter name
  • the procedure goes on to step 1014 by putting the function name or function parameter name into an instruction space such as space 124 in Figs. 1 and 2.
  • some further context mapping related conditions such as the conditions 1004, 1008, 1012 and/or an error handling 1016 may be performed.
  • the error handling 1016 may comprise to put the present input word into an irrelevant words mapping table to enable an early classification of this input word as an irrelevant word in the future.
  • the error handling 1016 may additionally or alternatively comprise to output information to a human user and/or to ask the user for an appropriate action. Further error handling steps may be performed throughout the procedure 1000, however, only the error handling 1016 is shown in Fig. 10 for illustrative purposes.
  • the procedure goes on from step 1002 to step 1018 by testing whether the context buffer is non-empty.
  • one or more functions and/or function parameters are iden- tified based on buffered words. For example, a comparison of the function IDs of the buffered context mapping words may be used in this respect, as has been described further above.
  • the identified function(s) and parameter(s) are put into the instruction space in step 1022 and the procedure stops by returning to step 910 of Fig. 9.
  • Fig. 11 is a flow diagram illustrating steps of a further embodiment of a method 1100 of controlling a computing device via speech.
  • the method 1100 may be performed in a control device and in a speech input device, wherein the speech input device is remotely arranged from the control device.
  • the method 1100 may be performed using the devices 804 and 808 of Fig. 8.
  • the method is triggered in step 102 in that a speech input is received and accepted at the speech input device.
  • the method goes on in step 1104 by transforming, in the speech input device, the speech input into speech data representing the speech input.
  • the step 1104 may be performed in a microphone such as microphone 810 and an A/D converter such as converter 812 in Fig. 8.
  • a data transmission connection is established for transmitting the speech data between the remotely arranged speech input device and the control device.
  • a data transmission connection such as connection 818 in Fig. 8 between interfaces 816 and 820 of the speech input device 804 and the control device 808 may be established.
  • the speech data may then be transmitted from the speech input device via the remote connection to the control device.
  • the speech data is converted in the control device into one or more control commands for controlling the computing device.
  • the conversion step 1108 comprises speech recognition and context mapping as described hereinbefore with regard to the functionality of the components 114 and 118 of Fig.l.
  • only a speech recognition as implemented in the speech recognition component 114 in Fig. 1 is performed without any context mapping. In this case, the user may only speak commands he or she would otherwise enter by typing or by clicking on an appropriate button.
  • the context mapping component 118 uses the context mapping table 122, the function name mapping table 210 and the irregular words mapping table 208. In other embodiments, other tables, more tables, or less tables may be employed. For example, in an embodiment depicted in Fig. 12, a context mapping component 118' is schematically illustrated, which has access to a main words table 1202, multiple auxiliary words tables 1204 and 1206, a CMT table 122', and a context target (function name) mapping table 210'.
  • the context mapping component 118', the CMT table 122' and the function name mapping table 210' may operate similar to the context mapping component 118, the CMT table 122 and the function name mapping table 210, respectively, as has been described hereinbefore with reference to Figs. 1 - 7, and a repetition of these functionalities is therefore omitted.
  • the main words table 1202 is a main primary dictionary which contains a plurality of word strings, each word string comprising a single word or several words or phrases consisting of several words, possibly in different languages.
  • Each record in the main words table 1202 has the following structure:
  • a word string which represents the human readable word/s, a unique number or numerical value associated to that word name, a language model of where the word comes from, and also the actual language type to identify specifically to what sub language the word may be identified to.
  • the main words table 1202 may be supplemented by a Language Model Table and a Language Type Table (not shown in Fig. 12).
  • the Language Model Table is a table that identifies the general categorical languages considered by the system. An entry in this table consists of a Language Model Name and its unique identification number. An example of an entry is [Language Model Name] English , [Unique Identification Number] 0
  • the Language Type Table is the more specific identification of the language model.
  • An entry consists of the Language Type Name and of its unique numerical number. Two example entry read as follows: [LanguageTypeName] English_us , [LanguageTypeUniqueNubmer] 0 [LanguageTypeName] English_uk , [LanguageTypeUniqueNubmer] 1
  • the auxiliary words table 1204 is a synonyms table comprising a list of word strings including both primary words and secondary words, i.e. for each secondary word, its primary or 'root' word is identified.
  • the synonyms table 1204 has the following structure: Primary Word Unique Numerical Number and Secondary Word Unique Numerical Number
  • the auxiliary words table 1206 is an antonyms table which comprises a list of words in the form of primary and secondary words, i.e. each (secondary) words direct op- posite root word is identified.
  • the antonyms table has the following structure: Primary Word Unique Numerical Number and Secondary Word Unique Numerical Number
  • auxiliary words tables may be provided (not illustrated in Fig. 12).
  • a Parts Of Speech Table may comprise a list of words (both primary and secon- dary words), which associates a word with a part of speech or phrase it belongs to in a typical given sentence.
  • the Parts Of Speech Table may have the following structure:
  • the numbers are the unique numbers from the main words table, in which each word is associated with a unique number.
  • a Definitions Table may be provided which may comprise a part or all of the words of the main words table with at least one of their proper descriptions and definitions.
  • the Definitions Table may have the following structure: [Primary Word Unique Numercial Number] and a [Definition String] The following is an example record of the Definitions Table:
  • auxiliary words table may be a Phonetic Table, which may be a table containing a sound buffer file for each word within the dictionary or main words table, so that a phonetic version for some or all of the words is available.
  • the Phonetic Table may have the following structure: Primary Word Unique Numerical Number and Word Phonetic Sound Data Buffer
  • any of the main words table and the auxiliary words tables may generated and/or updated based on an external dictionary, e.g. by accessing the external dictionary via the Internet.
  • the Context Mapping Table 122 comprises associations of word strings with function IDs (see Rg. 3)
  • the Context Mapping Table 122' may comprise the unique numbers associated for the word strings (as defined in the main words table 1202) instead of the word strings.
  • the Context Mapping Table 122' comprises all of the context that is in relation to all the actions that can be done, and also the identi- fiers to all data.
  • the Context Mapping Table 122 1 may have the following structure: Primary Word Unique ID , Context Number, and SearchForPrimaryWord (a flag variable to tell a word attraction mechanism decribed further below to search or not to search for the primary word equivalent to the current word)
  • mapping table 210 or Context Target Table 210' define the actions or meanings to the context mapping table.
  • a particular function ID is associated with a particular executable code (and/or data).
  • the structure of the Context Target Table 210' is as follows: Context Number (or function ID), and Context Target Native Code (for example of an action context)
  • An example of an action context entry in the Context Mapping Table 210' is: [Context Number] 0 , CreateFolder(void); [Context Target Native Code]
  • Context Mapping Table 210' An example of a data context entry in the Context Mapping Table 210' is: [Context Number] 45 , OpenFolderC'C/VacationPhotosFolder"); [Context Target Native Code]
  • FIG. 13a - 13c an example of the processing of a sequence of input word strings in the context mapping component 118' of Fig. 12 is described.
  • This natural language phrase is converted into a text string in a speech recognition component such as component 114 in Fig. 1.
  • the text string is seper- ated into individual words (or word strings) "Computer", “I”, “would”, “like”, “you", “to”, “open”, “up”, “the”, “folder”, “that”, “contains”, “all", “of, “my”, “vacation”, “photos”.
  • each word (string) is replaced with its unique number or numerical value in a replacement component 1208.
  • a sequence of unique number is obtained as depicted in Fig. 13a).
  • the internal representation of the strings by the unique numbers enables a faster processing and thus a faster execution of the actions required by the speaking user.
  • the next step is an optional step which is performed depending on whether the programmer has set the configuration value of [SearchForPrimaryWord] for some or all of the words of the input string.
  • any secondary word will be replaced based on the auxiliary tables with their associated primary words (this processing is not performed on the word strings, but on the associated unique numbers for each of the primary and secondary words as defined in the main words table).
  • this processing is not performed on the word strings, but on the associated unique numbers for each of the primary and secondary words as defined in the main words table).
  • an input word "car” may be replaced in the sequence of words by the word "automobile”. More precisely, the unique number 25 associated with "car” in the main words table 1202 would be replaced with the number 87 associated with "automobile”.
  • Fig. 13a we assume that all words are primary words, i.e. no replacement takes place for the sake of illustration.
  • the so-called word attraction mechanism is performed.
  • the number of matches is counted, i.e. an Attraction Value is defined which indicates how many times a given context number occurs within the query of all patterns associated with the particular asked number entity.
  • an Attraction Value is defined which indicates how many times a given context number occurs within the query of all patterns associated with the particular asked number entity.
  • Fig. 13b the result is illustrated in Fig. 13c).
  • the list illustrated in Fig, 13c) is analyzed to determine the function with the highest attraction, i.e. the highest attraction value.
  • only one context number (function ID) has attracted more than one match, namely the function represented by the unique number 45, which attracted four matches.
  • the function corresponding to the context number with the highest attraction is identified from the context target table 210' (function name mapping table).
  • the table 210' may comprise the following entry: [Context Number] 45 , OpenFolderC'C/VacationPhotosFolder"); [Context Target Native Code]
  • the word attraction mechanism further increases the reliability of the context mapping, as in most practical cases a particular function (context number) attracts most of the matches; thus a clear identification of the wanted function can be achieved.
  • the action defined in the entry identified in the context target table is executed, i.e. the function OpenFolder("C:/VacationPhotosFolder") is called, which may be taken directly from the context target (code) table 210'.
  • the context-mapping related techniques proposed herein allow the user to describe a command or function within various contexts, i.e. they propose to introduce redundancy into the speech recognition/control process. The user is not required to speak exactly the same command he or she would otherwise type, but may describe the intended command or function in his own words, in different lan- guages, or in any other context.
  • the deployed speech control device or system needs to be appropriately configured, e.g. by providing the relevant context mapping words in the context mapping table. In this way the proposed techniques allows to provide a more reliable speech control.
  • the context-related descriptions or circumscriptions of the user may of course also be related to more than only one function or command.
  • a spoken request "Please search for Searchjtem” may be transformed and converted into a function or functions searching for accordingly named files and occurrences of 'Searchjtem 1 in files present locally on the computing device, but may further be converted and transformed into a function searching a local network and/or the web for 'Searchjtem'.
  • the same function may also be performed multiple times, for example when transforming and converting the sentence '"Please scan the network for my friend's computers, if they are on", in which "friend's" may be transformed into a list of IP addresses to be used in consecutive network searches. Therefore, the proposed techniques are also more powerful than speech recognition techniques providing only a one-to-one mapping of spoken commands to machine commands.
  • the proposed speech control devices and systems are more user-friendly, as they may not require the user to know machine-specific or application-specific commands.
  • An appropriately configured device or system is able to identify functions or commands described by users not common with technical terms. For this reason, the speech input is also simplified for the user; the user may just describe in his own terms what he or she wants the computing device to do. This at the same time accelerates speech control, as a user allowed to talk in his or her own terms may produce fewer errors, which reduces wrong inputs.
  • control devices and systems may be developed in any programming language and make use of storage resources in the usual ways.
  • Control devices and systems intended for larger function sets may be based on existing database technologies.
  • the techniques are applicable for implementation on single computing devices such as mobile phones or personal computers as well as for implementation in a network-based client-server architecture.
  • the techniques proposed herein also provide an increased flexibility for speech con- trol. This is due to the fact that any device providing a speech input and speech data transmission facility, such as a mobile phone, but also many notebooks or conventional hardwired telephones may be used as speech input device, while the speech recognition and optional context mapping steps may be performed either near to the computing device to be controlled or at still another place, for example at a respec- tive node (e.g., server) in a network.
  • a respec- tive node e.g., server

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The invention relates to techniques of controlling a computing device via speech. A method realization of the proposed techniques comprises the steps of transforming speech input into a text string comprising one or more input words; performing a context-related mapping of the input words to one or more functions for controlling the computing device; and preparing an execution of the identified function. Another realization is related to a remote speech control of computing devices.

Description

Speech Control of Computing Devices
Technical Field
The invention relates to techniques for controlling computing devices via speech and is applicable to different computing devices such as mobile phones, notebooks and other mobile devices as well as personal computers, gaming consoles, computer- controlled machinery and other stationary devices.
Background
Controlling computing devices via speech provides for a human user or operator a fast and easy way of interacting with the device; for example, the time-consuming input of commands via keypad or keyboard can be omitted and the hands are free for other purposes such as moving a mouse or control lever or performing manual activities like carrying the device, carrying goods, etc. Therefore, speech control may conveniently be applied for such different operations as controlling mobile phones, gaming consoles or household appliances, but also for controlling machines in an industrial environment.
In principle, today's speech control systems require that the user inputs a command via speech which he or she would otherwise enter by typing or by clicking on an appropriate button. The input speech signal is then provided to a speech recognition component which recognizes the spoken command. The recognized command is output in a machine-readable form to the device which is to be controlled.
In some more detail, a typical speech control device may store some pre-determined speech samples representing, for example, a set of commands. A recorded input speech signal is then compared to the stored speech samples. As an example, a probability calculation block may determine, based on matching the input speech signal to the stored speech samples, a probability value for each of the stored samples, the value indicating the probability that the respective sample corresponds to the input speech signal. The sample with the largest probability value will then be selected. Each stored speech sample may have an executable program code associated therewith, which represents the respective command in a form that is executable by the computing device. The program code will then be provided to a processor of the computing device in order to perform the recognized command.
Speech recognition is notoriously prone to errors. In some cases, the speech recognition system is not able to recognize a command at all. Then the user has to decide whether to repeat the speech input or to manually input the command. Often, a speech recognition system does not recognize the correct command, such that the user has to cancel the wrongly recognized command before repeating the input attempt.
In order to achieve a high identification rate, the user must be familiar with all the commands and should speak in a particular way to facilitate speech recognition. Many speech recognition systems require a training phase. Elaborated algorithms for representing speech and matching speech samples with each other have been developed in order to allow a determination of the correct command with a confidence level sufficient for a practical deployment. Such developments have led to ever more complex systems requiring a considerable amount of processing resources. For a long time, the performance of speech recognition in personal computers and mobile phones has essentially been limited by the processing power available in these computing devices.
Summary
There is a need for a technique of controlling a computing device via speech which is easy to use for the user and enables a determination of the correct commands with high confidence while avoiding the use of excessive processing resources.
In order to meet with this need, as a first aspect a method of controlling a computing device via speech is proposed. The method comprises the following steps: Transforming speech input into a text string comprising one or more input words; comparing each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words; identifying, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and preparing an execution of the identified function.
The computing device may in principle be any hardware device which is adapted to perform at least one instruction. Thus, a 'computing device1 as understood herein may be any programmable device, for example a personal computer, notebook, phone, or control device for machinery in an industrial area, but also other areas such as private housing; e.g. the computing device may be a coffee machine. A computing device may be a general purpose device, such as a personal computer, or may be an embedded system, e.g. using a microprocessor or microcontroller within an Application-Specific Integrated Circuit (ASIC) or Field-Programmable Gate Array (FPGA). The term 'computing device1 is intended to include essentially any device which is controllable, e.g. via a hardware and/or software interface such as an Application Programming Interface (API), and via one or more machine-readable instruc- tions in the form of, e.g., an executable code which may be generated by a compiler, assembler or similar tool in any programming language, macro language, interpreter language, etc. The executable code may be in binary form or any other machine- readable form. The computing facility may be represented, e.g., in hardware, firmware, software or a combination thereof. For example, the computing device may comprise a microprocessor for controlling other parts of the device such as, e.g., a display, an actuator, a signal generator, a remote device, etc. The function(s) for controlling the computing device may include some or all commands for the operating system of the computing device or for an application executed on the computing device, but may further include functions which are not directly accessible via a user interface but require an input on an expert level such as via a system console or command window. The functions may express functionality in a syntax specific for an operating system, an application, a programming language, a macro language, etc.
A context mapping word may represent the entire function or one or more aspects of the functionality of the function the context mapping word is associated with. The context mapping word may represent the aspect in textual form. A context mapping word may be directly associated with a function or may additionally or alternatively be indirectly associated with a function; for example, the context mapping word may be associated with a function parameter. Multiple context mapping words associated with a particular function may be provided in order to enable that the function may be identified from within different contexts. For instance, the context mapping words associated with a function may represent different names (alias names) of the func- tion the context mapping words are associated with, or may represent technical and non-technical names, identifications or descriptions of the function or aspects of it. As a further example, the context mapping words may represent the function or one or more aspects of it in different pronunciations (e.g., male and female pronuncia- tion), dialects, or human languages.
The associations of context mapping words and functions (and possibly function parameters) may be represented in the context mapping table in different ways. In one implementation, all controllable functions (or function parameters) may be ar- ranged in one function column (row) of the table. For each function, the associated context mapping words may be arranged in a row (column) corresponding to the position of the function in the function column. In this implementation, one and the same context mapping word appears multiple times in the context mapping table in case it is associated with multiple functions. In another implementation, each context word may be represented only one time in the context mapping table, but the correspondingly associated function appears multiple times. In still other implementations, each context mapping word and each function is represented exactly one time in the context mapping table and the associations between them are represented via links, pointers or other structures known in the field of database technologies.
The identified function may be executed immediately after the identification (or after the entire input text string has been parsed). Alternatively or in addition, the identified function may also be executed at a later time. In one implementation of the method aspect, the function in the context mapping table has executable program code associated with it. The step of preparing the execution of the identified function may then comprise providing an executable program code representing the identified function on the computing device. In other implementations, the step of preparing the execution of the identified function comprises providing a text string representing a call of the identified function. The string may be provided immediately or at a later time to an interpreter, compiler etc. in order to generate executable code.
In one realization, the step of identifying the function comprises, in case an input word matches a context mapping word associated with multiple functions, identifying one function of the multiple functions which is associated with multiple matching context mapping words. This function may then be used as the identified function. The step of comparing each one of the one or more input words with context mapping words may comprise the step of buffering an input word in a context buffer in case the input word matches a context mapping word that is associated with two or more functions. In one implementation, the step of buffering the input word may further comprise to buffer the input word in the context buffer including, for each of the two or more functions or function parameters associated with the input word, an indication of the function or function parameter. The step of identifying the function may then comprise to compare indications of functions or function parameters of two or more input words buffered in the context buffer and to identify corresponding indications.
One variant of the method aspect may comprise the further step of comparing an input word with function names in a function name mapping table, in which each of the function names represents one of the functions for controlling the computing device. The method in this variant may comprise the further step of identifying, in case the input word matches with at least a part of a function name, the function associated with the at least partly matching function name. The function name mapping table may further comprise function parameters for comparing the function parameters with input words.
Entries corresponding to the same function or function parameter in the context mapping table and the function name mapping table may be linked with each other. A linked entry in the function name mapping table may be associated with executable program code representing at least a part of a function.
According to one implementation, the method comprises the further steps of compar- ing input words with irrelevant words in an irrelevant words mapping table; and, in case an input word matches with an irrelevant word, excluding the input word from identifying the function. The irrelevant words mapping table may comprise, for example, textual representations of spoken words such as 'the1, 'a', 'please', etc.
In one variant, the method comprises the preparatory steps of establishing a main words table comprising multiple word strings, each word string representing one or more potential input words in a string format and each word string being associated with a unique number; and establishing the context mapping table, wherein the context mapping words are represented by the unique numbers associated with the word string corresponding to the context mapping word in the main words table. In this variant, the step of comparing input words with context mapping words comprises the steps of representing each input word by its number as specified in the main words table; and comparing the number representations of input words and the number representations of context mapping words with each other in order to determine matches of input words to context mapping words.
In another variant, which may be combined with the above variant or any other mode of the method outlined here, the step of identifying the function comprises the steps of identifying for each matching context mapping word one or more functions associated therewith; determining for each function an attraction value indicating how often the function has been identified; and identifying the function with the highest attraction value.
According to one implementation, the method comprises the preparatory step of establishing one or more auxiliary words tables, each auxiliary words table comprising associations of a primary word with one or more secondary words. The step of comparing input words with context mapping words then comprises determining, based on the auxiliary words tables, if an input word matches with a secondary word, and, in case of a match, selectively replacing the input word with the primary word associated with the matching secondary word. The secondary words may comprise at least one of synonyms, antonyms, word type representations, definitions of the re- spectively associated primary words, and phonetic representations of the secondary words. This implementation may comprise the preparatory step of generating or updating at least one of the main words table and the auxiliary words tables based on an external dictionary.
In one realization of the method, the step of transforming the speech input into the text string is performed in a speech recognition device and the steps of comparing input words of the text string with context mapping words and identifying the function associated with a matching context mapping word are performed in a control device. The method may then comprise the further step of establishing a data trans- mission connection between the remote speech recognition device and the control device for transmitting data comprising the text string.
According to a second aspect, a method of controlling a computing device via speech is proposed, wherein the method is performed in a control device and in a speech input device remotely arranged from the control device. The method comprises the steps of transforming, in the speech input device, speech input into speech data representing the speech input; establishing a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the control device; and converting, in the control device, the speech data into one or more control commands for controlling the computing device.
That the control device and the speech input device are remotely arranged from each other does not necessarily include that these devices are arranged spatially or geographically remote from each other. For example, both devices may be located in the same building or room, but are assumed to be remotely arranged in case the data transmission connection is a connection configured for transmitting data between separate devices. For example, the data transmission connection may run over a local area network (LAN), wide area network (WAN), and/or a mobile network. For example, in case a mobile phone is used as speech input device and the speech input is transmitted using VoIP over a mobile network towards a notebook having installed a speech recognition/control application, the mobile phone and the notebook are assumed to be remotely arranged to each other even if they are physically located nearby to each other.
According to a third aspect, a computer program product is proposed. The computer program product comprises program code portions for performing the steps of any one of the method aspects described herein when the computer program product is executed on one or more computing devices. The computer program product may be stored on a computer readable recording medium, such as a permanent or re- writeable memory within or associated with a computing device or a removable CD- ROM or DVD. Additionally or alternatively, the computer program product may be provided for download to a computing device, for example via a data network such as the Internet or a communication line such as a telephone line or wireless link.
According to a fourth aspect, a control device for controlling a computing device via speech is proposed. The control device comprises a speech recognition component adapted to transform speech input into a text string comprising one or more input words; a matching component adapted to compare each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words; an identification component adapted to identify, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and a preparation component adapted to prepare an execution of the identified function. The control device may be implemented on the computing device, which may be a mobile device such as a notebook, mobile phone, handheld, wearable computing devices such as head-up display devices, etc., or a stationary device such as a personal computer, household appliance, machinery, etc.
According to a fifth aspect, a control device for controlling a computing device via speech is proposed, which comprises a data interface adapted to establish a data transmission connection between a remote speech input device and the control de- vice for receiving data comprising a text string representing speech input from the remote speech input device, wherein the text string comprises one or more input words; a matching component adapted to compare each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for control- ling the computing device and at least one of the at least one function is associated with multiple context mapping words; an identification component adapted to identify, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and a preparation component adapted to prepare an execution of the identi- fled function.
According to a sixth aspect, a system for controlling a computing device via speech is proposed. The system comprises a control device and a speech input device. The speech input device is adapted to transform speech input into speech data represent- ing the speech input. The control device is adapted to convert the speech data into one or more control commands for controlling the computing device. Each of the speech input device and the control device comprises a data interface adapted to establish a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the control device.
A seventh aspect is related to a speech input device, wherein the speech input device is adapted for inputting and transforming speech input into speech data representing the speech input and the speech input device comprises a data transmission interface. According to the seventh aspect, use of the speech input device is pro- posed for establishing, via the data transmission interface, a data transmission connection for transmitting the speech data to a remote computing device, wherein the computing device transforms the speech data into control functions for controlling the computing device.
An eighth aspect is related to a computing device including a speech recognition component for transforming speech input into control functions for controlling the computing device and a data reception interface for establishing a data reception connection. According to the eighth aspect, use of the computing device is proposed for receiving, via the data reception interface, speech data from a remote speech input device and for transforming the received speech data into control functions for controlling the computing device.
Brief Description of the Drawings
In the following, the invention will further be described with reference to exemplary embodiments illustrated in the figures, in which:
Fig. 1 schematically illustrates an embodiment of a control device for controlling a computing device via speech;
Fig. 2 illustrates an embodiment of a context mapping component of the control device of Fig. 1;
Fig. 3 illustrates an embodiment of a context mapping table for use with the context mapping component of Fig. 2;
Fig. 4 illustrates an embodiment of a function name mapping table for use with the context mapping component of Fig. 2;
Fig. 5 illustrates an example of a text string representing a speech input;
Fig. 6 illustrates a content of a context buffer used by the context mapping component of Fig. 2 when parsing the text string of Fig. 5;
Figs. 7A-7C illustrate contents of an instruction space used by the context mapping component of Fig. 2;
Fig. 8 schematically illustrates an embodiment of a control system for controlling a computing device via speech;
Hg. 9 illustrates a first embodiment of a method of controlling a computing device via speech;
Rg. 10 illustrates an embodiment of a context mapping procedure which may be performed within the framework of the method of Fig. 9;
Fig. 11 illustrates a second embodiment of a method of controlling a computing device via speech;
Fig. 12 illustrates a further embodiment of a context mapping stage; and
Figs. 13a)-c) illustrate a further example of the processing of an exemplary input sentence by the context mapping stage of Fig. 12.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as specific implementations of control devices and computing devices, in order to provide a thorough understanding of the current invention. It will be apparent to one skilled in the art that the current invention may be practised in other embodiments that depart from these specific details. For example, the skilled artisan will appreciate that the current invention may be practiced using wireless connections between different devices and/or components instead of the hardwired connections discussed below to illustrate the present invention. The invention may be practised in very different environments. This may include, for example, network- based and/or client-server based scenarios, in which at least one of a speech recognition component, a context mapping component and, e.g., an instruction space for providing an identified function is accessible via a server in a Local Area Network (LAN) or Wide Area Network (WAN).
Those skilled in the art will further appreciate that functions explained herein below may be implemented using individual hardware circuitry, using software functioning in conjunction with a programmed microprocessor or a general purpose computer, using an application specific integrated circuit (ASIC) and/or using one or more digital signal processors (DSPs). It will also be appreciated that when the current inven- tion is described as a method, it may also be embodied in a computer processor and a memory coupled to a processor, wherein the memory is encoded with one or more programs that perform the methods disclosed herein when executed by the processor.
Hg. 1 schematically illustrates an embodiment of a control device 100 for controlling a computing device 102 via speech. The computing device 102 may be a personal computer or similar device including an operating system (OS) 104 and an application (APP) 106. The computing device 102 may or may not be connected with other devices (not shown).
The control device 100 includes a built-in speech input device comprising a microphone 108 and an Analogue-to-Digital (A/D) converter 110 which digitizes an analogue electric signal from the microphone 108 representing a speech input by a human user. The A/D converter 110 provides the digital speech signal 112 to a
Speech recognition (SR) component 114. The SR component 114 operates to transform the speech signal 112 into a text string 116 which represents the speech input in a textual form. The text string 116 comprises a sequence of input words.
The text string 116 is provided to a context mapping component 118, which converts the text string 116 into one or more control functions 120 for controlling the computing device 102. The control functions 120 may comprise, e.g., one or more control commands with or without control parameters. The context mapping component 118 operates by accessing one or more databases; only one database is exemplarily illustrated in Hg. 1, which stores a context mapping table (CMT) 122. The operation of the context mapping component 118 will be described in detail further below.
The control function or functions 120 resulting from the operation of the context mapping component 118 are stored in an instruction space 124. During or after the process of transforming and converting a speech input into the functions 120, either the operating system 104 or the application 106, or both, of the computing device 102 may access the instruction space 124 in order to execute the instructions stored therein, i.e. the control functions which possibly include one or more function parameters. The functions 120 stored in the instruction space 124 may for example be represented in textual form as function calls, e.g., conforming to the syntax of at least one of the operating system 104 and the application(s) 106. For example, for the application 106 a specific software-API may be defined, to which the functions (instructions) 120 conform. As another example, the instruction space 124 may also store the control functions 120 in the form of a source code (one or more programs), which has to be transformed into an executable code by a compiler, assembler, etc. before execution. As still another example, the control functions may be represented in the form of one or more executable program codes, which do not require any compilation, interpretation or similar steps before execution.
The control device 100 and the computing device 102 may be implemented on a common hardware. For example, the control device 100 may be implemented in the form of software on a hardware of the computing device 102 running the operating system 104 and one or more applications 106. In other implementations, the control device 100 is implemented at least in part on a separate hardware. For example, software components of the control device 100 may be implemented on a removable storage device such as an USB stick. In another example, the control device is adapted to store the control functions 120 on a removable storage, for example a removable storage disk or stick. The removable storage may then be provided to the computing device 102 in order that the computing device 102 may load the stored control functions into the instruction space 124, which in this scenario belongs to the computing device 102. In still another example, the control device 100 may send the control functions 120 via a wireless or hardwired connection to the computing device 102.
Fig. 2 illustrates in more detail functional building blocks of the context mapping component 118 in Fig. 1. Like reference numerals are used for like components in Figs. 1 and 2. The context mapping component 118 comprises a matching component 202, an identification component 204 and a number of databases, namely the database storing the context mapping table 122 and further databases for storing a context buffer 206, an irrelevant words mapping table 208 and a function name mapping table 210. Both components 202 and 204 may provide control functions and/or parameters thereof to the instruction space 124 (cf. Fig. 1).
As shown in Fig. 1, the context mapping component 118 may receive a text string 116 from the Speech recognition component 118. The text string may comprise one or more input words 212 (Fig. 2). The matching component 202 is, amongst others, adapted to compare each one of the one or more input words 212 with context mapping words stored in the context mapping table 208. The example context mapping table 122 is in more detail depicted in Fig. 3. The table 122 in Fig. 3 comprises in column 302 function identification numbers (IDs), wherein each function ID references one and exactly one function which may be performed to control the computing device 102 in Hg. 1. Consequently, each row of the table 122 corresponding to an entry of a function ID in column 302 is assigned to a particular function. Further columns 304 of table 122 are provided for context mapping words (CMW, CMW_0, ...). The number of context mapping words associated with a function may be from 1 to a maximum number, which may be given for any particular implementation. For example, the maximum number may be 255.
As an example, the function ID "1" in row 306 of table 122 may refer to a function "ScanFile", which may be performed on the computing device 102 in order to scan all files on the computer fur the purpose of, e.g., finding a particular file. Between 1 and the maximum number of context mapping words may be associated with the function ScanFile. In the simple example table 122, only two context mapping words are associated with this function, namely as CMWJ) the word "scan" and as CMW_1 the word "file". Similarly, in row 308, the function ID "2" may refer to a function Scan- Drive to scan the drives available to the computing device 102; as context mapping words CMWJ) and CMW_1, the words "scan" and "drive" are associated with this function. In row 310, the function ID "3" may refer to a function "ScanIPaddress", which may be provided in the computing device 102 to scan a network in order to determine if a particular computer is connected therewith. The context mapping words CMWJ), CMW_1 and CMW_2 associated with this function are the words "scan", "file" and "computer".
Besides defining associations of context mapping words with functions, a context mapping table may also define associations of context mapping words with function parameters. A corresponding example is depicted in Fig. 3 with row 312 of table 122. The human name "Bob" as context mapping word is associated with ID "15". The ID may be assigned, e.g., to the IP address of the computer of a human user named Bob. As a further example, in rows 314 various context mapping words are defined which a human user may use to express that a device such a computer is turned or switched on or off. The parameter ID 134 may thus refer to a function parameter "ON" and the parameter ID 135 may refer to a function parameter "OFF".
The context mapping table 122 in Fig. 3 is structured such that a function (or its ID) is represented in the table only once. Then, a context mapping word relevant for multiple functions may occur several times in the table. For example, the context mapping word "scan" is associated with three functions in table 122, namely the functions referenced with IDs 1, 2, and 3 in lines 306, 308 and 310. Other embodiments of context mapping tables may be based on a different structure. For exam- pie, each context mapping word may be represented only once in the table. Then, the functions (or their IDs) would appear multiple times in the table. With such a structure, the CMW "scan" would appear only once, and would be arranged such that the associations with the function IDs 1, 2 and 3 are indicated. The function ID "1" would appear two times in the table, namely to indicate the associations of the CMWs "scan" and "file" with this function. Other mechanisms of representing associations of context mapping words with control functions may also be deployed.
Referring back to Fig. 2, the matching component 202 of control device 118 may also be adapted to employ the irrelevant words mapping table 208 when parsing the input words 212. This table 208 may comprise, in textual form, words which are assumed to be irrelevant for determining control functions. For example, articles such as "the" and words primarily required for grammatical or syntactical reasons in human language sentences such as "for", "if" etc. may be represented as irrelevant words in the irrelevant words mapping table 208. In case an input word matches with an irrelevant word, the matching component 202 may discard the input word from further processing, such that the word is excluded from identifying the function.
The matching component 202 may further be adapted to employ the function name mapping table 210 when parsing the input words 212. Fig. 4 illustrates an example embodiment 400 of the function name mapping table 210. The table 400 comprises a function ID column 402 similar to column 302 in context mapping table 122 in Fig. 3. A further column 404 comprises, for each of the function IDs in column 402, the associated function name in textual form. For example, the function ID "1" is associated with the function name "ScanFile", which may represent the file scanning func- tionality already described above.
The function name mapping table 400 thus represents the mapping of function IDs to functions as used (amongst others) in the context mapping table 122 in Fig. 3. The matching component 202 and the identification component 204 may thus access the function name mapping table 400 also for resolving function IDs into function names before putting a function call to the instruction space 124. The table 400 also allows resolving parameter IDs. For example, the ID "15" is assigned to the IP address 127. 0.0.7. which in the example implementation discussed here may be the IP address of the computer of the human user Bob in a network the computing device 102 is connected with (compare with table 3 in Hg. 3, row 312). Further, the parameter IDs 134 and 135 are resolved to function parameters "ON" and "OFF", respectively (see lines 314 in Fig. 3).
The textual representation of a function in column 404 may be such that it can be used as at least a part of a call for this function. For example, the column 404 may include the textual representation "ScanFile" because the operating system 104 of computing device 102 in Fig. 1 is adapted to handle a function call such as "ScanFile ([parameter I]; [parameter 2])". Brackets "(", ")" and separators ";" may be added to the function call in later steps, as will be described below. A textual representation such as "Scan-File" or "Scan File" could not be used as a valid function call in this example, and such representations may therefore not be included in the function name mapping table.
Alternatively or in addition to representing functions in the form of function names (function calls), the function name mapping table may also provide access to an executable program code for executing a function. This is also illustrated in Fig. 4, wherein a function ID "273" is associated with a pointer "*ls", which may point to an executable code for listing the content of a directory. The executable program code may be provided to at least one of the control device 100 and the computing device 102, e.g., in the form of one or more program libraries.
Referring to Fig. 2 again, the matching component 202 processes each of the input words 212 in the text string 116. In case a present input word is found in the irrelevant words mapping table 208, the input word is discarded. In case a present input word matches with a context mapping word in context mapping table 122, the matching component 202 buffers the input word in the context buffer 206. In case the input word directly matches with a function call in the function name mapping table 210, the matching component 202 may immediately prepare an execution of the corresponding function by, e.g., providing the textual representation of the function call specified in column 404 of table 400 or an executable program code or a link thereto to the instruction space 124. It is to be noted that the matching component 204 may immediately place a function or a function parameter in the instruction space 124 in case an input word matches unambiguously with a function or a function parameter name given in the function name mapping table 210. As an example, consider the human user speaks an IP address such as that reference with ID "15" in the example function name mapping table 400 in Fig. 4. Upon detecting that the human user has directly input this function parameter, the matching component 202 may instantly provide this parameter to the instruction space 124.
Further, an input word may also match unambiguously with a function or function parameter in the context mapping table 122. This may be the case if a present input word matches with a context mapping word which is associated with only one function or function parameter (other functions or function parameters the context mapping word is associated with may be ruled out for other reasons). In this case also, the matching component 202 may instantly provide the function or function parameter to the instruction space 124.
After the matching component 204 has finished parsing the available input words 212, it provides a trigger signal to the identification component 204. The identifica- tion component 204 works to resolve any ambiguity which may occur due to the fact that in the context mapping table a context mapping word may be associated with multiple control functions, i.e. one or more input words cannot be matched unambiguously to one or more functions or function parameters. For this purpose the identification component 204 accesses the context mapping words which have been buffered in the context buffer 206. The component 204 identifies a function by determining buffered context mapping words associated with the same function.
To further illustrate the operation of the context mapping component 118 of Fig. 2, in Fig. 5 a textual representation 502 of an example sentence is given which a user may speak. Line 504 in Fig. 5 indicates results of the processing of each of the input words of sentence 502 in the matching component 202 of Fig. 2. In this processing the words "please", "the", "for", "if", "it", "is" have been identified as irrelevant (indicated as "irr." in line 504) words, e.g. because these words are represented as irrelevant words in the irrelevant words mapping table 208. These words will not be considered in the further processing. The input word "scan" of sentence 502 is represented as a context mapping word multiple times in the example context mapping table 122, in which "scan" is associated with the function IDs 1, 2 and 3 (reference numbers 306, 308, 310). The further input words "network" and "computer" of sentence 502 are also context mapping words associated with function IDs in table 122, namely with ID "3" (the words found by the matching component 202 to be included in the context mapping table 122 are marked "context" in line 504 in Fig. 5). The content of the context buffer 206 after the matching component 204 has parsed the entire input text string 502 is schematically illustrated in Fig. 6. All the context mapping words (or input words) "scan", "network", "computer" have been buffered in the context buffer 204 (column 602).
It is to be noted that in the example discussed here all input words are buffered in the context buffer 206 in case they match with any context mapping word. In other embodiments, only an input word is buffered in the context buffer which matches with a context mapping word associated with two or more functions. In such embodiments, from the input text string 502 only the word "scan" would be buffered in the context buffer. The ambiguity of which one of the functions hidden behind the function IDs 1, 2 or 3 are intended will then be resolved in a way which is different from the way described hereinafter.
When the matching component 202 buffers an input word in the context buffer 206, it also stores the function ID(s), the corresponding context mapping word is associated with, as indications of the function(s). This is depicted in column 604 in Fig. 6. For example, the context mapping word "scan" is associated with the functions referenced by function IDs 1, 2 and 3 in the context mapping table 122 (see Fig. 3). "network" and "computer" are each associated with function ID 3. The input word "Bob's" is associated with function ID (parameter ID) 15.
When parsing the input words 502, the matching component 202 finds the word "on" in the function name mapping table 210 (this is marked "name" in line 504 in Fig. 5). Function names or parameter names found in the function name mapping table may immediately put into the instruction space 124. This instruction space will be discussed next.
Fig. 7 A schematically illustrates the status of the instruction space 124 (Fig. 4) after the matching component 204 has completed parsing the text string 502. The instruc- tion space 124 is prepared to receive for one or more functions ("function_l", "func- tion_2", etc. in column 702) and function parameters for these functions ("fparm_l.l", "fparm_1.2" for function_l, etc.) values which may the storage place indicated as column 704 in Fig. 7 (empty storage places are illustrated as "void" places). The instruction space 124 may not explicitly contain indications such as "function_l" and "fparm_l.l"; these indications are used in the figures mainly for illustrative purposes. The instruction space may be structured in any way which allows to represent the information of a type of a stored data. For example, an identified function call may be stored in a particular storage place in the instruction space reserved for this purpose, while function parameters may be stored in a separate storage place.
At the end of parsing, the matching component 202 has only unambiguously detected the function parameter "ON" from the function name mapping table 210 (see Fig. 4). All the other matching input words have matched with context mapping words in the context mapping table 122, which is why they have been placed in the context buffer 206. Note that in a different embodiment, which is based on storing only those context mapping words in the context buffer which are associated with multiple functions or function parameters, also the parameter "Bob's" would have been replaced with the IP address defined for this parameter (Fig. 4, function ID 15) and put into the instruction space, as this parameter can unambiguously be determined.
In order to resolve the ambiguity represented in the fact that the context mapping word "scan" is associated with multiple functions, the identification component 204 analyzes the function IDs stored in the context buffer 206 (Fig. 6). The analysis may, e.g. comprise to compare the function IDs stored for the different context mapping words (column 604) and/or to determine function IDs common to several context mapping words. For the simple example illustrated in Fig. 6, the identification com- ponent 204 detects that the function ID "3" is common to the context mapping words "scan", "network" and "computer". The component 204 may conclude that the function referenced with ID "3" is the intended function, e.g. on the basis of the determination that the ID "3" occurs multiple times in column 604 in Fig. 6, and/or that the ID "3" is the only ID the context mapping words "network" and "computer" are asso- ciated with. The identification component 204 determines from the function name mapping table 210 the function referenced by ID "3", namely the function "Scanl- Paddress". The component 204 puts the identified function call in the instruction space 124.
Fig. 7B illustrates the status of the instruction space 124 after the identification com- ponent 204 has entirely parsed the context buffer 206 of Fig. 6. The function "scanl- Paddress" has been identified. The identification component 204 has further replaced the parameter "Bob's" by the IP address 127.0.0.7 and has put this parameter into the instruction space. Storage place provided for further functions or function parameters has not been used.
While in the simple example illustrated here only one function with two parameters is identified, in principle any number of functions and function parameters can be identified from an input text string. In practical embodiments, a context mapping table comprises a large number of functions (function IDs) and function parameters, many of them probably associated with a large number of context mapping words. For example, a context mapping table may comprise several hundred functions with several thousand function parameters and may allow up to 256 context mapping words per function/parameter. The function name mapping table, if present, then comprises a correspondingly large number of functions and function parameters.
While it is shown here that the functions are referenced with function IDs in the context mapping table, of course the functions and their parameters may also be directly referenced in the context mapping table. Instead of putting a function call in textual form in the instruction space, also a program code may be provided there, for example in textual form for later compilation or in executable form.
The identification component 206 or another component of the control device 100 or computing device 102 eventually prepares execution of the identified function. As illustrated in Fig. 7C, this may comprise to put the function call in textual form in the instruction space 124. It is to be noted that default parameters may be used in case not all parameters required for a particular function call can be identified from the input text string. The function call may instantly or at a later time be executed by the computing device 102. For example, the context mapping component 118 may provide a trigger signal (not shown in Fig. 1) to the operating system 104 of computing device 102. In response to the trigger, the operating system 104 may access the instruction space 124, extract the function call illustrated in Fig. 7C, and may then perform the function. While in Hg. 1 it has been illustrated that the control device 100 comprises a built-in speech input device with a microphone 108 and A/D converter 110, a speech input device may as well be remotely arranged from the control device. This is exemplarily illustrated in Fig. 8, in which a system 800 for controlling a computing device 802 via speech is depicted.
The system 800 comprises a separate speech input device 804 which may be connected via a data transport network 806 with a control device 808. The speech input device 800 comprises a microphone 810 and an A/D converter 812, which outputs a digital speech signal 814 much as the A/D converter 110 in Hg. 1. The speech input device 804, which may be, e.g., a mobile phone, notebook or other mobile or stationary device, comprises a data interface 816 which is adapted to establish a data transmission connection 818 via the network 806 towards the control device 808 in order to transmit the speech data 814 from the speech input device 802 to the control device 808. The transport network 804 may for example be an IP, ISDN and/or ATM network. Therefore, the data transmission connection 818 may for example be a Voice-over-IP (VoIP), ISDN, or a Voice-over-ATM (VoATM) connection, or any other hardwired or wireless connection. For example, the connection 818 may run entirely or in part(s) over a mobile network such as a GSM or UMTS network.
The control device 808 comprises an interface 820 which is adapted to extract the speech signal 814' from the data received via the transport connection 818. For instance, the interfaces 816 and 820 may each comprise an IP socket, an ISDN card, etc. The interface 820 forwards the speech data 8141 to a speech recognition component 822, which may or may not operate similarly to the speech recognition component 114 in Fig. 1. The further processing may comprise a context mapping as has been described hereinbefore. In the embodiment illustrated in Rg. 8, no context mapping is performed but the speech recognition component 822 operates to provide recognized words directly as control commands 824 to operating system 826 and/or an application 828 of the computing device 802.
As a concrete example, the speech input device 804 of Hg. 8 may be a mobile phone, the data transmission connection 818 may comprise a VoIP connection, and the control device 808 may be installed as a software application on a notebook exemplarily representing the computing device 802. For example, Skype may be used for the VoIP connection, and the control device application may make use of a speech recognition feature such as that provided with Windows Vista (Skype and Windows Vista are trademarks of Skype Limited and Microsoft Corp., respectively).
In still other embodiments, a speech recognition component such as the component 5 114 or 822 of Fig. 1 and Hg. 8, respectively, may be remotely arranged from a context mapping component such as the component 118 in Fig. 1. In these embodiments, a text string comprising one or more input words is transmitted via a data transmission connection from the speech recognition component towards the context mapping component. The considerations discussed above with respect to the em-o bodiment 800 in Fig. 8 may be applied accordingly, except that for the transmission of a data string no VoIP, VoATM or such-like speech data transmission mechanism is required.
As a general remark, the speech recognition described as part of the techniquess proposed herein may be based on any kind of speech recognition algorithm capable of converting a speech signal to a sequence of words and implemented in the form of hardware, firmware, software or a combination there from. The term Voice recognition' as known to the skilled person is - in its precise meaning - directed to identifying a person who is speaking, but is often generally interchangeably used wheno 'speech recognition' is meant. In any case, the term 'speech recognition' as used herein may or may not include 'voice recognition1.
Regarding a speech recognition algorithm, the respective speech recognition component, such as component 114 or 822 illustrated in Figs. 1 and 8, respectively, may be5 implemented together with other components on a common hardware or on a separate or dedicated hardware unit which is connectable wireless or hardwired to other components. For example, a mobile phone or smart phone adapted for speech recognition may be used, which can be connected via USB, Bluetooth, etc. with a computing device, on which, e.g., a context mapping component such as component 118o of Fig. 1 is implemented.
Fig. 9 is a flow diagram illustrating steps of an embodiment of a method 900 of controlling a computing device via speech. The method 900 may be performed using, e.g., the control device 100 of Fig. 1. 5
The method starts in step 902 with accepting a speech input, which may be provided from a speech input device such as microphone 108 and A/D converter 110 in Fig. 1. In step 904, the speech input is transformed into a text string comprising one or more input words. This step may for example be performed in a speech recognition component such as the component 108 in Fig. 1. In step 906, each one of the one or more input words is compared with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words. An example for a context mapping table is illustrated in Fig. 3. In the example control device illustrated in Figs. 1 and 2, the step 906 is performed by the matching component 202.
In step 908, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word is identified. It is to be noted that in the example configuration of Figs. 1 and 2 the step 908 of identifying the intended function may be performed in the identification component 204, but also in the matching component 202. While the identification component 204 is adapted to resolve ambiguities by appropriately operating on the context buffer 206, the matching component 202 may identify a function in the function name mapping table 210.
In step 910, the execution of the identified function is prepared, for example by providing a call of the function or an executable program code in an instruction space such as the storage component 124 depicted in Figs. 1 and 2. In step 912, the method 900 stops and waits for further speech input.
Fig. 10 is a flow diagram illustrating an embodiment of a context mapping procedure 1000. The procedure 1000 is a possible realization of at least a part of the steps 906 and 908 of Fig. 9. Essentially, procedure 1000 parses all input words of a text string such as text string 116 in Fig. 1.
In step 1002, it is determined if an input word is present. If this is the case, the procedure goes on to step 1004 wherein it is tested if the present input word is an irrelevant word, which may be determined by comparing the present word with irrelevant words stored in an irrelevant words mapping table such as table 208 illustrated in Fig. 2. In case it is determined that the present input word is an irrelevant word, in step 1006 the present word is discarded and the procedure goes back to step 1002. In case the present input word is not an irrelevant word, for example because it does not match with any word in the irrelevant words mapping table, the procedure goes on to step 1008. In this step it is tested whether the present input word matches with a context mapping word in a context mapping table such as table 122 in Figs. 1 and 2. In case it is found that the present word matches with a context mapping word, it is buffered in step 1010 in a context buffer such as buffer 206 in Fig. 2. In a particular implementation of procedure 1000, a present input word may only be buffered in the context buffer in case the matching context mapping word is associated with at least two functions or function parameters (not shown in Fig. 10).
In case the present input word does not match with a context mapping word, the procedure goes on to step 1012 with testing if the present input word matches with a function name (or function parameter name), which may be determined by comparing the input word with the function names in a function name mapping table such as table 210 in Figs. 2 and 4. In case the present word matches with a function name or function parameter name, the procedure goes on to step 1014 by putting the function name or function parameter name into an instruction space such as space 124 in Figs. 1 and 2. In case the present input word is not a function name or function parameter name, some further context mapping related conditions (not shown) such as the conditions 1004, 1008, 1012 and/or an error handling 1016 may be performed. For example, the error handling 1016 may comprise to put the present input word into an irrelevant words mapping table to enable an early classification of this input word as an irrelevant word in the future. The error handling 1016 may additionally or alternatively comprise to output information to a human user and/or to ask the user for an appropriate action. Further error handling steps may be performed throughout the procedure 1000, however, only the error handling 1016 is shown in Fig. 10 for illustrative purposes.
In case the entire input text string has been parsed, the procedure goes on from step 1002 to step 1018 by testing whether the context buffer is non-empty. In case the buffer is non-empty, one or more functions and/or function parameters are iden- tified based on buffered words. For example, a comparison of the function IDs of the buffered context mapping words may be used in this respect, as has been described further above. After having identified one or more functions/function parameters in the context buffer in step 1020, the identified function(s) and parameter(s) are put into the instruction space in step 1022 and the procedure stops by returning to step 910 of Fig. 9. It is noted that other embodiments of a context mapping procedure may depart from procedure 1000, for example, by evaluating the context mapping related conditions 1004, 1008, 1012 in different order. Fig. 11 is a flow diagram illustrating steps of a further embodiment of a method 1100 of controlling a computing device via speech. The method 1100 may be performed in a control device and in a speech input device, wherein the speech input device is remotely arranged from the control device. For example, the method 1100 may be performed using the devices 804 and 808 of Fig. 8.
The method is triggered in step 102 in that a speech input is received and accepted at the speech input device. The method goes on in step 1104 by transforming, in the speech input device, the speech input into speech data representing the speech input. For example, the step 1104 may be performed in a microphone such as microphone 810 and an A/D converter such as converter 812 in Fig. 8. In step 1106, a data transmission connection is established for transmitting the speech data between the remotely arranged speech input device and the control device. For example, a data transmission connection such as connection 818 in Fig. 8 between interfaces 816 and 820 of the speech input device 804 and the control device 808 may be established. The speech data may then be transmitted from the speech input device via the remote connection to the control device.
In step 1108, the speech data is converted in the control device into one or more control commands for controlling the computing device. In one implementation, the conversion step 1108 comprises speech recognition and context mapping as described hereinbefore with regard to the functionality of the components 114 and 118 of Fig.l. In other embodiments, only a speech recognition as implemented in the speech recognition component 114 in Fig. 1 is performed without any context mapping. In this case, the user may only speak commands he or she would otherwise enter by typing or by clicking on an appropriate button.
In the embodiment described with reference to Figs. 1 and 2, the context mapping component 118 uses the context mapping table 122, the function name mapping table 210 and the irregular words mapping table 208. In other embodiments, other tables, more tables, or less tables may be employed. For example, in an embodiment depicted in Fig. 12, a context mapping component 118' is schematically illustrated, which has access to a main words table 1202, multiple auxiliary words tables 1204 and 1206, a CMT table 122', and a context target (function name) mapping table 210'. The context mapping component 118', the CMT table 122' and the function name mapping table 210' may operate similar to the context mapping component 118, the CMT table 122 and the function name mapping table 210, respectively, as has been described hereinbefore with reference to Figs. 1 - 7, and a repetition of these functionalities is therefore omitted.
The main words table 1202 is a main primary dictionary which contains a plurality of word strings, each word string comprising a single word or several words or phrases consisting of several words, possibly in different languages. Each record in the main words table 1202 has the following structure:
A word string which represents the human readable word/s, a unique number or numerical value associated to that word name, a language model of where the word comes from, and also the actual language type to identify specifically to what sub language the word may be identified to. The following is an example of two records in the main words table 1202:
[String Word Name] car , [Primary Word Unique Numerical Number] 25 , [Language Model] O(English), [Language Type] 0 (Englishjjs)
[String Word Name] pub ,[Primary Word Unique Numerical Number] 98 , [Language Model] O(English), [Language Type] 1 (English_uk)
The main words table 1202 may be supplemented by a Language Model Table and a Language Type Table (not shown in Fig. 12). The Language Model Table is a table that identifies the general categorical languages considered by the system. An entry in this table consists of a Language Model Name and its unique identification number. An example of an entry is [Language Model Name] English , [Unique Identification Number] 0
The Language Type Table is the more specific identification of the language model. An entry consists of the Language Type Name and of its unique numerical number. Two example entry read as follows: [LanguageTypeName] English_us , [LanguageTypeUniqueNubmer] 0 [LanguageTypeName] English_uk , [LanguageTypeUniqueNubmer] 1
The auxiliary words table 1204 is a synonyms table comprising a list of word strings including both primary words and secondary words, i.e. for each secondary word, its primary or 'root' word is identified. The synonyms table 1204 has the following structure: Primary Word Unique Numerical Number and Secondary Word Unique Numerical Number
The following is an example record of the synonyms table 1204: [Primary Word Unique Numerical Number] 25 (car), 87 (automobile) [Secondary Word Unique Numerical Number]
The auxiliary words table 1206 is an antonyms table which comprises a list of words in the form of primary and secondary words, i.e. each (secondary) words direct op- posite root word is identified. The antonyms table has the following structure: Primary Word Unique Numerical Number and Secondary Word Unique Numerical Number
The following is an example record of the antonyms table 1206: [Primary Word Unique Numerical Number] 65 (hot) , 189 (cold) [Secondary Word Unique Numerical Number]
Further auxiliary words tables may be provided (not illustrated in Fig. 12). For example, a Parts Of Speech Table may comprise a list of words (both primary and secon- dary words), which associates a word with a part of speech or phrase it belongs to in a typical given sentence. The Parts Of Speech Table may have the following structure:
Primary Word Unique Numercial Number and part of speech Unique Numerical Number
The following is an example record of the antonyms table 1206:
[Primary Word Unique Numercial Number] 25 (car), 2 (noun) [part of speech Unique
Numerical Number].
The numbers are the unique numbers from the main words table, in which each word is associated with a unique number.
As a further auxiliary words table, a Definitions Table may be provided which may comprise a part or all of the words of the main words table with at least one of their proper descriptions and definitions. The Definitions Table may have the following structure: [Primary Word Unique Numercial Number] and a [Definition String] The following is an example record of the Definitions Table:
[Primary Word Unique Numercial Number] 25 (car), a vehicle running on rails, as a streetcar or railroad car. [Definition String]
Another auxiliary words table may be a Phonetic Table, which may be a table containing a sound buffer file for each word within the dictionary or main words table, so that a phonetic version for some or all of the words is available. The Phonetic Table may have the following structure: Primary Word Unique Numerical Number and Word Phonetic Sound Data Buffer
The following is an example record of the Phonetic Table: [Primary Word Unique Numerical Number] 25 (car) , !!"§$%&/(())= =?U@oaϋ (non- readable binary data) [Word Phonetic Sound Data Buffer]
Any of the main words table and the auxiliary words tables may generated and/or updated based on an external dictionary, e.g. by accessing the external dictionary via the Internet.
While the Context Mapping Table 122 comprises associations of word strings with function IDs (see Rg. 3), the Context Mapping Table 122' may comprise the unique numbers associated for the word strings (as defined in the main words table 1202) instead of the word strings. Thus, the Context Mapping Table 122' comprises all of the context that is in relation to all the actions that can be done, and also the identi- fiers to all data. The Context Mapping Table 1221 may have the following structure: Primary Word Unique ID , Context Number, and SearchForPrimaryWord (a flag variable to tell a word attraction mechanism decribed further below to search or not to search for the primary word equivalent to the current word)
An example of an entry, namely an action context, in the Context Mapping Table 122' is:
[Primary Word Unique ID] 42 (create), 0 [Context Number], 1
(true)[SearchForPrimaryWord]
[Primary Word Unique ID] 61 (folder), 0 [Context Number], 1 (true)[SearchForPrimaryWord]
Three examples of data context entries are:
[Primary Word Unique ID] 35 (vacation), 45 [Context Number] [Primary Word Unique ID] 37 (photos), 45 [Context Number] [Primary Word Unique ID] 61 (folder), 45 [Context Number]
The functions names mapping table 210 or Context Target Table 210' define the actions or meanings to the context mapping table. For example, in the Context Target Table 210' a particular function ID is associated with a particular executable code (and/or data). The structure of the Context Target Table 210' is as follows: Context Number (or function ID), and Context Target Native Code (for example of an action context)
An example of an action context entry in the Context Mapping Table 210' is: [Context Number] 0 , CreateFolder(void); [Context Target Native Code]
An example of a data context entry in the Context Mapping Table 210' is: [Context Number] 45 , OpenFolderC'C/VacationPhotosFolder"); [Context Target Native Code]
With reference to Figs. 13a - 13c, an example of the processing of a sequence of input word strings in the context mapping component 118' of Fig. 12 is described.
It is assumed that the user says the following sentence: "Computer I would like you to open up the folder that contains all of my vacation photos."
This natural language phrase is converted into a text string in a speech recognition component such as component 114 in Fig. 1. In a first step, the text string is seper- ated into individual words (or word strings) "Computer", "I", "would", "like", "you", "to", "open", "up", "the", "folder", "that", "contains", "all", "of, "my", "vacation", "photos".
In a next step, based on the associations defined in the main words table 1202, each word (string) is replaced with its unique number or numerical value in a replacement component 1208. As a result, a sequence of unique number is obtained as depicted in Fig. 13a). The internal representation of the strings by the unique numbers enables a faster processing and thus a faster execution of the actions required by the speaking user. The next step is an optional step which is performed depending on whether the programmer has set the configuration value of [SearchForPrimaryWord] for some or all of the words of the input string. In this case any secondary word will be replaced based on the auxiliary tables with their associated primary words (this processing is not performed on the word strings, but on the associated unique numbers for each of the primary and secondary words as defined in the main words table). Referring to the example entry given above for the synonyms table 1204, for example an input word "car" may be replaced in the sequence of words by the word "automobile". More precisely, the unique number 25 associated with "car" in the main words table 1202 would be replaced with the number 87 associated with "automobile". In the example of Fig. 13a we assume that all words are primary words, i.e. no replacement takes place for the sake of illustration.
In the next step, based on the context mapping table 122', for each unique number one or more matching contexts are identified, i.e. for each matching context mapping word (more precisely, the unique number representing this word) one or more functions associated therewith are identified. For the example sentence discussed here the result is illustrated in Fig. 13b.
In the next step, the so-called word attraction mechanism is performed. For each context number (function ID), the number of matches is counted, i.e. an Attraction Value is defined which indicates how many times a given context number occurs within the query of all patterns associated with the particular asked number entity. For the example of Fig. 13b, the result is illustrated in Fig. 13c). The list illustrated in Fig, 13c) is analyzed to determine the function with the highest attraction, i.e. the highest attraction value. In this simple example, only one context number (function ID) has attracted more than one match, namely the function represented by the unique number 45, which attracted four matches.
In the next step, the function corresponding to the context number with the highest attraction is identified from the context target table 210' (function name mapping table). The table 210' may comprise the following entry: [Context Number] 45 , OpenFolderC'C/VacationPhotosFolder"); [Context Target Native Code] The word attraction mechanism further increases the reliability of the context mapping, as in most practical cases a particular function (context number) attracts most of the matches; thus a clear identification of the wanted function can be achieved.
In a final step, the action defined in the entry identified in the context target table is executed, i.e. the function OpenFolder("C:/VacationPhotosFolder") is called, which may be taken directly from the context target (code) table 210'. Instead of only providing a one-to-one mapping of spoken command to machine- readable command, the context-mapping related techniques proposed herein allow the user to describe a command or function within various contexts, i.e. they propose to introduce redundancy into the speech recognition/control process. The user is not required to speak exactly the same command he or she would otherwise type, but may describe the intended command or function in his own words, in different lan- guages, or in any other context. The deployed speech control device or system needs to be appropriately configured, e.g. by providing the relevant context mapping words in the context mapping table. In this way the proposed techniques allows to provide a more reliable speech control.
The context-related descriptions or circumscriptions of the user may of course also be related to more than only one function or command. For example, a spoken request "Please search for Searchjtem" may be transformed and converted into a function or functions searching for accordingly named files and occurrences of 'Searchjtem1 in files present locally on the computing device, but may further be converted and transformed into a function searching a local network and/or the web for 'Searchjtem'. Further, the same function may also be performed multiple times, for example when transforming and converting the sentence '"Please scan the network for my friend's computers, if they are on", in which "friend's" may be transformed into a list of IP addresses to be used in consecutive network searches. Therefore, the proposed techniques are also more powerful than speech recognition techniques providing only a one-to-one mapping of spoken commands to machine commands.
The proposed speech control devices and systems are more user-friendly, as they may not require the user to know machine-specific or application-specific commands. An appropriately configured device or system is able to identify functions or commands described by users not common with technical terms. For this reason, the speech input is also simplified for the user; the user may just describe in his own terms what he or she wants the computing device to do. This at the same time accelerates speech control, as a user allowed to talk in his or her own terms may produce fewer errors, which reduces wrong inputs.
The techniques proposed herein do not use excessive resources. Smaller control devices and systems may be developed in any programming language and make use of storage resources in the usual ways. Control devices and systems intended for larger function sets may be based on existing database technologies. The techniques are applicable for implementation on single computing devices such as mobile phones or personal computers as well as for implementation in a network-based client-server architecture.
The techniques proposed herein also provide an increased flexibility for speech con- trol. This is due to the fact that any device providing a speech input and speech data transmission facility, such as a mobile phone, but also many notebooks or conventional hardwired telephones may be used as speech input device, while the speech recognition and optional context mapping steps may be performed either near to the computing device to be controlled or at still another place, for example at a respec- tive node (e.g., server) in a network.
While the current invention has been described in relation to its preferred embodiments, it is to be understood that this disclosure is for illustrative purposes only. Accordingly, it is intended that the invention be limited only by the scope of the claims appended hereto.

Claims

Claims
1. A method of controlling a computing device via speech, comprising the following steps: - transforming speech input into a text string comprising one or more input words;
- comparing each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words;
- identifying, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and - preparing an execution of the identified function.
2. The method according to claim 1, wherein a context mapping word represents in textual form an aspect of the functionality of the function the context mapping word is associated with.
3. The method according to claim 1 or 2, wherein multiple context mapping words associated with a function represent alias names of the function the context mapping words are associated with.
4.The method according to any of the preceding claims, wherein context mapping words represent a function or one or more aspects of it in different human languages.
5. The method according to any of the preceding claims, wherein a context mapping word is associated with a function parameter.
6. The method according to any of the preceding claims, wherein the step of preparing the execution of the identified function comprises at least one of providing a text string representing a call of the identified function and providing an executable program code representing the identified function on the computing device.
7. The method according to any of the preceding claims, wherein the step of identifying the function comprises, in case an input word matches a context mapping word associated with multiple functions, identifying one function of the multiple functions which is associated with multiple matching context mapping words.
8. The method according to claim 7, wherein the step of comparing each one of the one or more input words with context mapping words comprises the step of buffering an input word in a context buffer in case the input word matches a context mapping word that is associated with two or more functions.
9. The method according to claim 8, wherein the step of buffering the input word comprises buffering the input word in the context buffer including, for each of the two or more functions or function parameters associated with the input word, an indication of the function or function parameter.
10. The method according to claim 9, wherein the step of identifying the function comprises comparing indications of functions or function parameters of two or more input words buffered in the context buffer and identifying corresponding indications.
11. The method according to any of the preceding claims, comprising the further step of comparing an input word with function names in a function name mapping table, in which each of the function names represents one of the functions for controlling the computing device.
12. The method according to claim 11, comprising the further step of identifying, in case the input word matches with at least a part of a function name, the function associated with the at least partly matching function name.
13. The method according to claim 11, wherein the function name mapping table further comprises function parameters for comparing the function parameters with input words.
14. The method according to any one of claims 11 to 13, wherein entries corresponding to the same function or function parameter in the context mapping table and the function name mapping table are linked with each other.
15. The method according to claim 14, wherein a linked entry in the function name mapping table is associated with executable program code representing at least a part of a function.
16. The method according to any of the preceding claims, comprising the further steps of
- comparing input words with irrelevant words in an irrelevant words mapping table; and - in case an input word matches with an irrelevant word, excluding the input word from identifying the function.
17. The method according to any one of the preceding claims, further comprising the preparatory steps of - establishing a main words table comprising multiple word strings, each word string representing one or more potential input words in a string format and each word string being associated with a unique number;
- establishing the context mapping table, wherein the context mapping words are represented by the unique numbers associated with the word string corre- sponding to the context mapping word in the main words table; and wherein the step of comparing input words with context mapping words comprises the steps of
- representing each input word by its number as specified in the main words table; and - comparing the number representations of input words and the number representations of context mapping words with each other in order to determine matches of input words to context mapping words.
18. The method according to any one of the preceding claims, wherein the step of identifying the function comprises the steps of
- identifying for each matching context mapping word one or more functions associated therewith; - determining for each function an attraction value indicating how often the function has been identified; and
- identifying the function with the highest attraction value.
19. The method according to any one of the preceding claims, comprising the preparatory step of
- establishing one or more auxiliary words tables, each auxiliary words table comprising associations of a primary word with one or more secondary words; and wherein the step of comparing input words with context mapping words com- prises
- determining, based on the auxiliary words tables, if an input word matches with a secondary word, and
- in case of a match, selectively replacing the input word with the primary word associated with the matching secondary word.
20. The method according to claim 19, wherein the secondary words comprise at least one of synonyms, antonyms, word type representations, definitions of the respectively associated primary words, and phonetic representations of the secondary words.
21. The method according to claim 19, comprising the preparatory step of generating or updating at least one of the main words table and the auxiliary words tables based on an external dictionary.
22. A method of controlling a computing device via speech, wherein the method is performed in a control device and in a speech input device remotely arranged from the control device, the method comprising the steps of
- transforming, in the speech input device, speech input into speech data representing the speech input; - establishing a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the control device; and
- converting, in the control device, the speech data into one or more control commands for controlling the computing device.
23. A computer program product comprising program code portions for performing the steps of any one of the preceding claims when the computer program product is executed on one or more computing devices.
24. The computer program product of claim 23, stored on a computer readable recording medium.
25. A control device for controlling a computing device via speech, comprising: - a speech recognition component adapted to transform speech input into a text string comprising one or more input words;
- a matching component adapted to compare each one of the one or more input words with context mapping words in a context mapping table, in which at least one context mapping word is associated with at least one function for controlling the computing device and at least one of the at least one function is associated with multiple context mapping words;
- an identification component adapted to identify, in case at least one of the one or more input words matches with one of the context mapping words, the function associated with the matching context mapping word; and - a preparation component adapted to prepare an execution of the identified function.
26. The control device according to claim 25, the control device being implemented on the mobile or stationary computing device.
27. A system for controlling a computing device via speech, wherein the system comprises a control device and a speech input device; and
- the speech input device is adapted to transform speech input into speech data representing the speech input; - the control device is adapted to convert the speech data into one or more control commands for controlling the computing device; and
- each of the speech input device and the control device comprises a data interface adapted to establish a data transmission connection for transmitting the speech data between the remotely arranged speech input device and the con- trol device.
PCT/EP2008/005691 2007-07-11 2008-07-11 Speech control of computing devices WO2009007131A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP07013604 2007-07-11
EP07013604.9 2007-07-11
US11/843,982 2007-08-23
US11/843,982 US20090018830A1 (en) 2007-07-11 2007-08-23 Speech control of computing devices

Publications (1)

Publication Number Publication Date
WO2009007131A1 true WO2009007131A1 (en) 2009-01-15

Family

ID=39712317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/005691 WO2009007131A1 (en) 2007-07-11 2008-07-11 Speech control of computing devices

Country Status (1)

Country Link
WO (1) WO2009007131A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584544B2 (en) 2006-11-20 2009-09-08 Technological Resources Pty, Ltd. Gravity gradiometer
ES2600145A1 (en) * 2015-08-06 2017-02-07 Proyectos Y Soluciones Tecnológicas Avanzadas, S.L.P. Instant messaging system (Machine-translation by Google Translate, not legally binding)
US11234935B2 (en) 2015-07-07 2022-02-01 Perora Gmbh Method of inducing satiety
US11311570B2 (en) 2014-08-11 2022-04-26 Perora Gmbh Method of inducing satiety
US11504330B2 (en) 2014-08-11 2022-11-22 Perora Gmbh Formulation comprising particles containing a water-swellable or water-soluble polymeric component and a lipid component

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5652897A (en) * 1993-05-24 1997-07-29 Unisys Corporation Robust language processor for segmenting and parsing-language containing multiple instructions
WO2000004533A1 (en) * 1998-07-14 2000-01-27 Intel Corporation Automatic speech recognition
WO2000026901A2 (en) * 1998-11-05 2000-05-11 Dragon Systems, Inc. Performing spoken recorded actions
US20030187653A1 (en) * 2001-03-27 2003-10-02 Atsushi Okubo Action teaching apparatus and action teaching method for robot system, and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5652897A (en) * 1993-05-24 1997-07-29 Unisys Corporation Robust language processor for segmenting and parsing-language containing multiple instructions
WO2000004533A1 (en) * 1998-07-14 2000-01-27 Intel Corporation Automatic speech recognition
WO2000026901A2 (en) * 1998-11-05 2000-05-11 Dragon Systems, Inc. Performing spoken recorded actions
US20030187653A1 (en) * 2001-03-27 2003-10-02 Atsushi Okubo Action teaching apparatus and action teaching method for robot system, and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584544B2 (en) 2006-11-20 2009-09-08 Technological Resources Pty, Ltd. Gravity gradiometer
US11311570B2 (en) 2014-08-11 2022-04-26 Perora Gmbh Method of inducing satiety
US11504330B2 (en) 2014-08-11 2022-11-22 Perora Gmbh Formulation comprising particles containing a water-swellable or water-soluble polymeric component and a lipid component
US11234935B2 (en) 2015-07-07 2022-02-01 Perora Gmbh Method of inducing satiety
ES2600145A1 (en) * 2015-08-06 2017-02-07 Proyectos Y Soluciones Tecnológicas Avanzadas, S.L.P. Instant messaging system (Machine-translation by Google Translate, not legally binding)
WO2017021579A1 (en) * 2015-08-06 2017-02-09 Proyectos Y Soluciones Tecnologicas Avanzadas, S.L.P. Instant messaging system

Similar Documents

Publication Publication Date Title
US20090018830A1 (en) Speech control of computing devices
KR102117574B1 (en) Dialog system with self-learning natural language understanding
CN107924483B (en) Generation and application of generic hypothesis ranking model
KR102417045B1 (en) Method and system for robust tagging of named entities
AU2015210460B2 (en) Speech recognition repair using contextual information
US5425128A (en) Automatic management system for speech recognition processes
EP3477635B1 (en) System and method for natural language processing
US20150279366A1 (en) Voice driven operating system for interfacing with electronic devices: system, method, and architecture
US8494862B2 (en) Method for triggering at least one first and second background application via a universal language dialog system
EP3241214A1 (en) Generation of language understanding systems and methods
US8509396B2 (en) Automatic creation of complex conversational natural language call routing system for call centers
GB2437436A (en) Voice recognition device and method, and program
EP2887229A2 (en) Communication support apparatus, communication support method and computer program product
JP5703491B2 (en) Language model / speech recognition dictionary creation device and information processing device using language model / speech recognition dictionary created thereby
CN110968245B (en) Operation method for controlling office software through voice
WO2009007131A1 (en) Speech control of computing devices
WO2016008128A1 (en) Speech recognition using foreign word grammar
JP2006053906A (en) Efficient multi-modal method for providing input to computing device
US11227116B2 (en) Translation device, translation method, and program
US20070016420A1 (en) Dictionary lookup for mobile devices using spelling recognition
CN101995963B (en) Vocabulary self-adaption Chinese input method
JP4042360B2 (en) Automatic interpretation system, method and program
JP5208795B2 (en) Interpreting device, method, and program
JP5139499B2 (en) Extraction apparatus, extraction method, program, and information processing apparatus for distributing program
JP7237356B2 (en) CAD control support system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08784730

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15-04-2010)

122 Ep: pct application non-entry in european phase

Ref document number: 08784730

Country of ref document: EP

Kind code of ref document: A1