EP2849054A1 - Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale - Google Patents

Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale Download PDF

Info

Publication number
EP2849054A1
EP2849054A1 EP20140160944 EP14160944A EP2849054A1 EP 2849054 A1 EP2849054 A1 EP 2849054A1 EP 20140160944 EP20140160944 EP 20140160944 EP 14160944 A EP14160944 A EP 14160944A EP 2849054 A1 EP2849054 A1 EP 2849054A1
Authority
EP
European Patent Office
Prior art keywords
identification information
control object
information
selecting
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP20140160944
Other languages
German (de)
English (en)
Inventor
Jongwon Shin
Semi Kim
Kanglae Jung
Jeongin Doh
Jehseon Youn
Kyeongsun Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Diotek Co Ltd
Original Assignee
Diotek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diotek Co Ltd filed Critical Diotek Co Ltd
Publication of EP2849054A1 publication Critical patent/EP2849054A1/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to an apparatus and a method for selecting a control object through voice recognition, and more particularly, to an apparatus and a method for selecting a control object through voice recognition by using first identification information based on display information about a control object.
  • a typical user interface depends on a physical input through an input device such as a keyboard, a mouse, or a touch screen.
  • an input device such as a keyboard, a mouse, or a touch screen.
  • a user interface capable of improving accessibility to the electronic device there is a voice recognition technique that controls the electronic device by analyzing a voice of a user.
  • a control command to be matched to the voice of the user needs to be previously stored in the electronic device.
  • a basic setting of the electronic device for example, a basic control of the electronic device such as the volume control or the brightness control of the electronic device can be performed through voice recognition.
  • control command to be matched to the voice of the user needs to be stored in each individual application.
  • the present invention may provide an apparatus and a method capable of controlling an electronic device through voice recognition even when a user uses an application that does not store a control command in advance.
  • the present invention may also provide an apparatus and a method capable of selecting multi-lingual control objects through voice recognition without distinction of a language used by a user.
  • the apparatus for selecting a control object through voice recognition may include one or more processing devices, in which the one or more processing devices may be configured to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.
  • the second identification information may include synonym identification information which may be a synonym of the first identification information.
  • the second identification information may include at least one of translation identification information in which the first identification information may be translated in a reference language and phonetic identification information in which the first identification information may be phonetically represented as the reference language.
  • the second identification information may include pronunciation string identification information which may be a pronunciation string of the first identification information.
  • the one or more processing devices may display the second identification information.
  • the first identification information may be obtained based on display information about the control object.
  • the first identification information may be obtained based on application screen information.
  • the first identification information may be obtained through optical character recognition (OCR).
  • OCR optical character recognition
  • the first identification information may correspond to a symbol obtained based on the control object.
  • the input information may include voice pattern information obtained by analyzing a feature of the voice of the user, and the matching of the input information to the identification information may include matching of the identification information to the voice pattern information.
  • the input information may include text information recognized from the voice of the user through voice recognition, and the matching of the input information to the identification information may include matching of the identification information to the text information.
  • the method for selecting a control object through voice recognition may include obtaining input information on the basis of a voice of a user; matching the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information; obtaining matched identification information matched to the input information within the first identification information and the second identification information; and selecting a control object corresponding to the matched identification information.
  • the second identification information may include synonym identification information which may be a synonym of the first identification information.
  • the second identification information may include at least one of translation identification information in which the first identification information may be translated in a reference language and phonetic identification information in which the first identification information may be phonetically represented as the reference language.
  • the second identification information may include pronunciation string identification information which may be a pronunciation string of the first identification information.
  • the method may further include displaying the second identification information.
  • the command sets may cause the computing apparatus to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.
  • control object selecting apparatus even when the control commands are not previously stored in an application, since the electronic device can be controlled through the voice recognition, accessibility of the user to the electronic device can be improved.
  • multi-lingual control objects can be selected through voice recognition without distinction of a language used by a user, so that it is possible to improve convenience of the user.
  • first, second, and the like are used in order to describe various components, the components are not limited by the terms. The above terms are used only to discriminate one component from the other component. Therefore, a first component mentioned below may be a second component within the technical spirit of the present invention.
  • Respective features of various exemplary embodiments of the present invention can be partially or totally joined or combined with each other and as sufficiently appreciated by those skilled in the art, various interworking or driving can be technologically achieved and the respective exemplary embodiments may be executed independently from each other or together executed through an association relationship.
  • any one element for the present specification 'transmits' data or signal to other elements, it means that the element may directly transmit the data or signal to other elements or may transmit the data or signal to other elements through another element.
  • Voice recognition basically means that an electronic device analyzes a voice of a user and recognizes the analyzed content as text. Specifically, when a waveform of the voice of the user is input to the electronic device, voice pattern information can be obtained by analyzing a voice waveform by referring to an acoustic model. Further, text having the highest matching probability in first identification information and second identification information can be recognized by comparing the obtained voice pattern information with the first identification information and the second identification information.
  • a control object in the present specification means an interface such as a button that is displayed on a screen of an apparatus for selecting a control object to receive an input of the user, and when the input of the user is applied to the displayed control object, the control object may perform a control operation that is previously determined by the apparatus for selecting a control object.
  • the control object may include an interface, such as a button, a check box and a text input field, that can be selected by the user through a click or a tap, but is not limited thereto.
  • the control object may be all interfaces that can be selected through an input device such as a mouse or a touch screen.
  • Input information in the present specification means information obtained through a part of the voice recognition or the whole voice recognition on the basis of the voice of the user.
  • the input information may be voice pattern information obtained by analyzing a feature of a voice waveform of the user.
  • voice pattern information may include voice feature coefficients extracted from the voice of the user for each short-time so as to express acoustic features.
  • the first identification information in the present specification means text that is automatically obtained based on the control object through the apparatus for selecting a control object, and the second identification information means text obtained so as to correspond to the first identification information.
  • the second identification information may include 'synonym identification information' which is a synonym of the first identification information, 'translation identification information' in which the first identification information is translated in a reference language, 'phonetic identification information' in which the first identification information is phonetically represented as the reference language, and 'pronunciation string identification information' which is a pronunciation string of the first identification information.
  • the first identification information may be obtained based on display information about the control object, application screen information, text information about the control object, or description information about the control object, and the relevant descriptions will be presented below with reference to FIG. 3 .
  • the display information about the control object in the present specification means information used to display a certain control object.
  • information about an image or icon of an object, and a size or position of the control object may be the display information.
  • the control object may be displayed on the screen of the apparatus for selecting a control object on the basis of values of items constituting the display information or paths to reach the values.
  • the application screen information in the present specification means information used to display a certain screen in the application run in the apparatus for selecting a control object.
  • the text information about the control object in the present specification means a charter string indicating the control object, and the character string may be displayed together with the control object.
  • the description information about the control object in the present specification means information written by a developer to describe the control object.
  • the first identification information may correspond to a symbol obtained based on the control object, and the symbol and the first identification information may be in one-to-one correspondence, one-to-multi correspondence, multi-to-one correspondence, or multi-to-multi correspondence.
  • the first identification information corresponding to the symbol will be described below with reference to FIGS. 9 and 10 .
  • the symbol in the present specification means a figure, a sign, or an image that can be interpreted as a certain meaning without including text.
  • the symbol of the control object may generally imply a function performed by the control object in the application.
  • the symbol ⁇ ' may generally mean that a sound or an image is played
  • the symbol '+' or '-' may mean that an item is added or removed.
  • the symbol may be obtained based on the display information about the control object or the application screen information.
  • FIG. 1 illustrates a block diagram of an apparatus for selecting a control object according to an exemplary embodiment of the present invention.
  • an apparatus for selecting a control object (hereinafter, also referred to as a "control object selecting apparatus") 100 according to the exemplary embodiment of the present invention a processor 120, a memory controller 122, and a memory 124, and may further include an interface 110, a microphone 140, a speaker 142, and a display 130.
  • the control object selecting apparatus 100 is a computing apparatus capable of selecting a control object through voice recognition, and includes one or more processing devices.
  • the control object selecting apparatus may be devices such as a computer having an audio input function, a notebook PC, a smart phone, a tablet PC, navigation, PDA (Personal Digital Assistant), a PMP (Portable Media Player), a MP3 player, and an electronic dictionary, or may be a server capable of being connected to such devices or a distributed computing system including a plurality of computers.
  • the one or processing devices may include at least one or more processors 120 and the memory 124, and the plurality of processors 120 may share the memory 124.
  • the processing devices are configured to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.
  • control object When the 'matched identification information' having the highest matching probability within the first identification information is recognized, a control object corresponding to the ⁇ matched identification information.' Accordingly, even though a control command matched to the voice of the user is stored, the control object can be selected by the control object selecting apparatus.
  • control object selecting apparatus 100 uses only the first identification information in order to select the control object, a control obj ect intended by the user may not be selected due to influences of various factors such as linguistic habits of the user or a language environment to which the user belongs.
  • control object selecting apparatus 100 uses the second identification information corresponding to the first identification information as well as the first identification information so as to take account of various factors such as linguistic habits of the user or a language environment to which the user belongs.
  • identification information having the highest matching probability within the first identification information and the second identification information can be recognized, and a control object corresponding to the recognized identification information can be selected.
  • a time of obtaining the second identification information or whether to store the second identification information may be implemented in various manners. For example, when the first identification information is obtained based on the control object, the control object selecting apparatus 100 may immediately obtain the second identification information corresponding to the obtained first identification information, store the obtained second identification information, and then use the stored second identification information together with the first identification information.
  • the control object selecting apparatus 100 may obtain the second identification information corresponding to the first identification information. That is, the control object selecting apparatus 100 may obtain the second identification information corresponding to the first identification information as necessary and use the obtained second identification information.
  • the memory 124 stores a program or a command set, and the memory 124 may include a RAM (Random Access Memory), a ROM (Read-Only Memory), a magnetic disk device, an optical disk device, and a flash memory.
  • the memory 124 may store a language model DB that provides the voice pattern information and the text corresponding to the voice pattern information, or may store a DB that provides the second identification information corresponding to the first identification information.
  • the DBs may be disposed at the outside connected to the control object selecting apparatus via a network.
  • the memory controller 122 controls the access of units such as the processor 120 and the interface 110 to the memory 124.
  • the processor 120 performs operations for executing the program or the command set stored in the memory 124.
  • the interface 110 connects an input device such as the microphone 140 or the speaker 142 of the control object selecting apparatus 100 to the processor 120 and the memory 124.
  • the microphone 140 receives a voice signal, converts the received voice signal into an electric signal, and provides the converted electric signal to the interface 110.
  • the speaker 142 converts the electric signal provided from the interface 110 into a voice signal and outputs the converted voice signal.
  • the display 130 displays visual graphic information to a user, and the display 130 may include a touch screen display that detects a touch input.
  • the control object selecting apparatus 100 selects a control object through voice recognition by using the program (hereinafter, referred to as a "control object selecting engine") that is stored in the memory 124 and is executed by the processor 120.
  • a control object selecting engine the program that is stored in the memory 124 and is executed by the processor 120.
  • the control object selecting engine is executed in a platform or a background of the control object selecting apparatus 100 to obtain information about the control object from an application and causes the control object selecting apparatus 100 to select the control object through the voice recognition by using the first identification information obtained based on the information about the control object and the second identification information corresponding to the first identification information.
  • FIG. 2 is a flowchart of a method for selecting a control object according to an exemplary embodiment of the present invention. For the sake of convenience in description, the description will be made with reference to FIG. 3 .
  • FIG. 3 illustrates first identification information obtained in the control object selecting apparatus according to the exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the control object selecting apparatus obtains input information on the basis of the voice of the user (S100).
  • the input information is voice pattern information obtained by analyzing a feature of the voice of the user, but is not limited thereto.
  • the input information may be all information that can be obtained through a part of the voice recognition or the whole voice recognition on the basis of the voice of the user.
  • the control object selecting apparatus matches the input information to at least one first identification information obtained based on the control object and second identification information corresponding to the first identification information (S110).
  • a 'route button' 152 when a subway application 150 is running on the control object selecting apparatus 100, a 'route button' 152, a 'schedule button' 154, a 'route search button' 156, and a 'update button' 158 correspond to control objects.
  • the first identification information may be obtained based on the display information about the control object.
  • display information 252, 254, 256 and 258 of information 200 about control objects may include a 'width' item, a 'height' item, a 'left' item and a 'top' item which are items 252A, 254A, 256A and 258A for determining sizes and positions of the control objects and values of 'img' items 252B, 254B, 256B and 258B that provides links to images of the control objects.
  • the aforementioned items 252A, 254A, 256A, 258A, 252B, 254B, 256B and 258B are arbitrary defined for the sake of convenience in description, and the kinds, number and names of items of the display information 252, 254, 256 and 258about the control objects may be variously modified.
  • the values of the 'img' items 252B, 254B, 256B and 258B that provides the links of the images of the control objects 152, 154, 156 and 158 may be character strings for representing image file paths ('x.jpg,' 'y.jpg,' 'z.jpg,' and 'u.jpg') of the control objects 152, 154 and 156 or the images themselves.
  • Widths and heights of the images of the control objects 152, 154, 156 and 158 are determined by the values of the 'width' item and the 'height' item among the items 252A, 254A, 256A and 258A for determining the sizes and positions of the control objects, and display positions of the control objects 152, 154, 156 and 158 are determined by the values of the 'left' item and the 'top' item. In this way, areas where the control objects 152, 154, 156 and 158 are displayed can be determined.
  • the 'route button' 152 may be displayed as an image by the 'x.jpg' of the 'img' item 252B.
  • the 'x.jpg' is merely an example, and the control object may be displayed as an image by various types of files.
  • the image 'x.jpg' includes a text capable of being identified as a 'route,' and also when optical character recognition (OCR) is performed on the image 'x.jpg', the text 'route' included in the image 'x.jpg' is recognized.
  • OCR optical character recognition
  • the recognized text 'route' corresponds to first identification information. That is, the first identification information obtained based on the 'route button' 152 corresponds to a 'route.
  • first identification information obtained based on the 'schedule button' 154 corresponds to a 'schedule
  • first identification information obtained based on the 'route search button' 156 corresponds to 'route search
  • first identification information obtained based on the 'update button' 158 corresponds to 'update.
  • the second identification information is text obtained so as to correspond to the first identification information, and may be synonym identification information which is a synonym of the first identification information as illustrated in FIG. 3 . That is, the second identification information corresponding to the first identification information 'route' may be synonym identification information which is a synonym of the first identification information, such as 'railroad,' or 'path.' Further, the second identification information corresponding to the first identification information 'update' in English may be synonym identification information which is a synonym of the first identification information, such as 'renew,' 'revise.' Meanwhile, when the first identification information includes a plurality of words, the second identification may be obtained for each word.
  • the synonym identification information may be provided to the control object selecting apparatus through a synonym DB that stores synonyms of words.
  • the synonym DB may be included in the control object selecting apparatus, or may provide synonym identification information to the control object selecting apparatus by being connected to the control object selecting apparatus via a network.
  • the synonym identification information may include synonyms within a language different from the first identification information in addition to synonyms within the same language as the first identification information, and the synonyms within the different language may means that the synonym identification information is translated in a reference language.
  • the second identification information may be the synonym identification information as described above, or the second identification information may be translation identification information in which the first identification information is translated in the reference language, phonetic identification information in which the first identification information is phonetically represented as the reference language, and pronunciation string identification information which is a pronunciation string of the first identification information.
  • the second identification information may be the synonym identification information as described above, or the second identification information may be translation identification information in which the first identification information is translated in the reference language, phonetic identification information in which the first identification information is phonetically represented as the reference language, and pronunciation string identification information which is a pronunciation string of the first identification information.
  • Various types of second identification information will be described below with reference to FIGS. 4 and 5 .
  • the obtained voice pattern is compared with the first identification information and the second identification information through the matching of the first identification information and the second identification information to the input information, that is, the matching of the identification information to the voice pattern information, and the matched identification information having the same pattern as or the most similar pattern to the voice pattern within the first identification information and the second identification information is determined.
  • the voice pattern information may be matched to the first identification information and the second identification information.
  • the first identification information and the second identification information may be matched to the voice pattern information through static matching, cosine similarity comparison, or elastic matching.
  • the control object selecting apparatus determines whether or not matched identification information matched to the input information exists as a matching result of the first identification information and the second identification information to the input information (S120).
  • the matched identification information having the same pattern as or the most similar pattern to the obtained voice pattern within the first identification information and the second identification information is determined as the matched identification information.
  • control object selecting apparatus may wait before the input information is obtained again, or may request for the user to make a voice again.
  • the control object selecting apparatus obtains the matched identification information (S130).
  • the second identification information 'path finding' corresponding to the first identification information 'route search' within the identification information 'rote,' 'schedule,' 'route search,' and 'update' and the second identification information corresponding to the first identification information may correspond to the matched identification information.
  • control object selecting apparatus selects a control object corresponding to the matched identification information (S150).
  • control object selecting apparatus 100 selects the 'route search button' 156.
  • the selecting of the control object may be performed through an input event or a selection event.
  • the event means an occurrence or an action that can be detected from the program, and examples of the event may include an input event for processing an input, an output event for processing an output, and a selection event for selecting a certain object.
  • the input event may be generated when an input such as a click, a touch or a key stroke is applied through an input device such as a mouse, a touchpad, a touch screen or a keyboard, or may be generated by processing an input as being virtually applied even though an actual input is not applied through the aforementioned input device.
  • an input such as a click, a touch or a key stroke
  • an input device such as a mouse, a touchpad, a touch screen or a keyboard
  • the selection event may be generated to select a certain control object, and the certain control object may be selected when the aforementioned input event, for example, a double click event or a tap event, occurs for the certain control object.
  • control object selecting apparatus even when the control commands are not previously stored in an application, since the electronic device can be controlled through the voice recognition, accessibility of the user to the electronic device can be improved.
  • the first identification information may be obtained in various manners.
  • the first identification information may be obtained based on text information about the control object.
  • the information 200 about the control object selecting information may include text information 242, 244, 246 and 248 about the control objects.
  • the text When text is included in an image of the control object, the text is recognized through the optical character recognition, so that the first identification information can be obtained.
  • the first identification information as the text can be immediately obtained from the text information.
  • a part of the text information about the control object may be obtained as the first identification information.
  • each word may be obtained as individual first identification information corresponding to the control object.
  • the first identification information may be obtained based on description information about the control object.
  • the description information is information in which a developer writes description on the control object
  • the description information includes a quantity of text larger than the text information. At this time, when the entire description is obtained as the first identification information, matching accuracy or matching speed of the identification information to the input information may be decreased.
  • the description information about the control object includes a plurality of words, only a part of the description information may be obtained as the first identification information. Furthermore, each part of the description information may be obtained as individual first identification information corresponding to the control object.
  • the first identification information may be obtained based on application screen information.
  • control object selecting apparatus may determine the control object to be displayed in a first area within the application screen where the text is displayed and a second area corresponding to the first area, and may allow the text in the first area to correspond to the determined control object.
  • the second area corresponding to the first area where the text is displayed may be an area including at least a part of a block where the text is displayed, an area closest to the block where the text is displayed, or an area such as an upper end or a lower end of the block where the text is displayed.
  • the second area corresponding to the first area is not limited to the aforementioned areas, and may be determined in various manners.
  • the display information about the control object may be referred.
  • the first identification information may be obtained in various manners. Only one first identification information need not exist for each the control object, and a plurality of first identification information may correspond to one control object.
  • the first identification information may be obtained by the control object selecting engine, but is not limited thereto.
  • the first identification information may be obtained by an application being run.
  • FIG. 4 illustrates the first identification information obtained in the control object selecting apparatus according to the exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the second identification information may be translation identification information in which the first identification information is translated in a reference language.
  • the reference language is set to English, for example.
  • the second identification information corresponding to the first identification information may be translation identification information in which the first identification information is translated in English, such as 'route,' or 'line.
  • the reference language may be set based on locale information such as positional information of the control object selecting apparatus, a language set by the user or regional information.
  • the reference language may be relatively determined depending on the first identification information. For example, when the first identification information is in Korean, the first identification information is translated in English, and when the first identification information is in English, the first identification information is translated in Korean.
  • the second identification information corresponding to the first identification information may be translation identification information in which the first identification information 'update' is translated in Korean, such as ' (update).
  • the translation identification information may be provided to the control object selecting apparatus through a dictionary DB that stores translated words of words.
  • the dictionary DB may include a word bank and a phrase bank, but may include only the word bank in order to provide translation identification information of the first identification information, that is, translated words of words.
  • the dictionary DB may be included in the control object selecting apparatus, or may provide the translation identification information to the control object selecting apparatus by being connected to the control object selecting apparatus via a network.
  • the second identification information may be phonetic identification information in which the first identification information is phonetically represented as the reference language.
  • the reference language is set to Korean, for example.
  • the second identification information corresponding to the first identification information 'update' may be phonetic identification information in which the first identification information is phonetically represented in Korean, such as ' (upadate),' or ' (update).'
  • the reference language may be set based on locale information such as positional information of the control object selecting apparatus, a language set by the user or regional information.
  • the reference language may be relatively determined depending on the first identification information. For example, when the first identification information is in Korean, the first identification information is phonetically represented in English, and when the first identification information is in English, the first identification information is phonetically represented in Korean.
  • the second identification information corresponding to the first identification information may be phonetic identification information in which the first identification information is phonetically represented in English, such as 'noseon,' 'noson,' or 'nosun.'
  • the phonetic identification information may be provided through a phonogram DB that stores phonetically represented words, or may be provided to the control object selecting apparatus by processing the first identification information through a phonetic algorithm.
  • the phonogram DB may be included in the control object selecting apparatus, or may provide the phonetic identification information to the control object selecting apparatus by being connected to the control object selecting apparatus via a network.
  • the phonetic algorithm may be independently used, or may be auxiliary used when the phonetic identification information does not exist in the phonogram DB.
  • the phonetic algorithm may be an algorithm in which alphabets are pronounced as it is.
  • the phonetic identification information in which the first identification 'ABC' is phonetically represented in Korean corresponds to ' (ABC).
  • the phonetic algorithm may be an algorithm in which a character corresponding to a pronunciation string is obtained from pronunciation string identification information to be described in FIG. 5 .
  • FIG. 5 illustrates the first identification information obtained in the control object selecting apparatus according to the exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the second identification information may be pronunciation string identification information which is a pronunciation string of the first identification information.
  • the pronunciation string identification information may be obtained by referring to a phonetic sign of the first identification information, and the phonetic sign may correspond to an international phonetic alphabet (IPA).
  • IPA international phonetic alphabet
  • the second identification information may be pronunciation string identification information of the first identification information according to the international phonetic alphabet, and since the pronunciation string identification information is in accordance with the international phonetic alphabet, the second identification information that is represented as only a pronunciation string of the first identification information may be obtained.
  • the control object can be selected through the voice recognition regardless of a language corresponding to the voice of the user.
  • characters corresponding to the pronunciation string in the reference language may be obtained from the pronunciation string identification information, and the obtained characters may mean phonetic identification information in FIG. 5 .
  • the pronunciation string identification information may be provided to the control object selecting apparatus through a pronunciation string DB that stores pronunciation strings of words.
  • the pronunciation string DB may be included in the control object selecting apparatus or may provide the pronunciation string identification information to the control object selecting apparatus by being connected to the control object selecting apparatus via a network.
  • the second identification information may be arbitrary designated by the user.
  • the second identification information may be identification information in which the synonym identification information of the first identification information is translated in the reference language or identification information in which the first identification information is translated in a first language and is then translated in the reference language.
  • the second identification information obtained by processing the first identification information through one or more processes will be described below with reference to FIGS. 6 and 7 .
  • FIG. 6 illustrates first identification information obtained in the control object selecting apparatus according to the exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the first identification information such as ' (the origin of Republic of Korea)' can be obtained based on the control object 161.
  • the synonym identification information which are synonyms of the first identification information corresponds to ' (history of Joseon Dynasty),' ' (origin of Republic of Korea),' and ' (history of Republic of Korea),' as illustrated in FIG. 6 .
  • the second identification information may correspond to (origin of Joseon Dynasty)' in which the first identification information is translated in Korean, ' (history of Joseon Dynasty),' ' (origin of Republic of Korea),' and (history of Republic of Korea)' in which synonym identification information of the first identification information are translated in Korean.
  • FIG. 7 illustrates first identification obtained in the apparatus for selecting a control object according to the exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the second identification information may include translation identification information in which the first identification information is translated in a first reference language or translation identification information in which the translation identification information is translated in a second reference language again.
  • the translation identification information such as 'origin of Joseon Dynasty (Republic of Korea),' 'genesis of Joseon Dynasty (Republic of Korea),' and 'history of Joseon Dynasty (Republic of Korea)' in which the first identification information is translated in the first reference language, for example, English can be obtained.
  • the translation identification information such as 'origin of Joseon Dynasty (Korea, Republic of Korea),' 'genesis of Joseon Dynasty (Korea, Republic of Korea),' and 'history of Joseon Dynasty (Korea, Republic of Korea)' which the translation identification information is translated again in the second language, for example, Korean can be obtained.
  • FIG. 8 illustrates a screen on which the second identification information obtained in FIG. 4 is displayed.
  • control object selecting apparatus 100 may display the second identification information corresponding to the control objects 152, 154, 156 and 158.
  • the second identification information ('route,' 'schedule,' 'route search,' and 'update') may be displayed adjacent to the corresponding to the control objects 152, 154, 156 and 158, or may be displayed in areas where text ('route,' 'schedule,' 'route search,' and 'update' in FIG. 4 ) corresponding to the first identification information or symbols are positioned.
  • the second identification information may be displayed together with the text recognized as the first identification information.
  • the user can know words that can be recognized by the control object selecting apparatus 100 by checking the second identification information displayed on the control object selecting apparatus 100.
  • control object selecting apparatus may output the matched identification information or the second identification information and the first identification information about the control object as voices.
  • a guideline on words that can be recognized by the control object selecting apparatus can be provided to the user, and by outputting the matched identification information as a voice, the user can conveniently select the control object without seeing the screen of the control object selecting apparatus.
  • FIG. 9 illustrates first identification information corresponding to a symbol according to an exemplary embodiment of the present invention and second identification information corresponding to the first identification information.
  • the first identification information may correspond to the symbol obtained based on the control object.
  • control objects corresponds to a 'backward button' 172, a 'forward button' 174, a 'play button' 176.
  • the control selecting apparatus 100 may obtain the symbols (' ⁇ ,' ' ⁇ ,' and ' ⁇ ') on the basis of the control objects 172, 174 and 176, and obtain the first identification information ('backward,' 'forward,' 'play').
  • the symbol can be obtained based on the display information about the control object like the first identification information is obtained based on the display information about the control object.
  • the 'backward button' 172 may be displayed as an image by 'bwd.jpg' of an 'img' item 272B. Further, when image pattern matching or the optical character recognition (OCR) is performed on the “bwd. jpg, " the symbol ' ⁇ ' can be obtained. Similarly, when the image pattern matching or the optical character recognition (OCR) is performed on “play. jpg” and “fwd. jpg, " the symbols ' ⁇ ' and ' ⁇ ' can be obtained.
  • OCR optical character recognition
  • the image pattern matching is a manner in which features are extracted from a target image such as "bwd.jpg,” “play.jpg,” or “fwd. jpg, " and then an image having the same pattern or similar pattern from a comparison group that is previously set or is generated through a heuristic manner or posterior description of the user.
  • the image pattern matching may be performed using template matching, neural network, and hidden Markov model (HMM), but is not limited thereto.
  • the image pattern matching may be performed by various methods.
  • the symbol may be obtained by the control object selecting engine and stored in the memory, but is not limited thereto.
  • the symbol may be obtained by an application being rung and stored in the memory.
  • the symbol obtained based on the control object corresponds to the first identification information.
  • the first identification information corresponding to the symbol will be explained below with reference to FIG. 10 .
  • FIG. 10 illustrates examples of a symbol and first identification information corresponding to the symbol.
  • the symbols ' ⁇ ,' ' ⁇ ' and ' ⁇ '372, 374 and 376 can be obtained as the symbols of the 'backward button' 172 (see FIG. 9 ), the 'forward button' 174 (see FIG. 9 ) and the 'play button' 176 (see FIG. 9 ).
  • the obtained symbols correspond to the first identification information.
  • first identification information 'forward' 472 can be obtained
  • first identification information 'forward' 474 can be obtained
  • first identification information 'play' 476 can be obtained.
  • the second identification information corresponding to the obtained first identification information 472, 474 and 476 for example, the translation identification information of the first identification information can be obtained.
  • the translation identification information such as 'backward,' 'play' and 'forward' into which the first identification information (backward),' ⁇ (play)' and ' (forward)' are translated in English.
  • the second identification information may be the synonym identification information, phonetic identification information and pronunciation string identification information of the first identification information in addition to the translation identification information, as illustrated in FIGS. 3 to 7 .
  • the symbol 300 illustrated in FIG. 10 or the identification information 400 corresponding to the symbol are merely examples, and the kinds and number of the symbols and the identification information corresponding to the symbol may be variously implemented.
  • one symbol corresponds to one identification information, and since meanings of symbols may be different depending on applications, one symbol may correspond to a plurality of identification information having different meanings from each other.
  • the plurality of identification information may be prioritized, and the matched identification information may be determined depending on a priority.
  • one symbol may correspond to the first identification information having different meanings depending on applications.
  • the symbol ' ⁇ ' 376 may correspond to the first identification 'pay' in the media player application, whereas the symbol ' ⁇ ' 376 may correspond to the first identification 'forward' in the web browser or an electronic book application.
  • the symbol may be obtained based on the application screen information.
  • control object When the control object is displayed on the application screen, and also when the optical character recognition is performed on the application screen, information that can be recognized as text or a character sign within the application screen can be obtained.
  • the first identification information corresponding to the text may be determined by the same method as the method of determining the control object corresponding to the symbol.
  • the input information may text itself recognized by further comparing the voice pattern information obtained from the voice of the user with a language model DB.
  • the language model DB may be included in the control object selecting apparatus, or may be connected to the control object selecting apparatus via a network.
  • the matching of the input information to the first identification information may performed by comparing the recognized text with the first identification information itself.
  • the apparatus for selecting a control object through voice recognition may include one or more processing devices, in which the one or more processing devices may be configured to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.
  • the one or more processing devices may be configured to obtain input information on the basis of a voice of a user, to match the input information to at least one first identification information obtained based on a control object and second identification information corresponding to the first identification information, to obtain matched identification information matched to the input information within the first identification information and the second identification information, and to select a control object corresponding to the matched identification information.
  • Combinations of each block of the accompanying block diagram and each step of the flow chart can be implemented by algorithms or computer program instructions comprised of firmware, software, or hardware. Since these algorithms or computer program instructions can be installed in processor of a universal computer, a special computer or other programmable data processing equipment, the instructions executed through a processor of a computer or other programmable data processing equipment generates means for implementing functions described in each block of the block diagram or each step of the flow chart.
  • the algorithms or computer program instructions can be stored in a computer available or computer readable memory capable of orienting a computer or other programmable data processing equipment to implement functions in a specific scheme
  • the instructions stored in the computer available or computer readable memory can produce items involving an instruction means executing functions described in each block of the block diagram or each step of the flow chart.
  • the computer program instructions can be installed in a computer or other programmable data processing equipment, a series of operation steps are carried out in the computer or other programmable data processing equipment to create a process executed by the computer such that instructions implementing the computer or other programmable data processing equipment can provide steps for implementing functions described in functions described in each block of the block diagram or each step of the flow chart.
  • each block or each step may indicate a part of a module, a segment, or a code including one or more executable instructions for implementing specific logical function(s).
  • functions described in blocks or steps can be generated out of the order. For example, two blocks or steps illustrated continuously may be implemented simultaneously, or the blocks or steps may be implemented in reverse order according to corresponding functions.
  • the steps of a method or algorithm described in connection with the embodiments disclosed in the present specification may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two.
  • the software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, register, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Otherwise, the storage medium may be integrated with the processor.
  • the processor and the storage medium may reside in an application-specific integrated circuit (ASIC).
  • the ASIC may reside in a user terminal. Otherwise, the processor and the storage medium may reside as discrete components in a user terminal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)
EP20140160944 2013-09-12 2014-03-20 Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale Withdrawn EP2849054A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR20130109992A KR101474854B1 (ko) 2013-09-12 2013-09-12 음성인식을 통해 컨트롤 객체를 선택하기 위한 장치 및 방법

Publications (1)

Publication Number Publication Date
EP2849054A1 true EP2849054A1 (fr) 2015-03-18

Family

ID=50342222

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20140160944 Withdrawn EP2849054A1 (fr) 2013-09-12 2014-03-20 Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale

Country Status (5)

Country Link
US (1) US20150073801A1 (fr)
EP (1) EP2849054A1 (fr)
KR (1) KR101474854B1 (fr)
CN (1) CN104464720A (fr)
TW (1) TW201510774A (fr)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWD161258S (zh) * 2013-01-03 2014-06-21 宏碁股份有限公司 螢幕之圖形化使用者介面
AU349920S (en) * 2013-01-05 2013-07-29 Samsung Electronics Co Ltd Display screen for an electronic device
AU350153S (en) * 2013-01-09 2013-08-13 Samsung Electronics Co Ltd Display screen for an electronic device
AU350159S (en) * 2013-01-09 2013-08-13 Samsung Electronics Co Ltd Display screen for an electronic device
USD758439S1 (en) * 2013-02-23 2016-06-07 Samsung Electronics Co., Ltd. Display screen or portion thereof with icon
USD744532S1 (en) * 2013-02-23 2015-12-01 Samsung Electronics Co., Ltd. Display screen or portion thereof with icon
USD757775S1 (en) * 2014-01-15 2016-05-31 Yahoo Japan Corporation Portable electronic terminal with graphical user interface
USD757074S1 (en) * 2014-01-15 2016-05-24 Yahoo Japan Corporation Portable electronic terminal with graphical user interface
USD759078S1 (en) * 2014-01-15 2016-06-14 Yahoo Japan Corporation Portable electronic terminal with graphical user interface
USD757774S1 (en) * 2014-01-15 2016-05-31 Yahoo Japan Corporation Portable electronic terminal with graphical user interface
CN104821168B (zh) * 2015-04-30 2017-03-29 北京京东方多媒体科技有限公司 一种语音识别方法及装置
KR102664318B1 (ko) * 2016-11-30 2024-05-09 주식회사 넥슨코리아 음성 기반 제어 장치 및 방법

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030156130A1 (en) * 2002-02-15 2003-08-21 Frankie James Voice-controlled user interfaces
US20080195958A1 (en) * 2007-02-09 2008-08-14 Detiege Patrick J Visual recognition of user interface objects on computer
EP2615607A2 (fr) * 2012-01-11 2013-07-17 Samsung Electronics Co., Ltd Procédé et appareil pour exécuter une fonction d'utilisateur utilisant la reconnaissance vocale

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6952155B2 (en) * 1999-07-23 2005-10-04 Himmelstein Richard B Voice-controlled security system with proximity detector
US20010051942A1 (en) * 2000-06-12 2001-12-13 Paul Toth Information retrieval user interface method
US7539483B2 (en) * 2001-05-02 2009-05-26 Qualcomm Incorporated System and method for entering alphanumeric characters in a wireless communication device
US20030036909A1 (en) * 2001-08-17 2003-02-20 Yoshinaga Kato Methods and devices for operating the multi-function peripherals
JP2005086365A (ja) * 2003-09-05 2005-03-31 Sony Corp 通話装置、会議装置および撮像条件調整方法
DE102005030963B4 (de) * 2005-06-30 2007-07-19 Daimlerchrysler Ag Verfahren und Vorrichtung zur Bestätigung und/oder Korrektur einer einem Spracherkennungssystems zugeführten Spracheingabe
US20090112572A1 (en) * 2007-10-30 2009-04-30 Karl Ola Thorn System and method for input of text to an application operating on a device
CN104347075A (zh) * 2013-08-02 2015-02-11 迪欧泰克有限责任公司 以语音识别来选择控制客体的装置及方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030156130A1 (en) * 2002-02-15 2003-08-21 Frankie James Voice-controlled user interfaces
US20080195958A1 (en) * 2007-02-09 2008-08-14 Detiege Patrick J Visual recognition of user interface objects on computer
EP2615607A2 (fr) * 2012-01-11 2013-07-17 Samsung Electronics Co., Ltd Procédé et appareil pour exécuter une fonction d'utilisateur utilisant la reconnaissance vocale

Also Published As

Publication number Publication date
TW201510774A (zh) 2015-03-16
US20150073801A1 (en) 2015-03-12
CN104464720A (zh) 2015-03-25
KR101474854B1 (ko) 2014-12-19

Similar Documents

Publication Publication Date Title
EP2849054A1 (fr) Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale
US10395654B2 (en) Text normalization based on a data-driven learning network
US20150039318A1 (en) Apparatus and method for selecting control object through voice recognition
US10592601B2 (en) Multilingual word prediction
US10127220B2 (en) Language identification from short strings
TWI437449B (zh) 多重模式輸入方法及輸入方法編輯器系統
US9117445B2 (en) System and method for audibly presenting selected text
US20170357640A1 (en) Multilingual word prediction
US20170263248A1 (en) Dictation that allows editing
JP4829901B2 (ja) マニュアルでエントリされた不確定なテキスト入力を音声入力を使用して確定する方法および装置
JP3962763B2 (ja) 対話支援装置
AU2005229636B2 (en) Generic spelling mnemonics
CN111462740A (zh) 非语音字母语言的话音辅助应用原型测试的话音命令匹配
KR101474856B1 (ko) 음성인식을 통해 이벤트를 발생시키기 위한 장치 및 방법
JP2006048628A (ja) マルチモーダル入力方法
EP3152754B1 (fr) Modification de contenu visuel pour faciliter une meilleure reconnaissance de la parole
JP6983118B2 (ja) 対話システムの制御方法、対話システム及びプログラム
EP2835734A1 (fr) Appareil et procédé permettant de sélectionner un objet de commande par reconnaissance vocale
Zhao et al. Voice and touch based error-tolerant multimodal text editing and correction for smartphones
US11899904B2 (en) Text input system with correction facility
KR20170009486A (ko) 청크 기반 언어 학습용 데이터베이스 구축 방법 및 이를 수행하는 전자 기기
JP5008248B2 (ja) 表示処理装置、表示処理方法、表示処理プログラム、および記録媒体
KR101645420B1 (ko) 터치 기반의 옛 한글 입력이 가능한 터치스크린 장치 및 상기 터치스크린 장치의 터치 기반의 옛 한글 입력 방법
JP2012014517A (ja) 手書き文字認識方法およびシステム
KR20200050609A (ko) 음성 명령 기반의 가상 터치 입력 장치

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20140320

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20150508