WO2015027789A1 - Language control method, device and terminal - Google Patents

Language control method, device and terminal Download PDF

Info

Publication number
WO2015027789A1
WO2015027789A1 PCT/CN2014/083505 CN2014083505W WO2015027789A1 WO 2015027789 A1 WO2015027789 A1 WO 2015027789A1 CN 2014083505 W CN2014083505 W CN 2014083505W WO 2015027789 A1 WO2015027789 A1 WO 2015027789A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
application
attribute information
instruction
action
Prior art date
Application number
PCT/CN2014/083505
Other languages
French (fr)
Chinese (zh)
Inventor
樊艳梅
蒋洪睿
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2015027789A1 publication Critical patent/WO2015027789A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a voice control method, apparatus, and terminal.
  • GUI Graphical User Interface
  • the GUI refers to a computer-operated user interface that is graphically displayed.
  • the graphical running result of the application is presented on the screen, and the intention is explained to the user through visual display, including the display in the graphic. Text, color, components, region division, etc., the user can visually know the operations that can be performed on the graphical interface, and perform corresponding operations by inputting a touch gesture to the screen.
  • the operation originally input by the gesture can be completed by inputting a voice command.
  • the voice assistant set on the existing smart terminal can perform voice operations on various applications that are provided in the terminal, such as telephone, short message, search, schedule, alarm clock and the like.
  • the voice assistant presets the command that each application can receive.
  • the user opens an application, the user enters the conversation scene of the application, and inputs a voice instruction through the dialogue to complete the operation desired by the user.
  • the voice assistant framework in the existing smart terminal is only for the built-in application of the terminal, and after downloading various third-party applications in the terminal, the voice in the terminal cannot be applied.
  • the assistant performs voice operations. It can be seen that the existing voice assistant framework has a limited degree of openness, and it is difficult to meet the requirement that the user installs the application to use the voice interaction at any time, resulting in a low user experience. Summary of the invention
  • a voice control method includes: receiving a voice instruction of a user to a first application; The voice user interface UI resource of the first application is matched, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes voice attribute information and action attribute information of each component of the first application. And context attribute information; performing an operation corresponding to the action instruction on the first application.
  • the voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component, and the action attribute information of the component is triggered by the An operation performed after the component;
  • the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the operation state includes a global state, an application state, or a page state.
  • the method further includes Obtaining a current first running state of the terminal; the matching the voice command with the voice UI resource of the first application, and obtaining an action instruction corresponding to the voice command, including: identifying, by using a voice engine Corresponding to the first text content corresponding to the voice instruction; matching the first running state and the first text content with the voice UI resource to obtain a first operation state and the first text content a first context attribute information and first voice attribute information; obtaining first action attribute information corresponding to the first context attribute information and the first voice attribute information, and using the operation corresponding to the first action attribute information as a context Determining an action instruction corresponding to the voice instruction; or identifying, by the voice engine, the first text content corresponding to the voice instruction; The content is matched with the voice UI resource, and the first voice attribute information and the first context attribute information corresponding to the first text
  • the voice command of the first application includes: receiving a voice instruction that the user turns on the first application; or receiving a voice instruction that the user performs further operations on the first application on the page after the first application is opened.
  • the method before the receiving the voice instruction of the user to the first application, the method further includes: outputting the at least two applications when the application opening voice command of the user receives the at least two applications
  • the receiving the user's voice instruction to the first application includes: receiving, by the user, a voice instruction of the first application selected from the at least two applications according to the option; or
  • the voice command of the application includes: receiving, by the user, an application for opening the voice command to the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice instruction of the application.
  • a voice control apparatus configured to include: a receiving unit, configured to receive a voice instruction of a user to a first application; a matching unit, configured to: receive, by the receiving unit, a voice instruction and the first An application voice
  • the UI resource is matched to obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application; And performing, for the first application, an operation corresponding to the action instruction obtained by the matching unit.
  • the voice attribute information of the component is the text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the voice of the component
  • the running state when the command is in effect, and the running state includes a global state, an application state, or a page state.
  • the device further includes: an obtaining unit, where the receiving unit receives the After the voice command is performed, the current first running state of the terminal is obtained; the matching unit includes: a first instruction identifying subunit, configured to identify, by using a voice engine, a first text content corresponding to the voice instruction; a subunit, configured to match the first running state obtained by the obtaining unit and the first text content recognized by the first instruction identifying subunit with the voice UI resource, to obtain the first running state and a first context attribute information and a first voice attribute information corresponding to the first text content; a first instruction obtaining subunit, configured to obtain first context attribute information and a first voice obtained by matching the first information matching subunit
  • the first action attribute information corresponding to the attribute information, and the operation corresponding to the first action attribute information is used as the motion corresponding to the voice instruction
  • the second instruction identification subunit configured to identify, by the speech engine, the first text content
  • the receiving unit Specifically, it is used to receive a voice instruction that the user starts the first application, or receive a voice instruction that the user performs further operations on the first application on the page after the first application is opened.
  • the device further includes: an output unit, configured to output an option of the at least two applications when the application opening voice instruction of the user receives the at least two applications; the receiving unit, The receiving unit is configured to receive a voice command, where the first application is the at least two applications corresponding to the voice command of the application.
  • the application with the highest priority is preset.
  • a terminal in a third aspect, includes: a microphone, a memory, and a processor, where the memory is used to store a voice engine;
  • the microphone is configured to receive a voice instruction of the user
  • the processor is configured to: after the microphone receives a voice instruction of the first application, the voice instruction and the voice user interface UI of the first application The resource is matched to obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application, and The first application performs an operation corresponding to the action instruction.
  • the voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component, and the action attribute information of the component is triggered by the An operation performed after the component;
  • the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the operation state includes a global state, an application state, or a page state.
  • the processor is further configured to obtain a current first running state of the terminal
  • the processor is specifically configured to identify, by using a voice engine, the first text content corresponding to the voice instruction, and match the first running state and the first text content with the voice UI resource to obtain Determining, by the first operating state, the first context attribute information and the first voice attribute information corresponding to the first text content, obtaining first action attribute information corresponding to the first context attribute information and the first voice attribute information, The operation corresponding to the first action attribute information is used as an action instruction corresponding to the voice instruction; or the first text content corresponding to the voice instruction is recognized by a voice engine; and the first text content and the voice are Matching the UI resources, obtaining first voice attribute information and first context attribute information corresponding to the first text content; when the first transport Row status When the first context attribute information is consistent, the first action attribute information corresponding to the first voice attribute information is obtained, and the operation corresponding to
  • the processor is further configured to output, when the user receives an application of the voice instruction by the microphone, the at least two applications, and the option of outputting the at least two applications; Specifically used to pick up
  • the microphone is specifically configured to receive a voice command for the application of the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice command.
  • the voice instruction when receiving the voice instruction of the first application, the voice instruction is matched with the voice UI resource of the first application, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes the first The voice attribute information, the action attribute information, and the context attribute information of each component of the application, and the operation corresponding to the action instruction is performed on the first application.
  • the embodiment of the invention expands the processing capability of the voice assistant framework in the terminal, because the language of different components is added in each application.
  • the sound attribute information, the action attribute information, and the context attribute information enable the terminal to obtain the voice UI resource of the application after the application is parsed.
  • the corresponding action instruction can be obtained by matching the voice UI resource of the application. This can realize various third-party applications for voice operation, thereby satisfying the requirement that the user can use the voice interaction at any time to install the application, and improve the user experience of the terminal user.
  • FIG. 1 is a flow chart of one embodiment of a voice control method according to the present invention
  • FIG. 2 is a flow chart of another embodiment of a voice control method according to the present invention
  • FIG. 3 is a block diagram of an embodiment of a voice control device according to the present invention
  • FIG. 5 is a block diagram of another embodiment of a voice control device of the present invention
  • FIG. 6 is a block diagram of an embodiment of a terminal of the present invention.
  • Step 101 Receive a voice instruction of a user to a first application.
  • the voice command sent by the user can be obtained by setting the microphone.
  • the voice command can be sent to the terminal.
  • the voice command for the first application may include the voice command of the first application, for example, when the first application is a mail application, the voice command of the user to the first application may be a voice command for opening the mail application; or
  • the voice instruction for the first application may also include a voice instruction for further operation of the first application on the page after the first application is opened, for example, when the first application is a mail application, the voice instruction for the first application It can be a mail order that is forwarded on the mail viewing page after the mail application is opened, or a voice command to reply to the mail.
  • the terminal may be output through the display interface, and the user pair is received from at least The voice command of the first application selected in the two applications.
  • the terminal can output "short message application” and "day chat” With the option of "Apply”, the user can select one of the above options as the first application. If the user selects "Everyday chat application", then the voice command of the daily chat application can be issued.
  • the terminal may select one of the highest-priority applications according to the preset priorities of the at least two applications.
  • the first application For example, an application opening message from a user The voice command is "send text message", and the short message application in the terminal and the installed daily chat application can realize the function of texting, and the priority of "short message application” is higher than "every day chat application", then the terminal is based on The priority is "Short Message Application” as the first application.
  • Step 102 Match the voice instruction with the voice UI resource of the first application, and obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information and action attribute information of each component of the first application. And context attribute information.
  • each application may pre-define a voice user interface (UI) resource, where the voice UI resource may include voice attribute information of each component in the application.
  • UI voice user interface
  • the components of the application may include a LayOut component that launches the application, and various controls after the application is launched, for example, a button, a check box.
  • the voice attribute information of the component is a text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; and the context attribute information of the component is when the voice instruction of the component is in effect.
  • An operational state including a global state, an application state, or a page state.
  • the global state refers to that the voice command of the terminal that receives the component in any running state can take effect;
  • the application state refers to that the terminal can take effect when receiving the voice command of the component during the running process of the currently enabled application;
  • the page state refers to the terminal currently The voice command of the component received under the page of an application will take effect.
  • the terminal may parse the voice UI resource of each application to obtain voice attribute information, action attribute information, and context attribute information of different components in the application. It should be noted that, in the embodiment of the present invention, the terminal may parse the voice UI resource of the application when installing an application, or may parse the voice UI resource of the application when the application is used for the first time.
  • the embodiment of the present invention is not limited thereto.
  • the terminal may save the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component of the parsed application and the component name of each component to the voice engine; wherein, for the same type of component, There can be multiple different component instances, and different component instances of the same type component are distinguished by component names, that is, the component name of each component instance corresponds to the component's voice attribute information (VoiceCommandText), action attribute information (VoiceCommandAction), and Context attribute information (VoiceCommandContext)ong
  • the terminal receives the voice command
  • the first running state of the terminal can be obtained, and the first text content corresponding to the voice instruction is recognized by the voice engine. Then, the first running state and the first text content are obtained.
  • the information, the operation corresponding to the first action attribute information is an action instruction corresponding to the voice command issued by the user; or the first text content may be matched with the voice UI resource of the first application to obtain the The first voice attribute information and the first context attribute information corresponding to the first text content, when the first running state is consistent with the first context attribute information, obtaining the first context attribute information and the first voice
  • the first action attribute information corresponding to the attribute information is an operation instruction corresponding to the first action attribute information as an action instruction corresponding to the voice instruction.
  • Step 103 Perform an operation corresponding to the action instruction on the first application.
  • the embodiment expands the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information, and the context attribute information of different components are added in each application, so that the terminal can analyze the application after Obtain the voice UI resource of the application, and match when receiving the voice command of the application Speech UI resource with the operation corresponding to the command is obtained, in order to operate a variety of voice can be third party applications, so that at any time to meet the users to install the application so that The need for voice interaction increases the end user experience.
  • Step 201 A terminal obtains a voice UI resource of an application.
  • each application may pre-define a voice user interface (UI) resource, where the voice UI resource may include voice attribute information of each component in the application.
  • UI voice user interface
  • the components of the application may include a LayOut component that launches the application, and various controls after the application is launched, for example, a button, a check box.
  • the voice attribute information of the component is a text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; and the context attribute information of the component is when the voice instruction of the component is in effect.
  • An operational state including a global state, an application state, or a page state.
  • the global state refers to that the voice command of the terminal that receives the component in any running state can take effect;
  • the application state refers to that the terminal can take effect when receiving the voice command of the component during the running process of the currently enabled application;
  • the page state refers to the terminal currently The voice command of the component received under the page of an application will take effect.
  • the terminal may parse the voice UI resource of the application, and obtain voice attribute information, action attribute information, and context attribute information of different components in the application, and Saving the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component and the component name of each component to the speech engine; wherein, for the same type of component, there may be multiple different component instances Different component instances of the same type of component are distinguished by component names, that is, the component name of each component instance corresponds to the component's voice attribute information (VoiceCommandText) and action attribute information (VoiceCommandAction). And context attribute information (VoiceCommandContext), for example, for button components, may be divided into "Next Page" button, "Previous Page” button, and so on. For example, for the LayOut component of an email application, its voice attribute information
  • action attribute information can be defined as “VoiceCommandAction”
  • VoiceCommandContext can be defined as "VoiceCommandContext global", that is, the voice command that the terminal receives the open email application in any running state can take effect.
  • voiceCommandContext global the voice command that the terminal receives the open email application in any running state can take effect.
  • voiceCommandContext global the voice command that the terminal receives the open email application in any running state can take effect.
  • Step 202 The terminal receives a voice instruction of the user to the first application.
  • the voice command sent by the user can be obtained by setting the microphone.
  • the voice command can be sent to the terminal.
  • the voice command for the first application may include the voice command of the first application, for example, when the first application is a mail application, the voice command of the user to the first application may be a voice command for opening the mail application;
  • the voice command of the first application may also include a voice instruction for further operation of the first application on the page after the first application is opened, for example, when the mail application is turned on, forwarding the mail on the mail viewing page, or replying to the mail Voice instructions.
  • Step 203 Obtain a current first running state of the terminal.
  • the first application is an email application. Assume that the voice command of the user to open the email is received in step 202, and the voice command may be specifically “open email” or “initiate email”.
  • Step 204 Identify, by the voice engine, the first text content corresponding to the voice instruction.
  • the voice engine recognizes the voice instruction using the semantic recognition mode, and the semantic recognition mode performs fuzzy recognition on the voice instruction, for example, whether the voice command issued by the user is “open email” or “initiate email”
  • the semantic analysis can know that the first text content corresponding to the voice instruction is "open email application”.
  • Step 205 Match the first running state and the first text content with the voice UI resource of the first application, and obtain first context attribute information and first voice attribute information corresponding to the first running state and the first text content.
  • the speech engine saves the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component of the application and the component name of each component, so in this step, when the first run is obtained After the state and the first text content, the first running state and the first text content may be matched in the correspondence, and the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content are obtained.
  • Step 206 Obtain first action attribute information corresponding to the first attribute information and the first voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction.
  • the voice command can be obtained by the voice engine as "Open” under the corresponding component, that is, the voice command issued by the user "opens the email”", or "Start email” corresponding action command is "Open” triggered operation, that is, open the mail application.
  • the first text content may be first matched with the voice UI resource of the first application to obtain The first voice attribute information and the first context attribute information corresponding to the first text content, when the first running state is consistent with the first context attribute information, obtaining the first context attribute information and the first
  • the first action attribute information corresponding to the voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction.
  • the embodiment of the present invention does not limit the matching manner of the voice UI resources.
  • Step 207 Perform an operation corresponding to the action instruction on the first application.
  • the embodiment expands the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information and the context attribute information of different components are added in each application, so that the terminal is parsing the application.
  • the voice UI resource of the application can be obtained.
  • the corresponding action instruction can be obtained by matching the voice UI resource of the application, thereby implementing various third-party applications for voice operation, thereby satisfying the user to install at any time.
  • the need for applications to use voice interaction at any time increases the end user experience.
  • the present invention also provides an embodiment of a voice control apparatus and terminal. Referring to FIG.
  • the apparatus includes: a receiving unit 310, a matching unit 320, and an executing unit 330.
  • the receiving unit 310 is configured to receive a voice instruction of the first application by the user
  • the matching unit 320 is configured to match the voice command received by the receiving unit 310 with the voice UI resource of the first application, to obtain
  • the action instruction corresponding to the voice instruction, the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application
  • the executing unit 330 is configured to:
  • the first application performs an operation corresponding to the action instruction obtained by the matching unit 320.
  • the receiving unit 310 may be specifically configured to receive a voice instruction that the user starts the first application, or receive a voice instruction that is further operated by the user on the first application after the first application is opened.
  • the voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the component.
  • the running state when the voice command is in effect, the running state includes a global state, an application state, or a page state.
  • the receiving unit 310 may be configured to: receive, by the user, an open voice command for the application of the first application, where the first application has the highest preset priority among the at least two applications corresponding to the open voice command of the application Applications
  • FIG. 4 it is a block diagram of another embodiment of voice control according to the present invention.
  • the apparatus includes: a parsing unit 410, a saving unit 420, a receiving unit 430, an obtaining unit 440, a matching unit 450, and an executing unit 460.
  • the parsing unit 410 is configured to obtain the voice attribute information, the action attribute information, and the context attribute information of the different components of the first application by parsing the first application.
  • the saving unit 420 is configured to: use the voice of the first application.
  • the UI resource is saved to the voice engine, and the voice UI resource includes the voice attribute information, the action attribute information, and the context attribute information of each component of the first application obtained by the parsing unit 410.
  • the receiving unit 430 is configured to receive the user. a voice instruction for the first application; an obtaining unit 440, configured to obtain, by the receiving unit 430, the current first running state of the terminal after receiving the voice instruction, and a matching unit 450, configured to obtain the The first operating state obtained by the unit 440 and the voice command received by the receiving unit 430 are matched with the voice UI resource of the first application saved by the saving unit 420 to obtain an action instruction corresponding to the voice command; the executing unit 460 And performing, for the first application, an operation corresponding to the action instruction obtained by the matching unit 450.
  • the matching unit 450 may include: (not shown in FIG. 4): a first instruction identification subunit, configured to identify, by using a voice engine, a first text content corresponding to the voice instruction; a information matching subunit, configured to obtain the first operating state and the first An instruction identifying the first text content recognized by the subunit and the first application saved by the saving unit
  • the UI resource is matched, and the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content are obtained;
  • the first instruction obtaining subunit is configured to obtain the first information Matching the first context attribute information obtained by the subunit and the first action attribute information corresponding to the first voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction;
  • the matching unit 450 may also include (not shown in FIG.
  • a second instruction identification subunit configured to identify, by using a voice engine, a first text content corresponding to the voice instruction
  • the information matching subunit is configured to match the first text content recognized by the second instruction identification subunit with the UI resource of the first application saved by the saving unit, to obtain a first corresponding to the first text content a voice attribute information and first context attribute information
  • a second instruction obtaining subunit configured to: when the first running state and the second information When the first context attribute information obtained by the game unit is consistent, the first action attribute information corresponding to the first context attribute information and the first voice attribute information is obtained, and the operation corresponding to the first action attribute information is used as The action command corresponding to the voice command.
  • the receiving unit 430 may be specifically configured to receive a voice instruction that is used by the user to enable the first application, or receive a further operation performed by the user on the first application after the first application is opened.
  • Voice command The voice attribute information of the component is within the text corresponding to the voice instruction that triggers the component.
  • the action attribute information of the component is an operation performed after the component is triggered;
  • the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the running state includes a global state, an application state, or Page status.
  • FIG. 5 is a block diagram of another embodiment of a voice control apparatus according to the present invention.
  • the apparatus includes: a parsing unit 510, a saving unit 520, an output unit 530, a receiving unit 540, a matching unit 550, and an executing unit 560.
  • the parsing unit 510 is configured to obtain the voice attribute information, the action attribute information, and the context attribute information of the different components of the first application by parsing the first application, and the saving unit 520, configured to: use the voice of the first application
  • the UI resource is saved to the voice engine, and the voice UI resource includes voice attribute information, action attribute information, and context attribute information of each component of the first application obtained by the parsing unit 510.
  • the output unit 530 is configured to receive Receiving an option of the at least two applications when the user's application opens the voice instruction corresponding to the at least two applications; the receiving unit 540 is configured to receive the option pair output by the user according to the output unit 530 from the at least two applications a matching voice command of the first application, the matching unit 550, configured to match the voice command received by the receiving unit 540 with the voice UI resource of the first application saved by the saving unit 520, to obtain a voice command corresponding to the voice command Action instruction The executing unit 560 is configured to perform an operation corresponding to the action instruction obtained by the matching unit 550 on the first application.
  • the receiving unit 540 may be specifically configured to receive a voice instruction that is used by the user to enable the first application, or receive a further operation performed by the user on the first application after the first application is opened.
  • Voice command The voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the component
  • the running state when the voice command is in effect, the running state includes a global state, an application state, or a page state.
  • FIG. 6 a block diagram of an embodiment of a terminal according to the present invention includes: a microphone 610, a memory 620, and a processor 630.
  • the memory 620 is configured to receive a voice command
  • the processor 630 is configured to: after the microphone 610 receives a voice command from the user to the first application, Matching the voice instruction with the voice user interface UI resource of the first application, and obtaining an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes each component of the first application The voice attribute information, the action attribute information, and the context attribute information, and perform an operation corresponding to the action instruction on the first application.
  • the microphone 610 may be specifically configured to receive a voice instruction that is used by a user to enable the first application. Or receiving a voice instruction for further operation of the first application by the user on the page after the first application is opened.
  • the processor 630 may be further configured to obtain a current first running state of the terminal, where the processor 630 may be specifically configured to identify, by using a voice engine, the voice command. a first text content, matching the first running state and the first text content with the voice UI resource to obtain a first context attribute corresponding to the first running state and the first text content And the first voice attribute information, the first action attribute information corresponding to the first context attribute information and the first voice attribute information, and the operation corresponding to the first action attribute information is used as the voice instruction Or the first text content corresponding to the voice instruction is matched by the voice engine; and the first text content is matched with the voice UI resource to obtain a first voice corresponding to the first text content.
  • Attribute information and first context attribute information obtained when the first operational state is consistent with the first context attribute information
  • the processor 630 may be further configured to output, when the application, the user, the application, the voice command, the at least two applications, by the microphone, output an option of the at least two applications;
  • the microphone 610 may be specifically configured to receive a voice instruction of a first application selected by the user from the at least two applications according to the option.
  • the microphone 610 may be specifically configured to receive a voice command for the application of the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice command of the application.
  • the voice attribute information of the component is the text content corresponding to the voice instruction that triggers the component;
  • the action attribute information of the component is an operation performed after the component is triggered;
  • the context attribute information of the component is An operational state that matches the execution of the component, the operational state including a global state, an application state, or a page state.
  • the voice instruction of the first application when the voice instruction of the first application is received by the user, the voice instruction is matched with the voice UI resource of the first application, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes the first
  • the voice attribute information, the action attribute information, and the context attribute information of each component of the application, and the operation corresponding to the action instruction is performed on the first application.
  • the embodiment of the invention extends the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information and the context attribute information of different components are added in each application, so that the terminal can obtain the application after parsing the application.
  • the voice UI resource can obtain corresponding action instructions by matching the voice UI resource of the application when the voice command of the application is received, thereby implementing various third-party applications for voice operation, thereby satisfying the user to install the application and use the voice interaction at any time.
  • the demand has improved the end user experience.
  • the techniques in the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM.
  • a disk, an optical disk, etc. including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.
  • the various embodiments in the present specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments.
  • the description since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the embodiments of the present invention described above are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Disclosed are a language control method, device and terminal. The method comprises: receiving a voice command from a user to a first application; matching the voice command with a voice UI resource of the first application, and obtaining an action command corresponding to the voice command, the voice UI resource of the first application including voice attribute information, action attribute information and context attribute information about each assembly of the first application; and executing an operation corresponding to the action command on the first application. The embodiments of the present invention expand the processing capability of a voice assistant framework in a terminal, and can realize operations of various third-party applications by voice, thereby meeting the requirements of a user of installing an application at any time and using voice to interact at any time, and improving the use experience of a terminal user.

Description

语言控制方法、 装置及终端  Language control method, device and terminal
本申请要求于 2013 年 08 月 26 日提交中国专利局、 申请号为 201310375572.3、 发明名称为 "语言控制方法、 装置及终端"的中国专利申请 的优先权, 其全部内容通过引用包含于本申请中。 The present application claims priority to Chinese Patent Application No. 201310375572.3, entitled "Language Control Method, Apparatus, and Terminal", filed on August 26, 2013, the entire contents of which is hereby incorporated by reference. .
技术领域 Technical field
本发明涉及通信技术领域, 特别涉及语音控制方法、 装置及终端。  The present invention relates to the field of communications technologies, and in particular, to a voice control method, apparatus, and terminal.
背景技术 Background technique
智能终端通常釆用图形用户界面 (Graphical User Interface, GUI ) 向终端 用户输出信息。 GUI是指釆用图形方式显示的计算机操作用户界面, 在现有 GUI架构下, 当启动一个应用时, 该应用的图形运行结果在屏幕上呈现, 通过 视觉向用户解释意图, 包括图形内展示的文字、 颜色、 组件、 区域划分等, 用 户通过视觉获知图形界面上能够实施的操作,并通过向屏幕输入触摸手势实施 相应的操作。 随着机器语音自动识别技术的日趋成熟, 为了简化用户对 GUI的使用, 可 以通过输入语音命令完成原来由手势输入的操作。现有智能终端上设置的语音 助理可以对终端内自带的各种应用进行语音操作, 这些应用包括电话、 短信、 搜索、 日程、 闹钟等。 其中, 语音助理预设了每个应用可以接收的命令, 当用 户开启了某个应用后, 就进入该应用的对话场景中, 通过对话输入语音指令, 完成用户希望的操作。 但是, 发明人在对现有技术的研究过程中发现, 现有智能终端内的语音助 理框架仅针对终端的自带应用, 当终端内下载了各种第三方应用后, 无法应用 终端内的语音助理进行语音操作。 由此可知,现有语音助理框架的开放程度有 限,难以满足用户随时安装应用随时使用语音交互的需求,导致用户体验不高。 发明内容 Intelligent terminals typically use a Graphical User Interface (GUI) to output information to end users. The GUI refers to a computer-operated user interface that is graphically displayed. Under the existing GUI architecture, when an application is launched, the graphical running result of the application is presented on the screen, and the intention is explained to the user through visual display, including the display in the graphic. Text, color, components, region division, etc., the user can visually know the operations that can be performed on the graphical interface, and perform corresponding operations by inputting a touch gesture to the screen. With the maturity of automatic machine voice recognition technology, in order to simplify the user's use of the GUI, the operation originally input by the gesture can be completed by inputting a voice command. The voice assistant set on the existing smart terminal can perform voice operations on various applications that are provided in the terminal, such as telephone, short message, search, schedule, alarm clock and the like. The voice assistant presets the command that each application can receive. When the user opens an application, the user enters the conversation scene of the application, and inputs a voice instruction through the dialogue to complete the operation desired by the user. However, the inventor found in the research process of the prior art that the voice assistant framework in the existing smart terminal is only for the built-in application of the terminal, and after downloading various third-party applications in the terminal, the voice in the terminal cannot be applied. The assistant performs voice operations. It can be seen that the existing voice assistant framework has a limited degree of openness, and it is difficult to meet the requirement that the user installs the application to use the voice interaction at any time, resulting in a low user experience. Summary of the invention
本发明实施例中提供了语音控制方法、 装置及终端, 以解决现有技术无法 随时安装应用随时使用语音交互, 从而导致智能终端的用户体验不高的问题。 为了解决上述技术问题, 本发明实施例公开了如下技术方案: 第一方面, 提供一种语音控制方法, 所述方法包括: 接收用户对第一应用的语音指令; 将所述语音指令与所述第一应用的语音用户接口 UI资源进行匹配,获得与 所述语音指令对应的动作指令, 所述第一应用的语音 UI资源包含所述第一应 用的每个组件的语音属性信息、 动作属性信息和上下文属性信息; 对所述第一应用执行与所述动作指令对应的操作。 结合第一方面, 在第一方面的第一种可能的实现方式中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 结合第一方面, 或第一方面的第一种可能的实现方式, 在第一方面的第二 种可能的实现方式中, 所述接收用户对第一应用的语音指令后, 所述方法还包 括: 获得所述终端当前的第一运行状态; 所述将所述语音指令与所述第一应用的语音 UI资源进行匹配,获得与所述 语音指令对应的动作指令, 包括: 通过语音引擎识别所述语音指令对应的第一文本内容; 将所述第一运行状 态和所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状 态和所述第一文本内容对应的第一上下文属性信息和第一语音属性信息;获得 与所述第一上下文属性信息和第一语音属性信息所对应的第一动作属性信息, 将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令;或 者, 通过语音引擎识别所述语音指令对应的第一文本内容; 将所述第一文本内 容与所述语音 UI资源进行匹配, 获得与所述第一文本内容对应的第一语音属 性信息和第一上下文属性信息;当所述第一运行状态与所述第一上下文属性信 息一致时, 获得与所述第一语音属性信息所对应的第一动作属性信息, 将所述 第一动作属性信息对应的操作作为与所述语音指令对应的动作指令。 结合第一方面, 或第一方面的第一种可能的实现方式, 或第一方面的第二 种可能的实现方式,在第一方面的第三种可能的实现方式中, 所述接收用户对 第一应用的语音指令, 包括: 接收用户开启第一应用的语音指令; 或者, 接收用户在第一应用开启后的页面上对第一应用进行的进一步操作的语音 指令。 结合第一方面, 或第一方面的第一种可能的实现方式, 或第一方面的第二 种可能的实现方式, 或第一方面的第三种可能的实现方式,在第一方面的第四 种可能的实现方式中, 所述接收用户对第一应用的语音指令前, 所述方法还包括: 当接收到用户 的应用开启语音指令对应至少两个应用时,输出所述至少两个应用的选项; 所 述接收用户对所述第一应用的语音指令包括:接收用户根据所述选项对从所述 至少两个应用中选择的第一应用的语音指令; 或者, 所述接收用户对第一应用的语音指令包括: 接收用户对第一应用的应用开 启语音指令,所述第一应用为所述应用开启语音指令对应的至少两个应用中预 设优先级最高的应用。 第二方面, 提供一种语音控制装置, 所述装置包括: 接收单元, 用于接收用户对第一应用的语音指令; 匹配单元, 用于将所述接收单元接收到的语音指令与所述第一应用的语音In the embodiment of the present invention, a voice control method, a device, and a terminal are provided to solve the problem that the prior art cannot install the application to use the voice interaction at any time, thereby causing the user experience of the smart terminal to be not high. In order to solve the above technical problem, the embodiment of the present invention discloses the following technical solution: In a first aspect, a voice control method is provided, where the method includes: receiving a voice instruction of a user to a first application; The voice user interface UI resource of the first application is matched, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes voice attribute information and action attribute information of each component of the first application. And context attribute information; performing an operation corresponding to the action instruction on the first application. With reference to the first aspect, in a first possible implementation manner of the first aspect, the voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component, and the action attribute information of the component is triggered by the An operation performed after the component; the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the operation state includes a global state, an application state, or a page state. With the first aspect, or the first possible implementation manner of the first aspect, in the second possible implementation manner of the first aspect, after the receiving, by the user, the voice instruction of the first application, the method further includes Obtaining a current first running state of the terminal; the matching the voice command with the voice UI resource of the first application, and obtaining an action instruction corresponding to the voice command, including: identifying, by using a voice engine Corresponding to the first text content corresponding to the voice instruction; matching the first running state and the first text content with the voice UI resource to obtain a first operation state and the first text content a first context attribute information and first voice attribute information; obtaining first action attribute information corresponding to the first context attribute information and the first voice attribute information, and using the operation corresponding to the first action attribute information as a context Determining an action instruction corresponding to the voice instruction; or identifying, by the voice engine, the first text content corresponding to the voice instruction; The content is matched with the voice UI resource, and the first voice attribute information and the first context attribute information corresponding to the first text content are obtained; when the first running state is consistent with the first context attribute information Obtaining first action attribute information corresponding to the first voice attribute information, and performing an operation corresponding to the first action attribute information as an action instruction corresponding to the voice instruction. With reference to the first aspect, or the first possible implementation manner of the first aspect, or the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, The voice command of the first application includes: receiving a voice instruction that the user turns on the first application; or receiving a voice instruction that the user performs further operations on the first application on the page after the first application is opened. With reference to the first aspect, or the first possible implementation of the first aspect, or the second possible implementation of the first aspect, or the third possible implementation of the first aspect, in the first aspect In the four possible implementation manners, before the receiving the voice instruction of the user to the first application, the method further includes: outputting the at least two applications when the application opening voice command of the user receives the at least two applications The receiving the user's voice instruction to the first application includes: receiving, by the user, a voice instruction of the first application selected from the at least two applications according to the option; or The voice command of the application includes: receiving, by the user, an application for opening the voice command to the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice instruction of the application. In a second aspect, a voice control apparatus is provided, where the apparatus includes: a receiving unit, configured to receive a voice instruction of a user to a first application; a matching unit, configured to: receive, by the receiving unit, a voice instruction and the first An application voice
UI 资源进行匹配, 获得与所述语音指令对应的动作指令, 所述第一应用的语 音 UI资源包含所述第一应用的每个组件的语音属性信息、 动作属性信息和上 下文属性信息; 执行单元, 用于对所述第一应用执行与所述匹配单元获得的动作指令对应 的操作。 结合第二方面, 在第二方面的第一种可能的实现方式中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 结合第二方面, 或第二方面的第一种可能的实现方式, 在第二方面的第二 种可能的实现方式中, 所述装置还包括: 获得单元, 用于所述接收单元接收到所述语音指令后, 获得所述终端当前 的第一运行状态; 所述匹配单元包括: 第一指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第一信息匹配子单元, 用于将所述获得单元获得的第一运行状态和所 述第一指令识别子单元识别出的第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状态和所述第一文本内容对应的第一上下文属性信息和 第一语音属性信息; 第一指令获得子单元, 用于获得与所述第一信息匹配子单 元获得的第一上下文属性信息和第一语音属性信息所对应的第一动作属性信 息,将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令; 或者, 第二指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第二信息匹配子单元, 用于将所述第二指令识别子单元识别出的第一 文本内容与所述语音 UI资源进行匹配, 获得与所述第一文本内容对应的第一 语音属性信息和第一上下文属性信息; 第二指令获得子单元, 用于当所述第一 运行状态与所述第二信息匹配子单元获得的第一上下文属性信息一致时,获得 与所述第一上下文属性信息和第一语音属性信息所对应的第一动作属性信息, 将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令。 结合第二方面, 或第二方面的第一种可能的实现方式, 或第二方面的第二 种可能的实现方式, 在第二方面的第三种可能的实现方式中, 所述接收单元, 具体用于接收用户开启第一应用的语音指令, 或者,接收用户在第一应用开启 后的页面上对第一应用进行的进一步操作的语音指令。 结合第二方面, 或第二方面的第一种可能的实现方式, 或第二方面的第二 种可能的实现方式, 或第二方面的第三种可能的实现方式,在第二方面的第四 种可能的实现方式中, 所述装置还包括: 输出单元, 用于当接收到用户的应用开启语音指令对应 至少两个应用时, 输出所述至少两个应用的选项; 所述接收单元, 具体用于接 用的语音指令; 或者, 所述接收单元, 具体用于接收用户对第一应用的应用开启语音指令, 所述 第一应用为所述应用开启语音指令对应的至少两个应用中预设优先级最高的 应用。 第三方面, 提供一种终端, 所述终端包括: 麦克风、 存储器和处理器, 其 中, 所述存储器, 用于存储语音引擎; 所述麦克风, 用于接收用户的语音指令; 所述处理器, 用于当所述麦克风接收用户对第一应用的语音指令后, 将所 述语音指令与所述第一应用的语音用户接口 UI资源进行匹配, 获得与所述语 音指令对应的动作指令, 所述第一应用的语音 UI资源包含所述第一应用的每 个组件的语音属性信息、动作属性信息和上下文属性信息, 并对所述第一应用 执行与所述动作指令对应的操作。 结合第三方面, 在第三方面的第一种可能的实现方式中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 结合第三方面, 或第三方面的第一种可能的实现方式, 在第三方面的第二 种可能的实现方式中,所述处理器,还用于获得所述终端当前的第一运行状态; 所述处理器, 具体用于通过语音引擎识别所述语音指令对应的第一文本内 容, 将所述第一运行状态和所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状态和所述第一文本内容对应的第一上下文属性信息和 第一语音属性信息,获得与所述第一上下文属性信息和第一语音属性信息所对 应的第一动作属性信息,将所述第一动作属性信息对应的操作作为与所述语音 指令对应的动作指令; 或者,通过语音引擎识别所述语音指令对应的第一文本 内容; 将所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一文 本内容对应的第一语音属性信息和第一上下文属性信息;当所述第一运行状态 与所述第一上下文属性信息一致时,获得与所述第一语音属性信息所对应的第 一动作属性信息,将所述第一动作属性信息对应的操作作为与所述语音指令对 应的动作指令。 结合第三方面, 或第三方面的第一种可能的实现方式, 或第三方面的第二 种可能的实现方式, 在第三方面的第三种可能的实现方式中, 所述麦克风, 具 体用于接收用户开启第一应用的语音指令, 或者,接收用户在第一应用开启后 的页面上对第一应用进行的进一步操作的语音指令。 结合第三方面, 或第三方面的第一种可能的实现方式, 或第三方面的第二 种可能的实现方式, 或第三方面的第三种可能的实现方式,在第三方面的第四 种可能的实现方式中, 所述处理器, 还用于当通过所述麦克风接收到用户的应用开启语音指令对 应至少两个应用时, 输出所述至少两个应用的选项; 所述麦克风, 具体用于接 The UI resource is matched to obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application; And performing, for the first application, an operation corresponding to the action instruction obtained by the matching unit. In conjunction with the second aspect, in a first possible implementation of the second aspect, The voice attribute information of the component is the text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the voice of the component The running state when the command is in effect, and the running state includes a global state, an application state, or a page state. With reference to the second aspect, or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the device further includes: an obtaining unit, where the receiving unit receives the After the voice command is performed, the current first running state of the terminal is obtained; the matching unit includes: a first instruction identifying subunit, configured to identify, by using a voice engine, a first text content corresponding to the voice instruction; a subunit, configured to match the first running state obtained by the obtaining unit and the first text content recognized by the first instruction identifying subunit with the voice UI resource, to obtain the first running state and a first context attribute information and a first voice attribute information corresponding to the first text content; a first instruction obtaining subunit, configured to obtain first context attribute information and a first voice obtained by matching the first information matching subunit The first action attribute information corresponding to the attribute information, and the operation corresponding to the first action attribute information is used as the motion corresponding to the voice instruction Or the second instruction identification subunit, configured to identify, by the speech engine, the first text content corresponding to the voice instruction; the second information matching subunit, configured to identify the second instruction identification subunit The first text content is matched with the voice UI resource, and the first voice attribute information and the first context attribute information corresponding to the first text content are obtained; the second instruction obtaining subunit is configured to be the first Obtaining the first action attribute information corresponding to the first context attribute information and the first voice attribute information when the running state is consistent with the first context attribute information obtained by the second information matching subunit, and the first The operation corresponding to the action attribute information is an action command corresponding to the voice command. With reference to the second aspect, or the first possible implementation manner of the second aspect, or the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the receiving unit, Specifically, it is used to receive a voice instruction that the user starts the first application, or receive a voice instruction that the user performs further operations on the first application on the page after the first application is opened. With reference to the second aspect, or the first possible implementation of the second aspect, or the second possible implementation of the second aspect, or the third possible implementation of the second aspect, in the second aspect In the four possible implementation manners, the device further includes: an output unit, configured to output an option of the at least two applications when the application opening voice instruction of the user receives the at least two applications; the receiving unit, The receiving unit is configured to receive a voice command, where the first application is the at least two applications corresponding to the voice command of the application. The application with the highest priority is preset. In a third aspect, a terminal is provided, where the terminal includes: a microphone, a memory, and a processor, where the memory is used to store a voice engine; The microphone is configured to receive a voice instruction of the user, where the processor is configured to: after the microphone receives a voice instruction of the first application, the voice instruction and the voice user interface UI of the first application The resource is matched to obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application, and The first application performs an operation corresponding to the action instruction. With reference to the third aspect, in a first possible implementation manner of the third aspect, the voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component, and the action attribute information of the component is triggered by the An operation performed after the component; the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the operation state includes a global state, an application state, or a page state. With reference to the third aspect, or the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the processor is further configured to obtain a current first running state of the terminal The processor is specifically configured to identify, by using a voice engine, the first text content corresponding to the voice instruction, and match the first running state and the first text content with the voice UI resource to obtain Determining, by the first operating state, the first context attribute information and the first voice attribute information corresponding to the first text content, obtaining first action attribute information corresponding to the first context attribute information and the first voice attribute information, The operation corresponding to the first action attribute information is used as an action instruction corresponding to the voice instruction; or the first text content corresponding to the voice instruction is recognized by a voice engine; and the first text content and the voice are Matching the UI resources, obtaining first voice attribute information and first context attribute information corresponding to the first text content; when the first transport Row status When the first context attribute information is consistent, the first action attribute information corresponding to the first voice attribute information is obtained, and the operation corresponding to the first action attribute information is used as an action instruction corresponding to the voice instruction. . With reference to the third aspect, or the first possible implementation manner of the third aspect, or the second possible implementation manner of the third aspect, in a third possible implementation manner of the third aspect, the And receiving a voice instruction for the user to open the first application, or receiving a voice instruction for the user to perform further operations on the first application on the page after the first application is opened. With reference to the third aspect, or the first possible implementation of the third aspect, or the second possible implementation of the third aspect, or the third possible implementation of the third aspect, in the third aspect In the four possible implementation manners, the processor is further configured to output, when the user receives an application of the voice instruction by the microphone, the at least two applications, and the option of outputting the at least two applications; Specifically used to pick up
或者, 所述麦克风, 具体用于接收用户对第一应用的应用开启语音指令, 所述第 一应用为所述应用开启语音指令对应的至少两个应用中预设优先级最高的应 用。 本发明实施例中, 接收用户对第一应用的语音指令时, 将语音指令与第一 应用的语音 UI资源进行匹配, 获得与语音指令对应的动作指令, 第一应用的 语音 UI资源包含第一应用的每个组件的语音属性信息、 动作属性信息和上下 文属性信息, 并对第一应用执行与动作指令对应的操作。 本发明实施例扩展了 终端内的语音助理框架的处理能力,由于在每个应用内增加了对不同组件的语 音属性信息、动作属性信息和上下文属性信息,使得终端在解析应用后可以获 得应用的语音 UI资源, 当接收到应用的语音指令时, 通过匹配应用的语音 UI 资源能够得到对应的动作指令, 以此可以实现语音操作各种第三方应用,从而 可以满足用户随时安装应用随时使用语音交互的需求,提高了终端用户的使用 体验。 Alternatively, the microphone is specifically configured to receive a voice command for the application of the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice command. In the embodiment of the present invention, when receiving the voice instruction of the first application, the voice instruction is matched with the voice UI resource of the first application, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes the first The voice attribute information, the action attribute information, and the context attribute information of each component of the application, and the operation corresponding to the action instruction is performed on the first application. The embodiment of the invention expands the processing capability of the voice assistant framework in the terminal, because the language of different components is added in each application. The sound attribute information, the action attribute information, and the context attribute information enable the terminal to obtain the voice UI resource of the application after the application is parsed. When the voice command of the application is received, the corresponding action instruction can be obtained by matching the voice UI resource of the application. This can realize various third-party applications for voice operation, thereby satisfying the requirement that the user can use the voice interaction at any time to install the application, and improve the user experience of the terminal user.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领 域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图 获得其他的附图。 图 1为本发明语音控制方法的一个实施例流程图; 图 2为本发明语音控制方法的另一个实施例流程图; 图 3为本发明语音控制装置的一个实施例框图; 图 4为本发明语音控制装置的另一个实施例框图; 图 5为本发明语音控制装置的另一个实施例框图; 图 6为本发明终端的实施例框图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it will be apparent to those skilled in the art that In other words, other drawings can be obtained based on these drawings without paying for creative labor. 1 is a flow chart of one embodiment of a voice control method according to the present invention; FIG. 2 is a flow chart of another embodiment of a voice control method according to the present invention; FIG. 3 is a block diagram of an embodiment of a voice control device according to the present invention; A block diagram of another embodiment of a voice control device; FIG. 5 is a block diagram of another embodiment of a voice control device of the present invention; FIG. 6 is a block diagram of an embodiment of a terminal of the present invention.
具体实施方式 detailed description
为了使本技术领域的人员更好地理解本发明实施例中的技术方案, 并使本 发明实施例的上述目的、特征和优点能够更加明显易懂, 下面结合附图对本发 明实施例中技术方案作进一步详细的说明。 参见图 1 , 为本发明语音控制方法的一个实施例流程图: 步骤 101 : 接收用户对第一应用的语音指令。 终端上通常可以通过设置麦克风获得用户发出的语音指令, 本实施例中, 当用户要操作终端内安装的第一应用时, 可以向终端发出语音指令。 其中, 对 第一应用的语音指令可以包括开启该第一应用的语音指令, 例如, 当第一应用 为邮件应用时,用户对第一应用的语音指令可以是开启该邮件应用的语音指令; 或者,对第一应用的语音指令也可以包括在第一应用开启后的页面上对第一应 用进行的进一步操作的语音指令, 例如, 当第一应用为邮件应用时, 对第一应 用的语音指令可以是邮件应用开启后, 在邮件查看页面上进行转发邮件, 或者 回复邮件的语音指令。 本实施例中, 如果终端接收到用户发出的应用开启语音指令, 且该应用开 启语音指令对应至少两个应用时,则可以通过显示界面输出上述至少两个应用 的选项, 并接收用户对从至少两个应用中选择的第一应用的语音指令。 例如, 用户发出的应用开启语音指令为 "发短信", 而终端内的短消息应用, 以及安 装的天天聊应用都可以实现发短信的功能, 则终端可以输出 "短消息应用"和 "天天聊应用" 的选项, 用户可以从上述选项中选择一个应用作为第一应用, 假设用户选择了 "天天聊应用", 则发出天天聊应用的语音指令即可。 或者, 如果终端接收到用户发出的应用开启语音指令, 且该应用开启语音 指令对应至少两个应用时,则终端可以根据预先设置的至少两个应用的优先级, 从中选择一个优先级最高的应用作为第一应用。例如, 用户发出的应用开启语 音指令为 "发短信", 而终端内的短消息应用, 以及安装的天天聊应用都可以 实现发短信的功能, 且 "短消息应用" 的优先级高于 "天天聊应用", 则终端 根据优先级将 "短消息应用" 作为第一应用。 步骤 102: 将语音指令与第一应用的语音 UI资源进行匹配, 获得与语音指 令对应的动作指令, 该第一应用的语音 UI资源包含第一应用的每个组件的语 音属性信息、 动作属性信息和上下文属性信息。 本实施例中,每个应用都可以预先定义语音用户接口( User Interface, UI ) 资源, 该语音 UI 资源可以包括应用中的每个组件的语音属性信息In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, and to enable The above described objects, features and advantages of the embodiments of the present invention will be more apparent and understood. 1 is a flowchart of an embodiment of a voice control method according to the present invention: Step 101: Receive a voice instruction of a user to a first application. The voice command sent by the user can be obtained by setting the microphone. In this embodiment, when the user wants to operate the first application installed in the terminal, the voice command can be sent to the terminal. The voice command for the first application may include the voice command of the first application, for example, when the first application is a mail application, the voice command of the user to the first application may be a voice command for opening the mail application; or The voice instruction for the first application may also include a voice instruction for further operation of the first application on the page after the first application is opened, for example, when the first application is a mail application, the voice instruction for the first application It can be a mail order that is forwarded on the mail viewing page after the mail application is opened, or a voice command to reply to the mail. In this embodiment, if the terminal receives the application-enabled voice command sent by the user, and the application-open voice command corresponds to the at least two applications, the at least two applications may be output through the display interface, and the user pair is received from at least The voice command of the first application selected in the two applications. For example, if the application sends a voice command to "send a text message", and the short message application in the terminal, and the installed daily chat application can implement the function of texting, the terminal can output "short message application" and "day chat" With the option of "Apply", the user can select one of the above options as the first application. If the user selects "Everyday chat application", then the voice command of the daily chat application can be issued. Alternatively, if the terminal receives the application-enabled voice command sent by the user, and the application-open voice command corresponds to the at least two applications, the terminal may select one of the highest-priority applications according to the preset priorities of the at least two applications. As the first application. For example, an application opening message from a user The voice command is "send text message", and the short message application in the terminal and the installed daily chat application can realize the function of texting, and the priority of "short message application" is higher than "every day chat application", then the terminal is based on The priority is "Short Message Application" as the first application. Step 102: Match the voice instruction with the voice UI resource of the first application, and obtain an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes voice attribute information and action attribute information of each component of the first application. And context attribute information. In this embodiment, each application may pre-define a voice user interface (UI) resource, where the voice UI resource may include voice attribute information of each component in the application.
( VoiceCommandText )、 动作属性信息 ( VoiceCommandAction )和上下文属性 信息 ( VoiceCommandContext )。 其中, 应用的组件可以包括启动该应用的 LayOut 组件, 和应用启动后的各种控件, 例如, 按钮 ( Button )、 复选框( VoiceCommandText ), action attribute information ( VoiceCommandAction ), and context attribute information ( VoiceCommandContext ). The components of the application may include a LayOut component that launches the application, and various controls after the application is launched, for example, a button, a check box.
( CheckBox )等。 其中, 组件的语音属性信息为触发所述组件的语音指令对 应的文本内容; 组件的动作属性信息为触发所述组件后执行的操作; 组件的上 下文属性信息为所述组件的语音指令生效时的运行状态,所述运行状态包括全 局状态、 应用状态或页面状态。 其中, 全局状态指终端在任何运行状态下接收 到组件的语音指令都能够生效;应用状态指终端在当前已开启应用的运行过程 中接收到组件的语音指令才能够生效;页面状态指终端在当前某个应用的页面 下接收到组件的语音指令才能生效。 终端可以通过对每个应用的语音 UI资源进行解析,获得该应用中不同组件 的语音属性信息、 动作属性信息和上下文属性信息。 需要说明的是, 本发明实 施例中终端可以在安装某个应用时, 就对该应用的语音 UI资源进行解析, 或 者也可以在首次使用某个应用时, 对该应用的语音 UI资源进行解析, 对此本 发明实施例不进行限制。 终端可以将解析出的应用的每个组件的语音属性信息、 动作属性信息和上 下文属性信息与所述每个组件的组件名称之间的对应关系保存到语音引擎;其 中, 对于同一类型的组件, 可以有多个不同组件实例, 同一类型组件的不同组 件实例之间通过组件名称进行区别,即每个组件实例的组件名称对应了该组件 的语音属性信息( VoiceCommandText )、动作属性信息( VoiceCommandAction ) 和上下文属性信息 ( VoiceCommandContext )„ 当终端接收到语音指令后, 可以获得终端当前的第一运行状态, 通过语音 引擎识别语音指令对应的第一文本内容。 然后,将第一运行状态和第一文本内 容与第一应用的语音 UI资源进行匹配, 获得与第一运行状态和第一文本内容 对应的第一上下文属性信息和第一语音属性信息,并获得与第一上文属性信息 和第一语音属性信息所对应的第一动作属性信息,将该第一动作属性信息对应 的操作作为与用户发出的语音指令对应的动作指令; 或者, 也可以先将所述第 一文本内容与第一应用的语音 UI资源进行匹配, 获得与所述第一文本内容对 应的第一语音属性信息和第一上下文属性信息,当所述第一运行状态与所述第 一上下文属性信息一致时,获得与所述第一上下文属性信息和第一语音属性信 息所对应的第一动作属性信息,将所述第一动作属性信息对应的操作作为与所 述语音指令对应的动作指令。 步骤 103: 对第一应用执行与动作指令对应的操作。 由上述实施例可见, 该实施例扩展了终端内的语音助理框架的处理能力, 由于在每个应用内增加了对不同组件的语音属性信息、动作属性信息和上下文 属性信息, 使得终端在解析应用后可以获得应用的语音 UI资源, 当接收到应 用的语音指令时, 通过匹配应用的语音 UI资源能够得到对应的动作指令, 以 此可以实现语音操作各种第三方应用 ,从而可以满足用户随时安装应用随时使 用语音交互的需求, 提高了终端用户的使用体验。 参见图 2, 为本发明语音控制方法的另一个实施例流程图: 步骤 201 : 终端获得应用的语音 UI资源。 本实施例中,每个应用都可以预先定义语音用户接口( User Interface, UI ) 资源, 该语音 UI 资源可以包括应用中的每个组件的语音属性信息(CheckBox) and so on. The voice attribute information of the component is a text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; and the context attribute information of the component is when the voice instruction of the component is in effect. An operational state, including a global state, an application state, or a page state. The global state refers to that the voice command of the terminal that receives the component in any running state can take effect; the application state refers to that the terminal can take effect when receiving the voice command of the component during the running process of the currently enabled application; the page state refers to the terminal currently The voice command of the component received under the page of an application will take effect. The terminal may parse the voice UI resource of each application to obtain voice attribute information, action attribute information, and context attribute information of different components in the application. It should be noted that, in the embodiment of the present invention, the terminal may parse the voice UI resource of the application when installing an application, or may parse the voice UI resource of the application when the application is used for the first time. The embodiment of the present invention is not limited thereto. The terminal may save the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component of the parsed application and the component name of each component to the voice engine; wherein, for the same type of component, There can be multiple different component instances, and different component instances of the same type component are distinguished by component names, that is, the component name of each component instance corresponds to the component's voice attribute information (VoiceCommandText), action attribute information (VoiceCommandAction), and Context attribute information (VoiceCommandContext) „ After the terminal receives the voice command, the first running state of the terminal can be obtained, and the first text content corresponding to the voice instruction is recognized by the voice engine. Then, the first running state and the first text content are obtained. Matching with the voice UI resource of the first application, obtaining first context attribute information and first voice attribute information corresponding to the first running state and the first text content, and obtaining the first attribute information and the first voice attribute The first action genus corresponding to the information The information, the operation corresponding to the first action attribute information is an action instruction corresponding to the voice command issued by the user; or the first text content may be matched with the voice UI resource of the first application to obtain the The first voice attribute information and the first context attribute information corresponding to the first text content, when the first running state is consistent with the first context attribute information, obtaining the first context attribute information and the first voice The first action attribute information corresponding to the attribute information is an operation instruction corresponding to the first action attribute information as an action instruction corresponding to the voice instruction. Step 103: Perform an operation corresponding to the action instruction on the first application. It can be seen that the embodiment expands the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information, and the context attribute information of different components are added in each application, so that the terminal can analyze the application after Obtain the voice UI resource of the application, and match when receiving the voice command of the application Speech UI resource with the operation corresponding to the command is obtained, in order to operate a variety of voice can be third party applications, so that at any time to meet the users to install the application so that The need for voice interaction increases the end user experience. 2 is a flowchart of another embodiment of a voice control method according to the present invention: Step 201: A terminal obtains a voice UI resource of an application. In this embodiment, each application may pre-define a voice user interface (UI) resource, where the voice UI resource may include voice attribute information of each component in the application.
( VoiceCommandText )、 动作属性信息 ( VoiceCommandAction )和上下文属性 信息 ( VoiceCommandContext )。 其中, 应用的组件可以包括启动该应用的 LayOut 组件, 和应用启动后的各种控件, 例如, 按钮 ( Button )、 复选框( VoiceCommandText ), action attribute information ( VoiceCommandAction ), and context attribute information ( VoiceCommandContext ). The components of the application may include a LayOut component that launches the application, and various controls after the application is launched, for example, a button, a check box.
( CheckBox )等。 其中, 组件的语音属性信息为触发所述组件的语音指令对 应的文本内容; 组件的动作属性信息为触发所述组件后执行的操作; 组件的上 下文属性信息为所述组件的语音指令生效时的运行状态,所述运行状态包括全 局状态、 应用状态或页面状态。 其中, 全局状态指终端在任何运行状态下接收 到组件的语音指令都能够生效;应用状态指终端在当前已开启应用的运行过程 中接收到组件的语音指令才能够生效;页面状态指终端在当前某个应用的页面 下接收到组件的语音指令才能生效。 当终端内安装某个应用时, 或者终端内开启某个应用后, 终端可以对该应 用的语音 UI资源进行解析, 获得该应用中不同组件的语音属性信息、 动作属 性信息和上下文属性信息, 并将每个组件的语音属性信息、动作属性信息和上 下文属性信息与所述每个组件的组件名称之间的对应关系保存到语音引擎;其 中, 对于同一类型的组件, 可以有多个不同组件实例, 同一类型组件的不同组 件实例之间通过组件名称进行区别,即每个组件实例的组件名称对应了该组件 的语音属性信息( VoiceCommandText )、动作属性信息( VoiceCommandAction ) 和上下文属性信息( VoiceCommandContext ), 例如, 对于按钮组件, 可能分为 "下一页" 按钮, "上一页" 按钮等。 例如, 对于电子邮件应用 的 LayOut 组件, 其语音属性信息(CheckBox) and so on. The voice attribute information of the component is a text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; and the context attribute information of the component is when the voice instruction of the component is in effect. An operational state, including a global state, an application state, or a page state. The global state refers to that the voice command of the terminal that receives the component in any running state can take effect; the application state refers to that the terminal can take effect when receiving the voice command of the component during the running process of the currently enabled application; the page state refers to the terminal currently The voice command of the component received under the page of an application will take effect. When an application is installed in the terminal, or an application is enabled in the terminal, the terminal may parse the voice UI resource of the application, and obtain voice attribute information, action attribute information, and context attribute information of different components in the application, and Saving the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component and the component name of each component to the speech engine; wherein, for the same type of component, there may be multiple different component instances Different component instances of the same type of component are distinguished by component names, that is, the component name of each component instance corresponds to the component's voice attribute information (VoiceCommandText) and action attribute information (VoiceCommandAction). And context attribute information (VoiceCommandContext), for example, for button components, may be divided into "Next Page" button, "Previous Page" button, and so on. For example, for the LayOut component of an email application, its voice attribute information
( VoiceCommandText )可以定义为 "VoiceCommandText=启动电子邮件应用 ", 其 动 作 属 性 信 息 ( VoiceCommandAction ) 可 以 定 义 为(VoiceCommandText) can be defined as "VoiceCommandText=Start Email Application", and its action attribute information (VoiceCommandAction) can be defined as
" VoiceCommandAction=Open" , 其上下文属性信息 ( VoiceCommandContext ) 可以定义为 "VoiceCommandContext 全局", 即终端在任何运行状态下接收到 开启电子邮件应用的语音指令都可以生效。 又例如,对于开启电子邮件应用后 的浏览邮件页面上的 "下一页 " Button 组件, 其语音属性信息"VoiceCommandAction=Open" , its context property information ( VoiceCommandContext ) can be defined as "VoiceCommandContext global", that is, the voice command that the terminal receives the open email application in any running state can take effect. For another example, the voice attribute information of the "Next Page" Button component on the browse mail page after the email application is opened.
( VoiceCommandText )可以定义为 " VoiceCommandText:下一页,, , 其动作属 性信息 ( VoiceCommandAction )可以定义为 " VoiceCommandAction=onClick" , 其 上 下 文 属 性 信 息 ( VoiceCommandContext ) 可 以 定 义 为( VoiceCommandText ) can be defined as " VoiceCommandText : next page , , , its action attribute information ( VoiceCommandAction ) can be defined as " VoiceCommandAction = onClick " , its context information ( VoiceCommandContext ) can be defined as
"VoiceCommandContext 页面", 即终端仅在浏览邮件页面时接收到的 "下一 页"语音指令才能生效, 而在例如邮件编辑页面时接收到该 "下一页"指令时 不能生效。 步骤 202: 终端接收用户对第一应用的语音指令。 终端上通常可以通过设置麦克风获得用户发出的语音指令, 本实施例中, 当用户要操作终端内安装的第一应用时, 可以向终端发出语音指令。 其中, 对 第一应用的语音指令可以包括开启该第一应用的语音指令, 例如, 当第一应用 为邮件应用时,用户对第一应用的语音指令可以是开启该邮件应用的语音指令; 对第一应用的语音指令也可以包括在第一应用开启后的页面上对第一应用进 行的进一步操作的语音指令, 例如, 当邮件应用开启后, 在邮件查看页面上进 行转发邮件, 或者回复邮件的语音指令。 步骤 203: 获得终端当前的第一运行状态。 本实施例中, 仍然以第一应用是电子邮件应用为例, 假设步骤 202中接收 到用户开启电子邮件的语音指令, 该语音指令可以具体为 "打开电子邮件", 或者 "启动 email" 等。 此时, 终端获得当前的第一运行状态, 该第一运行状 态指终端当前处于全局状态, 或应用状态, 或某个应用的页面状态。 步骤 204: 通过语音引擎识别语音指令对应的第一文本内容。 本实施例中, 语音引擎对语音指令釆用语义识别方式进行识别, 语义识别 方式对语音指令进行模糊识别, 例如, 无论用户发出的语音指令为 "打开电子 邮件", 或者 "启动 email" , 通过语义分析都可以获知该语音指令对应的第一 文本内容为 "开启电子邮件应用"。 步骤 205: 将第一运行状态和第一文本内容与第一应用的语音 UI资源进行 匹配,获得与第一运行状态和第一文本内容对应的第一上下文属性信息和第一 语音属性信息。 根据步骤 201可知, 语音引擎保存了应用的每个组件的语音属性信息、 动 作属性信息和上下文属性信息与每个组件的组件名称之间的对应关系,因此本 步骤中, 当获得了第一运行状态和第一文本内容后, 可以在对应关系中匹配该 第一运行状态和第一文本内容,获得与第一运行状态和第一文本内容对应的第 一上下文属性信息和第一语音属性信息。 例如, 当语音指令对应的第一文本内容为 "开启电子邮件应用", 当前终端 的第一运行状态为全局状态时, 则语音引擎匹配保存的对应关系获得了 VoiceCommandContext和 VoiceCommandText分另 ll为 "全局,, 和 "开启电子由 P 件应用"。 步骤 206: 获得与第一上文属性信息和第一语音属性信息所对应的第一动 作属性信息 ,将该第一动作属性信息对应的操作作为与语音指令对应的动作指 令。 结合步骤 205 , 当 VoiceCommandContext和 VoiceCommandText分另 ll为 "全 局" 和 "开启电子邮件应用" 后, 通过语音引擎可以获得对应的组件下的 VoiceCommandAction为 "Open" , 即用户发出的语音指令 "打开电子邮件", 或者 "启动 email" 对应的动作指令为 "Open" 触发的操作, 即打开邮件应 用。 需要说明的是,除了上述步骤 205和步骤 206示出的语音 UI资源的匹配方 式外, 在实际应用中, 也可以先将所述第一文本内容与第一应用的语音 UI资 源进行匹配,获得与所述第一文本内容对应的第一语音属性信息和第一上下文 属性信息, 当所述第一运行状态与所述第一上下文属性信息一致时, 获得与所 述第一上下文属性信息和第一语音属性信息所对应的第一动作属性信息,将所 述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令。在实际 应用中, 对于釆用何种语音 UI资源的匹配方式, 本发明实施例不进行限制。 步骤 207: 对第一应用执行与该动作指令对应的操作。 由上述实施例可见, 该实施例扩展了终端内的语音助理框架的处理能力, 由于在每个应用内增加了对不同组件的语音属性信息、动作属性信息和上下文 属性信息, 使得终端在解析应用后可以获得应用的语音 UI资源, 当接收到应 用的语音指令时, 通过匹配应用的语音 UI资源能够得到对应的动作指令, 以 此可以实现语音操作各种第三方应用 ,从而可以满足用户随时安装应用随时使 用语音交互的需求, 提高了终端用户的使用体验。 与本发明语音控制方法的实施例相对应, 本发明还提供了语音控制装置和 终端的实施例。 参见图 3 , 为本发明语音控制装置的一个实施例框图: 该装置包括: 接收单元 310、 匹配单元 320和执行单元 330。 其中, 接收单元 310, 用于接收用户对第一应用的语音指令; 匹配单元 320, 用于将所述接收单元 310接收到的语音指令与所述第一应 用的语音 UI资源进行匹配, 获得与所述语音指令对应的动作指令, 所述第一 应用的语音 UI资源包含所述第一应用的每个组件的语音属性信息、 动作属性 信息和上下文属性信息; 执行单元 330, 用于对所述第一应用执行与所述匹配单元 320获得的动作 指令对应的操作。 可选的, 所述接收单元 310, 可以具体用于接收用户开启第一应用的语音 指令, 或者,接收用户在第一应用开启后的页面上对第一应用进行的进一步操 作的语音指令。 其中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内 容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下 文属性信息为所述组件的语音指令生效时的运行状态,所述运行状态包括全局 状态、 应用状态或页面状态。 可选的, 所述接收单元 310, 可以具体用于接收用户对第一应用的应用开 启语音指令,所述第一应用为所述应用开启语音指令对应的至少两个应用中预 设优先级最高的应用 参见图 4, 为本发明语音控制的另一个实施例框图: 该装置包括: 解析单元 410、保存单元 420、接收单元 430、获得单元 440、 匹配单元 450和执行单元 460。 其中, 解析单元 410, 用于通过解析第一应用, 获得所述第一应用的不同 组件的语音属性信息、 动作属性信息和上下文属性信息; 保存单元 420, 用于将所述第一应用的语音 UI资源保存到语音引擎, 所述 语音 UI资源包含所述解析单元 410获得的所述第一应用的每个组件的语音属 性信息、 动作属性信息和上下文属性信息; 接收单元 430, 用于接收用户对所述第一应用的语音指令; 获得单元 440, 用于所述接收单元 430接收到所述语音指令后, 获得所述 终端当前的第一运行状态; 匹配单元 450, 用于将所述获得单元 440获得的第一运行状态和所述接收 单元 430接收的语音指令与所述保存单元 420保存的第一应用的语音 UI资源 进行匹配, 获得与所述语音指令对应的动作指令; 执行单元 460, 用于对所述第一应用执行与所述匹配单元 450获得的动作 指令对应的操作。 在一个可选的实现方式中, 所述匹配单元 450可以包括(图 4中未示出): 第一指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第一信息匹配子单元, 用于将所述获得单元获得的第一运行状态和所述第 一指令识别子单元识别出的第一文本内容与所述保存单元保存的第一应用的"VoiceCommandContext page", that is, the "next page" voice command received by the terminal only when browsing the mail page can take effect, but cannot be effective when receiving the "next page" command in, for example, the mail editing page. Step 202: The terminal receives a voice instruction of the user to the first application. The voice command sent by the user can be obtained by setting the microphone. In this embodiment, when the user wants to operate the first application installed in the terminal, the voice command can be sent to the terminal. The voice command for the first application may include the voice command of the first application, for example, when the first application is a mail application, the voice command of the user to the first application may be a voice command for opening the mail application; The voice command of the first application may also include a voice instruction for further operation of the first application on the page after the first application is opened, for example, when the mail application is turned on, forwarding the mail on the mail viewing page, or replying to the mail Voice instructions. Step 203: Obtain a current first running state of the terminal. In this embodiment, the first application is an email application. Assume that the voice command of the user to open the email is received in step 202, and the voice command may be specifically “open email” or “initiate email”. At this time, the terminal obtains the current first running state, where the first running state refers to the terminal currently in the global state, or the application state, or the page state of an application. Step 204: Identify, by the voice engine, the first text content corresponding to the voice instruction. In this embodiment, the voice engine recognizes the voice instruction using the semantic recognition mode, and the semantic recognition mode performs fuzzy recognition on the voice instruction, for example, whether the voice command issued by the user is “open email” or “initiate email” The semantic analysis can know that the first text content corresponding to the voice instruction is "open email application". Step 205: Match the first running state and the first text content with the voice UI resource of the first application, and obtain first context attribute information and first voice attribute information corresponding to the first running state and the first text content. According to step 201, the speech engine saves the correspondence between the voice attribute information, the action attribute information, and the context attribute information of each component of the application and the component name of each component, so in this step, when the first run is obtained After the state and the first text content, the first running state and the first text content may be matched in the correspondence, and the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content are obtained. For example, when the first text content corresponding to the voice instruction is "turn on the email application", and the first running state of the current terminal is the global state, the corresponding relationship between the voice engine matching and the saved is obtained by the VoiceCommandContext and the VoiceCommandText. ,, and "Open the electronic application by P". Step 206: Obtain first action attribute information corresponding to the first attribute information and the first voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction. In combination with step 205, when the VoiceCommandContext and the VoiceCommandText are respectively divided into "global" and "open email application", the voice command can be obtained by the voice engine as "Open" under the corresponding component, that is, the voice command issued by the user "opens the email"", or "Start email" corresponding action command is "Open" triggered operation, that is, open the mail application. It should be noted that, in addition to the matching manner of the voice UI resource shown in the foregoing step 205 and the step 206, in the actual application, the first text content may be first matched with the voice UI resource of the first application to obtain The first voice attribute information and the first context attribute information corresponding to the first text content, when the first running state is consistent with the first context attribute information, obtaining the first context attribute information and the first The first action attribute information corresponding to the voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction. In an actual application, the embodiment of the present invention does not limit the matching manner of the voice UI resources. Step 207: Perform an operation corresponding to the action instruction on the first application. It can be seen from the above embodiment that the embodiment expands the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information and the context attribute information of different components are added in each application, so that the terminal is parsing the application. The voice UI resource of the application can be obtained. When the voice command of the application is received, the corresponding action instruction can be obtained by matching the voice UI resource of the application, thereby implementing various third-party applications for voice operation, thereby satisfying the user to install at any time. The need for applications to use voice interaction at any time increases the end user experience. Corresponding to the embodiment of the voice control method of the present invention, the present invention also provides an embodiment of a voice control apparatus and terminal. Referring to FIG. 3, it is a block diagram of an embodiment of a voice control apparatus according to the present invention. The apparatus includes: a receiving unit 310, a matching unit 320, and an executing unit 330. The receiving unit 310 is configured to receive a voice instruction of the first application by the user, and the matching unit 320 is configured to match the voice command received by the receiving unit 310 with the voice UI resource of the first application, to obtain The action instruction corresponding to the voice instruction, the voice UI resource of the first application includes voice attribute information, action attribute information, and context attribute information of each component of the first application; and the executing unit 330 is configured to: The first application performs an operation corresponding to the action instruction obtained by the matching unit 320. Optionally, the receiving unit 310 may be specifically configured to receive a voice instruction that the user starts the first application, or receive a voice instruction that is further operated by the user on the first application after the first application is opened. The voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the component The running state when the voice command is in effect, the running state includes a global state, an application state, or a page state. Optionally, the receiving unit 310 may be configured to: receive, by the user, an open voice command for the application of the first application, where the first application has the highest preset priority among the at least two applications corresponding to the open voice command of the application Applications Referring to FIG. 4, it is a block diagram of another embodiment of voice control according to the present invention. The apparatus includes: a parsing unit 410, a saving unit 420, a receiving unit 430, an obtaining unit 440, a matching unit 450, and an executing unit 460. The parsing unit 410 is configured to obtain the voice attribute information, the action attribute information, and the context attribute information of the different components of the first application by parsing the first application. The saving unit 420 is configured to: use the voice of the first application. The UI resource is saved to the voice engine, and the voice UI resource includes the voice attribute information, the action attribute information, and the context attribute information of each component of the first application obtained by the parsing unit 410. The receiving unit 430 is configured to receive the user. a voice instruction for the first application; an obtaining unit 440, configured to obtain, by the receiving unit 430, the current first running state of the terminal after receiving the voice instruction, and a matching unit 450, configured to obtain the The first operating state obtained by the unit 440 and the voice command received by the receiving unit 430 are matched with the voice UI resource of the first application saved by the saving unit 420 to obtain an action instruction corresponding to the voice command; the executing unit 460 And performing, for the first application, an operation corresponding to the action instruction obtained by the matching unit 450. In an optional implementation manner, the matching unit 450 may include: (not shown in FIG. 4): a first instruction identification subunit, configured to identify, by using a voice engine, a first text content corresponding to the voice instruction; a information matching subunit, configured to obtain the first operating state and the first An instruction identifying the first text content recognized by the subunit and the first application saved by the saving unit
UI 资源进行匹配, 获得与所述第一运行状态和所述第一文本内容对应的第一 上下文属性信息和第一语音属性信息; 第一指令获得子单元, 用于获得与所述第一信息匹配子单元获得的第一上 下文属性信息和第一语音属性信息所对应的第一动作属性信息,将所述第一动 作属性信息对应的操作作为与所述语音指令对应的动作指令; 在另一个可选的实现方式中, 所述匹配单元 450也可以包括(图 4中未示 出 ): 第二指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第二信息匹配子单元, 用于将所述第二指令识别子单元识别出的第一文本 内容与所述保存单元保存的第一应用的 UI资源进行匹配, 获得与所述第一文 本内容对应的第一语音属性信息和第一上下文属性信息; 第二指令获得子单元, 用于当所述第一运行状态与所述第二信息匹配子单 元获得的第一上下文属性信息一致时,获得与所述第一上下文属性信息和第一 语音属性信息所对应的第一动作属性信息,将所述第一动作属性信息对应的操 作作为与所述语音指令对应的动作指令。 可选的, 所述接收单元 430, 可以具体用于接收用户开启所述第一应用的 语音指令, 或者,接收用户在所述第一应用开启后的页面上对第一应用进行的 进一步操作的语音指令。 其中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内 容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下 文属性信息为所述组件的语音指令生效时的运行状态,所述运行状态包括全局 状态、 应用状态或页面状态。 可选的, 所述接收单元 430, 可以具体用于接收用户对第一应用的应用开 启语音指令,所述第一应用为所述应用开启语音指令对应的至少两个应用中预 设优先级最高的应用 参见图 5, 为本发明语音控制装置的另一个实施例框图: 该装置包括: 解析单元 510、保存单元 520、输出单元 530、接收单元 540、 匹配单元 550和执行单元 560。 其中, 解析单元 510, 用于通过解析第一应用, 获得所述第一应用的不同 组件的语音属性信息、 动作属性信息和上下文属性信息; 保存单元 520, 用于将所述第一应用的语音 UI资源保存到语音引擎, 所述 语音 UI资源包含所述解析单元 510获得的所述第一应用的每个组件的语音属 性信息、 动作属性信息和上下文属性信息; 输出单元 530, 用于当接收到用户的应用开启语音指令对应至少两个应用 时, 输出所述至少两个应用的选项; 接收单元 540, 用于接收用户根据所述输出单元 530输出的选项对从所述 至少两个应用中选择的第一应用的语音指令; 匹配单元 550, 用于将所述接收单元 540接收的语音指令与所述保存单元 520保存的第一应用的语音 UI资源进行匹配, 获得与所述语音指令对应的动 作指令; 执行单元 560, 用于对所述第一应用执行与所述匹配单元 550获得的动作 指令对应的操作。 可选的, 所述接收单元 540, 可以具体用于接收用户开启所述第一应用的 语音指令, 或者,接收用户在所述第一应用开启后的页面上对第一应用进行的 进一步操作的语音指令。 其中, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内 容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下 文属性信息为所述组件的语音指令生效时的运行状态,所述运行状态包括全局 状态、 应用状态或页面状态。 参见图 6, 为本发明终端的实施例框图: 该终端包括: 麦克风 610、 存储器 620和处理器 630。 其中, 所述存储器 620, 用于存储语音引擎; 所述麦克风 610, 用于接收用户的语音指令; 所述处理器 630, 用于当所述麦克风 610接收用户对第一应用的语音指令 后, 将所述语音指令与所述第一应用的语音用户接口 UI资源进行匹配, 获得 与所述语音指令对应的动作指令, 所述第一应用的语音 UI资源包含所述第一 应用的每个组件的语音属性信息、动作属性信息和上下文属性信息, 并对所述 第一应用执行与所述动作指令对应的操作。 在一个可选的实现方式中: 所述麦克风 610, 可以具体用于接收用户开启所述第一应用的语音指令, 或者,接收用户在所述第一应用开启后的页面上对第一应用进行的进一步操作 的语音指令。 在另一个可选的实现方式中: 所述处理器 630, 还可以用于获得所述终端当前的第一运行状态; 所述处理器 630, 可以具体用于通过语音引擎识别所述语音指令对应的第 一文本内容, 将所述第一运行状态和所述第一文本内容与所述语音 UI资源进 行匹配,获得与所述第一运行状态和所述第一文本内容对应的第一上下文属性 信息和第一语音属性信息,获得与所述第一上下文属性信息和第一语音属性信 息所对应的第一动作属性信息,将所述第一动作属性信息对应的操作作为与所 述语音指令对应的动作指令; 或者,通过语音引擎识别所述语音指令对应的第 一文本内容; 将所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述 第一文本内容对应的第一语音属性信息和第一上下文属性信息;当所述第一运 行状态与所述第一上下文属性信息一致时,获得与所述第一语音属性信息所对 应的第一动作属性信息,将所述第一动作属性信息对应的操作作为与所述语音 指令对应的动作指令。 在另一个可选的实现方式中: 所述处理器 630, 还可以用于当通过所述麦克风接收到用户的应用开启语 音指令对应至少两个应用时, 输出所述至少两个应用的选项; 所述麦克风 610, 可以具体用于接收用户根据所述选项对从所述至少两个 应用中选择的第一应用的语音指令。 在另一个可选的实现方式中: 所述麦克风 610, 可以具体用于接收用户对第一应用的应用开启语音指令, 所述第一应用为所述应用开启语音指令对应的至少两个应用中预设优先级最 高的应用。 上述实施例中, 所述组件的语音属性信息为触发所述组件的语音指令对应 的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组 件的上下文属性信息为与执行所述组件匹配的运行状态,所述运行状态包括全 局状态、 应用状态或页面状态。 由上述实施例可见, 接收用户对第一应用的语音指令时, 将语音指令与第 一应用的语音 UI资源进行匹配, 获得与语音指令对应的动作指令, 第一应用 的语音 UI资源包含第一应用的每个组件的语音属性信息、 动作属性信息和上 下文属性信息, 并对第一应用执行与动作指令对应的操作。本发明实施例扩展 了终端内的语音助理框架的处理能力,由于在每个应用内增加了对不同组件的 语音属性信息、动作属性信息和上下文属性信息,使得终端在解析应用后可以 获得应用的语音 UI资源, 当接收到应用的语音指令时, 通过匹配应用的语音 UI 资源能够得到对应的动作指令, 以此可以实现语音操作各种第三方应用, 从而可以满足用户随时安装应用随时使用语音交互的需求,提高了终端用户的 使用体验。 本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加 必需的通用硬件平台的方式来实现。基于这样的理解, 本发明实施例中的技术 方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出 来,该计算机软件产品可以存储在存储介质中,如 ROM/RAM、磁碟、光盘等, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网 络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。 本说明书中的各个实施例均釆用递进的方式描述, 各个实施例之间相同相 似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。 尤其, 对于***实施例而言, 由于其基本相似于方法实施例, 所以描述的比较 简单, 相关之处参见方法实施例的部分说明即可。 以上所述的本发明实施方式, 并不构成对本发明保护范围的限定。 任何在 本发明的精神和原则之内所作的修改、等同替换和改进等, 均应包含在本发明 的保护范围之内。 The UI resource is matched, and the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content are obtained; the first instruction obtaining subunit is configured to obtain the first information Matching the first context attribute information obtained by the subunit and the first action attribute information corresponding to the first voice attribute information, and the operation corresponding to the first action attribute information is an action instruction corresponding to the voice instruction; In an optional implementation manner, the matching unit 450 may also include (not shown in FIG. 4): a second instruction identification subunit, configured to identify, by using a voice engine, a first text content corresponding to the voice instruction; The information matching subunit is configured to match the first text content recognized by the second instruction identification subunit with the UI resource of the first application saved by the saving unit, to obtain a first corresponding to the first text content a voice attribute information and first context attribute information; a second instruction obtaining subunit, configured to: when the first running state and the second information When the first context attribute information obtained by the game unit is consistent, the first action attribute information corresponding to the first context attribute information and the first voice attribute information is obtained, and the operation corresponding to the first action attribute information is used as The action command corresponding to the voice command. Optionally, the receiving unit 430 may be specifically configured to receive a voice instruction that is used by the user to enable the first application, or receive a further operation performed by the user on the first application after the first application is opened. Voice command. The voice attribute information of the component is within the text corresponding to the voice instruction that triggers the component. The action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is an operation state when the voice instruction of the component is in effect, and the running state includes a global state, an application state, or Page status. Optionally, the receiving unit 430 may be specifically configured to: receive, by the user, an open voice command for the application of the first application, where the first application has the highest preset priority among the at least two applications corresponding to the open voice command of the application Application FIG. 5 is a block diagram of another embodiment of a voice control apparatus according to the present invention. The apparatus includes: a parsing unit 510, a saving unit 520, an output unit 530, a receiving unit 540, a matching unit 550, and an executing unit 560. The parsing unit 510 is configured to obtain the voice attribute information, the action attribute information, and the context attribute information of the different components of the first application by parsing the first application, and the saving unit 520, configured to: use the voice of the first application The UI resource is saved to the voice engine, and the voice UI resource includes voice attribute information, action attribute information, and context attribute information of each component of the first application obtained by the parsing unit 510. The output unit 530 is configured to receive Receiving an option of the at least two applications when the user's application opens the voice instruction corresponding to the at least two applications; the receiving unit 540 is configured to receive the option pair output by the user according to the output unit 530 from the at least two applications a matching voice command of the first application, the matching unit 550, configured to match the voice command received by the receiving unit 540 with the voice UI resource of the first application saved by the saving unit 520, to obtain a voice command corresponding to the voice command Action instruction The executing unit 560 is configured to perform an operation corresponding to the action instruction obtained by the matching unit 550 on the first application. Optionally, the receiving unit 540 may be specifically configured to receive a voice instruction that is used by the user to enable the first application, or receive a further operation performed by the user on the first application after the first application is opened. Voice command. The voice attribute information of the component is a text content corresponding to a voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is the component The running state when the voice command is in effect, the running state includes a global state, an application state, or a page state. Referring to FIG. 6, a block diagram of an embodiment of a terminal according to the present invention includes: a microphone 610, a memory 620, and a processor 630. The memory 620 is configured to receive a voice command, and the processor 630 is configured to: after the microphone 610 receives a voice command from the user to the first application, Matching the voice instruction with the voice user interface UI resource of the first application, and obtaining an action instruction corresponding to the voice instruction, where the voice UI resource of the first application includes each component of the first application The voice attribute information, the action attribute information, and the context attribute information, and perform an operation corresponding to the action instruction on the first application. In an optional implementation manner, the microphone 610 may be specifically configured to receive a voice instruction that is used by a user to enable the first application. Or receiving a voice instruction for further operation of the first application by the user on the page after the first application is opened. In another optional implementation manner, the processor 630 may be further configured to obtain a current first running state of the terminal, where the processor 630 may be specifically configured to identify, by using a voice engine, the voice command. a first text content, matching the first running state and the first text content with the voice UI resource to obtain a first context attribute corresponding to the first running state and the first text content And the first voice attribute information, the first action attribute information corresponding to the first context attribute information and the first voice attribute information, and the operation corresponding to the first action attribute information is used as the voice instruction Or the first text content corresponding to the voice instruction is matched by the voice engine; and the first text content is matched with the voice UI resource to obtain a first voice corresponding to the first text content. Attribute information and first context attribute information; obtained when the first operational state is consistent with the first context attribute information A first motion attribute information corresponding to voice attribute information, the attribute information corresponding to the first operation as to the operation of the voice instruction corresponding to the operation command. In another optional implementation manner, the processor 630 may be further configured to output, when the application, the user, the application, the voice command, the at least two applications, by the microphone, output an option of the at least two applications; The microphone 610 may be specifically configured to receive a voice instruction of a first application selected by the user from the at least two applications according to the option. In another alternative implementation: The microphone 610 may be specifically configured to receive a voice command for the application of the first application, where the first application is the application with the highest priority among the at least two applications corresponding to the voice command of the application. In the above embodiment, the voice attribute information of the component is the text content corresponding to the voice instruction that triggers the component; the action attribute information of the component is an operation performed after the component is triggered; the context attribute information of the component is An operational state that matches the execution of the component, the operational state including a global state, an application state, or a page state. It can be seen that, when the voice instruction of the first application is received by the user, the voice instruction is matched with the voice UI resource of the first application, and the action instruction corresponding to the voice instruction is obtained, where the voice UI resource of the first application includes the first The voice attribute information, the action attribute information, and the context attribute information of each component of the application, and the operation corresponding to the action instruction is performed on the first application. The embodiment of the invention extends the processing capability of the voice assistant framework in the terminal, and the voice attribute information, the action attribute information and the context attribute information of different components are added in each application, so that the terminal can obtain the application after parsing the application. The voice UI resource can obtain corresponding action instructions by matching the voice UI resource of the application when the voice command of the application is received, thereby implementing various third-party applications for voice operation, thereby satisfying the user to install the application and use the voice interaction at any time. The demand has improved the end user experience. It will be apparent to those skilled in the art that the techniques in the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. , a disk, an optical disk, etc., including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments. The various embodiments in the present specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The embodiments of the present invention described above are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

权 利 要 求 Rights request
1、 一种语音控制方法, 其特征在于, 所述方法包括: 接收用户对第一应用的语音指令; 将所述语音指令与所述第一应用的语音用户接口 UI资源进行匹配,获得与 所述语音指令对应的动作指令, 所述第一应用的语音 UI资源包含所述第一应 用的每个组件的语音属性信息、 动作属性信息和上下文属性信息; 对所述第一应用执行与所述动作指令对应的操作。 1. A voice control method, characterized in that the method includes: receiving a user's voice command for a first application; matching the voice command with the voice user interface UI resource of the first application to obtain the corresponding voice command. Action instructions corresponding to the voice instructions, the voice UI resources of the first application include voice attribute information, action attribute information and context attribute information of each component of the first application; executing the first application and the The operation corresponding to the action instruction.
2、 根据权利要求 1所述的方法, 其特征在于, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 2. The method according to claim 1, characterized in that: the voice attribute information of the component is the text content corresponding to the voice command that triggers the component; the action attribute information of the component is executed after triggering the component Operation; The context attribute information of the component is the running state when the voice command of the component takes effect, and the running state includes global state, application state or page state.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述接收用户对第一应 用的语音指令后, 所述方法还包括: 获得所述终端当前的第一运行状态; 所述将所述语音指令与所述第一应用的语音 UI资源进行匹配,获得与所述 语音指令对应的动作指令, 包括: 通过语音引擎识别所述语音指令对应的第一文本内容; 将所述第一运行状 态和所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状 态和所述第一文本内容对应的第一上下文属性信息和第一语音属性信息;获得 与所述第一上下文属性信息和第一语音属性信息所对应的第一动作属性信息, 将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令;或 者, 通过语音引擎识别所述语音指令对应的第一文本内容; 将所述第一文本内 容与所述语音 UI资源进行匹配, 获得与所述第一文本内容对应的第一语音属 性信息和第一上下文属性信息;当所述第一运行状态与所述第一上下文属性信 息一致时, 获得与所述第一语音属性信息所对应的第一动作属性信息, 将所述 第一动作属性信息对应的操作作为与所述语音指令对应的动作指令。 3. The method according to claim 1 or 2, characterized in that, after receiving the user's voice command for the first application, the method further includes: obtaining the current first operating state of the terminal; Matching the voice command with the voice UI resource of the first application to obtain the action command corresponding to the voice command includes: identifying the first text content corresponding to the voice command through a voice engine; converting the first text content to the voice command. Match the running state and the first text content with the voice UI resource to obtain the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content; obtain the first context attribute information and the first voice attribute information corresponding to the first running state and the first text content; the first action attribute information corresponding to the first context attribute information and the first voice attribute information, Use the operation corresponding to the first action attribute information as the action instruction corresponding to the voice instruction; or, identify the first text content corresponding to the voice instruction through a speech engine; compare the first text content with the voice instruction Match the UI resources to obtain the first voice attribute information and the first context attribute information corresponding to the first text content; when the first running state is consistent with the first context attribute information, obtain the first voice attribute information and the first context attribute information corresponding to the first text content. First action attribute information corresponding to a piece of voice attribute information, and an operation corresponding to the first action attribute information is used as an action instruction corresponding to the voice instruction.
4、 根据权利要求 1至 3任意一项所述的方法, 其特征在于, 所述接收用户 对第一应用的语音指令, 包括: 接收用户开启第一应用的语音指令; 或者, 接收用户在第一应用开启后的页面上对第一应用进行的进一步操作的语音 指令。 4. The method according to any one of claims 1 to 3, wherein the receiving the user's voice instruction for the first application includes: receiving the user's voice instruction for opening the first application; or, receiving the user's voice instruction for opening the first application; or, receiving the user's voice instruction for opening the first application. Voice instructions for further operations on the first application on the page after the application is opened.
5、 根据权利要求 1至 4任意一项所述的方法, 其特征在于, 所述接收用户对第一应用的语音指令前, 所述方法还包括: 当接收到用户 的应用开启语音指令对应至少两个应用时,输出所述至少两个应用的选项; 所 述接收用户对所述第一应用的语音指令包括:接收用户根据所述选项对从所述 至少两个应用中选择的第一应用的语音指令; 或者, 所述接收用户对第一应用的语音指令包括: 接收用户对第一应用的应用开 启语音指令,所述第一应用为所述应用开启语音指令对应的至少两个应用中预 设优先级最高的应用。 5. The method according to any one of claims 1 to 4, characterized in that, before receiving the user's voice instruction for the first application, the method further includes: when receiving the user's application opening voice instruction corresponding to at least When there are two applications, output the options of the at least two applications; receiving the user's voice instructions for the first application includes: receiving the user's instructions for the first application selected from the at least two applications according to the options. or, the receiving the user's voice instruction to the first application includes: receiving the user's application opening voice instruction to the first application, and the first application is one of at least two applications corresponding to the application opening voice instruction. The application with the highest default priority.
6、 一种语音控制装置, 其特征在于, 所述装置包括: 接收单元, 用于接收用户对第一应用的语音指令; 匹配单元, 用于将所述接收单元接收到的语音指令与所述第一应用的语音 UI 资源进行匹配, 获得与所述语音指令对应的动作指令, 所述第一应用的语 音 UI资源包含所述第一应用的每个组件的语音属性信息、 动作属性信息和上 下文属性信息; 执行单元, 用于对所述第一应用执行与所述匹配单元获得的动作指令对应 的操作。 6. A voice control device, characterized in that the device includes: a receiving unit, used to receive the user's voice command for the first application; a matching unit, used to match the voice command received by the receiving unit with the The voice UI resources of the first application are matched to obtain the action instructions corresponding to the voice instructions. The voice UI resources of the first application include voice attribute information, action attribute information and context of each component of the first application. Attribute information; an execution unit, configured to execute an operation corresponding to the action instruction obtained by the matching unit on the first application.
7、 根据权利要求 6所述的装置, 其特征在于, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 7. The device according to claim 6, wherein the voice attribute information of the component is the text content corresponding to the voice command that triggers the component; the action attribute information of the component is executed after triggering the component. Operation; The context attribute information of the component is the running state when the voice command of the component takes effect, and the running state includes global state, application state or page state.
8、 根据权利要求 6或 7所述的装置, 其特征在于, 所述装置还包括: 获得单元, 用于所述接收单元接收到所述语音指令后, 获得所述终端当前 的第一运行状态; 所述匹配单元包括: 第一指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第一信息匹配子单元, 用于将所述获得单元获得的第一运行状态和所 述第一指令识别子单元识别出的第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状态和所述第一文本内容对应的第一上下文属性信息和 第一语音属性信息; 第一指令获得子单元, 用于获得与所述第一信息匹配子单 元获得的第一上下文属性信息和第一语音属性信息所对应的第一动作属性信 息,将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令; 或者, 第二指令识别子单元, 用于通过语音引擎识别所述语音指令对应的第一文 本内容; 第二信息匹配子单元, 用于将所述第二指令识别子单元识别出的第一 文本内容与所述语音 UI资源进行匹配, 获得与所述第一文本内容对应的第一 语音属性信息和第一上下文属性信息; 第二指令获得子单元, 用于当所述第一 运行状态与所述第二信息匹配子单元获得的第一上下文属性信息一致时,获得 与所述第一上下文属性信息和第一语音属性信息所对应的第一动作属性信息, 将所述第一动作属性信息对应的操作作为与所述语音指令对应的动作指令。 8. The device according to claim 6 or 7, characterized in that, the device further includes: an obtaining unit, configured to obtain the current first operating state of the terminal after the receiving unit receives the voice command. ; The matching unit includes: a first instruction recognition subunit, used to identify the first text corresponding to the voice instruction through a speech engine; This content; The first information matching subunit is used to match the first operating status obtained by the obtaining unit and the first text content recognized by the first instruction recognition subunit with the voice UI resource, to obtain the The first context attribute information and the first voice attribute information corresponding to the first operating state and the first text content; a first instruction obtaining subunit, used to obtain the first instruction obtained by the first information matching subunit. The first action attribute information corresponding to the context attribute information and the first voice attribute information uses the operation corresponding to the first action attribute information as the action instruction corresponding to the voice instruction; or, the second instruction identification subunit uses for identifying the first text content corresponding to the voice command through a speech engine; a second information matching subunit for matching the first text content recognized by the second command recognition subunit with the voice UI resource, Obtain the first voice attribute information and the first context attribute information corresponding to the first text content; The second instruction acquisition subunit is used to obtain the first subunit when the first operating state matches the second information. When the context attribute information is consistent, the first action attribute information corresponding to the first context attribute information and the first voice attribute information is obtained, and the operation corresponding to the first action attribute information is used as the operation corresponding to the voice command. Action instructions.
9、 根据权利要求 6至 8任意一项所述的装置, 其特征在于, 所述接收单元, 具体用于接收用户开启第一应用的语音指令, 或者, 接收 用户在第一应用开启后的页面上对第一应用进行的进一步操作的语音指令。 9. The device according to any one of claims 6 to 8, characterized in that the receiving unit is specifically configured to receive the user's voice instruction to open the first application, or to receive the user's page after the first application is opened. voice instructions for further operations on the first application.
10、 根据权利要求 6至 9任意一项所述的装置, 其特征在于, 所述装置还包括: 输出单元, 用于当接收到用户的应用开启语音指令对应 至少两个应用时, 输出所述至少两个应用的选项; 所述接收单元, 具体用于接 用的语音指令; 或者, 所述接收单元, 具体用于接收用户对第一应用的应用开启语音指令, 所述 第一应用为所述应用开启语音指令对应的至少两个应用中预设优先级最高的 应用。 10. The device according to any one of claims 6 to 9, characterized in that, the device further includes: an output unit, configured to output the said At least two application options; The receiving unit is specifically used to receive voice commands; Alternatively, the receiving unit is specifically configured to receive a user's application start voice instruction for a first application, and the first application is the application with the highest preset priority among at least two applications corresponding to the application start voice instruction.
11、 一种终端, 其特征在于, 所述终端包括: 麦克风、 存储器和处理器, 其中, 所述存储器, 用于存储语音引擎; 所述麦克风, 用于接收用户的语音指令; 所述处理器, 用于当所述麦克风接收用户对第一应用的语音指令后, 将所 述语音指令与所述第一应用的语音用户接口 UI资源进行匹配, 获得与所述语 音指令对应的动作指令, 所述第一应用的语音 UI资源包含所述第一应用的每 个组件的语音属性信息、动作属性信息和上下文属性信息, 并对所述第一应用 执行与所述动作指令对应的操作。 11. A terminal, characterized in that the terminal includes: a microphone, a memory and a processor, wherein the memory is used to store a voice engine; the microphone is used to receive the user's voice instructions; the processor , used to match the voice command with the voice user interface UI resource of the first application after the microphone receives the user's voice command for the first application, and obtain the action command corresponding to the voice command, so The voice UI resource of the first application includes voice attribute information, action attribute information and context attribute information of each component of the first application, and performs operations corresponding to the action instructions on the first application.
12、 根据权利要求 11所述的终端, 其特征在于, 所述组件的语音属性信息为触发所述组件的语音指令对应的文本内容; 所述组件的动作属性信息为触发所述组件后执行的操作; 所述组件的上下文属性信息为所述组件的语音指令生效时的运行状态, 所 述运行状态包括全局状态、 应用状态或页面状态。 12. The terminal according to claim 11, wherein the voice attribute information of the component is the text content corresponding to the voice command that triggers the component; the action attribute information of the component is executed after triggering the component. Operation; The context attribute information of the component is the running state when the voice command of the component takes effect, and the running state includes global state, application state or page state.
13、 根据权利要求 11或 12所述的终端, 其特征在于, 所述处理器, 还用于获得所述终端当前的第一运行状态; 所述处理器, 具体用于通过语音引擎识别所述语音指令对应的第一文本内 容, 将所述第一运行状态和所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一运行状态和所述第一文本内容对应的第一上下文属性信息和 第一语音属性信息,获得与所述第一上下文属性信息和第一语音属性信息所对 应的第一动作属性信息,将所述第一动作属性信息对应的操作作为与所述语音 指令对应的动作指令; 或者,通过语音引擎识别所述语音指令对应的第一文本 内容; 将所述第一文本内容与所述语音 UI资源进行匹配, 获得与所述第一文 本内容对应的第一语音属性信息和第一上下文属性信息;当所述第一运行状态 与所述第一上下文属性信息一致时,获得与所述第一语音属性信息所对应的第 一动作属性信息,将所述第一动作属性信息对应的操作作为与所述语音指令对 应的动作指令。 13. The terminal according to claim 11 or 12, characterized in that, The processor is further configured to obtain the current first operating state of the terminal; the processor is specifically configured to identify the first text content corresponding to the voice instruction through a speech engine, and combine the first operating state with The first text content is matched with the voice UI resource, first context attribute information and first voice attribute information corresponding to the first running state and the first text content are obtained, and the first context attribute information and the first voice attribute information corresponding to the first text content are obtained. The first action attribute information corresponding to the context attribute information and the first voice attribute information uses the operation corresponding to the first action attribute information as the action instruction corresponding to the voice instruction; or, recognizes the voice instruction through a speech engine. Corresponding first text content; Match the first text content with the voice UI resource to obtain the first voice attribute information and first context attribute information corresponding to the first text content; when the first When the running state is consistent with the first context attribute information, the first action attribute information corresponding to the first voice attribute information is obtained, and the operation corresponding to the first action attribute information is used as the operation corresponding to the voice instruction. Action instructions.
14、 根据权利要求 11至 13任意一项所述的终端, 其特征在于, 所述麦克风, 具体用于接收用户开启第一应用的语音指令, 或者, 接收用 户在第一应用开启后的页面上对第一应用进行的进一步操作的语音指令。 14. The terminal according to any one of claims 11 to 13, characterized in that the microphone is specifically used to receive the user's voice command to open the first application, or to receive the user's message on the page after the first application is opened. Voice instructions for further operations on the first application.
15、 根据权利要求 11至 14任意一项所述的终端, 其特征在于, 所述处理器, 还用于当通过所述麦克风接收到用户的应用开启语音指令对 应至少两个应用时, 输出所述至少两个应用的选项; 所述麦克风, 具体用于接 15. The terminal according to any one of claims 11 to 14, wherein the processor is further configured to output the user's application opening voice command corresponding to at least two applications through the microphone. options for at least two applications; the microphone is specifically used to receive
或者, 所述麦克风, 具体用于接收用户对第一应用的应用开启语音指令, 所述第 一应用为所述应用开启语音指令对应的至少两个应用中预设优先级最高的应 用。 Or, the microphone is specifically used to receive the user's voice instruction for opening the first application, and the third application An application opens the application with the highest preset priority among at least two applications corresponding to the voice command for the application.
PCT/CN2014/083505 2013-08-26 2014-08-01 Language control method, device and terminal WO2015027789A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2013103755723A CN103442138A (en) 2013-08-26 2013-08-26 Voice control method, device and terminal
CN201310375572.3 2013-08-26

Publications (1)

Publication Number Publication Date
WO2015027789A1 true WO2015027789A1 (en) 2015-03-05

Family

ID=49695800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/083505 WO2015027789A1 (en) 2013-08-26 2014-08-01 Language control method, device and terminal

Country Status (2)

Country Link
CN (1) CN103442138A (en)
WO (1) WO2015027789A1 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103442138A (en) * 2013-08-26 2013-12-11 华为终端有限公司 Voice control method, device and terminal
CN103841243A (en) * 2014-02-28 2014-06-04 深圳市中兴移动通信有限公司 Voice saving method and mobile terminal with voice saving function
US9589567B2 (en) * 2014-06-11 2017-03-07 Honeywell International Inc. Plant control system using voice as a control mechanism
CN104184890A (en) * 2014-08-11 2014-12-03 联想(北京)有限公司 Information processing method and electronic device
CN105575390A (en) * 2014-10-23 2016-05-11 中兴通讯股份有限公司 Voice control method and device
CN104318924A (en) * 2014-11-12 2015-01-28 沈阳美行科技有限公司 Method for realizing voice recognition function
CN106469040B (en) 2015-08-19 2019-06-21 华为终端有限公司 Communication means, server and equipment
CN105161106A (en) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 Voice control method of intelligent terminal, voice control device and television system
CN105487668B (en) * 2015-12-09 2020-06-16 腾讯科技(深圳)有限公司 Display method and device of terminal equipment
CN106023994B (en) * 2016-04-29 2020-04-03 杭州华橙网络科技有限公司 Voice processing method, device and system
CN107452383B (en) * 2016-05-31 2021-10-26 华为终端有限公司 Information processing method, server, terminal and information processing system
US10282218B2 (en) * 2016-06-07 2019-05-07 Google Llc Nondeterministic task initiation by a personal assistant module
CN106098063B (en) * 2016-07-01 2020-05-22 海信集团有限公司 Voice control method, terminal device and server
CN106448668A (en) * 2016-10-10 2017-02-22 山东浪潮商用***有限公司 Method for speech recognition and devices
CN107507614B (en) * 2017-07-28 2018-12-21 北京小蓦机器人技术有限公司 Method, equipment, system and the storage medium of natural language instructions are executed in conjunction with UI
CN107610700A (en) * 2017-09-07 2018-01-19 唐冬香 A kind of terminal control method and system based on MEMS microphone
CN108470566B (en) * 2018-03-08 2020-09-15 腾讯科技(深圳)有限公司 Application operation method and device
CN109741737B (en) * 2018-05-14 2020-07-21 北京字节跳动网络技术有限公司 Voice control method and device
CN110534110B (en) * 2018-05-25 2022-04-15 深圳市优必选科技有限公司 Robot and method, device and circuit for improving voice interaction recognition rate of robot
CN110691160A (en) * 2018-07-04 2020-01-14 青岛海信移动通信技术股份有限公司 Voice control method and device and mobile phone
CN109086028A (en) * 2018-07-27 2018-12-25 重庆柚瓣家科技有限公司 Voice UI and its implementation
CN109584879B (en) 2018-11-23 2021-07-06 华为技术有限公司 Voice control method and electronic equipment
JP7229906B2 (en) * 2019-12-06 2023-02-28 Tvs Regza株式会社 Command controller, control method and control program
CN111292742A (en) * 2020-01-14 2020-06-16 京东数字科技控股有限公司 Data processing method and device, electronic equipment and computer storage medium
WO2021195897A1 (en) * 2020-03-30 2021-10-07 华为技术有限公司 Voice control method and smart terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541574A (en) * 2010-12-13 2012-07-04 鸿富锦精密工业(深圳)有限公司 Application program opening system and method
CN102622085A (en) * 2012-04-11 2012-08-01 北京航空航天大学 Multidimensional sense man-machine interaction system and method
CN102868827A (en) * 2012-09-15 2013-01-09 潘天华 Method of using voice commands to control start of mobile phone applications
CN103442138A (en) * 2013-08-26 2013-12-11 华为终端有限公司 Voice control method, device and terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101262700B1 (en) * 2011-08-05 2013-05-08 삼성전자주식회사 Method for Controlling Electronic Apparatus based on Voice Recognition and Motion Recognition, and Electric Apparatus thereof
US9256396B2 (en) * 2011-10-10 2016-02-09 Microsoft Technology Licensing, Llc Speech recognition for context switching
CN102520788B (en) * 2011-11-16 2015-01-21 歌尔声学股份有限公司 Voice identification control method
CN102830915A (en) * 2012-08-02 2012-12-19 聚熵信息技术(上海)有限公司 Semanteme input control system and method
CN103200329A (en) * 2013-04-10 2013-07-10 威盛电子股份有限公司 Voice control method, mobile terminal device and voice control system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541574A (en) * 2010-12-13 2012-07-04 鸿富锦精密工业(深圳)有限公司 Application program opening system and method
CN102622085A (en) * 2012-04-11 2012-08-01 北京航空航天大学 Multidimensional sense man-machine interaction system and method
CN102868827A (en) * 2012-09-15 2013-01-09 潘天华 Method of using voice commands to control start of mobile phone applications
CN103442138A (en) * 2013-08-26 2013-12-11 华为终端有限公司 Voice control method, device and terminal

Also Published As

Publication number Publication date
CN103442138A (en) 2013-12-11

Similar Documents

Publication Publication Date Title
WO2015027789A1 (en) Language control method, device and terminal
US9666190B2 (en) Speech recognition using loosely coupled components
US11676605B2 (en) Method, interaction device, server, and system for speech recognition
KR102334950B1 (en) Method and apparatus for managing background application
US9594496B2 (en) Method and apparatus for playing IM message
JP2020009459A (en) Voice control of interactive whiteboard appliances
WO2019154153A1 (en) Message processing method, unread message display method and computer terminal
KR102220945B1 (en) Apparatus and method for displaying an related contents information related the opponent party in terminal
KR20130112885A (en) Methods and apparatus for providing input to a speech-enabled application program
WO2016180260A1 (en) Method and apparatus for displaying instant messaging window and computer readable medium
CN102984050A (en) Method, client and system for searching voices in instant messaging
CN108933946B (en) Voice-control-based live broadcasting attention method, storage medium, electronic equipment and system
WO2020228033A1 (en) Sdk plug-in loading method and apparatus, and mobile terminal and storage medium
CN113094143A (en) Cross-application message sending method and device, electronic equipment and readable storage medium
CN112767936A (en) Voice conversation method, device, storage medium and electronic equipment
WO2021134237A1 (en) Video recording method and apparatus, and computer-readable storage medium
WO2017219446A1 (en) Icon processing method and device
WO2017032146A1 (en) File sharing method and apparatus
WO2018058895A1 (en) Terminal control method and apparatus based on rcs message
KR20150088532A (en) Apparatus for providing service during call and method for using the apparatus
WO2022213943A1 (en) Message sending method, message sending apparatus, electronic device, and storage medium
WO2016188227A1 (en) Intelligent terminal shortcut establishment method and device
WO2020192245A1 (en) Application starting method and apparatus, and computer system and medium
US20120203538A1 (en) Techniques for announcing conference attendance changes in multiple languages
CN103941961A (en) Prompting method, device and facility for application updating

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14839534

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14839534

Country of ref document: EP

Kind code of ref document: A1