CN112151028A - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN112151028A
CN112151028A CN202010906497.9A CN202010906497A CN112151028A CN 112151028 A CN112151028 A CN 112151028A CN 202010906497 A CN202010906497 A CN 202010906497A CN 112151028 A CN112151028 A CN 112151028A
Authority
CN
China
Prior art keywords
voice
module
state information
application
application state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010906497.9A
Other languages
Chinese (zh)
Inventor
唐明明
徐龙生
郑津杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruying Intelligent Technology Co ltd
Original Assignee
Beijing Ruying Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruying Intelligent Technology Co ltd filed Critical Beijing Ruying Intelligent Technology Co ltd
Priority to CN202010906497.9A priority Critical patent/CN112151028A/en
Publication of CN112151028A publication Critical patent/CN112151028A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice recognition method and a voice recognition device, which are used for processing voice commands more quickly under specific conditions. The method comprises the following steps: acquiring application state information about an application module when the application module is in a dormant state; receiving input voice when the application module is in a dormant state; and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.

Description

Voice recognition method and device
Technical Field
The present invention relates to the field of computer and communication technologies, and in particular, to a method and an apparatus for speech recognition.
Background
Speech recognition and control techniques are the core foundation for artificial intelligence. Many smart devices have been supporting voice control. The user speaks the awakening word to awaken the intelligent device, which is the first step of controlling the intelligent device by voice. However, in some special scenarios, the user may be anxious, speaking the wake-up word first, then speaking the voice command, and may delay some time.
Disclosure of Invention
The invention provides a voice recognition method and a voice recognition device, which are used for processing voice commands more quickly under specific conditions.
The invention provides a voice recognition method, which is applied to a voice processing front end and comprises the following steps:
acquiring application state information about an application module when the application module is in a dormant state;
receiving input voice when the application module is in a dormant state;
and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment realizes that the voice command can be directly processed without the need of the wake-up word in the specific state of the application. And the method is favorable for responding to and processing the current state of the application more quickly.
Optionally, the method further includes:
receiving a control command sent by a voice processing rear end;
activating the application module;
and sending the control command to the application module.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: the embodiment can continuously realize the activation of the application module and the transmission of the control command, does not need the triggering of the awakening word, and is beneficial to quickly responding and processing the current state of the application.
Optionally, the method further includes:
and when the application state information does not belong to the preset application state information, sending the voice as a wake-up word to a voice processing rear end for voice recognition.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: in this embodiment, when the application state information does not belong to the preset application state information, the voice may be processed in an awake mode.
Optionally, the method further includes:
and sending the user identification of the application module to the voice processing back end.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: in this embodiment, the application state is bound to the user identifier, so that the processing result of the voice more meets the requirements of the user.
Optionally, the application module is located in an external intelligent terminal;
the acquiring application state information about the application module includes:
and receiving application state information of the application module sent by an external intelligent terminal.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects: in this embodiment, the voice processing front end may process not only the voice for the local application but also the voice for the application of the external associated device.
The invention provides a voice recognition method, which is applied to a voice processing back end and comprises the following steps:
receiving application state information and voice sent by a voice processing front end;
matching the voice with a preset voice command corresponding to the application state information;
and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
optionally, the voice includes a wake-up word and a command word;
the method further comprises the following steps:
analyzing a wake-up word and a command word from the voice;
deleting the awakening word;
the matching the voice with a preset voice command corresponding to the application state information includes:
and matching the command word with a preset voice command corresponding to the application state information.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
optionally, the method further includes:
receiving voice sent by a voice processing front end;
judging whether the application state information is in the validity period;
when the application state information is in the valid period, matching the voice with a preset voice command corresponding to the application state information;
and when the voice is not in the valid period, processing the voice as normal voice.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
optionally, the method further includes:
receiving a user identification of the application module sent by a voice processing front end;
and calling a scene context corresponding to the user identification, wherein the scene context comprises the corresponding relation between the application state information and the voice command.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the invention provides a voice recognition device, which is applied to a voice processing front end and comprises:
the device comprises an acquisition module, a judging module and a judging module, wherein the acquisition module is used for acquiring application state information related to an application module when the application module is in a dormant state;
the first receiving module is used for receiving input voice when the application module is in a dormant state;
and the first sending module is used for sending the voice as a voice command and the application state information to a voice processing rear end for voice recognition when the application state information belongs to preset application state information.
Optionally, the apparatus further comprises:
the second receiving module is used for receiving a control command sent by the voice processing back end;
an activation module for activating the application module;
and the second sending module is used for sending the control command to the application module.
Optionally, the apparatus further comprises:
and the third sending module is used for sending the voice as a wake-up word to a voice processing rear end for voice recognition when the application state information does not belong to the preset application state information.
Optionally, the apparatus further comprises:
and the fourth sending module is used for sending the user identification of the application module to the voice processing back end.
Optionally, the application module is located in an external intelligent terminal;
the acquisition module includes:
and the receiving submodule is used for receiving the application state information of the application module sent by an external intelligent terminal.
The invention provides a speech recognition device, which is applied to a speech processing back end and comprises:
the first receiving module is used for receiving application state information and voice sent by the voice processing front end;
the first matching module is used for matching the voice with a preset voice command corresponding to the application state information;
and the sending module is used for sending the control command corresponding to the voice command which is matched with the voice command to the voice processing front end when the voice command is matched with the voice command.
Optionally, the voice includes a wake-up word and a command word;
the device further comprises:
the analysis module is used for analyzing the awakening words and the command words from the voice;
a deleting module for deleting the awakening word;
the first matching module includes:
and the matching sub-module is used for matching the command word with a preset voice command corresponding to the application state information.
Optionally, the apparatus further comprises:
the second receiving module is used for receiving the voice sent by the voice processing front end;
the judging module is used for judging whether the application state information is in the validity period or not;
the second matching module is used for matching the voice with a preset voice command corresponding to the application state information when the validity period is up;
and the third matching module is used for processing the voice as the normal voice when the voice is not in the validity period.
Optionally, the apparatus further comprises:
the third receiving module is used for receiving the user identification of the application module sent by the voice processing front end;
and the calling module is used for calling the scene context corresponding to the user identification, and the scene context comprises the corresponding relation between the application state information and the voice command.
The invention provides a voice recognition device, which is applied to a voice processing front end and comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring application state information about an application module when the application module is in a dormant state;
receiving input voice when the application module is in a dormant state;
and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
The invention provides a voice recognition device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
receiving application state information and voice sent by a voice processing front end;
matching the voice with a preset voice command corresponding to the application state information;
and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
The present invention provides a computer readable storage medium having stored thereon computer instructions, wherein the instructions, when executed by a processor, perform the steps of a method of a speech processing front-end.
The present invention provides a computer readable storage medium having stored thereon computer instructions, wherein the instructions, when executed by a processor, perform the steps of a method for a speech processing backend.
The invention provides a system for speech recognition, comprising: means for a speech processing front-end, and means for a speech processing back-end.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method of speech recognition according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of speech recognition according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method of speech recognition according to an embodiment of the present invention;
FIG. 4 is a flow chart of a method of speech recognition according to an embodiment of the present invention;
FIG. 5 is a flow chart of a method of speech recognition according to an embodiment of the present invention;
FIG. 6 is a block diagram of an apparatus for speech recognition according to an embodiment of the present invention;
FIG. 7 is a block diagram of an apparatus for speech recognition according to an embodiment of the present invention;
FIG. 8 is a block diagram of an apparatus for speech recognition in an embodiment of the present invention;
FIG. 9 is a block diagram of an apparatus for speech recognition according to an embodiment of the present invention;
FIG. 10 is a block diagram of an acquisition module in an embodiment of the invention;
FIG. 11 is a block diagram of an apparatus for speech recognition in an embodiment of the present invention;
FIG. 12 is a block diagram of an apparatus for speech recognition according to an embodiment of the present invention;
FIG. 13 is a block diagram of a first matching module in an embodiment of the invention;
FIG. 14 is a block diagram of an apparatus for speech recognition in an embodiment of the present invention;
fig. 15 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In the related art, a user speaks a wake-up word to wake up an intelligent device, which is the first step of controlling the intelligent device by voice. However, in some special scenarios, the user may be anxious, speaking the wake-up word first, then speaking the voice command, and may delay some time.
In order to solve the above problem, in some specific states of the application, the embodiment may directly process the voice as a voice command without a wakeup word. The method and the device are favorable for quickly responding to the current state of the application and carrying out corresponding processing.
Referring to fig. 1, the method for speech recognition in this embodiment includes:
step 101: and acquiring application state information related to the application module when the application module is in a dormant state.
Step 102: and receiving input voice when the application module is in a dormant state.
Step 103: and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
In this embodiment, when the application module is in the activated state, the application state information about the application module may be acquired, or the application state information may not be acquired. And sending the received voice as a voice command to a voice processing rear end for voice recognition. At this time, the current application state information may or may not be transmitted.
This embodiment may be implemented by a speech processing front-end for receiving speech. The voice processing front end and the voice processing rear end can be located in the same intelligent device, and offline voice recognition is achieved. Or the voice processing front end is located at the intelligent terminal, and the voice processing rear end is located at the cloud server, so that online voice recognition is achieved. The intelligent equipment and the intelligent terminal can be equipment such as a central control device and an intelligent sound box in a home.
The voice processing front end and the application module can be located in the same intelligent device, and when a certain function of the application module is triggered, the application module sends the current application state information to the voice processing front end through the operating system. The speech processing front end and the application module can be located in different intelligent devices, for example, the speech processing front end is a central control device in a home, and the application module is an intelligent device such as an alarm clock, an entrance guard and a sound box. When a certain function of the application module is triggered, the application module sends the current application state information to the voice processing front end through the network.
The application state information is the state of some functions in the application module that make it undesirable for the user to wait for an extended period of time, and may be preconfigured. For example, when the application module is an alarm clock, the application state information is state information when the alarm clock rings. And if the application module is the access control module, the application state information is the state information of the access control module when the access control module receives the door opening request. The application state information may include an application identification and a state identification.
In this embodiment, the wake-up word and the voice command are distinguished. The wake-up word is used to wake up the application module in the dormant state. The wake up word typically includes identification information of the application module. After receiving the wake-up word, the corresponding application module is activated, i.e. the application module is switched from the sleep state to the active state. The voice command is valid when the application module is in an active state. And converting the voice command into a corresponding control command, sending the control command to the application module, and processing the control command by the application module. For example, the application module is an alarm, the current state is an alarm ringing, the voice command is e.g. "turn off alarm", and the corresponding control command is turn off ringing.
In this embodiment, when the application state information is received and the application state information conforms to the preset application state information, the received voice may be processed as a voice command, instead of being processed as a wakeup word. The speech may not include identification information of the application module. For example, the application module is an alarm, the current state is alarm ringing, the voice command is "close", and a corresponding control command is determined according to the application state information, and the control command is to close the alarm ringing. No control commands are obtained with respect to other application modules, such as not turning off the music playback (not turning off the music player). And, because the corresponding control command is determined according to the application state information, the interference of the environmental sound can be reduced, and the recognition of the voice and the obtaining of the control command are more accurate.
Optionally, the method further includes: step a 1-step A3.
Step A1: and receiving a control command sent by the voice processing back end.
Step A2: activating the application module.
Step A3: and sending the control command to the application module.
In this embodiment, when the voice control front end receives the control command, it performs two processes, one is to activate the application module, and the other is to send the control command. Therefore, the triggering process of the wake-up word is omitted, so that the control command has two functions.
The voice control front end can determine the application module to be activated according to the previously received application state information. The application state information has a certain validity period in the voice control front end, the validity period can be a preset time length, or the validity period can be the time length when the next application state information is received, and the validity period can be ended when the application state information and the next application state information arrive first.
Or the control command contains identification information of the application module, and the voice control front end can determine the application module to be activated according to the control command.
Optionally, the method further includes: step B1.
Step B1: and when the application state information does not belong to the preset application state information, sending the voice as a wake-up word to a voice processing rear end for voice recognition.
In this embodiment, when the application state information does not belong to the preset application state information, it may be determined that the current application state is not a relatively urgent state, and the current application state may be processed in a manner of waking up first and then commanding. Therefore, the speech received at this time is processed as a wake-up word. And, when the voice is sent to the voice processing back end as a wake-up word for voice recognition, the application state information may not be sent. Compatible with traditional voice processing mode.
Optionally, the method further includes: step C1.
Step C1: and sending the user identification of the application module to the voice processing back end.
In this embodiment, a user identifier may also be sent to the speech processing back end, where the user identifier corresponds to the application module and may be regarded as an account identifier of the application module. The voice processing back end can correspond to a plurality of voice processing front ends, and the voice processing front ends can correspond to a plurality of application modules. The speech processing back end can know which user's application module through the user identification. According to the user identification and the application state information, the application module and the corresponding control command corresponding to the user can be determined. The control command determined in this way is more in line with the needs of the user.
Step C1 may be performed in synchronization with step 103, i.e., the user identification, voice command, and application state information may be sent in a message to the speech processing backend.
Optionally, the application module is located in an external intelligent terminal.
The step 101 comprises: step D1.
Step D1: and receiving application state information of the application module sent by an external intelligent terminal.
In this embodiment, the speech processing front end and the application module may be located in different intelligent devices, for example, the speech processing front end is a central control device in a home, and the application module is an intelligent device such as an alarm clock, a door control, a sound box, and the like. When a certain function of the application module is triggered, the application module sends the current application state information to the voice processing front end through the network.
The implementation process is described in detail by the following embodiments.
Referring to fig. 2, the method for speech recognition in this embodiment includes:
step 201: and acquiring application state information related to the application module when the application module is in a dormant state.
Step 202: and receiving input voice when the application module is in a dormant state.
Step 203: and when the application state information belongs to preset application state information, the voice is used as a voice command, and the voice command, the application state information and the user identification of the application module are sent to a voice processing rear end for voice recognition.
And when the application state information does not belong to the preset application state information, sending the voice as a wake-up word to a voice processing rear end for voice recognition.
Step 204: and receiving a control command sent by the voice processing back end.
Step 205: activating the application module.
Step 206: and sending the control command to the application module.
Referring to fig. 3, the method for speech recognition in this embodiment includes:
step 301: and receiving the application state information and the voice sent by the voice processing front end.
Step 302: and matching the voice with a preset voice command corresponding to the application state information.
Step 303: and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
This embodiment may be implemented by a speech processing backend. The speech processing back end stores a context formed by a plurality of application state information. Each application state information corresponds to a group of voice commands and control commands corresponding to the voice commands. Speech processing back end: and matching the voice with a preset voice command corresponding to the application state information, so that the matching range can be reduced, and the accuracy of voice command recognition can be improved. When the matching is consistent, a control command more suitable for the application module can be obtained. For example, the application module is an alarm, the current state is an alarm ringing, and the voice command is "off", and the "off" voice command is matched with the voice command corresponding to the alarm. For example, the voice commands corresponding to the alarm clock include "turn off the alarm clock", and the like. Because the application state information of the alarm and the voice command of closing are available, the matching can be confirmed to be consistent, at the moment, the similarity threshold value when the matching is consistent can be reduced, or the application state information and the received voice command are combined together to be matched with the corresponding voice command, the matching result of closing the alarm is obtained, and the control command of closing the alarm can be determined.
In this embodiment, when the speech processing back end receives the application state information, the received speech may be processed as a speech command, rather than as a wakeup word. The voice commands received by the voice processing backend are essentially voice.
Optionally, the speech includes a wake word and a command word.
The method further comprises the following steps: step E1-step E2.
Step E1: and analyzing the awakening words and the command words from the voice.
Step E2: and deleting the awakening word.
The step 302 includes: step E3.
Step E3: and matching the command word with a preset voice command corresponding to the application state information.
In this embodiment, the received voice may include a wake-up word, for example, the voice is "XX (wake-up word of alarm clock), turn off alarm. When the voice processing back end identifies the voice, the awakening word can be identified, namely the awakening word and the command word are analyzed from the voice. But the wake-up word need not be processed and can be ignored or deleted. And processing the command word, matching the command word with the voice command, and further determining the control command. It can be seen that the embodiment omits the processing procedure of the wakeup word. And is not affected by the wake-up word.
Optionally, the method further includes: step F1-step F4.
Step F1: and receiving the voice sent by the voice processing front end.
Step F2: and judging whether the application state information is in the validity period.
Step F3: and when the application state information is in the valid period, matching the voice with a preset voice command corresponding to the application state information.
Step F4: and when the voice is not in the valid period, processing the voice as normal voice.
In this embodiment, the speech processing backend may receive the speech again after step 301. At this time, whether the application state information is in the validity period is judged. The application state information has a certain validity period in the voice control back end, and the validity period may be a preset time length (such as 30 seconds, 1 minute, and the like), or the validity period may be ended when the next application state information is received, which comes first. When the voice is received again, the application state information is not received, and the previously received application state information is in the validity period, the currently received voice is taken as a voice command associated with the application state information in the validity period to be processed, namely, the voice is matched with a preset voice command corresponding to the application state information. And if the previously received application state information is not in the validity period, matching the voice command with a preset general voice command. That is, instead of matching the received speech with the speech command corresponding to the application state information, the speech is processed as a normal speech, for example, whether the speech is a wake-up word of an application module is recognized. The normal voice in the present embodiment is distinguished from the voice directly as a voice command.
In this embodiment, the duration of the validity period of the application state information may be similar to the duration of the timer in which the application module is in the activated state. And if the application module is in the activated state and does not receive any trigger before the timer is overtime, the application module is converted into the dormant state when the timer is overtime.
Optionally, the method further includes: step G1-step G2.
Step G1: and receiving the user identification of the application module sent by the voice processing front end.
Step G2: and calling a scene context corresponding to the user identification, wherein the scene context comprises the corresponding relation between the application state information and the voice command.
In this embodiment, step G1 and step 301 may be the same message. The speech processing back end stores a context formed by a plurality of application state information, namely a scene context. The corresponding relation of the user identification, the application state information and the voice command is constructed. Therefore, the voice command can be matched according to the use habits of the user, and a more accurate voice command can be obtained. Moreover, a plurality of voices of the same user can form a voice context, and the front-back logic of the plurality of voices is combined, so that more accurate voice commands can be obtained.
The implementation process is described in detail by the following embodiments.
Referring to fig. 4, the method for speech recognition in this embodiment includes:
step 401: and receiving the application state information, the user identification of the application module and the voice sent by the voice processing front end.
Step 402: and calling a scene context corresponding to the user identification, wherein the scene context comprises the corresponding relation between the application state information and the voice command.
Step 403: and matching the voice with a preset voice command corresponding to the application state information.
Step 404: and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
And when the matching is inconsistent, feeding back a command indicating that the voice recognition is failed to the voice processing front end.
Step 405: and receiving the voice sent by the voice processing front end.
Step 406: and judging whether the application state information is in the validity period. When it is in the valid period, continue to step 407; when it is not at the validity period, the process continues to step 408.
Step 407: and matching the voice with a preset voice command corresponding to the application state information.
Step 408: the speech is processed as normal speech.
The implementation is described below in conjunction with both the front-end and back-end speech processing.
Referring to fig. 5, the method for speech recognition in this embodiment includes:
step 501: the voice processing front end acquires application state information about an application module when the application module is in a dormant state.
Step 502: the voice processing front end receives input voice when the application module is in a dormant state.
Step 503: and when the application state information belongs to the preset application state information, the voice processing front end takes the voice as a voice command and sends the voice command and the application state information to the voice processing rear end for voice recognition.
Step 504: and the voice processing back end matches the voice with a preset voice command corresponding to the application state information.
Step 505: and when the voice processing back end is matched consistently, sending the control command corresponding to the voice command matched consistently to the voice processing front end.
Step 506: the speech processing front end activates the application module.
Step 507: and the voice processing front end sends the control command to the application module.
The above embodiments can be freely combined according to actual needs.
The implementation of speech recognition is described above and may be implemented by a device, the internal structure and function of which are described below.
Referring to fig. 6, the speech recognition apparatus in this embodiment, applied to a speech processing front end, includes: an obtaining module 601, a first receiving module 602, and a first sending module 603.
An obtaining module 601, configured to obtain application state information about an application module when the application module is in a dormant state.
A first receiving module 602, configured to receive an input voice when the application module is in a dormant state.
A first sending module 603, configured to, when the application state information belongs to preset application state information, send the voice as a voice command together with the application state information to a voice processing back end for voice recognition.
Optionally, as shown in fig. 7, the apparatus further includes: a second receiving module 701, an activating module 702 and a second sending module 703.
A second receiving module 701, configured to receive a control command sent by the back end of the voice processing.
An activation module 702 for activating the application module.
A second sending module 703, configured to send the control command to the application module.
Optionally, as shown in fig. 8, the apparatus further includes: and a third sending module 801.
A third sending module 801, configured to send the voice as a wakeup word to a voice processing back end for voice recognition when the application state information does not belong to preset application state information.
Optionally, as shown in fig. 9, the apparatus further includes: a fourth sending module 901.
A fourth sending module 901, configured to send the user identifier of the application module to the speech processing backend.
Optionally, the application module is located in an external intelligent terminal.
As shown in fig. 10, the obtaining module 601 includes: a receive submodule 1001.
The receiving submodule 1001 is configured to receive application state information of the application module sent by an external intelligent terminal.
Referring to fig. 11, the speech recognition apparatus in this embodiment is applied to a speech processing backend, and the apparatus includes: a first receiving module 1101, a first matching module 1102 and a sending module 1103.
The first receiving module 1101 is configured to receive application state information and voice sent by a voice processing front end.
A first matching module 1102, configured to match the voice with a preset voice command corresponding to the application state information.
A sending module 1103, configured to send, when the matching is consistent, the control command corresponding to the voice command that is consistent with the matching to the voice processing front end.
Optionally, the speech includes a wake word and a command word.
As shown in fig. 12, the apparatus further includes: a parsing module 1201 and a deletion module 1202.
And the analyzing module 1201 is configured to analyze the wake-up word and the command word from the speech.
A deleting module 1202, configured to delete the wakeup word.
As shown in fig. 13, the first matching module 1102 includes: a matching sub-module 1301.
The matching sub-module 1301 is configured to match the command word with a preset voice command corresponding to the application state information.
Optionally, as shown in fig. 14, the apparatus further includes: a second receiving module 1401, a judging module 1402, a second matching module 1403 and a third matching module 1404.
A second receiving module 1401, configured to receive the voice sent by the voice processing front end.
A determining module 1402, configured to determine whether the application state information is in a validity period.
A second matching module 1403, configured to match the voice with a preset voice command corresponding to the application state information when the validity period is up.
A third matching module 1404, configured to process the speech as normal speech when the validity period is not reached.
Optionally, as shown in fig. 15, the apparatus further includes: a third receiving module 1501 and a calling module 1502.
A third receiving module 1501 is configured to receive the user identifier of the application module sent by the speech processing front end.
The invoking module 1502 is configured to invoke a context corresponding to the user identifier, where the context includes a corresponding relationship between the application state information and the voice command.
An apparatus for speech recognition, applied to a speech processing front end, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring application state information about an application module when the application module is in a dormant state;
receiving input voice when the application module is in a dormant state;
and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
An apparatus for speech recognition, applied to a speech processing backend, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
receiving application state information and voice sent by a voice processing front end;
matching the voice with a preset voice command corresponding to the application state information;
and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of a speech processing front-end.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of a speech processing backend.
A system for speech recognition, comprising: a speech processing front-end and a speech processing back-end.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (23)

1. A method of speech recognition applied to a speech processing front-end, the method comprising:
acquiring application state information about an application module when the application module is in a dormant state;
receiving input voice when the application module is in a dormant state;
and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
2. The method of claim 1, wherein the method further comprises:
receiving a control command sent by a voice processing rear end;
activating the application module;
and sending the control command to the application module.
3. The method of claim 1, wherein the method further comprises:
and when the application state information does not belong to the preset application state information, sending the voice as a wake-up word to a voice processing rear end for voice recognition.
4. The method of claim 1, wherein the method further comprises:
and sending the user identification of the application module to the voice processing back end.
5. The method of claim 1, wherein the application module is located in an external smart terminal;
the acquiring application state information about the application module includes:
and receiving application state information of the application module sent by an external intelligent terminal.
6. A method of speech recognition, applied to a speech processing backend, the method comprising:
receiving application state information and voice sent by a voice processing front end;
matching the voice with a preset voice command corresponding to the application state information;
and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
7. The method of claim 6, wherein the speech comprises a wake word and a command word;
the method further comprises the following steps:
analyzing a wake-up word and a command word from the voice;
deleting the awakening word;
the matching the voice with a preset voice command corresponding to the application state information includes:
and matching the command word with a preset voice command corresponding to the application state information.
8. The method of claim 6, wherein the method further comprises:
receiving voice sent by a voice processing front end;
judging whether the application state information is in the validity period;
when the application state information is in the valid period, matching the voice with a preset voice command corresponding to the application state information;
and when the voice is not in the valid period, processing the voice as normal voice.
9. The method of claim 6, wherein the method further comprises:
receiving a user identification of the application module sent by a voice processing front end;
and calling a scene context corresponding to the user identification, wherein the scene context comprises the corresponding relation between the application state information and the voice command.
10. An apparatus for speech recognition, applied to a speech processing front-end, comprising:
the device comprises an acquisition module, a judging module and a judging module, wherein the acquisition module is used for acquiring application state information related to an application module when the application module is in a dormant state;
the first receiving module is used for receiving input voice when the application module is in a dormant state;
and the first sending module is used for sending the voice as a voice command and the application state information to a voice processing rear end for voice recognition when the application state information belongs to preset application state information.
11. The apparatus of claim 10, wherein the apparatus further comprises:
the second receiving module is used for receiving a control command sent by the voice processing back end;
an activation module for activating the application module;
and the second sending module is used for sending the control command to the application module.
12. The apparatus of claim 10, wherein the apparatus further comprises:
and the third sending module is used for sending the voice as a wake-up word to a voice processing rear end for voice recognition when the application state information does not belong to the preset application state information.
13. The apparatus of claim 10, wherein the apparatus further comprises:
and the fourth sending module is used for sending the user identification of the application module to the voice processing back end.
14. The apparatus of claim 10, wherein the application module is located in an external smart terminal;
the acquisition module includes:
and the receiving submodule is used for receiving the application state information of the application module sent by an external intelligent terminal.
15. An apparatus for speech recognition, applied to a speech processing backend, the apparatus comprising:
the first receiving module is used for receiving application state information and voice sent by the voice processing front end;
the first matching module is used for matching the voice with a preset voice command corresponding to the application state information;
and the sending module is used for sending the control command corresponding to the voice command which is matched with the voice command to the voice processing front end when the voice command is matched with the voice command.
16. The apparatus of claim 15, wherein the speech comprises a wake word and a command word;
the device further comprises:
the analysis module is used for analyzing the awakening words and the command words from the voice;
a deleting module for deleting the awakening word;
the first matching module includes:
and the matching sub-module is used for matching the command word with a preset voice command corresponding to the application state information.
17. The apparatus of claim 15, wherein the apparatus further comprises:
the second receiving module is used for receiving the voice sent by the voice processing front end;
the judging module is used for judging whether the application state information is in the validity period or not;
the second matching module is used for matching the voice with a preset voice command corresponding to the application state information when the validity period is up;
and the third matching module is used for processing the voice as the normal voice when the voice is not in the validity period.
18. The apparatus of claim 15, wherein the apparatus further comprises:
the third receiving module is used for receiving the user identification of the application module sent by the voice processing front end;
and the calling module is used for calling the scene context corresponding to the user identification, and the scene context comprises the corresponding relation between the application state information and the voice command.
19. An apparatus for speech recognition, applied to a speech processing front-end, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring application state information about an application module when the application module is in a dormant state;
receiving input voice when the application module is in a dormant state;
and when the application state information belongs to preset application state information, the voice is used as a voice command and is sent to a voice processing rear end together with the application state information for voice recognition.
20. An apparatus for speech recognition, wherein an application and speech processing backend comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
receiving application state information and voice sent by a voice processing front end;
matching the voice with a preset voice command corresponding to the application state information;
and when the matching is consistent, sending the control command corresponding to the voice command which is consistent in matching to the voice processing front end.
21. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any one of claims 1 to 5.
22. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, carry out the steps of the method of any one of claims 6 to 9.
23. A system for speech recognition, comprising: the device of any one of claims 10-14, and the device of any one of claims 15-18.
CN202010906497.9A 2020-09-01 2020-09-01 Voice recognition method and device Pending CN112151028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010906497.9A CN112151028A (en) 2020-09-01 2020-09-01 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010906497.9A CN112151028A (en) 2020-09-01 2020-09-01 Voice recognition method and device

Publications (1)

Publication Number Publication Date
CN112151028A true CN112151028A (en) 2020-12-29

Family

ID=73890178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010906497.9A Pending CN112151028A (en) 2020-09-01 2020-09-01 Voice recognition method and device

Country Status (1)

Country Link
CN (1) CN112151028A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115312051A (en) * 2022-07-07 2022-11-08 青岛海尔科技有限公司 Voice control method and device for equipment, storage medium and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107564518A (en) * 2017-08-21 2018-01-09 百度在线网络技术(北京)有限公司 Smart machine control method, device and computer equipment
CN108335695A (en) * 2017-06-27 2018-07-27 腾讯科技(深圳)有限公司 Sound control method, device, computer equipment and storage medium
CN109192208A (en) * 2018-09-30 2019-01-11 深圳创维-Rgb电子有限公司 A kind of control method of electrical equipment, system, device, equipment and medium
CN109493849A (en) * 2018-12-29 2019-03-19 联想(北京)有限公司 Voice awakening method, device and electronic equipment
CN110503962A (en) * 2019-08-12 2019-11-26 惠州市音贝科技有限公司 Speech recognition and setting method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335695A (en) * 2017-06-27 2018-07-27 腾讯科技(深圳)有限公司 Sound control method, device, computer equipment and storage medium
CN107564518A (en) * 2017-08-21 2018-01-09 百度在线网络技术(北京)有限公司 Smart machine control method, device and computer equipment
CN109192208A (en) * 2018-09-30 2019-01-11 深圳创维-Rgb电子有限公司 A kind of control method of electrical equipment, system, device, equipment and medium
CN109493849A (en) * 2018-12-29 2019-03-19 联想(北京)有限公司 Voice awakening method, device and electronic equipment
CN110503962A (en) * 2019-08-12 2019-11-26 惠州市音贝科技有限公司 Speech recognition and setting method, device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115312051A (en) * 2022-07-07 2022-11-08 青岛海尔科技有限公司 Voice control method and device for equipment, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN109326289B (en) Wake-up-free voice interaction method, device, equipment and storage medium
AU2019246868B2 (en) Method and system for voice activation
CN108962262B (en) Voice data processing method and device
CN107223280B (en) Robot awakening method and device and robot
CN106463112B (en) Voice recognition method, voice awakening device, voice recognition device and terminal
CN106782554B (en) Voice awakening method and device based on artificial intelligence
CN110111789B (en) Voice interaction method and device, computing equipment and computer readable medium
CN108538298B (en) Voice wake-up method and device
CN109741753B (en) Voice interaction method, device, terminal and server
CN107220532B (en) Method and apparatus for recognizing user identity through voice
CN111161714B (en) Voice information processing method, electronic equipment and storage medium
US11200899B2 (en) Voice processing method, apparatus and device
US11810593B2 (en) Low power mode for speech capture devices
CN111261151A (en) Voice processing method and device, electronic equipment and storage medium
CN110718225A (en) Voice control method, terminal and storage medium
CN105744074A (en) Voice operation method and apparatus in mobile terminal
WO2019227370A1 (en) Method, apparatus and system for controlling multiple voice assistants, and computer-readable storage medium
CN112420044A (en) Voice recognition method, voice recognition device and electronic equipment
CN106303015A (en) The processing method and processing device of a kind of communication information, terminal unit
CN112151028A (en) Voice recognition method and device
CN111862965A (en) Awakening processing method and device, intelligent sound box and electronic equipment
CN116705033A (en) System on chip for wireless intelligent audio equipment and wireless processing method
CN114391165A (en) Voice information processing method, device, equipment and storage medium
CN115862604A (en) Voice wakeup model training and voice wakeup method, device and computer equipment
CN111464644B (en) Data transmission method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination