CN112562670A - Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment - Google Patents

Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment Download PDF

Info

Publication number
CN112562670A
CN112562670A CN202011411651.1A CN202011411651A CN112562670A CN 112562670 A CN112562670 A CN 112562670A CN 202011411651 A CN202011411651 A CN 202011411651A CN 112562670 A CN112562670 A CN 112562670A
Authority
CN
China
Prior art keywords
corpus
recognition
keyword
intelligent
voice information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011411651.1A
Other languages
Chinese (zh)
Inventor
何海亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Oribo Technology Co Ltd
Original Assignee
Shenzhen Oribo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Oribo Technology Co Ltd filed Critical Shenzhen Oribo Technology Co Ltd
Priority to CN202011411651.1A priority Critical patent/CN112562670A/en
Publication of CN112562670A publication Critical patent/CN112562670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application provides a voice intelligent recognition method, a voice intelligent recognition device and intelligent equipment, relates to the technical field of intelligent home furnishing, and not only can the user key operation be omitted, but also the intelligent equipment can be controlled to work under the condition that the user does not fully speak out the linguistic data mastered by the intelligent equipment. The intelligent voice recognition method comprises the following steps: receiving voice information; matching the voice information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus; matching the target keywords with a business logic library, and determining skill sentences matched with the target keywords, wherein the business logic library comprises at least one skill sentence; and determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.

Description

Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment
Technical Field
The application relates to the technical field of smart homes, in particular to a voice intelligent recognition method, a voice intelligent recognition device and intelligent equipment.
Background
Along with the continuous development of intelligent equipment, the function of intelligent equipment is many and miscellaneous, and the user needs numerous and complicated manual button operation, just can awaken corresponding function or obtain useful information to reduce user experience.
Disclosure of Invention
The present application provides a voice intelligent recognition method, a voice intelligent recognition device and an intelligent device, for example, to solve the above problems.
In a first aspect, a method for intelligent speech recognition is provided, the method comprising: receiving voice information; matching the voice information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus; matching the target keywords with a business logic library, and determining skill sentences matched with the target keywords, wherein the business logic library comprises at least one skill sentence; and determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.
In a second aspect, a speech intelligent recognition method is provided, which includes: the server receives voice information; the server matches the voice information with a corpus, determines a keyword corresponding to at least one recognition corpus in the corpus, and takes the keyword as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus; the server matches the target keywords with a business logic library to determine skill sentences matched with the business logic library, wherein the business logic library comprises at least one skill sentence; the server determines a control command corresponding to the voice information according to the skill sentence and sends the control command to the intelligent equipment; and the intelligent equipment receives the control command and controls the intelligent household equipment to execute the control command.
In a third aspect, an intelligent speech recognition apparatus is provided, which includes: the device comprises a receiving module and a processing module. The receiving module is used for receiving voice information sent by the intelligent equipment; the processing module is used for matching the voice information with the corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus; the processing module is also used for matching the target keywords with the business logic library and determining the skill sentences matched with the target keywords, and the business logic library comprises at least one skill sentence; and the processing module is also used for determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.
In a fourth aspect, a smart device is provided, comprising: one or more processors; a memory; and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of the first aspect or the second aspect.
In a fifth aspect, a computer-readable storage medium is provided, in which a program code is stored, and the program code can be called by a processor to execute the method according to the first aspect or the second aspect.
In the intelligent voice recognition method, the intelligent voice recognition device and the intelligent device provided by the embodiment of the application, after receiving the voice information, the server can split the voice information into at least one keyword, match the at least one keyword with the recognition corpus in the corpus to determine a target keyword corresponding to the recognition corpus in the at least one keyword, match the target keyword with the business logic library to determine a skill sentence corresponding to the target keyword, and then determine a control command according to the skill sentence corresponding to the target keyword, so that the precise control command is determined according to the generalized voice information. Compared with the prior art, the method and the device can not only save key operation of a user, but also control the intelligent device to work under the condition that the user does not fully speak the linguistic data mastered by the intelligent device.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 2 is a flowchart of a speech intelligent recognition method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 4 is a flowchart of a speech intelligent recognition method provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 7 is an interaction timing diagram of a speech intelligent recognition method according to an embodiment of the present application;
fig. 8 is a block diagram of an intelligent speech recognition apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of an intelligent speech recognition apparatus according to an embodiment of the present application;
fig. 10 is a block diagram of an intelligent device provided by an embodiment of the present application;
fig. 11 is a memory of an application program of the speech intelligent recognition method according to the embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
Based on the problems provided by the background art, the voice recognition function is applied to the intelligent equipment in the related art, the distance between the user and the intelligent equipment is shortened, and the user can more directly and conveniently control the intelligent equipment.
At present, manufacturers often adopt a deep learning algorithm to train intelligent equipment, so that the intelligent equipment masters fixed corpora, and a user needs to speak a control command strictly according to the corpora mastered by the intelligent equipment. However, the corpus of the partially controlled intelligent device is too long and the spoken language is not enough, so that the user is difficult to hit precisely, and the intelligent device is further controlled, and the user experience is influenced.
The inventor provides the following scheme after research, so that the key operation of a user can be omitted, and the intelligent device can be controlled under the condition that the user does not fully speak the linguistic data mastered by the intelligent device.
The smart device may be a smart phone, a tablet computer, an electronic book, a smart control panel, etc. The intelligent household equipment can comprise an intelligent television, an intelligent curtain, an intelligent sound box, an intelligent refrigerator, an intelligent electric cooker and the like.
As shown in fig. 1, the server 20 may receive the voice information through the smart device 10, process the voice information, determine a control command corresponding to the voice information, and control the smart device 10 to perform a corresponding operation according to the control command. The server 20 may be a conventional server or a cloud server, among others.
In the embodiment of the present application, a corpus may be prestored in the smart device 10 or the server 20, where the corpus includes at least one recognition corpus. After receiving the voice message, the server 20 may directly call the pre-stored corpus to determine a keyword corresponding to the identified corpus in the voice message, where the keyword is used as a target keyword.
The corpus may be an intelligent home corpus, and the recognition corpus in the intelligent home corpus may include at least one of a name of the intelligent home device, a function of the intelligent home device, and a functional state of the intelligent home device.
Taking the smart home devices as smart televisions and the current requirements of users as watching televisions as examples, the names of the smart home devices may be "smart televisions", the functions of the smart home devices may be "on/off", and the functional states of the smart home devices may be "on". Certainly, the identification corpus in the smart home corpus may also include other functions and functional states of the smart television; the recognition corpus in the smart home corpus may also include names, functions, and functional states corresponding to other smart homes.
In the embodiment of the present application, a business logic library may be prestored in the smart device 10 or the server 20, and the business logic library includes at least one skill sentence. After the server 20 determines the target keyword corresponding to the voice information through the corpus, it may directly call a pre-stored service logic library to determine the skill sentence corresponding to the target keyword.
The service logic library may be an intelligent home service logic library, and the skill sentences in the intelligent home service logic library may include all sentences which can be formed by names of the intelligent home devices, functions of the intelligent home devices, and functional states of the intelligent home devices.
Taking the smart home device as a smart television, and taking an operation corresponding to a skill sentence as "turning on the smart television" as an example, the skill sentence of the smart home service logic library may include a skill sentence consisting of "smart television", "switch", "turn on", and "turn on", for example, the skill sentence may be "turn on the switch of the smart television", "please turn on the switch of the smart television", "turn on the switch of the smart television", and the like.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
As shown in fig. 2, an embodiment of the present application provides an intelligent speech recognition method, which is applicable to a server 20, and this embodiment describes a server-side process flow, where the method may include:
and S110, receiving voice information.
As shown in fig. 3, when the smart device 10 is in the offline state and the awake state, the user's spoken words may be transmitted to the server 20 through the smart device 10.
For example, the smart device 10 may be a smart control panel, and before the server 20 receives the voice message, the smart control panel may be in an offline state, and the user may speak "small housekeeper" and send the voice message to the server 20 through the smart control panel.
Before the server 20 receives the voice message, the intelligent control panel may also be in an awake state, and the user may say "turn on the speaker", and send the voice message to the server 20 through the intelligent control panel.
S120, matching the voice information with a corpus to determine a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
After the voice information is received by the server 20, the voice information may be parsed to convert the voice information into a computer recognizable language. After the server 20 recognizes the analyzed voice information, the user can split a complete sentence into at least one keyword, and match the at least one keyword with the recognition corpus in the corpus, wherein the keyword corresponding to the recognition corpus in the at least one keyword can be used as a target keyword in the voice information.
For example, the voice message received by the server 20 is "open speaker", the "open speaker" may be split into "handle", "speaker" and "open", and the "handle", "speaker" and "open" are respectively matched with the recognized corpora in the corpus. If the recognition corpus in the corpus includes "open" and "smart speaker", the keyword "open" is matched with the recognition corpus "open", the keyword "speaker" is matched with the recognition corpus "smart speaker", and the "speaker" and "open" in the voice information can be used as the target keyword.
In some embodiments, in the case that a complete sentence spoken by a user is split into multiple keywords, the multiple keywords may all be matched with the corpus and serve as target keywords; alternatively, a part of the plurality of keywords may be matched with the corpus, and a part of the plurality of keywords matched with the corpus may be used as the target keyword.
In some embodiments, the words spoken by the user may include keywords and non-keywords, and the non-keywords are before and/or after the keywords, and when matching the speech information corresponding to the words spoken by the user with the corpus, the server 20 may determine which words in the speech information are the keywords and which words are the non-keywords according to the recognition corpus in the corpus, that is, the server 20 determines the target keywords according to the recognition corpus, and at the same time allows at least one non-keyword to be included in the speech information.
The non-keyword may be located before and/or after the keyword, taking the corpus as an example of the smart home corpus, and the non-keyword may be located before a name of the smart home device, and/or after the name of the smart home device, and/or before a function of the smart home device, and/or after a function state of the smart home device, and/or after the function state of the smart home device.
The non-keyword may be, for example, a subject, a verb, an assistant word, and the like, and this is not particularly limited in this embodiment of the present application. The subject may be, for example, "i", the verb may be, for example, "help", and the helpwords may be, for example, "bar, o, and" calash ".
For example, if the user's requirement is "turn on the smart tv", the user may say "please help me to turn on the switch bar of the smart tv", the target keyword may be "turn on", "smart tv", or "switch", and before "turn on", the voice message further includes a non-keyword "please help me"; between "smart tv" and "switch", the voice message also includes non-keyword "; after "switch", the voice message also includes a non-keyword "bar".
In some embodiments, the number of words of the non-keyword before or after the keyword may be less than the preset number of words, so as to avoid that the voice information is too long and the voice recognition effect is affected. The preset word number is not limited, for example, the preset word number can be 3-7, and optionally, the preset word number is 5. In some embodiments, the manner of parsing the voice information is not limited as long as the parsed voice information can be recognized by the server 20.
For example, the server may recognize the voice information by communicating the voice information with the computer using a Natural Language Understanding (NLU) technology; alternatively, a Dynamic Time Warping (DTW) algorithm may be used to identify text information corresponding to the speech information according to the acoustic feature vector of the speech information. Of course, other techniques may also be used to analyze the voice information, and this is not particularly limited in this embodiment of the application.
Wherein, if adopt technologies such as DTW technique to convert speech information into text message to analyze speech information, then match speech information and corpus, confirm the keyword that corresponds with at least one discernment corpus in the corpus, regard the keyword that corresponds with the discernment corpus as the target keyword in the speech information, the step that includes at least one discernment corpus in the corpus can include:
and S121, converting the voice information into text information.
The voice information may be converted into text information using DTW technology or the like.
And S122, matching the text information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
S130, matching the target keywords with a business logic library, and determining the skill sentences matched with the business logic library, wherein the business logic library comprises at least one skill sentence.
After determining the target keyword according to step S120, the target keyword may be matched with the business logic library to determine a skill sentence corresponding to the target keyword.
For example, the target keywords are "speaker" and "open", and the skill statements in the business logic library include "open smart speaker", so that the skill statements matching "speaker" and "open" can be determined as "open smart speaker".
In some embodiments, in the case that the voice message has a plurality of target keywords, the sequence of the target keywords in the voice message may be the same as or different from the sequence of the target keywords in the skill sentence.
For example, the user says "open sound box", where "sound box" and "open" are the target keywords, and in the voice message, the order of "sound box" and "open" is: firstly, turning on the loudspeaker box; and the skill sentence matched with the target keyword is 'turn on smart speaker', and in the skill sentence, the sequence of 'smart speaker' and 'turn on' is: firstly, the intelligent sound box is turned on and then turned on.
Or, the user says "please open the sound box", wherein "sound box" and "open" are the target keywords, and in the voice message, the order of "sound box" and "open" is: firstly, turning on and then turning on the loudspeaker box; and the skill sentence matched with the target keyword is "turn on smart speaker", and in the skill sentence, the sequence of "smart speaker" and "turn on" is also: firstly, the intelligent sound box is turned on and then turned on. And S140, determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.
After determining the skill sentence according to step S130, the control command corresponding to the voice information may be determined according to the skill sentence corresponding to the target keyword.
In some embodiments, when determining the control command according to the skill sentence matched with the target keyword, the skill business logic may be triggered according to the skill sentence matched with the target keyword; and then, determining a control command corresponding to the voice information according to the skill business logic.
The embodiment of the application provides an intelligent voice recognition method, after receiving voice information, a server 20 can split the voice information into at least one keyword, match the at least one keyword with recognition corpora in a corpus to determine a target keyword corresponding to the recognition corpora in the at least one keyword, then match the target keyword with a business logic library to determine a skill sentence corresponding to the target keyword, and further determine a control command according to the skill sentence corresponding to the target keyword, so that an accurate control command is determined according to generalized voice information. Compared with the prior art, the method and the device can not only save key operation of a user, but also control the intelligent device 10 to work under the condition that the user does not fully speak the linguistic data mastered by the intelligent device 10.
As shown in fig. 4, an embodiment of the present application provides an intelligent speech recognition method, which is applicable to a server 20, and this embodiment describes a server-side process flow, where the method may include:
and S110, receiving voice information.
S120, matching the voice information with a corpus to determine a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
S130, matching the target keywords with a business logic library, and determining the skill sentences matched with the business logic library, wherein the business logic library comprises at least one skill sentence.
And S140, determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.
S150, the control command is sent to the intelligent equipment, so that the intelligent equipment controls the intelligent household equipment to execute the control command.
As shown in fig. 1, after determining the control command corresponding to the voice information, the server 20 may send the control command to the smart device 10 through communication manners such as Wireless-Fidelity (WiFi), bluetooth, Zigbee, and hotspot, so that the smart device controls the smart home device to execute the control command. Of course, the server 20 may also send the control command to the smart device 10 through other communication manners, which is not limited in this embodiment.
As shown in fig. 5, taking the control command as "turn on the smart speaker" as an example, if the server 20 sends the control command to the smart control panel, the smart control panel further controls the smart speaker to turn from the offline state to the awake state after receiving the control command. That is, the server 20 sends the control command to the smart device 10, and the smart device 10 further controls the smart home device to execute the control command.
In other embodiments, if the smart device 10 is an intelligent home device, the smart device 10 may directly execute the corresponding operation after receiving the control command.
Taking the control command as "turn on the smart speaker" as an example, as shown in fig. 6, if the server 20 sends the control command to the smart speaker, the smart speaker is turned from the offline state to the awake state.
The intelligent control panel and the intelligent sound box can interact with each other through WiFi Bluetooth, Zigbee, hotspots and other communication modes.
The embodiment of the application provides an intelligent voice recognition method, after receiving voice information, a server 20 can split the voice information into at least one keyword, match the at least one keyword with recognition corpora in a corpus to determine a target keyword corresponding to the recognition corpora in the at least one keyword, then match the target keyword with a business logic library to determine a skill sentence corresponding to the target keyword, further determine a control command according to the skill sentence corresponding to the target keyword, and send the control command to an intelligent device 10, so that the purpose that an accurate control command is determined according to generalized voice information is achieved, and the intelligent device 10 controls an intelligent home device to execute corresponding operations. Compared with the prior art, the method and the device can not only save key operation of a user, but also control the intelligent device 10 to work under the condition that the user does not fully speak the linguistic data mastered by the intelligent device 10.
As shown in fig. 7, another embodiment of the present application provides a device distribution network processing method, which is applicable to interaction between an intelligent device 10 and a server 12, where this embodiment describes an interaction flow between the intelligent device 10 and the server 12, and the method may include:
s210, the server receives the voice information.
As shown in fig. 3, when the smart device 10 is in the offline state and the awake state, the user's spoken words may be transmitted to the server 20 through the smart device 10.
S220, the server matches the voice information with the corpus to determine a keyword corresponding to at least one recognition corpus in the corpus, the keyword is used as a target keyword in the voice information, and the corpus comprises at least one recognition corpus.
After the voice information is received by the server 20, the voice information may be parsed to convert the voice information into a computer recognizable language. After the server 20 recognizes the analyzed voice information, the user can split a complete sentence into at least one keyword, and match the at least one keyword with the recognition corpus in the corpus, wherein the keyword corresponding to the recognition corpus in the at least one keyword can be used as a target keyword in the voice information.
S230, the server matches the target keywords with the business logic library to determine the skill sentences matched with the business logic library, and the business logic library comprises at least one skill sentence.
After determining the target keyword according to step S220, the target keyword may be matched with the business logic library to determine a skill sentence corresponding to the target keyword.
And S240, the server determines a control command corresponding to the voice information according to the skill sentence, and sends the control command to the intelligent equipment.
After determining the skill sentence according to step S230, the control command corresponding to the voice information may be determined according to the skill sentence corresponding to the target keyword, and then the control command may be sent to the intelligent device 10 through communication methods such as WiFi bluetooth, Zigbee, hotspot, and the like.
In addition, the other explanations of steps S210 to S240 are the same as the explanations of steps S110 to S150 in the foregoing embodiment, and are not repeated herein.
And S250, the intelligent equipment receives the control command and controls the intelligent household equipment to execute the control command.
After receiving the control command, the smart device 10 may control the smart home device to execute a corresponding operation according to the control command.
For example, the control command is "turn on the speaker", and after the intelligent control panel receiving the control command receives the control command "turn on the speaker", the intelligent speaker may be controlled to change from the offline state to the awake state.
The embodiment of the application provides an intelligent voice recognition method, after receiving voice information, a server 20 can split the voice information into at least one keyword, match the at least one keyword with recognition corpora in a corpus to determine a target keyword corresponding to the recognition corpora in the at least one keyword, then match the target keyword with a business logic library to determine a skill sentence corresponding to the target keyword, further determine a control command according to the skill sentence corresponding to the target keyword, and send the control command to an intelligent device 10, so that the purpose that an accurate control command is determined according to generalized voice information is achieved, and the intelligent device 10 controls an intelligent home device to execute corresponding operations. Compared with the prior art, the method and the device can not only save key operation of a user, but also control the intelligent device 10 to work under the condition that the user does not fully speak the linguistic data mastered by the intelligent device 10.
As shown in fig. 8, which shows a block diagram of a speech intelligent recognition apparatus 100 according to another embodiment of the present application, the speech intelligent recognition apparatus 100 may include a receiving module 101 and a processing module 102.
The receiving module 101 is configured to receive voice information sent by the smart device 10.
The processing module 102 is configured to match the speech information with a corpus, determine a keyword corresponding to at least one recognition corpus in the corpus, and use the keyword corresponding to the recognition corpus as a target keyword in the speech information, where the corpus includes the at least one recognition corpus.
The processing module 102 is further configured to match the target keyword with a business logic library, and determine a skill statement matched with the target keyword, where the business logic library includes at least one skill statement.
The processing module 102 is further configured to determine a control command corresponding to the voice message according to the skill sentence matched with the target keyword.
On the basis, the processing module 102 is further configured to parse the voice message to convert the voice message into a language recognizable to the computer; matching the analyzed voice information with a corpus, determining a target keyword corresponding to at least one recognition corpus of the corpus, and taking the keyword corresponding to the recognition corpus as the target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
The processing module 102 is further configured to convert the voice information into text information; matching the text information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
The corpus can be an intelligent home device corpus, and the target keywords include at least one of names of the intelligent home devices, functions of the intelligent home devices, and functional states of the intelligent home devices.
On this basis, as shown in fig. 9, the intelligent speech recognition device 100 may further include a sending module 103. The sending module 103 is configured to send the control command to the intelligent device 10 after determining the control command corresponding to the voice information according to the skill sentence matched with the target keyword, so that the intelligent device 10 controls the intelligent home device to execute the control command.
The explanation and the advantageous effects of the speech intelligent recognition device 100 provided in the embodiment of the present application are the same as those of the foregoing embodiment, and are not repeated herein.
As shown in fig. 10, a block diagram of a smart device 10 according to another embodiment of the present application is shown, where the smart device 10 includes: one or more processors 11; a memory 12; and one or more applications 13, wherein the one or more applications 13 are stored in the memory and configured to be executed by the one or more processors 11, the one or more applications 13 configured to perform the methods of the foregoing embodiments.
Processor 11 may include one or more processing cores. The processor 11 connects various parts throughout the smart device 10 using various interfaces and lines, and performs various functions of the smart device 10 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 12, and calling data stored in the memory 12. Alternatively, the processor 11 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 11 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may be implemented by a communication chip without being integrated into the processor 11.
The Memory 12 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory 12 may be used to store instructions, programs, code sets or instruction sets. The memory 12 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the smart device 10 during use (e.g., phone book, audio-video data, chat log data), etc.
Fig. 11 is a block diagram illustrating a computer-readable storage medium 200 according to another embodiment of the present application. The computer-readable storage medium 200 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.
The computer-readable storage medium 200 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 200 includes a non-transitory computer-readable storage medium.
The computer readable storage medium 200 has storage space for the application 13 that performs any of the method steps of the method described above. The application programs 13 may be read from or written to one or more computer program products. The application 13 may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. An intelligent speech recognition method, comprising:
receiving voice information;
matching the voice information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus;
matching the target keywords with a business logic library, and determining skill sentences matched with the target keywords, wherein the business logic library comprises at least one skill sentence;
and determining a control command corresponding to the voice information according to the skill sentence matched with the target keyword.
2. The method according to claim 1, wherein the step of matching the speech information with a corpus, determining a target keyword corresponding to at least one recognition corpus in the corpus, and using the keyword corresponding to the recognition corpus as the target keyword in the speech information, wherein the corpus includes at least one recognition corpus specifically comprises:
analyzing the voice information to convert the voice information into a language which can be recognized by a computer;
matching the analyzed voice information with the corpus to determine a target keyword corresponding to at least one recognition corpus of the corpus, and taking the keyword corresponding to the recognition corpus as the target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
3. The method according to claim 1, wherein the step of matching the speech information with the corpus to determine a keyword corresponding to at least one recognition corpus in the corpus, and using the keyword corresponding to the recognition corpus as a target keyword in the speech information, wherein the corpus includes at least one recognition corpus specifically includes:
converting the voice information into text information;
matching the text information with the corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus.
4. The method according to any one of claims 1-3, wherein after determining the control command corresponding to the voice message according to the skill sentence matching the target keyword, the method further comprises:
and sending the control command to intelligent equipment so that the intelligent equipment controls the intelligent household equipment to execute the control command.
5. The method according to any one of claims 1-3, wherein the corpus is a corpus of smart home devices.
6. The method according to claim 5, wherein the target keyword comprises at least one of a name of the smart home device, a function of the smart home device, and a functional state of the smart home device.
7. An intelligent speech recognition method, comprising:
the server receives voice information;
the server matches the voice information with a corpus, determines a keyword corresponding to at least one recognition corpus in the corpus, and takes the keyword as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus;
the server matches the target keyword with a business logic library and determines a skill sentence matched with the business logic library, wherein the business logic library comprises at least one skill sentence;
the server determines a control command corresponding to the voice information according to the skill sentence and sends the control command to the intelligent equipment;
and the intelligent equipment receives the control command and controls the intelligent household equipment to execute the control command.
8. An intelligent speech recognition device, comprising:
the receiving module is used for receiving voice information sent by the intelligent equipment;
the processing module is used for matching the voice information with a corpus, determining a keyword corresponding to at least one recognition corpus in the corpus, and taking the keyword corresponding to the recognition corpus as a target keyword in the voice information, wherein the corpus comprises at least one recognition corpus;
the processing module is further configured to match the target keyword with a business logic library, and determine a skill statement matched with the target keyword, where the business logic library includes at least one skill statement;
the processing module is further configured to determine a control command corresponding to the voice message according to the skill sentence matched with the target keyword.
9. An intelligent device, comprising:
one or more processors;
a memory; and
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-6 or claim 7.
10. A computer-readable storage medium having program code stored therein, the program code being callable by a processor to perform the method according to any one of claims 1 to 6 or claim 7.
CN202011411651.1A 2020-12-03 2020-12-03 Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment Pending CN112562670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011411651.1A CN112562670A (en) 2020-12-03 2020-12-03 Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011411651.1A CN112562670A (en) 2020-12-03 2020-12-03 Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment

Publications (1)

Publication Number Publication Date
CN112562670A true CN112562670A (en) 2021-03-26

Family

ID=75048768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011411651.1A Pending CN112562670A (en) 2020-12-03 2020-12-03 Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment

Country Status (1)

Country Link
CN (1) CN112562670A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611306A (en) * 2021-09-07 2021-11-05 云知声(上海)智能科技有限公司 Intelligent household voice control method and system based on user habits and storage medium
CN114049877A (en) * 2021-11-04 2022-02-15 北京奇天大胜网络科技有限公司 Voice digital human-television information interaction method and system based on Internet of things
CN114911381A (en) * 2022-04-15 2022-08-16 青岛海尔科技有限公司 Interactive feedback method and device, storage medium and electronic device
CN115421396A (en) * 2022-09-29 2022-12-02 深圳康佳电子科技有限公司 Intelligent household equipment control method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305626A (en) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 The sound control method and device of application program
CN110286601A (en) * 2019-07-01 2019-09-27 珠海格力电器股份有限公司 Method and device for controlling intelligent household equipment, control equipment and storage medium
CN110942773A (en) * 2019-12-10 2020-03-31 上海雷盎云智能技术有限公司 Method and device for controlling intelligent household equipment through voice
WO2020135067A1 (en) * 2018-12-24 2020-07-02 同方威视技术股份有限公司 Voice interaction method and device, robot, and computer readable storage medium
CN111599362A (en) * 2020-05-20 2020-08-28 湖南华诺科技有限公司 System and method for self-defining intelligent sound box skill and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305626A (en) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 The sound control method and device of application program
WO2020135067A1 (en) * 2018-12-24 2020-07-02 同方威视技术股份有限公司 Voice interaction method and device, robot, and computer readable storage medium
CN110286601A (en) * 2019-07-01 2019-09-27 珠海格力电器股份有限公司 Method and device for controlling intelligent household equipment, control equipment and storage medium
CN110942773A (en) * 2019-12-10 2020-03-31 上海雷盎云智能技术有限公司 Method and device for controlling intelligent household equipment through voice
CN111599362A (en) * 2020-05-20 2020-08-28 湖南华诺科技有限公司 System and method for self-defining intelligent sound box skill and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611306A (en) * 2021-09-07 2021-11-05 云知声(上海)智能科技有限公司 Intelligent household voice control method and system based on user habits and storage medium
CN114049877A (en) * 2021-11-04 2022-02-15 北京奇天大胜网络科技有限公司 Voice digital human-television information interaction method and system based on Internet of things
CN114911381A (en) * 2022-04-15 2022-08-16 青岛海尔科技有限公司 Interactive feedback method and device, storage medium and electronic device
CN115421396A (en) * 2022-09-29 2022-12-02 深圳康佳电子科技有限公司 Intelligent household equipment control method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US11302302B2 (en) Method, apparatus, device and storage medium for switching voice role
US10803869B2 (en) Voice enablement and disablement of speech processing functionality
CN112562670A (en) Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment
CN112201246B (en) Intelligent control method and device based on voice, electronic equipment and storage medium
CN105592343B (en) Display device and method for question and answer
US20160293168A1 (en) Method of setting personal wake-up word by text for voice control
JP5119055B2 (en) Multilingual voice recognition apparatus, system, voice switching method and program
KR102411619B1 (en) Electronic apparatus and the controlling method thereof
JP2017107078A (en) Voice interactive method, voice interactive device, and voice interactive program
JP2014191030A (en) Voice recognition terminal and voice recognition method using computer terminal
CN108882101B (en) Playing control method, device, equipment and storage medium of intelligent sound box
CN107844470B (en) Voice data processing method and equipment thereof
TW201440482A (en) Voice answering method and mobile terminal apparatus
US10540973B2 (en) Electronic device for performing operation corresponding to voice input
CN112420044A (en) Voice recognition method, voice recognition device and electronic equipment
CN113674742B (en) Man-machine interaction method, device, equipment and storage medium
CN112767916A (en) Voice interaction method, device, equipment, medium and product of intelligent voice equipment
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
KR20200057501A (en) ELECTRONIC APPARATUS AND WiFi CONNECTING METHOD THEREOF
CN113643684A (en) Speech synthesis method, speech synthesis device, electronic equipment and storage medium
CN110473524B (en) Method and device for constructing voice recognition system
CN112787899B (en) Equipment voice interaction method, computer readable storage medium and refrigerator
CN112002325B (en) Multi-language voice interaction method and device
CN114822598A (en) Server and speech emotion recognition method
CN114299941A (en) Voice interaction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination