CN112289312B - Speech instruction recognition method and device, electronic equipment and computer readable medium - Google Patents

Speech instruction recognition method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN112289312B
CN112289312B CN202010663954.6A CN202010663954A CN112289312B CN 112289312 B CN112289312 B CN 112289312B CN 202010663954 A CN202010663954 A CN 202010663954A CN 112289312 B CN112289312 B CN 112289312B
Authority
CN
China
Prior art keywords
voice
call
template
sub
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010663954.6A
Other languages
Chinese (zh)
Other versions
CN112289312A (en
Inventor
张智慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010663954.6A priority Critical patent/CN112289312B/en
Publication of CN112289312A publication Critical patent/CN112289312A/en
Application granted granted Critical
Publication of CN112289312B publication Critical patent/CN112289312B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure relates to a voice instruction recognition method, a voice instruction recognition device, electronic equipment and a computer readable medium, and belongs to the technical field of voice recognition. The method comprises the following steps: acquiring voice information of a user, and obtaining a corresponding voice call according to the voice information, wherein the voice call comprises one or more sub-intention calls; acquiring each single-intent template in a single-intent template set, matching the voice call with the single-intent template, and determining a sub-intent template corresponding to the sub-intent in the voice call; and identifying the sub-intention voice instruction corresponding to the sub-intention voice according to the sub-intention voice template, and obtaining a complete voice instruction corresponding to the voice information according to the sub-intention voice instruction. The method and the device can simultaneously recognize a plurality of voice instructions in the voice information of the user by matching the multi-intention voice utterances by using the single-intention utterances template without adding the multi-intention utterances template.

Description

Speech instruction recognition method and device, electronic equipment and computer readable medium
Technical Field
The present disclosure relates to the field of speech recognition technology, and in particular, to a speech instruction recognition method, a speech instruction recognition device, an electronic apparatus, and a computer readable medium.
Background
With the popularization of intelligent home equipment, users can issue corresponding control instructions to various equipment through voice, and the voice control operation brings great convenience to life.
However, in the existing voice command recognition method, single voice is limited to the recognition of single intention of a single device, two or more devices cannot be controlled simultaneously, or a plurality of commands are issued simultaneously to the same device, so that the voice recognition efficiency is low.
In view of the foregoing, there is a need in the art for a method that can recognize multiple control intentions in the same speech, improving the efficiency of speech recognition.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a voice command recognition method, a voice command recognition device, an electronic device, and a computer-readable medium, and thus to improve the efficiency of voice recognition at least to some extent.
According to a first aspect of the present disclosure, there is provided a method for recognizing a voice command, including:
acquiring voice information of a user, and obtaining a corresponding voice call according to the voice information, wherein the voice call comprises one or more sub-intention calls;
acquiring each single-intent template in a single-intent template set, matching the voice call with the single-intent template, and determining a sub-intent template corresponding to the sub-intent in the voice call;
and identifying the sub-intention voice instruction corresponding to the sub-intention voice according to the sub-intention voice template, and obtaining a complete voice instruction corresponding to the voice information according to the sub-intention voice instruction.
In an exemplary embodiment of the disclosure, the obtaining a corresponding voice call according to the voice information includes:
performing voice recognition on the voice information to obtain voice text information corresponding to the voice information;
and preprocessing the voice text information to obtain a voice speaking operation corresponding to the voice information.
In an exemplary embodiment of the present disclosure, the preprocessing the phonetic text information includes:
acquiring a punctuation mark library, sequentially comparing the phonetic text information with each punctuation mark in the punctuation mark library, and filtering the punctuation marks in the phonetic text information;
and obtaining a virtual word library, comparing the voice text information with each virtual word in the virtual word library in sequence, and filtering the virtual word in the voice text information.
In an exemplary embodiment of the disclosure, the matching the voice call with the single-call template, determining a sub-call template corresponding to a sub-call in the voice call, includes:
obtaining a to-be-matched conversation template corresponding to the single-purpose conversation template according to the single-purpose conversation template;
and matching the voice call with the single-purpose call template and the to-be-matched call template, and determining sub-purpose call templates corresponding to the sub-purpose call in the voice call.
In an exemplary embodiment of the present disclosure, the obtaining, according to the single-purpose voice template, a voice template to be matched corresponding to the single-purpose voice template includes:
and deleting the ending character of the single-intent template to obtain the to-be-matched template corresponding to the single-intent template.
In an exemplary embodiment of the present disclosure, the matching the voice call with the single-purpose call template and the to-be-matched call template, determining a sub-purpose call template corresponding to each sub-purpose call in the voice call, includes:
sequentially matching the voice call with all single-purpose call templates;
if the voice call is successfully matched with the single-purpose call template, the matched single-purpose call template is used as a sub-purpose call template corresponding to the sub-purpose call in the voice call, and the matching process of the voice call is ended;
if the matching of the voice call and the single-purpose call template fails, the voice call and all call templates to be matched are matched in sequence;
if the voice call is successfully matched with the call template to be matched, the matched call template to be matched is used as a sub-call template corresponding to the sub-call;
removing the sub-intention phone operation successfully matched with the phone operation template to be matched in the voice phone operation to obtain the rest sub-intention phone operation in the voice phone operation;
sequentially matching the remaining sub-intention operation with all single-intention operation templates again to determine sub-intention operation templates corresponding to the sub-intention operation in the remaining sub-intention operation;
and if the voice call is failed to be matched with the to-be-matched call template, determining the voice call as an invalid voice call.
In one exemplary embodiment of the present disclosure, before the acquiring each single-intent template in the set of single-intent templates, the method further includes:
determining an index vocabulary in the single-intent template;
and establishing different types of single-intention operation templates according to the index vocabulary, and obtaining the single-intention operation template set according to the different types of single-intention operation templates.
According to a second aspect of the present disclosure, there is provided a voice instruction recognition apparatus including:
the voice call acquisition module is used for acquiring voice information of a user and obtaining a corresponding voice call according to the voice information, wherein the voice call comprises one or more sub-intention calls;
the voice call template matching module is used for acquiring each single-intention call template in the single-intention call template set, matching the voice call with the single-intention call template and determining a sub-intention call template corresponding to the sub-intention call in the voice call;
the voice command recognition module is used for recognizing the sub-intention voice command corresponding to the sub-intention voice according to the sub-intention voice template and obtaining the complete voice command corresponding to the voice information according to the sub-intention voice command.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of recognizing a voice instruction as described in any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of recognizing a speech instruction according to any one of the above.
Exemplary embodiments of the present disclosure may have the following advantageous effects:
in the voice instruction recognition method of the exemplary embodiment of the present disclosure, by matching one or more sub-intent vocabularies in voice information using a single-intent vocabularies template, one or more voice instructions in the same piece of user voice information can be recognized at the same time without adding or expanding a multi-intent vocabularies template. By acquiring the voice information containing a plurality of instructions at one time, two or more devices can be controlled simultaneously, or a plurality of instructions can be issued simultaneously to the same device. By the voice command recognition method in the example embodiment of the disclosure, the efficiency of voice command recognition and the accuracy of recognition can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
FIG. 1 illustrates a flow diagram of a method of recognizing a voice command according to an example embodiment of the present disclosure;
FIG. 2 illustrates a flow diagram of deriving a corresponding phonetic transcription from phonetic information in accordance with an example embodiment of the present disclosure;
FIG. 3 shows a flow diagram of determining a sub-intent template in accordance with an example embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram of sub-intent template matching in accordance with an example embodiment of the present disclosure;
FIG. 5 illustrates a flow diagram of a method of recognition of dual intent speech instructions in one embodiment in accordance with the present disclosure;
FIG. 6 illustrates a block diagram of a speech instruction recognition device of an example embodiment of the present disclosure;
fig. 7 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The present exemplary embodiment first provides a method for recognizing a voice command. Referring to fig. 1, the method for recognizing a voice command may include the following steps:
s110, acquiring voice information of a user, and obtaining a corresponding voice conversation according to the voice information, wherein the voice conversation comprises one or more sub-intention conversations.
S120, obtaining each single-intention phone template in the single-intention phone template set, matching the voice phone with the single-intention phone template, and determining a sub-intention phone template corresponding to the sub-intention phone in the voice phone.
S130, identifying sub-intention voice instructions corresponding to the sub-intention voice operation according to the sub-intention voice operation template, and obtaining complete voice instructions corresponding to the voice information according to the sub-intention voice instructions.
The recognition method of the voice command in the present exemplary embodiment may be applied to some smart devices having a voice recognition function, for example, a smart speaker or a smart phone that may receive voice information and control other smart home devices. The intelligent home equipment can comprise common household appliances such as televisions, air conditioners and water heaters, the intelligent sound box can directly receive voice information of a user, and the intelligent mobile phone can receive the voice information of the user through a certain corresponding application software on the mobile phone, so that the intelligent home equipment is converted into a corresponding voice instruction and executed. For example, a user may name each smart home device on application software of the mobile phone, for example, name an air conditioner as an "air conditioner", name a bedroom television as a "bedroom television", and the like, and when the smart home devices are controlled by voice, only input voice information such as "open air conditioner", "open air conditioner heating mode", "bedroom television loud, and the like, so as to control the corresponding electrical appliance to execute the corresponding action.
The recognition method of the voice instruction in the disclosed example embodiment is realized on the basis of single-purpose voice operation recognition, and is mainly based on a template matching mode, namely, a regular expression type template is used as a single-purpose voice operation template, and the single-purpose voice operation template is directly used for realizing the accurate recognition of multi-purpose voice operation under the condition of not adding a new template.
In the voice instruction recognition method of the present exemplary embodiment, by matching one or more sub-utterances in voice information using a single-utterances template, one or more voice instructions in the same piece of user voice information can be recognized at the same time without adding or expanding a multi-utterances template. The user can control two or more devices simultaneously or issue a plurality of instructions simultaneously to the same device by inputting voice information containing a plurality of instructions at a time. By the voice command recognition method in the example embodiment of the disclosure, the efficiency of voice command recognition and the accuracy of recognition can be improved.
The above steps of the present exemplary embodiment will be described in more detail with reference to fig. 2 to 5.
In step S110, voice information of the user is obtained, and a corresponding voice call is obtained according to the voice information, wherein the voice call includes one or more sub-intention calls.
In this example embodiment, the voice call may be a single-purpose call or a multi-purpose call, where a single-purpose call refers to a voice call with only one instruction intention, such as "turn on an air conditioner" or the like; the multi-intention includes two or more sub-intention, for example, "turning on the air conditioner off the lamp" is the multi-intention, and "turning on the air conditioner" and "turning off the lamp" are two of the sub-intention, respectively.
The voice information of the user can be obtained through application software on the smart phone or smart equipment such as a smart sound box and the like capable of performing voice recognition. After obtaining the voice information of the user, as shown in fig. 2, a corresponding voice call is obtained according to the voice information, which specifically includes the following steps:
and S210, performing voice recognition on the voice information to obtain voice text information corresponding to the voice information.
After the voice information is acquired, firstly, the voice information of the user is converted into corresponding text information through the voice recognition function of the intelligent equipment.
S220, preprocessing the voice text information to obtain a voice speaking operation corresponding to the voice information.
In this example embodiment, since punctuation marks and imaginary terms do not have actual meanings in text expression, in order to reduce matching complexity, punctuation marks and imaginary terms in phonetic text information may be removed through a preprocessing process, and the specific method is as follows: acquiring a punctuation mark library, sequentially comparing the voice text information with each punctuation mark in the punctuation mark library, and filtering the punctuation marks in the voice text information; and obtaining a virtual word bank, sequentially comparing the voice text information with each virtual word in the virtual word bank, and filtering the virtual word in the voice text information.
The term "virtual word" refers to nonsensical words that are not related to the instruction, such as "please", "help me" and "woolen" words at the end of the sentence.
Punctuation marks contained in voice text information can be removed by establishing a punctuation mark library and comparing the voice text information, for example: the "turn on air conditioner, turn off lamp" becomes "turn on air conditioner turn off lamp" after the process. By establishing the virtual word library, the speech and text information can be subjected to virtual word filtering by traversing the virtual word library, for example, the step of 'helping me to turn on an air conditioner and turn off a lamp bar' becomes 'turning on the air conditioner and turn off the lamp' after filtering.
After the voice operation corresponding to the voice information is obtained through the step S110, the corresponding voice instruction is identified through a template matching method.
In step S120, each single-intent template in the single-intent template set is obtained, and the voice utterances are matched with the single-intent templates, so as to determine sub-intent templates corresponding to sub-utterances in the voice utterances.
In this example embodiment, various types of rich single-intent templates may be established by setting different types of index vocabularies, so as to obtain a single-intent template set, and the specific method is as follows: determining an index vocabulary in the single-intent template; and establishing different types of single-intention operation templates according to the index vocabulary, and obtaining a single-intention operation template set according to the different types of single-intention operation templates.
A single-intent template established through an index vocabulary is, for example, "- [ open ] - [ device ] -," wherein "$" symbols at the beginning "- [ symbol ] and the end" $ "of a template sentence represent a matching beginning symbol and an ending symbol respectively, and" open "and" device "are synonym set indexes, for example," open "may represent a synonym set of all open classes," open|open|start "and the like, and" device "represents a set of all device names, including air conditioner, television and the like, and the single-intent that the template can match is" open air conditioner "," open television "and the like.
In addition, there are templates that shut down device classes, such as: "- [ close ]," - [ device ], "- [ close ]," and the like can be matched with single-purpose voice conversations of closed equipment types such as "closed air conditioner", "closed air conditioner" and the like.
In addition, there are templates of types such as setting modes, adjustment parameters, etc., such as "[ device ] [ open ] [ mode value ] [ mode word ] $ ]," type templates, where "mode value" represents a synonym set of different mode types such as "cooling/heating/sleeping", "mode word" represents an equivalent term of "mode/function", and the single-purpose speech that the template can match is a speech of setting mode types such as "air-conditioning on cooling mode", "air-conditioning on heating function".
By establishing a rich single-intent template set, most situations of user instruction information can be contained, and some common voice instructions can find corresponding templates in the single-intent template set for matching.
After each single-intent template in the single-intent template set is obtained, the voice utterances can be matched according to the single-intent template. As shown in fig. 3, the matching of the voice call and the single-purpose call template, and determining the sub-purpose call template corresponding to the sub-purpose call in the voice call, may specifically include the following steps:
and S310, obtaining a to-be-matched conversation template corresponding to the single-purpose conversation template according to the single-purpose conversation template.
In this example embodiment, the to-be-matched session template corresponding to the single-session template may be obtained by deleting the ending symbol of the single-session template. The to-be-matched-session template may be used to match sub-session of the first half of the multi-session except for the last sub-session.
S320, matching the voice call with the single-purpose call template and the call template to be matched, and determining the sub-purpose call template corresponding to each sub-purpose call in the voice call.
As shown in fig. 4, in step S320, the matching process of the child intent template may specifically include the following steps:
step S410, matching the voice operation with all single-purpose operation templates in sequence.
In this example embodiment, since the voice call may be either a single call or a multiple call, the voice call is first matched with all single call templates in sequence, that is, all single call templates in the single call template set are traversed, and it is determined whether or not a template can be successfully matched with the voice call.
Step S420, if the voice call and the single-purpose call template are successfully matched, the matched single-purpose call template is used as a sub-purpose call template corresponding to the sub-purpose call in the voice call, and the matching process of the voice call is finished.
If the matching of the phonetic transcription and the single-intent transcription template is successful, the sub-intent transcription for matching is described as a part with an ending symbol, possibly the sub-intent transcription is the last part in the phonetic transcription, possibly the phonetic transcription is the single-intent transcription, and only one corresponding sub-intent transcription is provided. Therefore, the single-intention phone template successfully matched is directly used as the sub-intention phone template corresponding to the sub-intention phone, and the matching process of the voice phone is finished.
And S430, if the matching of the voice call and the single-purpose call template fails, matching the voice call and all call templates to be matched in sequence.
If the matching of the voice call and the single-purpose call template fails, the sub-purpose call which is matched is indicated to have no ending character, and the sub-purpose call of the first half part of the multi-purpose call is possible, so that the voice call is matched with all call templates to be matched, the ending character of which is removed.
S440, if the voice call is successfully matched with the call template to be matched, the matched call template to be matched is used as a sub-call template corresponding to the sub-call.
If the voice call is successfully matched with the call template to be matched, the matched call template to be matched is used as a sub-call template corresponding to the sub-call matched currently, and other sub-call which is not matched after the sub-call is still available.
S450, removing the sub-intention phone operation successfully matched with the template of the phone operation to be matched in the voice phone operation, and obtaining the rest sub-intention phone operation in the voice phone operation.
The sub-intention of the successful match is removed from the whole voice call and recorded, and the rest is the rest sub-intention of the voice call which is not matched.
Step S460, the remaining sub-intention operation is matched with all single-intention operation templates in sequence again, so that sub-intention operation templates corresponding to the sub-intention operation in the remaining sub-intention operation are determined.
The remaining sub-utterances that have not been matched in the utterances are matched again in the method from step S410 to step S450 until all sub-utterances in the utterances are matched.
And S470, if the voice call is failed to be matched with the voice call template to be matched, determining the voice call as an invalid voice call.
If the voice call is failed to be matched with the single-purpose call template and the to-be-matched call template, the voice call is invalid, and the corresponding template cannot be matched in the template set. In this case, the user may be prompted that the voice command entered is an invalid command and be prompted to resume voice entry.
In step S130, the sub-intent voice command corresponding to the sub-intent is identified according to the sub-intent template, and the complete voice command corresponding to the voice information is obtained according to the sub-intent voice command.
After determining the sub-intention voice operation templates corresponding to all sub-intention voice operations in the voice operation, the corresponding sub-intention voice instructions can be identified according to the sub-intention voice operation templates. And combining all the sub-intention voice instructions together to obtain a complete voice instruction corresponding to the voice information, and controlling the content in the corresponding equipment execution instruction according to the voice instruction.
A complete flow chart of double intent recognition in one embodiment of the present disclosure is shown in fig. 5, which is an illustration of the above steps in this example embodiment, and the specific steps of the flow chart are as follows:
step S510, user speaking input.
For example, the voice input by the user is "help me turn on the air conditioner, turn off the light".
Step S520, pretreatment of speech surgery.
Punctuation marks and works in the user voice operation are removed by acquiring a punctuation mark library 501 and a works library 502. After pretreatment, the "turn on the air conditioner off lamp" is obtained.
S530, constructing a single intention template.
A single intent template set 503 is created for parsing the user's input speech utterances.
S540, removing the ending character matching conversation of the single-purpose template.
Firstly, removing the ending symbol "$" of the tail part of all the single-purpose templates to perform template matching, and then the air conditioner can be turned on by the first half part of the template "+[ open ], [ device ]," "after removing the ending symbol" $"," [ open ], "matching to the first half part of the speech operation".
Step S550, matching the rest of the speech by using a complete single-purpose template.
Then, the ending symbol "$" of all single intent templates is added, and the templates are used to match the remaining conversation part, at which time the remaining conversation "turn off" can be matched to by the template "+[ close ] [ device ]. So far, a double-intention user speaking is accurately identified as two intentions, namely turning on the air conditioner and turning off the lamp.
And S560, outputting a matching result.
And outputting a matching result, controlling the content in the corresponding equipment execution instruction, turning on the air conditioner, and turning off the lamp.
It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
Further, the disclosure also provides a voice instruction recognition device. Referring to fig. 6, the voice command recognition apparatus may include a voice call acquisition module 610, a call template matching module 620, and a voice command recognition module 630. Wherein:
the voice call acquisition module 610 may be configured to acquire voice information of a user and obtain a corresponding voice call according to the voice information, where the voice call includes one or more sub-intention calls;
the voice call template matching module 620 may be configured to obtain each single-call template in the single-call template set, match a voice call with the single-call template, and determine a sub-call template corresponding to a sub-call in the voice call;
the voice command recognition module 630 may be configured to recognize a sub-intention voice command corresponding to the sub-intention voice according to the sub-intention voice template, and obtain a complete voice command corresponding to the voice information according to the sub-intention voice command.
In some exemplary embodiments of the present disclosure, the voice call acquisition module 610 may include a voice text recognition unit and a voice call preprocessing unit. Wherein:
the voice character recognition unit can be used for carrying out voice recognition on the voice information to obtain voice character information corresponding to the voice information;
the voice speaking preprocessing unit can be used for preprocessing voice text information to obtain voice speaking corresponding to the voice information.
In some exemplary embodiments of the present disclosure, the phonetic transcription preprocessing unit may include a punctuation filtering unit and an imaginary filtering unit. Wherein:
the punctuation mark filtering unit can be used for obtaining a punctuation mark library, comparing the phonetic text information with each punctuation mark in the punctuation mark library in sequence and filtering the punctuation marks in the phonetic text information;
the virtual word filtering unit can be used for obtaining a virtual word stock, comparing the voice text information with each virtual word in the virtual word stock in sequence, and filtering the virtual word in the voice text information.
In some example embodiments of the present disclosure, the conversation template matching module 620 may include a conversation template to be matched determination unit and a sub-conversation template matching unit. Wherein:
the to-be-matched conversation template determining unit can be used for obtaining a to-be-matched conversation template corresponding to the single-purpose conversation template according to the single-purpose conversation template;
the sub-intention phone template matching unit can be used for matching the voice phone with the single-intention phone template and the to-be-matched phone templates and determining sub-intention phone templates corresponding to the sub-intention phones in the voice phone.
In some exemplary embodiments of the present disclosure, the to-be-matched speech template determining unit may include an ending symbol deleting unit, which may be configured to delete an ending symbol of the single-purpose speech template, to obtain the to-be-matched speech template corresponding to the single-purpose speech template.
In some exemplary embodiments of the present disclosure, the sub-intent template matching unit may include a single-intent template matching unit, a first sub-intent template determining unit, a to-be-matched-speech template matching unit, a second sub-intent template determining unit, a match success sub-intent removing unit, a remaining sub-intent matching unit, and an invalid speech determining unit. Wherein:
the single-intent template matching unit can be used for matching the voice call with all single-intent templates in sequence;
the first sub-intention phone template determining unit may be configured to, if the voice phone is successfully matched with the single-intention phone template, take the matched single-intention phone template as a sub-intention phone template corresponding to the sub-intention phone in the voice phone, and end a matching process of the voice phone;
the to-be-matched voice operation template matching unit can be used for matching the voice operation with all to-be-matched voice operation templates in sequence if the voice operation and the single-purpose voice operation template are failed to match;
the second sub-intention phone template determining unit may be configured to, if the voice phone is successfully matched with the to-be-matched phone template, take the matched to-be-matched phone template as a sub-intention phone template corresponding to the sub-intention phone;
the successful match sub-intention removing unit can be used for removing sub-intention dialects successfully matched with the template of the to-be-matched dialects in the voice dialects to obtain the rest sub-intention dialects in the voice dialects;
the remaining sub-intention matching unit may be configured to match the remaining sub-intention to all the single-intention templates in sequence again, so as to determine a sub-intention template corresponding to the sub-intention in the remaining sub-intention;
the invalid speech utterances determining unit may be configured to determine the speech utterances as the invalid speech utterances if the speech utterances fail to match the template of the speech utterances to be matched.
In some exemplary embodiments of the present disclosure, a voice instruction recognition apparatus provided by the present disclosure may further include an index vocabulary determination module and a speech template set creation module. Wherein:
the index vocabulary determination module may be configured to determine an index vocabulary in the single-intent template;
the conversation template set establishing module can be used for establishing different types of single-intention conversation templates according to the index vocabulary and obtaining a single-intention conversation template set according to the different types of single-intention conversation templates.
The specific details of each module/unit in the voice command recognition device are described in detail in the corresponding method embodiment section, and are not repeated here.
Fig. 7 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
It should be noted that, the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the system operation are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present invention, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. When executed by a Central Processing Unit (CPU) 701, performs the various functions defined in the system of the present application.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below.
It should be noted that although in the above detailed description several modules of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (7)

1. A method for recognizing a voice command, comprising:
acquiring voice information of a user, and obtaining a corresponding voice call according to the voice information, wherein the voice call comprises one or more sub-intention calls;
acquiring each single-intent template in a single-intent template set, matching the voice call with the single-intent template, and determining a sub-intent template corresponding to the sub-intent in the voice call;
identifying a sub-intention voice command corresponding to the sub-intention voice according to the sub-intention voice template, and obtaining a complete voice command corresponding to the voice information according to the sub-intention voice command;
wherein the matching the voice call with the single-purpose call template, determining a sub-purpose call template corresponding to the sub-purpose call in the voice call, includes:
deleting the ending character of the single-intent operation template to obtain a to-be-matched operation template corresponding to the single-intent operation template;
sequentially matching the voice call with all single-purpose call templates;
if the voice call is successfully matched with the single-purpose call template, the matched single-purpose call template is used as a sub-purpose call template corresponding to the sub-purpose call in the voice call, and the matching process of the voice call is ended;
if the matching of the voice call and the single-purpose call template fails, the voice call and all call templates to be matched are matched in sequence;
if the voice call is successfully matched with the call template to be matched, the matched call template to be matched is used as a sub-call template corresponding to the sub-call;
removing the sub-intention phone operation successfully matched with the phone operation template to be matched in the voice phone operation to obtain the rest sub-intention phone operation in the voice phone operation;
sequentially matching the remaining sub-intention operation with all single-intention operation templates again to determine sub-intention operation templates corresponding to the sub-intention operation in the remaining sub-intention operation;
and if the voice call is failed to be matched with the to-be-matched call template, determining the voice call as an invalid voice call.
2. The method for recognizing a voice command according to claim 1, wherein the obtaining a corresponding voice call from the voice information comprises:
performing voice recognition on the voice information to obtain voice text information corresponding to the voice information;
and preprocessing the voice text information to obtain a voice speaking operation corresponding to the voice information.
3. The method for recognizing a voice command according to claim 2, wherein the preprocessing the voice text information comprises:
acquiring a punctuation mark library, sequentially comparing the phonetic text information with each punctuation mark in the punctuation mark library, and filtering the punctuation marks in the phonetic text information;
and obtaining a virtual word library, comparing the voice text information with each virtual word in the virtual word library in sequence, and filtering the virtual word in the voice text information.
4. The method of claim 1, wherein prior to the obtaining each single-intent template in the set of single-intent templates, the method further comprises:
determining an index vocabulary in the single-intent template;
and establishing different types of single-intention operation templates according to the index vocabulary, and obtaining the single-intention operation template set according to the different types of single-intention operation templates.
5. A voice command recognition apparatus, comprising:
the voice call acquisition module is used for acquiring voice information of a user and obtaining a corresponding voice call according to the voice information, wherein the voice call comprises one or more sub-intention calls;
the voice call template matching module is used for acquiring each single-intention call template in the single-intention call template set, matching the voice call with the single-intention call template and determining a sub-intention call template corresponding to the sub-intention call in the voice call;
the voice command recognition module is used for recognizing the sub-intention voice command corresponding to the sub-intention voice according to the sub-intention voice template and obtaining a complete voice command corresponding to the voice information according to the sub-intention voice command;
wherein the matching the voice call with the single-purpose call template, determining a sub-purpose call template corresponding to the sub-purpose call in the voice call, includes:
deleting the ending character of the single-intent operation template to obtain a to-be-matched operation template corresponding to the single-intent operation template;
sequentially matching the voice call with all single-purpose call templates;
if the voice call is successfully matched with the single-purpose call template, the matched single-purpose call template is used as a sub-purpose call template corresponding to the sub-purpose call in the voice call, and the matching process of the voice call is ended;
if the matching of the voice call and the single-purpose call template fails, the voice call and all call templates to be matched are matched in sequence;
if the voice call is successfully matched with the call template to be matched, the matched call template to be matched is used as a sub-call template corresponding to the sub-call;
removing the sub-intention phone operation successfully matched with the phone operation template to be matched in the voice phone operation to obtain the rest sub-intention phone operation in the voice phone operation;
sequentially matching the remaining sub-intention operation with all single-intention operation templates again to determine sub-intention operation templates corresponding to the sub-intention operation in the remaining sub-intention operation;
and if the voice call is failed to be matched with the to-be-matched call template, determining the voice call as an invalid voice call.
6. An electronic device, comprising:
a processor; and
a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of recognizing speech instructions according to any one of claims 1 to 4.
7. A computer readable medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements the method of recognition of a speech instruction according to any one of claims 1 to 4.
CN202010663954.6A 2020-07-10 2020-07-10 Speech instruction recognition method and device, electronic equipment and computer readable medium Active CN112289312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010663954.6A CN112289312B (en) 2020-07-10 2020-07-10 Speech instruction recognition method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010663954.6A CN112289312B (en) 2020-07-10 2020-07-10 Speech instruction recognition method and device, electronic equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN112289312A CN112289312A (en) 2021-01-29
CN112289312B true CN112289312B (en) 2024-04-05

Family

ID=74419686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010663954.6A Active CN112289312B (en) 2020-07-10 2020-07-10 Speech instruction recognition method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN112289312B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105074816A (en) * 2013-02-25 2015-11-18 微软公司 Facilitating development of a spoken natural language interface
CN109388700A (en) * 2018-10-26 2019-02-26 广东小天才科技有限公司 A kind of intension recognizing method and system
CN109859752A (en) * 2019-01-02 2019-06-07 珠海格力电器股份有限公司 A kind of sound control method, device, storage medium and voice joint control system
CN110704641A (en) * 2019-10-11 2020-01-17 零犀(北京)科技有限公司 Ten-thousand-level intention classification method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309254A (en) * 2018-03-01 2019-10-08 富泰华工业(深圳)有限公司 Intelligent robot and man-machine interaction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105074816A (en) * 2013-02-25 2015-11-18 微软公司 Facilitating development of a spoken natural language interface
CN109388700A (en) * 2018-10-26 2019-02-26 广东小天才科技有限公司 A kind of intension recognizing method and system
CN109859752A (en) * 2019-01-02 2019-06-07 珠海格力电器股份有限公司 A kind of sound control method, device, storage medium and voice joint control system
CN110704641A (en) * 2019-10-11 2020-01-17 零犀(北京)科技有限公司 Ten-thousand-level intention classification method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112289312A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
US10489112B1 (en) Method for user training of information dialogue system
CN108520743B (en) Voice control method of intelligent device, intelligent device and computer readable medium
CN111402861B (en) Voice recognition method, device, equipment and storage medium
CN110047481B (en) Method and apparatus for speech recognition
CN110164435A (en) Audio recognition method, device, equipment and computer readable storage medium
KR20180025121A (en) Method and apparatus for inputting information
KR20160015218A (en) On-line voice translation method and device
WO2015014122A1 (en) Voice interaction method and system and interaction terminal
KR20070090642A (en) Apparatus for providing voice dialogue service and method for operating the apparatus
JP2003308087A (en) System and method for updating grammar
WO2020233363A1 (en) Speech recognition method and device, electronic apparatus, and storage medium
CN109036406A (en) A kind of processing method of voice messaging, device, equipment and storage medium
JP2021140134A (en) Method, device, electronic apparatus, computer readable storage medium, and computer program for recognizing speech
CN111399629B (en) Operation guiding method of terminal equipment, terminal equipment and storage medium
CN112669842A (en) Man-machine conversation control method, device, computer equipment and storage medium
CN114330371A (en) Session intention identification method and device based on prompt learning and electronic equipment
CN112562670A (en) Intelligent voice recognition method, intelligent voice recognition device and intelligent equipment
CN110211576B (en) Voice recognition method, device and system
CN112163084B (en) Problem feedback method, device, medium and electronic equipment
CN113012683A (en) Speech recognition method and device, equipment and computer readable storage medium
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
CN111508481B (en) Training method and device of voice awakening model, electronic equipment and storage medium
CN112289312B (en) Speech instruction recognition method and device, electronic equipment and computer readable medium
CN111161735A (en) Voice editing method and device
CN112002325B (en) Multi-language voice interaction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant