CN111179903A - Voice recognition method and device, storage medium and electric appliance - Google Patents

Voice recognition method and device, storage medium and electric appliance Download PDF

Info

Publication number
CN111179903A
CN111179903A CN201911395017.0A CN201911395017A CN111179903A CN 111179903 A CN111179903 A CN 111179903A CN 201911395017 A CN201911395017 A CN 201911395017A CN 111179903 A CN111179903 A CN 111179903A
Authority
CN
China
Prior art keywords
emotion
information
voice
recognition
dialect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911395017.0A
Other languages
Chinese (zh)
Inventor
高宏
毛跃辉
王慧君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Gree Green Refrigeration Technology Center Co Ltd of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Gree Green Refrigeration Technology Center Co Ltd of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Gree Green Refrigeration Technology Center Co Ltd of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201911395017.0A priority Critical patent/CN111179903A/en
Publication of CN111179903A publication Critical patent/CN111179903A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a voice recognition method, a voice recognition device, a storage medium and an electric appliance, wherein the method comprises the following steps: obtaining effective voice information of a current user; carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect; if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user; semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information; and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting. The scheme provided by the invention can identify the speech sound of the opposite party, and improves and controls the naturalness of the air-conditioning interactive voice by combining the detection result based on emotion identification.

Description

Voice recognition method and device, storage medium and electric appliance
Technical Field
The invention relates to the field of control, in particular to a voice recognition method, a voice recognition device, a storage medium and an electric appliance.
Background
At present, there is an intelligent home control method based on emotion recognition, which controls an intelligent device in a home by means of a recognition result to change an environmental condition. However, the existing speech emotion recognition method is generally limited to recognition of mandarin, and the user experience of mandarin is poor.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned drawbacks of the prior art, and provides a speech recognition method, a speech recognition device, a storage medium, and an electrical appliance, so as to solve the problem that the speech emotion recognition method in the prior art is limited to recognition of mandarin and has a low dialect recognition rate.
One aspect of the present invention provides a speech recognition method, including: obtaining effective voice information of a current user; carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect; if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user; semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information; and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.
Optionally, performing emotion feature recognition on the valid voice information to obtain emotion information of the current user, including: carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result; performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result; and obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.
Optionally, performing speech emotion recognition on the valid speech information to obtain a corresponding speech emotion result, including: carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result; and/or performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, wherein the semantic emotion result comprises: the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
Optionally, the method further comprises: if the effective voice information contains control instruction information, controlling the electrical appliance to execute corresponding operation according to the control instruction information; and/or if the obtained emotion information of the current user comprises preset emotion information, carrying out conversation with the current user through a preset conversation corresponding to the preset emotion information.
Another aspect of the present invention provides a speech recognition apparatus, including: the voice acquisition unit is used for acquiring effective voice information of a current user; the dialect identification unit is used for carrying out dialect identification on the effective voice information so as to determine whether the effective voice information is dialect; the emotion recognition unit is used for carrying out emotion feature recognition on the effective voice information to obtain emotion information of the current user if the dialect recognition unit determines that the effective voice information is the dialect; the semantic understanding unit is used for carrying out semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information; the voice synthesis unit is used for carrying out dialect voice synthesis on the response text information by combining the emotion information of the current user to obtain corresponding dialect response voice; and the voice broadcasting unit is used for broadcasting the obtained dialect response voice.
Optionally, the emotion recognition unit includes: the voice emotion recognition subunit is used for carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result; the semantic emotion recognition subunit is used for performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result; and the emotion information obtaining subunit is used for obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.
Optionally, the speech emotion recognition subunit performs speech emotion recognition on the effective speech information to obtain a corresponding speech emotion result, where the speech emotion recognition subunit includes: carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result; and/or the semantic emotion recognition subunit performs text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, and the semantic emotion result comprises: the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
Optionally, the method further comprises: the control unit is used for controlling the electrical appliance to execute corresponding operation according to the control instruction information if the effective voice information contains the control instruction information; and/or the interactive dialogue unit is used for carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user comprises the preset emotion information.
A further aspect of the invention provides a storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of any of the methods described above.
Yet another aspect of the present invention provides an appliance comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods described above when executing the program. The electrical appliance comprises an air conditioner.
In another aspect, the invention provides an electrical appliance comprising a speech recognition device according to any of the preceding claims. The electrical appliance comprises an air conditioner.
According to the technical scheme of the invention, the speech sound of the opposite party can be identified, and the naturalness of the air-conditioning interactive voice is promoted and controlled by combining the detection result based on emotion identification. The dialect speech recognition technology is adopted, the universality of speech recognition can be improved, the fluency and the intelligibility of synthesized dialect speech are increased by combining emotion recognition, and the emotion persuasion can be performed on a user if necessary, so that better experience is brought.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an embodiment of a step of performing emotion feature recognition on the valid speech information to obtain emotion information of a current user;
FIG. 3 is a schematic diagram of a speech recognition method according to another embodiment of the present invention;
FIG. 4 is a method diagram of another embodiment of a speech recognition method provided by the present invention;
FIG. 5 is a method diagram of one embodiment of a speech device method provided by the present invention;
FIG. 6 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention;
FIG. 7 is a block diagram of a specific implementation of an emotion recognition unit according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of another embodiment of a speech recognition apparatus provided in the present invention;
fig. 9 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a voice recognition method. The voice recognition method can be particularly used in an electric appliance and used for realizing voice control on the electric appliance. For example, may be used in an air conditioner.
Fig. 1 is a schematic method diagram of an embodiment of a speech recognition method provided by the present invention.
As shown in fig. 1, according to an embodiment of the present invention, the voice recognition method at least includes step S110, step S120, step S130, step S140 and step S150.
Step S110, obtaining the valid voice information of the current user.
For example, the voice information sent by the current user may be acquired through a microphone, and the voice sent by the user is subjected to front-end processing such as filtering and noise reduction to obtain pure voice information, which is the valid voice information.
Step S120, performing dialect recognition on the valid voice information to determine whether the valid voice information is a dialect.
Specifically, the effective speech information is analyzed by average energy, zero-crossing number or frequency spectrum, cepstrum, and linear prediction coefficient to obtain corresponding corpus information. And searching the acquired corresponding expectation information in the dialect corpus to determine whether the information is dialect.
Step S130, if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user.
Fig. 2 is a flowchart illustrating a specific embodiment of a step of performing emotion feature recognition on the valid speech information to obtain emotion information of a current user. As shown in fig. 2, step S130 specifically includes step S131, step S132, and step S133.
And S131, carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result.
Specifically, emotion recognition is carried out on the effective voice information through a preset emotion corpus, and a corresponding voice emotion result is obtained. And extracting and analyzing the emotional characteristics of the effective voice information, and identifying by an emotion identification engine by depending on a preset emotion database to obtain a corresponding voice emotion result.
Step S132, text semantic recognition is carried out on the effective voice information to obtain a corresponding semantic emotion result.
Specifically, the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
For example, if the valid speech information is recognized as cantonese, the valid speech information is recognized through a cantonese corpus and converted into corresponding text information. The expected data in the corpus are all subjected to text labeling, namely dialects corresponding to one word, one word and one sentence are all in one-to-one correspondence. After the effective voice information is converted into corresponding text information, the text information can be segmented, and the segmented words are subjected to part-of-speech (positive, neutral and depreciation) probability analysis, so that a corresponding semantic emotion result is obtained.
And step S133, obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.
Specifically, after obtaining the speech emotion result and the semantic emotion result, the emotion and state of the current user are evaluated by combining the speech emotion result and the semantic emotion result, so as to obtain emotional state information of the current user, such as anger, irritability, fear, surprise, joy, self-confidence, neutrality, sadness, tiredness, and the like.
Step S140, semantic understanding is carried out on the text recognition result of the effective voice information, and corresponding response text information is obtained.
Specifically, the effective speech information is recognized and converted into corresponding text information through a corresponding dialect corpus, and the converted text information is subjected to speech understanding to obtain corresponding response text information of the response user. And sending the converted text information into a natural voice processing engine for semantic understanding, and obtaining response text information responding to the user according to the semantic information obtained by understanding.
And S150, combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.
Specifically, the tone, intonation, etc. of the synthesized dialect response speech are determined according to the emotion of the current user. For example, the response text information is sent to a speech synthesis unit, TTS synthesis is performed through a preset speech library by combining the emotion recognition result, and a corresponding dialect speech is synthesized. For example, if the user's voice is cantonese, the user synthesizes the voice into a response voice of cantonese, and if the user's emotion is tired, the user synthesizes the voice into a response voice with a cheerful tone and a gentle voice, and broadcasts the response voice. The intelligibility and naturalness of the synthesized dialect speech can be improved by the way of synthesizing the speech.
Fig. 3 is a schematic method diagram of another embodiment of the speech recognition method provided by the present invention.
As shown in fig. 3, according to another embodiment of the present invention, the speech recognition method further includes step S160.
Step S160, if the valid voice information includes control instruction information, controlling the electrical appliance to execute corresponding operations according to the control instruction information.
Specifically, the voice recognition method can be used in an electric appliance and can be used for voice control of the electric appliance by a user. For example, if the valid voice information is recognized to have control instruction information for controlling the air conditioner to be turned on, the air conditioner is controlled to be turned on according to the control instruction.
Fig. 4 is a schematic method diagram of a speech recognition method according to another embodiment of the present invention.
As shown in fig. 4, according to still another embodiment of the present invention, the voice recognition method further includes step S170.
Step S170, if the obtained emotion information of the current user includes preset emotion information, performing a dialogue with the current user through a preset dialogue corresponding to the preset emotion information.
For example, the preset emotional information includes sadness, irritability, joy, and the like. For example, if the current user is stressed in negative emotions, such as sadness, anger and the like, the current user is conversed with the current user through a preset dialect corresponding to sadness and anger, namely, the current user enters a chat mode, a question and answer is initiated by adopting a fixed dialect, and voice interaction is performed with the user, so that the context understanding is performed by combining the voice of the user, and the response is made, so that the psychological persuasion of the user is realized.
For clearly explaining the technical solution of the present invention, the following describes an execution flow of the speech recognition method provided by the present invention with a specific embodiment.
Fig. 5 is a method diagram of a speech device method according to an embodiment of the present invention.
As shown in fig. 5, the speech uttered by the current user is acquired through a microphone, and after front-end processing such as filtering and denoising, effective speech information is obtained, dialect feature extraction is performed and dialect recognition is performed, if an existing dialect corpus can be matched with a corresponding dialect, the next step is performed, otherwise, speech data is discarded, and the flow is restarted. And if the dialect is confirmed, namely the dialect passes the authentication, extracting and analyzing the emotional characteristics of the effective voice information, and identifying the voice to be detected by using an emotion recognition engine by depending on an emotion database and outputting a corresponding voice emotion result. And converting effective voice information into corresponding text information by depending on the corresponding dialect corpus expected data, and then carrying out text semantic emotion recognition and semantic emotion results. Based on the voice emotion result and the voice emotion result, the current emotional mental state of the user is evaluated, for example: anger, irritability, fear, surprise, joy, confidence, neutrality, sadness, tiredness, etc. And sending the effective voice information into a natural language processing engine for semantic understanding to obtain corresponding response text information, carrying out TTS (text to speech) synthesis on the obtained response text information through a voice library by combining the emotion recognition result to obtain a synthesized answer dialect response voice, and broadcasting the synthesized answer dialect response voice, namely responding to the voice of the user.
The invention also provides a voice recognition device. The voice recognition device can be particularly used in an electric appliance and used for realizing voice control on the electric appliance. For example, may be used in an air conditioner.
Fig. 6 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention. As shown in fig. 6, the voice recognition apparatus 100 includes a voice acquisition unit 110, a dialect recognition unit 120, an emotion recognition unit 130, a semantic understanding unit 140, a voice synthesis unit 150, and a voice announcement unit 160.
The voice acquiring unit 110 is configured to acquire valid voice information of a current user; the dialect recognition unit 120 is configured to perform dialect recognition on the valid voice information to determine whether the valid voice information is dialect; the emotion recognition unit 130 is configured to, if the dialect recognition unit determines that the valid voice information is a dialect, perform emotion feature recognition on the valid voice information to obtain emotion information of the current user; the semantic understanding unit 140 is configured to perform semantic understanding on the text recognition result of the valid voice information to obtain corresponding response text information; the voice synthesis unit 150 is configured to perform dialect voice synthesis on the response text information in combination with the emotion information of the current user to obtain a corresponding dialect response voice; the voice broadcasting unit 160 is configured to broadcast the obtained dialect response voice.
The voice acquiring unit 110 acquires valid voice information of the current user. For example, the voice information sent by the current user may be acquired through a microphone, and the voice sent by the user is subjected to front-end processing such as filtering and noise reduction to obtain pure voice information, which is the valid voice information.
The dialect recognition unit 120 performs dialect recognition on the valid voice information to determine whether the valid voice information is dialect. Specifically, the effective speech information is analyzed by average energy, zero-crossing number or frequency spectrum, cepstrum, and linear prediction coefficient to obtain corresponding corpus information. And searching the acquired corresponding expectation information in the dialect corpus to determine whether the information is dialect.
If the dialect recognition unit 120 determines that the valid speech information is a dialect, the emotion recognition unit 130 performs emotion feature recognition on the valid speech information to obtain emotion information of the current user.
FIG. 7 is a block diagram of a specific implementation of an emotion recognition unit according to an embodiment of the present invention. As shown in FIG. 7, emotion recognition section 130 includes speech emotion recognition subunit 131, semantic emotion recognition subunit 132, and emotion information derivation subunit 133.
The speech emotion recognition subunit 131 is configured to perform speech emotion recognition on the effective speech information to obtain a corresponding speech emotion result.
Specifically, emotion recognition is carried out on the effective voice information through a preset emotion corpus, and a corresponding voice emotion result is obtained. And extracting and analyzing the emotional characteristics of the effective voice information, and identifying by an emotion identification engine by depending on a preset emotion database to obtain a corresponding voice emotion result.
The semantic emotion recognition subunit 132 is configured to perform text semantic recognition on the effective speech information to obtain a corresponding semantic emotion result.
Specifically, the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
For example, if the valid speech information is recognized as cantonese, the valid speech information is recognized through a cantonese corpus and converted into corresponding text information. The expected data in the corpus are all subjected to text labeling, namely dialects corresponding to one word, one word and one sentence are all in one-to-one correspondence. After the effective voice information is converted into corresponding text information, the text information can be segmented, and the segmented words are subjected to part-of-speech (positive, neutral and depreciation) probability analysis, so that a corresponding semantic emotion result is obtained.
The emotion information obtaining subunit 133 is configured to obtain emotion information of the current user according to the speech emotion result and the semantic emotion result.
Specifically, after obtaining the speech emotion result and the semantic emotion result, the emotion and state of the current user are evaluated by combining the speech emotion result and the semantic emotion result, so as to obtain emotional state information of the current user, such as anger, irritability, fear, surprise, joy, self-confidence, neutrality, sadness, tiredness, and the like.
The semantic understanding unit 140 performs semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information.
Specifically, the effective speech information is recognized and converted into corresponding text information through a corresponding dialect corpus, and the converted text information is subjected to speech understanding to obtain corresponding response text information of the response user. And sending the converted text information into a natural voice processing engine for semantic understanding, and obtaining response text information responding to the user according to the semantic information obtained by understanding.
The voice synthesis unit 150 performs dialect voice synthesis on the response text information in combination with the emotion information of the current user to obtain corresponding dialect response voice, and the voice broadcast unit 160 broadcasts the obtained dialect response voice.
Specifically, the tone, intonation, etc. of the synthesized dialect response speech are determined according to the emotion of the current user. For example, the response text information is sent to a speech synthesis unit, TTS synthesis is performed through a preset speech library by combining the emotion recognition result, and a corresponding dialect speech is synthesized. For example, if the user's voice is cantonese, the user synthesizes the voice into a response voice of cantonese, and if the user's emotion is tired, the user synthesizes the voice into a response voice with a cheerful tone and a gentle voice, and broadcasts the response voice. The intelligibility and naturalness of the synthesized dialect speech can be improved by the way of synthesizing the speech.
Fig. 8 is a schematic structural diagram of another embodiment of a speech recognition apparatus provided in the present invention. As shown in fig. 8, the speech recognition apparatus 100 further includes a control unit 170.
The control unit 170 is configured to control the electrical appliance to execute a corresponding operation according to the control instruction information if the valid voice information includes the control instruction information.
Specifically, the voice recognition method can be used in an electric appliance and can be used for voice control of the electric appliance by a user. For example, if the valid voice information is recognized to have control instruction information for controlling the air conditioner to be turned on, the air conditioner is controlled to be turned on according to the control instruction.
Fig. 9 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention. As shown in fig. 9, the speech recognition apparatus 100 further includes an interactive dialog unit 180.
The interactive dialogue unit 180 is configured to perform a dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user includes the preset emotion information.
For example, the preset emotional information includes sadness, irritability, joy, and the like. For example, if the current user is stressed in negative emotions, such as sadness, anger and the like, the current user is conversed with the current user through a preset dialect corresponding to sadness and anger, namely, the current user enters a chat mode, a question and answer is initiated by adopting a fixed dialect, and voice interaction is performed with the user, so that the context understanding is performed by combining the voice of the user, and the response is made, so that the psychological persuasion of the user is realized.
The invention also provides a storage medium corresponding to the speech recognition method, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.
The invention also provides an electric appliance corresponding to the voice recognition method, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the methods. The electrical appliance may be an air conditioner.
The invention also provides an electric appliance corresponding to the voice recognition device, which comprises the voice recognition device. The electrical appliance may be an air conditioner.
Therefore, the scheme provided by the invention can identify the speech sound of the opposite side and improve and control the naturalness of the air-conditioning interactive voice by combining the detection result based on emotion identification. The dialect speech recognition technology is adopted, the universality of speech recognition can be improved, the fluency and the intelligibility of synthesized dialect speech are increased by combining emotion recognition, and the emotion persuasion can be performed on a user if necessary, so that better experience is brought.
The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope and spirit of the invention and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hardwired, or a combination of any of these. In addition, each functional unit may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and the parts serving as the control device may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A speech recognition method, comprising:
obtaining effective voice information of a current user;
carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect;
if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user;
semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information;
and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.
2. The method of claim 1,
performing emotion feature recognition on the effective voice information to obtain emotion information of the current user, wherein the emotion feature recognition comprises the following steps:
carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result;
performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result;
and obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.
3. The method of claim 2,
carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result, wherein the voice emotion result comprises the following steps:
carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result;
and/or the presence of a gas in the gas,
performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, wherein the semantic emotion result comprises the following steps:
the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus;
and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
4. The method according to any one of claims 1-3, further comprising:
if the effective voice information contains control instruction information, controlling the electrical appliance to execute corresponding operation according to the control instruction information;
and/or the presence of a gas in the gas,
and if the obtained emotion information of the current user comprises preset emotion information, carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information.
5. A speech recognition apparatus, comprising:
the voice acquisition unit is used for acquiring effective voice information of a current user;
the dialect identification unit is used for carrying out dialect identification on the effective voice information so as to determine whether the effective voice information is dialect;
the emotion recognition unit is used for carrying out emotion feature recognition on the effective voice information to obtain emotion information of the current user if the dialect recognition unit determines that the effective voice information is the dialect;
the semantic understanding unit is used for carrying out semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information;
the voice synthesis unit is used for carrying out dialect voice synthesis on the response text information by combining the emotion information of the current user to obtain corresponding dialect response voice;
and the voice broadcasting unit is used for broadcasting the obtained dialect response voice.
6. The apparatus of claim 5, wherein the emotion recognition unit comprises:
the voice emotion recognition subunit is used for carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result;
the semantic emotion recognition subunit is used for performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result;
and the emotion information obtaining subunit is used for obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.
7. The apparatus of claim 6,
the voice emotion recognition subunit performs voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result, and the voice emotion recognition subunit includes:
carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result;
and/or the presence of a gas in the gas,
the semantic emotion recognition subunit performs text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, and the semantic emotion recognition subunit includes:
the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus;
and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.
8. The apparatus of any one of claims 5-7, further comprising:
the control unit is used for controlling the electrical appliance to execute corresponding operation according to the control instruction information if the effective voice information contains the control instruction information;
and/or the presence of a gas in the gas,
and the interactive dialogue unit is used for carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user comprises the preset emotion information.
9. A storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
10. An electrical appliance comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 4 when executing the program or comprising the speech recognition arrangement of any one of claims 5 to 8.
CN201911395017.0A 2019-12-30 2019-12-30 Voice recognition method and device, storage medium and electric appliance Pending CN111179903A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395017.0A CN111179903A (en) 2019-12-30 2019-12-30 Voice recognition method and device, storage medium and electric appliance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395017.0A CN111179903A (en) 2019-12-30 2019-12-30 Voice recognition method and device, storage medium and electric appliance

Publications (1)

Publication Number Publication Date
CN111179903A true CN111179903A (en) 2020-05-19

Family

ID=70655913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395017.0A Pending CN111179903A (en) 2019-12-30 2019-12-30 Voice recognition method and device, storage medium and electric appliance

Country Status (1)

Country Link
CN (1) CN111179903A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327582A (en) * 2021-05-18 2021-08-31 北京声智科技有限公司 Voice interaction method and device, electronic equipment and storage medium
CN116741146A (en) * 2023-08-15 2023-09-12 成都信通信息技术有限公司 Dialect voice generation method, system and medium based on semantic intonation
CN117041430A (en) * 2023-10-09 2023-11-10 成都乐超人科技有限公司 Method and device for improving outbound quality and robustness of intelligent coordinated outbound system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
CN105654950A (en) * 2016-01-28 2016-06-08 百度在线网络技术(北京)有限公司 Self-adaptive voice feedback method and device
CN106297783A (en) * 2016-08-05 2017-01-04 易晓阳 A kind of interactive voice identification intelligent terminal
CN106683672A (en) * 2016-12-21 2017-05-17 竹间智能科技(上海)有限公司 Intelligent dialogue method and system based on emotion and semantics
CN108847239A (en) * 2018-08-31 2018-11-20 上海擎感智能科技有限公司 Interactive voice/processing method, system, storage medium, engine end and server-side
CN108877800A (en) * 2018-08-30 2018-11-23 出门问问信息科技有限公司 Voice interactive method, device, electronic equipment and readable storage medium storing program for executing
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 information interaction method and device
CN109189980A (en) * 2018-09-26 2019-01-11 三星电子(中国)研发中心 The method and electronic equipment of interactive voice are carried out with user
CN110085262A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Voice mood exchange method, computer equipment and computer readable storage medium
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN110599999A (en) * 2019-09-17 2019-12-20 寇晓宇 Data interaction method and device and robot

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
CN105654950A (en) * 2016-01-28 2016-06-08 百度在线网络技术(北京)有限公司 Self-adaptive voice feedback method and device
CN106297783A (en) * 2016-08-05 2017-01-04 易晓阳 A kind of interactive voice identification intelligent terminal
CN106683672A (en) * 2016-12-21 2017-05-17 竹间智能科技(上海)有限公司 Intelligent dialogue method and system based on emotion and semantics
CN110085262A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Voice mood exchange method, computer equipment and computer readable storage medium
CN110085221A (en) * 2018-01-26 2019-08-02 上海智臻智能网络科技股份有限公司 Speech emotional exchange method, computer equipment and computer readable storage medium
CN108877800A (en) * 2018-08-30 2018-11-23 出门问问信息科技有限公司 Voice interactive method, device, electronic equipment and readable storage medium storing program for executing
CN108847239A (en) * 2018-08-31 2018-11-20 上海擎感智能科技有限公司 Interactive voice/processing method, system, storage medium, engine end and server-side
CN109065035A (en) * 2018-09-06 2018-12-21 珠海格力电器股份有限公司 information interaction method and device
CN109189980A (en) * 2018-09-26 2019-01-11 三星电子(中国)研发中心 The method and electronic equipment of interactive voice are carried out with user
CN110379445A (en) * 2019-06-20 2019-10-25 深圳壹账通智能科技有限公司 Method for processing business, device, equipment and storage medium based on mood analysis
CN110599999A (en) * 2019-09-17 2019-12-20 寇晓宇 Data interaction method and device and robot

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327582A (en) * 2021-05-18 2021-08-31 北京声智科技有限公司 Voice interaction method and device, electronic equipment and storage medium
CN113327582B (en) * 2021-05-18 2023-06-02 北京声智科技有限公司 Voice interaction method and device, electronic equipment and storage medium
CN116741146A (en) * 2023-08-15 2023-09-12 成都信通信息技术有限公司 Dialect voice generation method, system and medium based on semantic intonation
CN116741146B (en) * 2023-08-15 2023-10-20 成都信通信息技术有限公司 Dialect voice generation method, system and medium based on semantic intonation
CN117041430A (en) * 2023-10-09 2023-11-10 成都乐超人科技有限公司 Method and device for improving outbound quality and robustness of intelligent coordinated outbound system
CN117041430B (en) * 2023-10-09 2023-12-05 成都乐超人科技有限公司 Method and device for improving outbound quality and robustness of intelligent coordinated outbound system

Similar Documents

Publication Publication Date Title
CN107623614B (en) Method and device for pushing information
CN108630193B (en) Voice recognition method and device
CN106710586B (en) Automatic switching method and device for voice recognition engine
US7711105B2 (en) Methods and apparatus for processing foreign accent/language communications
US8510103B2 (en) System and method for voice recognition
CN113327609B (en) Method and apparatus for speech recognition
CN111179903A (en) Voice recognition method and device, storage medium and electric appliance
CN110689877A (en) Voice end point detection method and device
KR101131278B1 (en) Method and Apparatus to Improve Dialog System based on Study
CN105304080A (en) Speech synthesis device and speech synthesis method
WO2008084476A2 (en) Vowel recognition system and method in speech to text applications
CN103177721A (en) Voice recognition method and system
CN112562681B (en) Speech recognition method and apparatus, and storage medium
CN111192585A (en) Music playing control system, control method and intelligent household appliance
CN114708869A (en) Voice interaction method and device and electric appliance
CN109074809B (en) Information processing apparatus, information processing method, and computer-readable storage medium
CN113593565B (en) Intelligent home device management and control method and system
CN111768789A (en) Electronic equipment and method, device and medium for determining identity of voice sender thereof
CN110931018A (en) Intelligent voice interaction method and device and computer readable storage medium
CN111949778A (en) Intelligent voice conversation method and device based on user emotion and electronic equipment
CN107886940B (en) Voice translation processing method and device
CN111128127A (en) Voice recognition processing method and device
Tsiakoulis et al. Dialogue context sensitive HMM-based speech synthesis
KR20100020066A (en) Apparatus and method for recognizing emotion, and call center system using the same
CN113160821A (en) Control method and device based on voice recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200519

RJ01 Rejection of invention patent application after publication