CN111179903A

CN111179903A - Voice recognition method and device, storage medium and electric appliance

Info

Publication number: CN111179903A
Application number: CN201911395017.0A
Authority: CN
Inventors: 高宏; 毛跃辉; 王慧君
Original assignee: Gree Electric Appliances Inc of Zhuhai; Gree Green Refrigeration Technology Center Co Ltd of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai; Gree Green Refrigeration Technology Center Co Ltd of Zhuhai
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-19

Abstract

The invention provides a voice recognition method, a voice recognition device, a storage medium and an electric appliance, wherein the method comprises the following steps: obtaining effective voice information of a current user; carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect; if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user; semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information; and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting. The scheme provided by the invention can identify the speech sound of the opposite party, and improves and controls the naturalness of the air-conditioning interactive voice by combining the detection result based on emotion identification.

Description

Voice recognition method and device, storage medium and electric appliance

Technical Field

The invention relates to the field of control, in particular to a voice recognition method, a voice recognition device, a storage medium and an electric appliance.

Background

At present, there is an intelligent home control method based on emotion recognition, which controls an intelligent device in a home by means of a recognition result to change an environmental condition. However, the existing speech emotion recognition method is generally limited to recognition of mandarin, and the user experience of mandarin is poor.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned drawbacks of the prior art, and provides a speech recognition method, a speech recognition device, a storage medium, and an electrical appliance, so as to solve the problem that the speech emotion recognition method in the prior art is limited to recognition of mandarin and has a low dialect recognition rate.

One aspect of the present invention provides a speech recognition method, including: obtaining effective voice information of a current user; carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect; if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user; semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information; and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.

Optionally, performing emotion feature recognition on the valid voice information to obtain emotion information of the current user, including: carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result; performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result; and obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.

Optionally, performing speech emotion recognition on the valid speech information to obtain a corresponding speech emotion result, including: carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result; and/or performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, wherein the semantic emotion result comprises: the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.

Optionally, the method further comprises: if the effective voice information contains control instruction information, controlling the electrical appliance to execute corresponding operation according to the control instruction information; and/or if the obtained emotion information of the current user comprises preset emotion information, carrying out conversation with the current user through a preset conversation corresponding to the preset emotion information.

Another aspect of the present invention provides a speech recognition apparatus, including: the voice acquisition unit is used for acquiring effective voice information of a current user; the dialect identification unit is used for carrying out dialect identification on the effective voice information so as to determine whether the effective voice information is dialect; the emotion recognition unit is used for carrying out emotion feature recognition on the effective voice information to obtain emotion information of the current user if the dialect recognition unit determines that the effective voice information is the dialect; the semantic understanding unit is used for carrying out semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information; the voice synthesis unit is used for carrying out dialect voice synthesis on the response text information by combining the emotion information of the current user to obtain corresponding dialect response voice; and the voice broadcasting unit is used for broadcasting the obtained dialect response voice.

Optionally, the emotion recognition unit includes: the voice emotion recognition subunit is used for carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result; the semantic emotion recognition subunit is used for performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result; and the emotion information obtaining subunit is used for obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.

Optionally, the speech emotion recognition subunit performs speech emotion recognition on the effective speech information to obtain a corresponding speech emotion result, where the speech emotion recognition subunit includes: carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result; and/or the semantic emotion recognition subunit performs text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, and the semantic emotion result comprises: the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.

Optionally, the method further comprises: the control unit is used for controlling the electrical appliance to execute corresponding operation according to the control instruction information if the effective voice information contains the control instruction information; and/or the interactive dialogue unit is used for carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user comprises the preset emotion information.

A further aspect of the invention provides a storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of any of the methods described above.

Yet another aspect of the present invention provides an appliance comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods described above when executing the program. The electrical appliance comprises an air conditioner.

In another aspect, the invention provides an electrical appliance comprising a speech recognition device according to any of the preceding claims. The electrical appliance comprises an air conditioner.

According to the technical scheme of the invention, the speech sound of the opposite party can be identified, and the naturalness of the air-conditioning interactive voice is promoted and controlled by combining the detection result based on emotion identification. The dialect speech recognition technology is adopted, the universality of speech recognition can be improved, the fluency and the intelligibility of synthesized dialect speech are increased by combining emotion recognition, and the emotion persuasion can be performed on a user if necessary, so that better experience is brought.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a step of performing emotion feature recognition on the valid speech information to obtain emotion information of a current user;

FIG. 3 is a schematic diagram of a speech recognition method according to another embodiment of the present invention;

FIG. 4 is a method diagram of another embodiment of a speech recognition method provided by the present invention;

FIG. 5 is a method diagram of one embodiment of a speech device method provided by the present invention;

FIG. 6 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention;

FIG. 7 is a block diagram of a specific implementation of an emotion recognition unit according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of another embodiment of a speech recognition apparatus provided in the present invention;

fig. 9 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention provides a voice recognition method. The voice recognition method can be particularly used in an electric appliance and used for realizing voice control on the electric appliance. For example, may be used in an air conditioner.

Fig. 1 is a schematic method diagram of an embodiment of a speech recognition method provided by the present invention.

As shown in fig. 1, according to an embodiment of the present invention, the voice recognition method at least includes step S110, step S120, step S130, step S140 and step S150.

Step S110, obtaining the valid voice information of the current user.

For example, the voice information sent by the current user may be acquired through a microphone, and the voice sent by the user is subjected to front-end processing such as filtering and noise reduction to obtain pure voice information, which is the valid voice information.

Step S120, performing dialect recognition on the valid voice information to determine whether the valid voice information is a dialect.

Specifically, the effective speech information is analyzed by average energy, zero-crossing number or frequency spectrum, cepstrum, and linear prediction coefficient to obtain corresponding corpus information. And searching the acquired corresponding expectation information in the dialect corpus to determine whether the information is dialect.

Step S130, if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user.

Fig. 2 is a flowchart illustrating a specific embodiment of a step of performing emotion feature recognition on the valid speech information to obtain emotion information of a current user. As shown in fig. 2, step S130 specifically includes step S131, step S132, and step S133.

And S131, carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result.

Specifically, emotion recognition is carried out on the effective voice information through a preset emotion corpus, and a corresponding voice emotion result is obtained. And extracting and analyzing the emotional characteristics of the effective voice information, and identifying by an emotion identification engine by depending on a preset emotion database to obtain a corresponding voice emotion result.

Step S132, text semantic recognition is carried out on the effective voice information to obtain a corresponding semantic emotion result.

Specifically, the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus; and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.

For example, if the valid speech information is recognized as cantonese, the valid speech information is recognized through a cantonese corpus and converted into corresponding text information. The expected data in the corpus are all subjected to text labeling, namely dialects corresponding to one word, one word and one sentence are all in one-to-one correspondence. After the effective voice information is converted into corresponding text information, the text information can be segmented, and the segmented words are subjected to part-of-speech (positive, neutral and depreciation) probability analysis, so that a corresponding semantic emotion result is obtained.

And step S133, obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.

Specifically, after obtaining the speech emotion result and the semantic emotion result, the emotion and state of the current user are evaluated by combining the speech emotion result and the semantic emotion result, so as to obtain emotional state information of the current user, such as anger, irritability, fear, surprise, joy, self-confidence, neutrality, sadness, tiredness, and the like.

Step S140, semantic understanding is carried out on the text recognition result of the effective voice information, and corresponding response text information is obtained.

Specifically, the effective speech information is recognized and converted into corresponding text information through a corresponding dialect corpus, and the converted text information is subjected to speech understanding to obtain corresponding response text information of the response user. And sending the converted text information into a natural voice processing engine for semantic understanding, and obtaining response text information responding to the user according to the semantic information obtained by understanding.

And S150, combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.

Specifically, the tone, intonation, etc. of the synthesized dialect response speech are determined according to the emotion of the current user. For example, the response text information is sent to a speech synthesis unit, TTS synthesis is performed through a preset speech library by combining the emotion recognition result, and a corresponding dialect speech is synthesized. For example, if the user's voice is cantonese, the user synthesizes the voice into a response voice of cantonese, and if the user's emotion is tired, the user synthesizes the voice into a response voice with a cheerful tone and a gentle voice, and broadcasts the response voice. The intelligibility and naturalness of the synthesized dialect speech can be improved by the way of synthesizing the speech.

Fig. 3 is a schematic method diagram of another embodiment of the speech recognition method provided by the present invention.

As shown in fig. 3, according to another embodiment of the present invention, the speech recognition method further includes step S160.

Step S160, if the valid voice information includes control instruction information, controlling the electrical appliance to execute corresponding operations according to the control instruction information.

Specifically, the voice recognition method can be used in an electric appliance and can be used for voice control of the electric appliance by a user. For example, if the valid voice information is recognized to have control instruction information for controlling the air conditioner to be turned on, the air conditioner is controlled to be turned on according to the control instruction.

Fig. 4 is a schematic method diagram of a speech recognition method according to another embodiment of the present invention.

As shown in fig. 4, according to still another embodiment of the present invention, the voice recognition method further includes step S170.

Step S170, if the obtained emotion information of the current user includes preset emotion information, performing a dialogue with the current user through a preset dialogue corresponding to the preset emotion information.

For example, the preset emotional information includes sadness, irritability, joy, and the like. For example, if the current user is stressed in negative emotions, such as sadness, anger and the like, the current user is conversed with the current user through a preset dialect corresponding to sadness and anger, namely, the current user enters a chat mode, a question and answer is initiated by adopting a fixed dialect, and voice interaction is performed with the user, so that the context understanding is performed by combining the voice of the user, and the response is made, so that the psychological persuasion of the user is realized.

For clearly explaining the technical solution of the present invention, the following describes an execution flow of the speech recognition method provided by the present invention with a specific embodiment.

Fig. 5 is a method diagram of a speech device method according to an embodiment of the present invention.

As shown in fig. 5, the speech uttered by the current user is acquired through a microphone, and after front-end processing such as filtering and denoising, effective speech information is obtained, dialect feature extraction is performed and dialect recognition is performed, if an existing dialect corpus can be matched with a corresponding dialect, the next step is performed, otherwise, speech data is discarded, and the flow is restarted. And if the dialect is confirmed, namely the dialect passes the authentication, extracting and analyzing the emotional characteristics of the effective voice information, and identifying the voice to be detected by using an emotion recognition engine by depending on an emotion database and outputting a corresponding voice emotion result. And converting effective voice information into corresponding text information by depending on the corresponding dialect corpus expected data, and then carrying out text semantic emotion recognition and semantic emotion results. Based on the voice emotion result and the voice emotion result, the current emotional mental state of the user is evaluated, for example: anger, irritability, fear, surprise, joy, confidence, neutrality, sadness, tiredness, etc. And sending the effective voice information into a natural language processing engine for semantic understanding to obtain corresponding response text information, carrying out TTS (text to speech) synthesis on the obtained response text information through a voice library by combining the emotion recognition result to obtain a synthesized answer dialect response voice, and broadcasting the synthesized answer dialect response voice, namely responding to the voice of the user.

The invention also provides a voice recognition device. The voice recognition device can be particularly used in an electric appliance and used for realizing voice control on the electric appliance. For example, may be used in an air conditioner.

Fig. 6 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention. As shown in fig. 6, the voice recognition apparatus 100 includes a voice acquisition unit 110, a dialect recognition unit 120, an emotion recognition unit 130, a semantic understanding unit 140, a voice synthesis unit 150, and a voice announcement unit 160.

The voice acquiring unit 110 is configured to acquire valid voice information of a current user; the dialect recognition unit 120 is configured to perform dialect recognition on the valid voice information to determine whether the valid voice information is dialect; the emotion recognition unit 130 is configured to, if the dialect recognition unit determines that the valid voice information is a dialect, perform emotion feature recognition on the valid voice information to obtain emotion information of the current user; the semantic understanding unit 140 is configured to perform semantic understanding on the text recognition result of the valid voice information to obtain corresponding response text information; the voice synthesis unit 150 is configured to perform dialect voice synthesis on the response text information in combination with the emotion information of the current user to obtain a corresponding dialect response voice; the voice broadcasting unit 160 is configured to broadcast the obtained dialect response voice.

The voice acquiring unit 110 acquires valid voice information of the current user. For example, the voice information sent by the current user may be acquired through a microphone, and the voice sent by the user is subjected to front-end processing such as filtering and noise reduction to obtain pure voice information, which is the valid voice information.

The dialect recognition unit 120 performs dialect recognition on the valid voice information to determine whether the valid voice information is dialect. Specifically, the effective speech information is analyzed by average energy, zero-crossing number or frequency spectrum, cepstrum, and linear prediction coefficient to obtain corresponding corpus information. And searching the acquired corresponding expectation information in the dialect corpus to determine whether the information is dialect.

If the dialect recognition unit 120 determines that the valid speech information is a dialect, the emotion recognition unit 130 performs emotion feature recognition on the valid speech information to obtain emotion information of the current user.

FIG. 7 is a block diagram of a specific implementation of an emotion recognition unit according to an embodiment of the present invention. As shown in FIG. 7, emotion recognition section 130 includes speech emotion recognition subunit 131, semantic emotion recognition subunit 132, and emotion information derivation subunit 133.

The speech emotion recognition subunit 131 is configured to perform speech emotion recognition on the effective speech information to obtain a corresponding speech emotion result.

The semantic emotion recognition subunit 132 is configured to perform text semantic recognition on the effective speech information to obtain a corresponding semantic emotion result.

The emotion information obtaining subunit 133 is configured to obtain emotion information of the current user according to the speech emotion result and the semantic emotion result.

The semantic understanding unit 140 performs semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information.

The voice synthesis unit 150 performs dialect voice synthesis on the response text information in combination with the emotion information of the current user to obtain corresponding dialect response voice, and the voice broadcast unit 160 broadcasts the obtained dialect response voice.

Fig. 8 is a schematic structural diagram of another embodiment of a speech recognition apparatus provided in the present invention. As shown in fig. 8, the speech recognition apparatus 100 further includes a control unit 170.

The control unit 170 is configured to control the electrical appliance to execute a corresponding operation according to the control instruction information if the valid voice information includes the control instruction information.

Fig. 9 is a schematic structural diagram of a speech recognition apparatus according to another embodiment of the present invention. As shown in fig. 9, the speech recognition apparatus 100 further includes an interactive dialog unit 180.

The interactive dialogue unit 180 is configured to perform a dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user includes the preset emotion information.

The invention also provides a storage medium corresponding to the speech recognition method, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.

The invention also provides an electric appliance corresponding to the voice recognition method, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of any one of the methods. The electrical appliance may be an air conditioner.

The invention also provides an electric appliance corresponding to the voice recognition device, which comprises the voice recognition device. The electrical appliance may be an air conditioner.

Therefore, the scheme provided by the invention can identify the speech sound of the opposite side and improve and control the naturalness of the air-conditioning interactive voice by combining the detection result based on emotion identification. The dialect speech recognition technology is adopted, the universality of speech recognition can be improved, the fluency and the intelligibility of synthesized dialect speech are increased by combining emotion recognition, and the emotion persuasion can be performed on a user if necessary, so that better experience is brought.

The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope and spirit of the invention and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hardwired, or a combination of any of these. In addition, each functional unit may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and the parts serving as the control device may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A speech recognition method, comprising:

obtaining effective voice information of a current user;

carrying out dialect recognition on the effective voice information to determine whether the effective voice information is dialect;

if the effective voice information is determined to be dialect, performing emotion feature recognition on the effective voice information to obtain emotion information of the current user;

semantic understanding is carried out on the text recognition result of the effective voice information to obtain corresponding response text information;

and combining the emotion information of the current user to carry out dialect voice synthesis on the response text information to obtain corresponding dialect response voice, and broadcasting.

2. The method of claim 1,

performing emotion feature recognition on the effective voice information to obtain emotion information of the current user, wherein the emotion feature recognition comprises the following steps:

carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result;

performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result;

and obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.

3. The method of claim 2,

carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result, wherein the voice emotion result comprises the following steps:

carrying out emotion recognition on the effective voice information through a preset emotion corpus to obtain a corresponding voice emotion result;

and/or the presence of a gas in the gas,

performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, wherein the semantic emotion result comprises the following steps:

the effective voice information is identified and converted into corresponding text information through a corresponding dialect corpus;

and performing emotion recognition of text semantics on the converted text information to obtain a corresponding semantic emotion result.

4. The method according to any one of claims 1-3, further comprising:

if the effective voice information contains control instruction information, controlling the electrical appliance to execute corresponding operation according to the control instruction information;

and/or the presence of a gas in the gas,

and if the obtained emotion information of the current user comprises preset emotion information, carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information.

5. A speech recognition apparatus, comprising:

the voice acquisition unit is used for acquiring effective voice information of a current user;

the dialect identification unit is used for carrying out dialect identification on the effective voice information so as to determine whether the effective voice information is dialect;

the emotion recognition unit is used for carrying out emotion feature recognition on the effective voice information to obtain emotion information of the current user if the dialect recognition unit determines that the effective voice information is the dialect;

the semantic understanding unit is used for carrying out semantic understanding on the text recognition result of the effective voice information to obtain corresponding response text information;

the voice synthesis unit is used for carrying out dialect voice synthesis on the response text information by combining the emotion information of the current user to obtain corresponding dialect response voice;

and the voice broadcasting unit is used for broadcasting the obtained dialect response voice.

6. The apparatus of claim 5, wherein the emotion recognition unit comprises:

the voice emotion recognition subunit is used for carrying out voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result;

the semantic emotion recognition subunit is used for performing text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result;

and the emotion information obtaining subunit is used for obtaining the emotion information of the current user according to the voice emotion result and the semantic emotion result.

7. The apparatus of claim 6,

the voice emotion recognition subunit performs voice emotion recognition on the effective voice information to obtain a corresponding voice emotion result, and the voice emotion recognition subunit includes:

and/or the presence of a gas in the gas,

the semantic emotion recognition subunit performs text semantic recognition on the effective voice information to obtain a corresponding semantic emotion result, and the semantic emotion recognition subunit includes:

8. The apparatus of any one of claims 5-7, further comprising:

the control unit is used for controlling the electrical appliance to execute corresponding operation according to the control instruction information if the effective voice information contains the control instruction information;

and/or the presence of a gas in the gas,

and the interactive dialogue unit is used for carrying out dialogue with the current user through a preset dialogue corresponding to the preset emotion information if the obtained emotion information of the current user comprises the preset emotion information.

9. A storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.

10. An electrical appliance comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 4 when executing the program or comprising the speech recognition arrangement of any one of claims 5 to 8.