CN112002319A

CN112002319A - Voice recognition method and device of intelligent equipment

Info

Publication number: CN112002319A
Application number: CN202010779352.7A
Authority: CN
Inventors: 许业喜
Original assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-11-27

Abstract

The invention provides a voice recognition method and a voice recognition device for intelligent equipment, wherein the method comprises the following steps: the intelligent terminal receives a plurality of awakening audio signals uploaded by a plurality of devices; the intelligent terminal selects a corresponding device from the plurality of devices as a recording device according to the audio quality of the plurality of wake-up audio signals; the intelligent terminal receives the voice information recorded by the recording equipment and carries out voice recognition on the voice information; and the intelligent terminal sends a processing result to the recording equipment according to the voice recognition content. In the invention, when voice interaction is carried out, the intelligent terminal screens the acquired plurality of awakening audio signals and selects the equipment with the awakening audio signal with the best quality as the recording equipment, so that the problem that a plurality of pieces of equipment are awakened during voice interaction in the related technology can be solved, and the effect of improving the accuracy rate of voice recognition is achieved.

Description

Voice recognition method and device of intelligent equipment

Technical Field

The invention relates to the technical field of voice of internet of things, in particular to a voice recognition method and device of intelligent equipment.

Background

With the development of voice technology and the maturity of internet of things technology, the demand for the convenient and natural voice interaction mode in the family is increasing.

However, in the traditional household products such as household appliances and furniture, the computing capability is limited, the complete voice algorithm is difficult to support, and if the complete voice capability of the parts is used, the cost is very high.

In the related solutions in the market, a mode of implanting a voice module or adding an intelligent screen into a device end is generally adopted, the hardware is greatly changed, and the cost of a single piece of hardware is extremely high. The use of a similar approach to building a distributed portal is costly. Under the condition that the devices are not networked, the devices using the same command word can be simultaneously awakened, so that the interaction is disordered. And the processing capability of the module (generally a low-end chip, ROM <1G) is generally limited, and only limited command words can be supported in an off-line situation.

Disclosure of Invention

The embodiment of the invention provides a voice recognition method and a voice recognition device for intelligent equipment, which at least solve the problem that a plurality of pieces of equipment are awakened during voice interaction in the related technology.

According to an embodiment of the present invention, there is provided a speech recognition method for a smart device, including: the intelligent terminal receives a plurality of awakening audio signals uploaded by a plurality of devices; the intelligent terminal selects a corresponding device from the plurality of devices as a recording device according to the audio quality of the plurality of wake-up audio signals; the intelligent terminal receives the voice information recorded by the recording equipment and carries out voice recognition on the voice information; and the intelligent terminal sends a processing result to the recording equipment according to the voice recognition content.

Optionally, before the intelligent terminal receives a plurality of wake-up audio signals uploaded by a plurality of devices, the method further includes: and the plurality of devices in the monitoring state respectively upload the detected awakening audio signals to the intelligent terminal.

Optionally, the intelligent terminal selects a corresponding device from the multiple devices as a recording device according to the quality of the wake-up audio signals, and includes: the intelligent terminal compares audio indexes of a plurality of awakening audio signals to determine an awakening audio signal with the best audio quality, wherein the audio indexes comprise at least one of the following: amplitude, signal-to-noise ratio, voiceprint, degree of identification; and taking the equipment corresponding to the awakening audio signal with the best audio quality as the recording equipment.

Optionally, after the intelligent terminal selects a corresponding device from the multiple devices as a recording device according to the audio quality, the method further includes: and the recording equipment starts recording and uploads the recorded audio to the intelligent terminal in a streaming manner.

Optionally, the intelligent terminal performs voice recognition on the voice information uploaded by the recording device, where the voice recognition includes at least one of the following: the intelligent terminal uploads the voice information to a cloud end, and the cloud end carries out voice recognition on the voice information; and carrying out voice recognition on the voice information by the intelligent terminal.

Optionally, after the intelligent terminal receives the voice information recorded by the recording device, the method further includes: and the intelligent terminal judges whether to finish recording according to the voice information uploaded by the recording equipment.

Optionally, the wake-up audio signal and/or the protocol data packet of the voice message received by the smart device from the plurality of devices includes at least one of the following fields: device ID, timestamp, audio, VAD status, mode of operation.

According to another embodiment of the present invention, there is provided a speech recognition apparatus for a smart device, including: the receiving module is used for receiving a plurality of awakening audio signals uploaded by a plurality of devices; a selecting module, configured to select a corresponding device from the multiple devices as a recording device according to the audio quality of the wake-up audio signals; the recognition module is used for receiving the voice information recorded by the recording equipment and carrying out voice recognition on the voice information; and the sending module is used for sending the processing result to the recording equipment according to the voice recognition content.

According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the embodiment of the invention, in the voice interaction of the equipment, the intelligent terminal is used for screening the acquired awakening audio signals, and the equipment with the awakening audio signal with the best quality is selected as the recording equipment, so that the problem that a plurality of pieces of equipment are awakened during the voice interaction in the related technology can be solved, and the effect of improving the voice recognition accuracy is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of speech recognition of a smart device according to an embodiment of the present invention;

fig. 2 is a block diagram of a voice recognition apparatus of an intelligent device according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of a speech recognition interaction flow for a smart device in accordance with an alternative embodiment of the present invention;

FIG. 4 is a schematic diagram of the interaction of a device with an intelligent terminal according to an alternative embodiment of the invention;

FIG. 5 is a schematic diagram of a proprietary protocol field definition form in accordance with an alternative embodiment of the invention;

fig. 6 is a timing diagram of user, device, smart terminal interaction with a cloud service platform according to an alternative embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to better understand the technical solutions of the embodiments and the alternative embodiments of the present invention, the following description is made on possible application scenarios in the embodiments and the alternative embodiments of the present invention, but is not limited to the application of the following scenarios.

Example 1

In this embodiment, a speech recognition method of an intelligent device is provided, and fig. 1 is a flowchart of a speech recognition method of an intelligent device according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

step S101, the intelligent terminal receives a plurality of awakening audio signals uploaded by a plurality of devices. In this embodiment, the intelligent terminal is a device with certain computing power, can independently perform logical operation and storage, and the device can preset a microphone (Mic), carry on a Bluetooth or wifi module, and can be connected and interacted with the intelligent terminal. The wake-up signal in this embodiment may be a signal generated by an interactive manner such as a button, a gesture, or voice, for example, a user speaks a specific wake-up word by voice.

Before step S101 in this embodiment, a plurality of devices are all in a monitoring state, and upload a monitored wake-up audio signal to the intelligent terminal, where the wake-up audio signal may be uploaded to the intelligent terminal in a wifi or bluetooth manner.

And S102, the intelligent terminal selects a corresponding device from the plurality of devices as a recording device according to the audio quality of the plurality of awakening audio signals.

In this step, the intelligent terminal may compare audio indexes, such as amplitude, signal-to-noise ratio, voiceprint, recognition degree, of the received wake-up audio signals, to determine a wake-up audio signal with the best audio quality, and use a device corresponding to the wake-up audio signal with the best audio quality as the recording device. For example, the larger the amplitude, i.e. the louder the sound, the closer the representation is. The higher the signal-to-noise ratio, the sharper the sound is represented. The intelligent terminal can obtain high-quality audio through screening the multi-channel audio, and is favorable for improving the identification accuracy. In this step, the audio quality may be determined by integrating the audio indexes.

The multi-channel audio screening of the embodiment judges the distance between users according to the definition and the volume, and selects corresponding equipment, so that a plurality of pieces of equipment are prevented from being awakened simultaneously during voice interaction, and the equipment response closest to the users is ensured.

After step S102 in this embodiment, the method may further include: and the recording equipment starts recording and uploads the recorded audio to the intelligent terminal in a streaming manner. The streaming transmission mode can achieve the characteristic of being used at any time, and the waiting time for analyzing the audio by the intelligent terminal is greatly reduced.

And step S103, the intelligent terminal receives the voice information recorded by the recording equipment and carries out voice recognition on the voice information.

In this step, the intelligent terminal may determine the voice recognition mode according to the networking state. For example, if the intelligent terminal is offline, the intelligent terminal performs voice recognition processing itself, or connects to other intelligent terminals in the local area network to perform voice recognition processing. If the mobile phone is online, the voice assistant interface can be called to upload the audio to the cloud end, and the cloud end carries out voice recognition processing. Therefore, on one hand, cloud resources can be fully utilized, and on the other hand, the dependence of the intelligent terminal on the network connection state is reduced.

After step S103 of this embodiment, the method may further include the steps of: and the intelligent terminal judges whether to finish recording according to the voice information uploaded by the recording equipment. For example, whether the recording is finished can be judged through a finishing instruction carried in the voice information.

And step S104, the intelligent terminal sends a processing result to the recording equipment according to the voice recognition content. In this step, if the intelligent terminal is in a networking state, the intelligent terminal can acquire a voice recognition result from a cloud and then send the voice recognition result to the corresponding recording device, otherwise, the intelligent terminal sends the voice recognition result to the recording device, and the recording device can broadcast the result.

In this embodiment, the data interaction between the intelligent terminal and the device may be performed according to a private protocol on the basis of a general protocol. For example, the protocol data packet of the voice message and/or the wake-up audio signal received by the smart device from the plurality of devices may include at least one of the following fields: device ID, timestamp, audio, vad (voice Activity detection) status, mode of operation. For example, the device ID may be used to indicate a specific networker, the timestamp field may be used to indicate the time of data transmission or reception, the audio field may be used to carry audio data, the VAD status field may be used to indicate the end of recording, and the operation mode field may be used to indicate the wake-up operation mode, e.g., 01 indicates a button, 02 indicates voice wake-up, and 03 indicates others.

The above-mentioned private Protocol may be designed based on a general Protocol such as a Transport Control Protocol (TCP) + a User Datagram Protocol (UDP), and in addition, the audio transmission may be implemented by using a Real-time Transport Control Protocol (RTCP). In the embodiment, the calculation power of the intelligent terminal is reused by the equipment by formulating the private protocol, so that the effect of reducing the hardware cost of the equipment is realized.

Through the steps, the intelligent terminal is used for selecting the recording equipment, so that the problem that multiple pieces of equipment are awakened during voice interaction in the related technology is solved, and the identification accuracy rate in a noisy environment is improved. Meanwhile, by formulating a private protocol of data interaction between the intelligent terminal and the equipment, the intelligent terminal is used for simultaneously receiving and analyzing a plurality of voice messages sent by the equipment, so that the calculation force of the equipment on the intelligent terminal is multiplexed, and the hardware cost of the equipment is reduced.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, a speech recognition apparatus for an intelligent device is also provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and has already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram of a speech recognition apparatus of an intelligent device according to an embodiment of the present invention, and as shown in fig. 2, the apparatus includes a receiving module 10, a selecting module 20, a recognition module 30, and a transmitting module 40.

A receiving module 10, configured to receive a plurality of wake-up audio signals uploaded by a plurality of devices;

a selecting module 20, configured to select a corresponding device from the multiple devices as a recording device according to the audio quality of the wake-up audio signals;

the recognition module 30 is configured to receive the voice information recorded by the recording device, and perform voice recognition on the voice information;

and the sending module 40 is configured to send the processing result to the sound recording device according to the speech recognition content.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

In order to facilitate understanding of the technical solutions provided by the present invention, the following detailed description will be made with reference to embodiments of specific scenarios.

At the present stage, the related scheme has the disadvantages of high cost and great change on hardware products. Multiple devices that install the module may wake up simultaneously, causing confusion in interaction, and the off-line case has very limited voice capability.

The embodiment of the invention modularizes and deconstructs the original voice interaction system, and realizes the sharing of the voice computing capability through the connection between different devices.

Meanwhile, through multiplexing of operation capacity and network communication capacity among devices, the device transformation cost can be reduced (only a mic and a signal processing module are needed to be added), judgment of response devices is achieved through networking in a family, simultaneous response of multiple devices is avoided, simultaneous recording of multiple devices can be achieved, and pre-processing is conducted on audio recorded by different devices (the audio with the highest quality is selected for identification according to multiple indexes of recording contents of different devices, such as signal-to-noise ratio and integrity, and background noise is eliminated among multiple channels of audio), so that the identification effect under the noisy environment of the family is improved.

The embodiments of the present invention relate to the following terms:

the user: a user of the apparatus;

device/networker: presetting mic, carrying a Bluetooth or wifi module, and realizing an independent product connected with an intelligent terminal;

the intelligent terminal: the device with certain complex computing capability can independently perform logic operation and storage;

platform: capability collection of the cloud.

Fig. 3 is a schematic diagram of a speech recognition interaction process of an intelligent device according to an alternative embodiment of the present invention, and as shown in fig. 3, the interaction process includes the following steps:

step S301, in the normal operation process of the equipment, the equipment is in a monitoring state;

step S302, when a user initiates an interaction requirement through a button, a gesture or voice, the equipment detects a wake-up signal at the same time;

step S303, the equipment checks whether the equipment is connected to the intelligent terminal, and uploads awakening audio information and equipment codes if the equipment is connected to the intelligent terminal;

step S304, the intelligent terminal selects a recording device according to the amplitude, the signal-to-noise ratio, the voiceprint and the identification index of the audio frequency;

step S305, after the recording equipment is determined, the recording equipment starts recording, wherein in the recording process, audio is uploaded to the intelligent terminal in a streaming uploading mode;

step S306, the intelligent terminal judges according to the networking state, if the intelligent terminal is offline, the intelligent terminal is connected with the local area network for processing, and if the intelligent terminal is online, the voice assistant interface is investigated to upload the audio to the cloud for processing;

step S307, the intelligent terminal judges whether recording is finished according to the audio;

and step S308, the cloud or the intelligent terminal gives a processing result according to the audio content and broadcasts the processing result at the equipment terminal.

Fig. 4 is a schematic diagram of interaction between a device and an intelligent terminal according to an alternative embodiment of the present invention, and as shown in fig. 4, the intelligent terminal serves as an edge computing node to provide computing power for a networker.

In this embodiment, the device and the intelligent terminal perform coding and decoding according to a private protocol based on a Bluetooth (BLE)/Wifi and other general protocols, where fig. 5 is a schematic diagram of a private protocol field definition form according to an alternative embodiment of the present invention, as shown in fig. 5, the private protocol is defined as follows:

the key fields may include networker ID, timestamp, audio, VAD status, mode of operation, etc.

The Protocol can be designed based on general protocols such as tcp (Transport Control Protocol) + udp (user data Protocol), and the audio transmission is realized by using Real-time Transport Control Protocol (RTCP).

The interaction process between the device and the intelligent terminal in this embodiment is as follows:

step S401, a wake-up module of the device side continuously operates, codes a signal after wake-up action is detected, calls the BLE/Wifi module, and transmits the signal through the BLE/Wifi module;

step S402, after the BLE/Wifi module at the intelligent terminal side receives the protocol, calling a protocol stack of an OS (operating system) layer, and transmitting the coded signal to a voice assistant APP;

step S403, the voice assistant APP decodes the signal and responds according to the interaction definition;

step S404, the voice assistant communicates with the device through the same path, sends a reply audio and a service instruction to the networker corresponding to the ID, and is received and executed by the networker, where the service instruction may include a tag of "whether to proceed to the next round".

Fig. 6 is a timing diagram of interaction between a user, a device, and an intelligent terminal and a cloud service platform according to an alternative embodiment of the present invention, where as shown in fig. 6, the process includes the following steps:

step S601, the configuration of the network device (equipment) is completed, and the network device enters a voice monitoring state.

In step S602, the user sends out a voice wake-up signal, for example, utters a specific wake-up word.

Step S603, if a plurality of networkspacers with mic are arranged in the user' S home, the voice wake-up signals collected by a plurality of networkspacers are simultaneously transmitted to the intelligent terminal in the local area network (wifi or Bluetooth);

step S604, the intelligent terminal judges and scores the audio, wherein:

the networker upload field may include, networker ID, timestamp, audio, mode of operation (scoring only if the mode of operation is voice wakeup and there are multiple requests at the same time);

the scoring factors comprise sound wave amplitude, signal-to-noise ratio and voiceprint, wherein the sound wave amplitude is larger, the sound is larger, the distance is closer, the signal-to-noise ratio is higher, the sound is clearer, and the voiceprint is used for judging the number of speakers;

the scoring rule is that the final score is ((SNR × + α) + (a ×)/N, where N is the number of voiceprint detection people, SNR (signal Noise ratio) is the signal-to-Noise ratio, a is the sound wave amplitude, α is 0.55,

β＝0.45；

if the user wakes up through other modes such as key pressing, gesture and the like, the ID of the net appliance at the operation end is uploaded without judging the response equipment.

Step S605, determining the Internet appliance ID corresponding to the audio with the highest score, and issuing a recording instruction to the Internet appliance.

Step S606, the net server prompts the user that voice interaction can be entered.

In step S607, the network device enters a recording state.

Step S608, the internet access device uploads the recording to the intelligent device in real time.

In step S609, the smart device invokes a local engine or platform service to perform an identification operation, for example, sending the recording to the cloud for identification.

And step S610, the Internet surfing device finishes recording.

In step S611, the intelligent device determines that the recording is finished, and notifies the cloud.

Step S612, the cloud returns the equipment result to the intelligent terminal

And step S613, the intelligent terminal distributes the identification result to a corresponding ID network device for broadcasting.

The embodiment of the invention provides a method for judging the position of a response net device/user according to signal-to-noise ratio and sound intensity information, and also provides a method for realizing low-cost voice interaction capacity by using an intelligent terminal as an edge computing node and a voice net device borrowing the intelligent terminal.

According to the embodiment of the invention, the BOM cost and the development cost of the entrance product can be reduced through the calculation force reuse of the intelligent terminal equipment.

Meanwhile, the multi-channel audio screening obtains the audio with the highest quality, and the identification accuracy is improved.

And finally, multi-channel audio screening, namely judging the distance between users according to definition and volume, so that a plurality of networ can be prevented from being awakened simultaneously during voice interaction, and the networ closest to the users can be ensured to respond.

Example 4

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, the intelligent terminal receives a plurality of awakening audio signals uploaded by a plurality of devices;

s2, the intelligent terminal selects a corresponding device from the multiple devices as a recording device according to the audio quality of the awakening audio signals;

s3, the intelligent terminal receives the voice information recorded by the recording equipment and carries out voice recognition on the voice information;

and S4, the intelligent terminal sends the processing result to the recording equipment according to the voice recognition content.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 5

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A speech recognition method of an intelligent device is characterized by comprising the following steps:

the intelligent terminal receives a plurality of awakening audio signals uploaded by a plurality of devices;

the intelligent terminal selects a corresponding device from the plurality of devices as a recording device according to the audio quality of the plurality of wake-up audio signals;

the intelligent terminal receives the voice information recorded by the recording equipment and carries out voice recognition on the voice information;

and the intelligent terminal sends a processing result to the recording equipment according to the voice recognition content.

2. The method of claim 1, wherein before the intelligent terminal receives the plurality of wake-up audio signals uploaded by the plurality of devices, the method further comprises:

and the plurality of devices in the monitoring state respectively upload the detected awakening audio signals to the intelligent terminal.

3. The method according to claim 1, wherein the intelligent terminal selects a corresponding device from the plurality of devices as a recording device according to the quality of the plurality of wake-up audio signals, and comprises:

the intelligent terminal compares audio indexes of a plurality of awakening audio signals to determine an awakening audio signal with the best audio quality, wherein the audio indexes comprise at least one of the following: amplitude, signal-to-noise ratio, voiceprint, degree of identification;

and taking the equipment corresponding to the awakening audio signal with the best audio quality as the recording equipment.

4. The method according to claim 1, wherein after the intelligent terminal selects a corresponding device from the plurality of devices as the recording device according to the audio quality, the method further comprises:

and the recording equipment starts recording and uploads the recorded audio to the intelligent terminal in a streaming manner.

5. The method of claim 1, wherein the intelligent terminal performs voice recognition on the voice information uploaded by the sound recording device, and the method comprises at least one of:

the intelligent terminal uploads the voice information to a cloud end, and the cloud end carries out voice recognition on the voice information;

and carrying out voice recognition on the voice information by the intelligent terminal.

6. The method according to claim 1, wherein after the intelligent terminal receives the voice information recorded by the recording device, the method further comprises:

and the intelligent terminal judges whether to finish recording according to the voice information uploaded by the recording equipment.

7. The method of claim 1, wherein the protocol data packet of the voice message and/or the wake-up audio signal received by the smart device from the plurality of devices comprises at least one of the following fields: device ID, timestamp, audio, VAD status, mode of operation.

8. A speech recognition device of an intelligent device, comprising:

the receiving module is used for receiving a plurality of awakening audio signals uploaded by a plurality of devices;

a selecting module, configured to select a corresponding device from the multiple devices as a recording device according to the audio quality of the wake-up audio signals;

the recognition module is used for receiving the voice information recorded by the recording equipment and carrying out voice recognition on the voice information;

and the sending module is used for sending the processing result to the recording equipment according to the voice recognition content.

9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.

10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 7.