CN213586241U - Far-field voice interaction device and electronic equipment - Google Patents

Far-field voice interaction device and electronic equipment Download PDF

Info

Publication number
CN213586241U
CN213586241U CN202022827416.4U CN202022827416U CN213586241U CN 213586241 U CN213586241 U CN 213586241U CN 202022827416 U CN202022827416 U CN 202022827416U CN 213586241 U CN213586241 U CN 213586241U
Authority
CN
China
Prior art keywords
pins
circuit
electrically connected
control unit
pin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202022827416.4U
Other languages
Chinese (zh)
Inventor
于云涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to CN202022827416.4U priority Critical patent/CN213586241U/en
Priority to PCT/CN2021/081502 priority patent/WO2022001200A1/en
Application granted granted Critical
Publication of CN213586241U publication Critical patent/CN213586241U/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephone Function (AREA)

Abstract

The embodiment of the application provides a far-field voice interaction device and electronic equipment. The receiving circuit is electrically connected with the analog-to-digital conversion circuit, the output circuit is electrically connected with the power amplification circuit, and the analog-to-digital conversion circuit is electrically connected with the control unit; the power amplifying circuit is electrically connected with the control unit, or the analog-to-digital conversion circuit is electrically connected between the output circuit and the power amplifying circuit. Therefore, the control unit can receive the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, and can perform signal noise reduction, echo cancellation, sound source positioning and beam forming voice algorithm processing on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data, the far-field voice interaction function of the electronic equipment is completed, the direct connection between the microphone array and the control unit is realized, the overall design of a hardware architecture is optimized, and the component cost of the whole hardware architecture is reduced.

Description

Far-field voice interaction device and electronic equipment
Technical Field
The application relates to the technical field of voice, in particular to a far-field voice interaction device and electronic equipment.
Background
The far-field audio (far-field audio) interactive function can realize the voice control at a distance of about 5 meters, can fully release the hands of a user, and becomes the standard matching function of the electronic equipment.
However, the hardware resource limit of the chip solution provider is limited, and the interface definition of the chip has no hardware interface directly connected with the microphone (mic). When electronic equipment adopts this type of chip, additional other components need to be added to hardware scheme framework, and far-field voice interaction function can receive the influence of other components, and has increased the cost of whole hardware scheme.
Disclosure of Invention
The embodiment of the application provides a far-field voice interaction device and electronic equipment, can realize the direct connection between a microphone array and a control unit, complete the far-field voice interaction function of the electronic equipment, optimize the overall design of a hardware framework, avoid the influence of other components on the far-field voice interaction power consumption, and reduce the component cost of the whole hardware framework.
In a first aspect, an embodiment of the present application provides a far-field speech interaction apparatus, including: the device comprises a receiving circuit, an analog-to-digital conversion circuit, an output circuit, a power amplification circuit and a control unit;
the receiving circuit is used for collecting original audio signals, the receiving circuit is electrically connected with the analog-to-digital conversion circuit, the output circuit is electrically connected with the power amplification circuit, the power amplification circuit is used for providing echo reference signals, and the output circuit is used for outputting voice data synthesized by the control unit based on the original audio signals and the echo reference signals;
the control unit comprises a first group of pins and a second group of pins;
a first data pin, a clock pin and a bit selection pin in the first group of pins are electrically connected with the analog-to-digital conversion circuit, and a second data pin in the first group of pins is electrically connected with the power amplification circuit;
and one data pin, one clock pin and one bit selection pin in the second group of pins are electrically connected with the power amplification circuit.
With the apparatus of the first aspect, based on the aforementioned electrical connection relationship, the analog-to-digital conversion circuit may convert the original audio signal collected by the receiving circuit into a digital signal in the format of I2S, and transmit the digital signal in the format of I2S to the control unit. Meanwhile, the power amplifying circuit may convert the echo reference signal into a digital signal of I2S format and transmit the digital signal of I2S format to the control unit. The control unit can perform voice algorithm processing such as signal noise reduction, echo cancellation, sound source positioning and beam forming on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data. Therefore, the far-field voice interaction function of the electronic equipment is completed, direct connection between the microphone array and the control unit is realized, the overall design of the hardware architecture is optimized, the influence of other components on the far-field voice interaction power consumption is avoided, and the component cost of the whole hardware architecture is reduced.
In one possible design, a microphone array is provided in the receiving circuit and a loudspeaker array is provided in the output circuit.
In one possible design, the number of the first data pins is M, the total number of microphones in the microphone array is 2M, and M is a positive integer; the number of the second data pins is 1.
Therefore, the control unit can receive the digital signals corresponding to the original audio signals and the digital signals corresponding to the echo reference signals through the data pins, and the original audio signals are prevented from being influenced by other components.
In one possible design, one clock pin of the first set of pins is electrically connected with one clock pin of the second set of pins; one bit selection pin in the first group of pins is electrically connected with one bit selection pin in the second group of pins.
Therefore, two groups of pins of the control unit can keep the same clock signal and bit selection signal, and the signal quantity of various types of signals provided by the control unit is reduced.
In a second aspect, an embodiment of the present application provides a far-field speech interaction apparatus, including: the device comprises a receiving circuit, an analog-to-digital conversion circuit, an output circuit, a power amplification circuit and a control unit;
the receiving circuit is used for collecting original audio signals, the receiving circuit is electrically connected with the analog-to-digital conversion circuit, the output circuit is electrically connected with the power amplification circuit, the analog-to-digital conversion circuit is also electrically connected between the output circuit and the power amplification circuit, the power amplification circuit is used for providing echo reference signals, and the output circuit is used for outputting voice data synthesized by the control unit based on the original audio signals and the echo reference signals;
the control unit comprises a third group of pins and a fourth group of pins;
a third data pin, a clock pin and a bit selection pin in the third group of pins are all electrically connected with the analog-to-digital conversion circuit;
and one data pin, one clock pin and one bit selection pin in the fourth group of pins are electrically connected with the power amplification circuit.
With the apparatus of the second aspect, based on the aforementioned electrical connection relationship, the analog-to-digital conversion circuit may convert the original audio signal collected by the receiving circuit into a digital signal in the format of I2S, and transmit the digital signal in the format of I2S to the control unit. Meanwhile, the analog-to-digital conversion circuit can also convert the echo reference signal output by the power amplification circuit into a digital signal in an I2S format and transmit the digital signal in an I2S format to the control unit. The control unit can perform voice algorithm processing such as signal noise reduction, echo cancellation, sound source positioning and beam forming on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data. Therefore, the far-field voice interaction function of the electronic equipment is completed, direct connection between the microphone array and the control unit is realized, the overall design of the hardware architecture is optimized, the influence of other components on the far-field voice interaction power consumption is avoided, and the component cost of the whole hardware architecture is reduced.
In one possible design, a microphone array is provided in the receiving circuit and a loudspeaker array is provided in the output circuit.
In one possible design, the number of the third data pins is M +1, the total number of microphones in the microphone array is 2M, and M is a positive integer.
Therefore, the control unit can receive the digital signals corresponding to the original audio signals and the digital signals corresponding to the echo reference signals through the data pins, and the original audio signals are prevented from being influenced by other components.
In one possible design, the number of third data pins is 1.
Therefore, the analog-to-digital conversion circuit adopts a Time-division multiplexing (TDM) mode configured by software, and transmits a digital signal corresponding to the original audio signal and a digital signal corresponding to the echo reference signal to the control unit in a Time-division manner through the same data pin in the third group of pins after voltage division, filtering and the like, so that the control unit receives the digital signal corresponding to the original audio signal and the digital signal corresponding to the echo reference signal through one data pin.
In one possible design, the far-field speech interaction apparatus further includes: a WIFI6 module; two fourth data pins in the third group of pins are electrically connected with the WIFI6 module.
Therefore, coexistence of the remote language interaction function and the WIFI function is achieved. Moreover, due to the advantages of low delay and high transmission rate of the WIFI6 module, services such as language recognition and semantic understanding associated with the cloud server are smoother, and the voice experience effect of the user is better.
In one possible design, one clock pin of the third set of pins is electrically connected with one clock pin of the fourth set of pins; one bit selection pin in the third group of pins is electrically connected with one bit selection pin in the fourth group of pins.
Therefore, two groups of pins of the control unit can keep the same clock signal and bit selection signal, and the signal quantity of various types of signals provided by the control unit is reduced.
In a third aspect, an embodiment of the present application provides an electronic device, including: a housing and the far-field speech interaction device of the first aspect and any one of the possible designs of the first aspect; or the housing and the far-field speech interaction device in any one of the possible designs of the second aspect and the second aspect.
In one possible design, the electronic device is a television or a sound box.
Drawings
FIG. 1 is a hardware architecture diagram of a first apparatus for implementing far-field voice interaction functionality;
fig. 2 is a schematic structural diagram of a far-field speech interaction apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a far-field speech interaction apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a far-field speech interaction device according to an embodiment of the present application.
Detailed Description
In the embodiments of the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a alone, b alone, or c alone, may represent: a alone, b alone, c alone, a and b in combination, a and c in combination, b and c in combination, or a, b and c in combination, wherein a, b and c may be single or multiple. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
With the continuous development of artificial intelligence technology, users increasingly want to realize the control of electronic devices through voice. Therefore, the electronic equipment provided with the far-field voice interaction device is bred, and the production and life of users are greatly met. The electronic device may include, but is not limited to, a television, a sound box, an air conditioner, a refrigerator, and other intelligent devices.
Taking a television as an example, when a user wants to inquire about the current weather condition of the television, the user can send a corresponding voice instruction. A far-field voice interaction device in the television can acquire a voice instruction sent by a user and recognize a wakeup word for the voice instruction. After the recognition is successful, the far-field voice interaction device in the television can send the processed audio data to the cloud server. And the cloud server performs voice recognition and semantic understanding on the processed audio data. And the cloud server transmits the synthesized voice data to a far-field voice interaction device in the television. And the far-field voice interaction device in the television outputs the synthesized voice data, namely the current weather condition. Therefore, the far-field voice interaction function of the electronic equipment is realized.
It should be noted that, in addition to the way of interacting with the cloud server, the far-field voice interaction device in the television set may also perform voice recognition and semantic understanding on the processed audio data by itself, which is not limited in the embodiment of the present application.
Referring to fig. 1, fig. 1 is a hardware architecture diagram of a first apparatus for implementing far-field voice interaction function. As shown in fig. 1, a first apparatus 400 for implementing far-field voice interaction function may include: a microphone array 401, an analog to digital converter (ADC) 402, a speaker array 403, a power Amplifier (AMP) 404, an ADC 405, a Micro Control Unit (MCU) 406, and a System on Chip (SoC) 407.
The microphone array 401 is electrically connected to the ADC 402, and the ADC 402 is electrically connected to the MCU 406. The speaker array 403 is electrically connected to the AMP 404, the AMP 404 is electrically connected to the ADC 405, and the ADC 405 is electrically connected to the MCU 406. The MCU406 is also electrically connected to the SoC 407, and the SoC 407 is also electrically connected to the AMP 404.
The microphone array 401 outputs analog signals to the ADC 402, the ADC 402 converts the analog signals into digital signals in an integrated circuit built-in audio bus (I2S) format, and the ADC 402 inputs the digital signals in the I2S format into the MCU406, so that the MCU406 takes the digital signals in the I2S format as raw audio signals collected by the microphone array 401 required by the far-field speech algorithm.
Meanwhile, the echo reference signal is obtained from the analog output of the AMP 404, and after being subjected to voltage reduction, filtering and other processing, the analog signal is output to the ADC 405, the ADC 405 converts the analog signal into a digital signal in the I2S format, and the ADC 405 inputs the digital signal in the I2S format into the MCU406, so that the MCU406 takes the digital signal in the I2S format as the echo reference signal required by the far-field speech algorithm.
The MCU406 performs signal synthesis, signal conversion, and signal phase control on the original audio signal and the echo reference signal. The MCU406 transmits the processed Audio data to the SoC 407 through a standard USB interface using a USB Audio Class (UAC) protocol.
Through the above-described procedure, the transmission of audio data between the microphone array 401 and the SoC 407 is completed. However, the entire architecture of the first apparatus 400 in fig. 1 is complex in design, and not only the ADC 405 and the MCU406 and the corresponding power chip are additionally added, but also the component cost of the entire architecture is increased. Moreover, when the firmware program of the MCU406 needs to be updated, the SoC 407 can only transmit the corresponding firmware program to the MCU406 through the USB interface, so that the MCU406 completes the upgrade of the firmware program. At this time, when transmitting the audio signal, the first device 400 also needs to use the USB interface, so that the first device 400 cannot implement the remote voice interaction function during the firmware program upgrade process of the MCU 406.
In order to solve the above problems, embodiments of the present application provide a far-field speech interaction device and an electronic device, where an algorithm module for processing a digital signal corresponding to an echo reference signal and a digital signal corresponding to an original audio signal may be integrated with a control unit in the electronic device, so as to complete a far-field speech interaction function of the electronic device, implement direct connection between a microphone array and the control unit, optimize an overall design of a hardware architecture, avoid an influence of other components on far-field speech interaction power consumption, and reduce component costs of the entire hardware architecture.
Illustratively, the embodiment of the application can provide a far-field voice interaction device.
In the following, a specific implementation of the field voice interaction apparatus is described with reference to fig. 2 to 4.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a far-field speech interaction device according to an embodiment of the present application. As shown in fig. 2, the far-field speech interaction apparatus 100 according to the embodiment of the present application may include: a receiving circuit 101, an analog-to-digital conversion circuit 102, an output circuit 103, a power amplification circuit 104, and a control unit 105.
Wherein, a microphone array is arranged in the receiving circuit 101, and the microphone array is used for collecting original audio signals. The number, type and other parameters of the microphones in the microphone array are not limited in the application.
The output circuit 103 is provided with a speaker array for outputting voice data synthesized by the control unit 105 based on the original audio signal and the echo reference signal. The application does not limit the parameters such as the number and the type of the loudspeakers in the loudspeaker array.
The analog-to-digital conversion circuit 102 is used for implementing functions such as analog-to-digital conversion, and the power amplification circuit 104 is used for implementing functions such as power amplification.
In addition, the present application does not limit the specific implementation of the analog-to-digital conversion circuit 102, the power amplification circuit 104, and the control unit 105. For example, the analog-to-digital conversion circuit 102 may include a mode converter and peripheral circuits, the power amplification circuit 104 may include a power amplifier and peripheral circuits, and the control unit 105 may be an SoC. The peripheral circuit can be a resistor, a capacitor, an inductor and other components.
The receiving circuit 101 is electrically connected to the analog-to-digital conversion circuit 102, and the output circuit 103 is electrically connected to the power amplification circuit 104. The control unit 105 includes a first set of pins (shown labeled "I2S 0") and a second set of pins (shown labeled "I2S 1").
The first group of pins may include a first data pin, a second data pin, a clock pin, and a bit select pin. The second set of pins may include a data pin, a clock pin, and a bit select pin.
In some embodiments, the number of the first data pins is M, the total number of microphones in the microphone array is 2M, and M is a positive integer; the number of the second data pins is 1.
For example, when the microphone array is 4mic, the number of the first data pins is 2, and each pin transmits two channels of audio signals. As another example, when the microphone array is 8mic, the number of the first data pins is 4, and each pin transmits two channels of audio signals.
For convenience of explanation, the electrical connection relationship of each pin will be described below by taking 4mic as an example.
A first data pin (indicated in fig. 2 by the labels "I2S 0 DIN 0" and "I2S 0 DIN 1"), a clock pin (indicated in fig. 2 by the label "I2S 0 BCLK"), and a bit select pin (indicated in fig. 2 by the label "I2S 0 WS") in the first group of pins are electrically connected to the analog-to-digital conversion circuit 102, and a second data pin (indicated in fig. 2 by the label "I2S 0 DIN 2") in the first group of pins are electrically connected to the power amplification circuit 104.
One data pin (indicated by the label "I2S 1 DOUT 0" in fig. 2), one clock pin (indicated by the label "I2S 1 BCLK" in fig. 2), and one bit select pin (indicated by the label "I2S 1 WS" in fig. 2) in the second set of pins are all electrically connected to the power amplifier circuit 104.
In some embodiments, one clock pin of the first set of pins (illustrated in FIG. 2 by the label "I2S 0 BCLK") is electrically connected to one clock pin of the second set of pins (illustrated in FIG. 2 by the label "I2S 1 BCLK"), and one bit select pin of the first set of pins (illustrated in FIG. 2 by the label "I2S 0 WS") is electrically connected to one bit select pin of the second set of pins (illustrated in FIG. 2 by the label "I2S 1 WS").
Thus, the two sets of pins of the control unit 105 are enabled to maintain the same clock signal and bit select signal, reducing the amount of signals provided by the control unit 105 for various types of signals.
In the embodiment of the present application, the receiving circuit 101 collects an original audio signal, the receiving circuit 101 outputs an analog signal to the analog-to-digital conversion circuit 102, and the analog-to-digital conversion circuit 102 converts the analog signal into a digital signal in the I2S format. The analog-to-digital conversion circuit 102 transmits the digital signal in the I2S format to the control unit 105 through a plurality of data pins among the first data pins.
Meanwhile, the echo reference signal is taken from the digital output of the power amplification circuit 104. The power amplification circuit 104 transmits a digital signal in the I2S format to the control unit 105 through one of the first data pins.
The control unit 105 receives the digital signal corresponding to the original audio signal and the digital signal corresponding to the echo reference signal through different data pins.
Therefore, the original audio signal collected by the receiving circuit 101 is only output to the control unit 105 through the analog-to-digital conversion circuit 102, and the echo reference signal provided by the power amplification circuit 104 is directly transmitted to the control unit 105 without passing through other components.
An algorithm module, such as a far-field speech apk (Android application package), is integrated in the control unit 105. The algorithm module can perform speech algorithm processing such as signal noise reduction, Echo Cancellation (AEC), sound source localization, beam forming and the like on the digital signal corresponding to the Echo reference signal and the digital signal corresponding to the original audio signal.
A speech engine is also integrated in the control unit 105. The algorithm module can transmit the processed audio data to the voice engine, so that the voice engine can complete the identification of the awakening words on the processed audio data.
Upon recognizing that the wake-up word is included in the processed audio data, the speech engine will trigger a wake-up event. After the wake-up event is triggered, the voice engine converts the processed audio data and transmits the converted audio data to the cloud server, so that the cloud server performs voice recognition and semantic understanding on the converted audio data.
The cloud server transmits the synthesized voice data to the control unit 105 by voice synthesis. The control unit 105 transmits the synthesized voice data to the output circuit 103 through the power amplifying circuit 104 via one data pin of the second group of pins, and thus the whole process of the far-field voice interaction function is completed.
It should be noted that, in addition to the way of interacting with the cloud server, the control unit 105 may also perform speech recognition and semantic understanding on the processed audio data by itself, which is not limited in this embodiment of the application.
In the embodiment of the present application, based on the aforementioned electrical connection relationship, the analog-to-digital conversion circuit may convert the original audio signal collected by the receiving circuit into a digital signal in the format of I2S, and transmit the digital signal in the format of I2S to the control unit. Meanwhile, the power amplifying circuit may convert the echo reference signal into a digital signal of I2S format and transmit the digital signal of I2S format to the control unit. The control unit can perform voice algorithm processing such as signal noise reduction, echo cancellation, sound source positioning and beam forming on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data. Therefore, the far-field voice interaction function of the electronic equipment is completed, direct connection between the microphone array and the control unit is realized, the overall design of the hardware architecture is optimized, the influence of other components on the far-field voice interaction power consumption is avoided, and the component cost of the whole hardware architecture is reduced.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a far-field speech interaction device according to an embodiment of the present application.
As shown in fig. 3, the far-field speech interaction apparatus 200 according to the embodiment of the present application may include: a receiving circuit 201, an analog-to-digital conversion circuit 202, an output circuit 203, a power amplification circuit 204, and a control unit 205.
A microphone array is disposed in the receiving circuit 201, and the microphone array is used for collecting an original audio signal. The number, type and other parameters of the microphones in the microphone array are not limited in the application.
The output circuit 203 is provided with a speaker array for outputting voice data synthesized by the control unit 105 based on the original audio signal and the echo reference signal. The application does not limit the parameters such as the number and the type of the loudspeakers in the loudspeaker array.
The analog-to-digital conversion circuit 202 is used for realizing functions such as analog-to-digital conversion, and the power amplification circuit 204 is used for realizing functions such as power amplification.
In addition, the specific implementation of the analog-to-digital conversion circuit 202 and the power amplification circuit 204 control unit 205 is not limited in this application. For example, the analog-to-digital conversion circuit 202 may include a mode converter and peripheral circuits, the power amplification circuit 204 may include a power amplifier and peripheral circuits, and the control unit 205 may be an SoC. The peripheral circuit can be a resistor, a capacitor, an inductor and other components.
The receiving circuit 201 is electrically connected to the analog-to-digital conversion circuit 202, the output circuit 203 is electrically connected to the power amplification circuit 204, and the analog-to-digital conversion circuit 202 is also electrically connected between the output circuit 203 and the power amplification circuit 204.
The control unit 205 includes a third set of pins (shown in fig. 3 with the label "I2S 2") and a fourth set of pins (shown in fig. 3 with the label "I2S 3").
The third group of pins may include a third data pin, a clock pin, and a bit select pin. The fourth set of pins may include a data pin, a clock pin, and a bit select pin.
In some embodiments, the number of the third data pins is M +1, the total number of microphones in the microphone array is 2M, and M is a positive integer.
For example, when the microphone array is 4mic, the number of the third data pins is 3, two of the data pins are used for transmitting the original audio signal collected by 4mic, the remaining one data pin is used for transmitting the echo reference signal output by the power amplification circuit 204, and each pin transmits the audio signal of two channels.
For another example, when the microphone array is 8mic, the number of the third data pins is 5, where four data pins are used to transmit the original audio signal collected at 4mic, the remaining one data pin is used to transmit the echo reference signal output by the power amplification circuit 204, and each pin transmits two channels of audio signals.
For convenience of explanation, the electrical connection relationship of each pin will be described below by taking 4mic as an example.
A third data pin (illustrated in FIG. 3 by the labels "I2S 2DIN 0" and "I2S 2DIN 1"), a clock pin (illustrated in FIG. 3 by the label "I2S 2 BCLK"), and a bit select pin (illustrated in FIG. 3 by the label "I2S 2 WS") in the third set of pins are electrically connected to the analog-to-digital conversion circuit 202.
One data pin (indicated by the label "I2S 3 DOUT 0" in fig. 3), one clock pin (indicated by the label "I2S 3 BCLK" in fig. 3), and one bit select pin (indicated by the label "I2S 3 WS" in fig. 3) in the fourth set of pins are all electrically connected to the power amplifier circuit 204.
In some embodiments, one clock pin of the third set of pins (illustrated in FIG. 3 with the label "I2S 2 BCLK") is electrically connected to one clock pin of the fourth set of pins (illustrated in FIG. 3 with the label "I2S 3 BCLK"), and one bit select pin of the third set of pins (illustrated in FIG. 3 with the label "I2S 2 WS") is electrically connected to one bit select pin of the fourth set of pins (illustrated in FIG. 3 with the label "I2S 3 WS").
Thus, the two sets of pins of the control unit 205 are enabled to maintain the same clock signal and bit select signal, reducing the amount of signals provided by the control unit 205 for various types of signals.
In the embodiment of the present application, the receiving circuit 201 collects an original audio signal, the receiving circuit 201 outputs an analog signal to the analog-to-digital conversion circuit 202, and the analog-to-digital conversion circuit 202 converts the analog signal into a digital signal in the I2S format. The analog-to-digital conversion circuit 202 transmits the digital signal in the I2S format to the control unit 205 through a plurality of data pins among the third data pins.
Meanwhile, the echo reference signal is taken from the analog output of the power amplifier circuit 204. The power amplification circuit 204 transmits an analog signal to the analog-to-digital conversion circuit 202 through one of the third data pins. The analog-to-digital conversion circuit 202 transmits the digital signal in the I2S format to the control unit 205.
The control unit 105 receives the digital signal corresponding to the original audio signal and the digital signal corresponding to the echo reference signal through different data pins.
Therefore, the original audio signal collected by the receiving circuit 201 and the echo reference signal provided by the power amplifying circuit 204 are directly transmitted to the control unit 205 through the analog-to-digital conversion circuit 102 without passing through other components.
An algorithm module, such as a far-field speech apk (Android application package), is integrated in the control unit 205. The algorithm module can perform signal noise reduction, Echo Cancellation (AEC), sound source localization, and beamforming on the digital signal corresponding to the Echo reference signal and the digital signal corresponding to the original audio signal to perform voice algorithm processing.
A speech engine is also integrated in the control unit 205. The algorithm module can transmit the processed audio data to the voice engine, so that the voice engine can complete the identification of the awakening words on the processed audio data.
Upon recognizing that the wake-up word is included in the processed audio data, the speech engine will trigger a wake-up event. After the wake-up event is triggered, the voice engine converts the processed audio data and transmits the converted audio data to the cloud server, so that the cloud server performs voice recognition and semantic understanding on the converted audio data.
The cloud server transmits the synthesized voice data to the control unit 205 by voice synthesis. The control unit 205 transmits the synthesized voice data to the output circuit 203 through one data pin of the fourth group of pins via the power amplifying circuit 204, so as to complete the whole process of the far-field voice interaction function.
It should be noted that, in addition to the way of interacting with the cloud server, the control unit 205 may also perform speech recognition and semantic understanding on the processed audio data by itself, which is not limited in this embodiment of the application.
In the embodiment of the present application, based on the aforementioned electrical connection relationship, the analog-to-digital conversion circuit may convert the original audio signal collected by the microphone array into a digital signal in the format of I2S, and transmit the digital signal in the format of I2S to the control unit. Meanwhile, the analog-to-digital conversion circuit can also convert the echo reference signal output by the power amplification circuit into a digital signal in the format of I2S and transmit the digital signal in the format of I2S to the control unit. The control unit can perform voice algorithm processing such as signal noise reduction, echo cancellation, sound source positioning and beam forming on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data. Therefore, the far-field voice interaction function of the electronic equipment is completed, direct connection between the microphone array and the control unit is realized, the overall design of the hardware architecture is optimized, the influence of other components on the far-field voice interaction power consumption is avoided, and the component cost of the whole hardware architecture is reduced.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a far-field speech interaction device according to an embodiment of the present application.
As shown in fig. 4, the far-field speech interaction apparatus according to the embodiment of the present application may include: a receiving circuit 301, an analog-to-digital conversion circuit 302, an output circuit 303, a power amplification circuit 304, and a control unit 305.
A microphone array is disposed in the receiving circuit 301, and the microphone array is used for collecting an original audio signal. The number, type and other parameters of the microphones in the microphone array are not limited in the application.
The output circuit 303 is provided with a speaker array, and the speaker array is used for outputting voice data synthesized by the control unit 105 based on the original audio signal and the echo reference signal. The application does not limit the parameters such as the number and the type of the loudspeakers in the loudspeaker array.
The analog-to-digital conversion circuit 302 is used for realizing functions such as analog-to-digital conversion, and the power amplification circuit 304 is used for realizing functions such as power amplification.
In addition, the specific implementation of the analog-to-digital conversion circuit 302 and the control unit 305 of the power amplification circuit 304 is not limited in this application. For example, the analog-to-digital conversion circuit 302 may include a mode converter and peripheral circuits, the power amplification circuit 304 may include a power amplifier and peripheral circuits, and the control unit 305 may be an SoC. The peripheral circuit can be a resistor, a capacitor, an inductor and other components.
The receiving circuit 301 is electrically connected to an analog-to-digital conversion circuit 302, the output circuit 303 is electrically connected to a power amplification circuit 304, and the analog-to-digital conversion circuit 302 is also electrically connected between the output circuit 303 and the power amplification circuit 304.
The control unit 305 includes a third set of pins (shown in fig. 4 labeled "I2S 2") and a fourth set of pins (shown in fig. 4 labeled "I2S 3").
The third group of pins may include a third data pin, a clock pin, and a bit select pin. The fourth set of pins may include a data pin, a clock pin, and a bit select pin.
In some embodiments, the number of the third data pins is 1. The number of microphones in the microphone array is not limited in the present application. For example 4mic or 8 mic.
For convenience of explanation, the electrical connection relationship of each pin will be described below by taking 4mic as an example.
A third data pin (indicated by the label "I2S 2DIN 0" in FIG. 4), a clock pin (indicated by the label "I2S 2 BCLK" in FIG. 4), and a bit select pin (indicated by the label "I2S 2 WS" in FIG. 4) in the third set of pins are electrically connected to the analog-to-digital conversion circuit 302.
One data pin (indicated by the label "I2S 3 DOUT 0" in fig. 4), one clock pin (indicated by the label "I2S 3 BCLK" in fig. 4), and one bit select pin (indicated by the label "I2S 3 WS" in fig. 4) in the fourth set of pins are all electrically connected to the power amplifier circuit 304.
In some embodiments, one clock pin of the third set of pins (illustrated in FIG. 4 with the label "I2S 2 BCLK") is electrically connected to one clock pin of the fourth set of pins (illustrated in FIG. 4 with the label "I2S 3 BCLK"), and one bit select pin of the third set of pins (illustrated in FIG. 4 with the label "I2S 2 WS") is electrically connected to one bit select pin of the fourth set of pins (illustrated in FIG. 4 with the label "I2S 3 WS").
Thus, the two sets of pins of the control unit 305 are made to maintain the same clock signal and bit select signal, reducing the amount of signals that the control unit 305 provides various types of signals.
In this embodiment, the receiving circuit 301 collects an original audio signal, the receiving circuit 301 outputs an analog signal to the analog-to-digital conversion circuit 302, and the analog-to-digital conversion circuit 302 converts the analog signal into a digital signal in the I2S format. Analog-to-digital conversion circuit 302 transmits a digital signal in the format of I2S to control unit 305 through a third data pin in a third set of pins.
Also, the echo reference signal is taken from the analog output of the power amplification circuit 304. The power amplification circuit 304 transmits an analog signal to the analog-to-digital conversion circuit 302, and the analog-to-digital conversion circuit 302 converts the analog signal into a digital signal in the I2S format. The analog-to-digital conversion circuit 302 further transmits the digital signal in the I2S format to the control unit 305 through a third data pin of the third group of pins.
The control unit 105 receives a digital signal corresponding to the original audio signal and a digital signal corresponding to the echo reference signal through the same data pin, i.e., a third data pin (indicated by the symbol "I2S 2DIN 0" in fig. 4). Therefore, the analog-to-digital conversion circuit 302 adopts a Time-division multiplexing (TDM) mode configured by software, and after processing such as voltage division and filtering, transmits a digital signal corresponding to the original audio signal and a digital signal corresponding to the echo reference signal to the control unit 305 in a Time-division manner through the same data pin in the third group of pins, so that the control unit 305 receives the digital signal corresponding to the original audio signal and the digital signal corresponding to the echo reference signal through one data pin.
Therefore, the original audio signal collected by the receiving circuit 301 and the echo reference signal provided by the power amplifying circuit 304 are directly transmitted to the control unit 305 only through the analog-to-digital conversion circuit 102, and do not pass through other components.
An algorithm module, such as a far-field speech apk (Android application package), is integrated in the control unit 305. The algorithm module may perform speech algorithm processing for signal noise reduction, Echo Cancellation (AEC), sound source localization, and beamforming on the digital signal corresponding to the Echo reference signal and the digital signal corresponding to the original audio signal.
A speech engine is also integrated in the control unit 305. The algorithm module can transmit the processed audio data to the voice engine, so that the voice engine can complete the identification of the awakening words on the processed audio data.
Upon recognizing that the wake-up word is included in the processed audio data, the speech engine will trigger a wake-up event. After the wake-up event is triggered, the voice engine converts the processed audio data and transmits the converted audio data to the cloud server, so that the cloud server performs voice recognition and semantic understanding on the converted audio data.
The cloud server transmits the synthesized voice data to the control unit 305 by voice synthesis. The control unit 305 transmits the synthesized voice data to the output circuit 303 through the power amplification circuit 304 via one data pin of the fourth group of pins, so as to complete the whole process of the far-field voice interaction function.
It should be noted that, in addition to the way of interacting with the cloud server, the control unit 305 may also perform speech recognition and semantic understanding on the processed audio data by itself, which is not limited in this embodiment of the application.
In the embodiment of the present application, based on the aforementioned electrical connection relationship, the analog-to-digital conversion circuit may convert the original audio signal collected by the receiving circuit into a digital signal in the format of I2S, and transmit the digital signal in the format of I2S to the control unit through a data pin. In addition, the analog-to-digital conversion circuit can also convert an echo reference signal output by the power amplification circuit into a digital signal in an I2S format, and transmit the digital signal in the I2S format to the control unit through the same data pin. The control unit can perform voice algorithm processing such as signal noise reduction, echo cancellation, sound source positioning and beam forming on the digital signal corresponding to the echo reference signal and the digital signal corresponding to the original audio signal, so that the output circuit can output synthesized voice data. Therefore, the far-field voice interaction function of the electronic equipment is completed, direct connection between the microphone array and the control unit is realized, the number of pins of the control unit is reduced, the overall design of the hardware architecture is optimized, the influence of other components on the far-field voice interaction power consumption is avoided, and the component cost of the whole hardware architecture is reduced.
Since the data pins in the third set of pins are often multiplexed as functions. Thus, in some embodiments, the far-field speech device may further comprise: WIFI6 module 306. Correspondingly, the third group of pins may further include two fourth data pins (illustrated in fig. 4 with labels "I2S 2DIN 1" and "I2S 2DIN 2"). Two fourth data pins in the third group of pins are electrically connected with the WIFI6 module 306.
In the embodiment of the present application, two fourth data pins in the third group of pins are configured as high-speed data serial ports, such as UART4_ TXD and UART4_ RXD. Thus, the WIFI6 module 306 transmits WIFI data to the control unit 305 through two fourth data pins in the third group of pins, and data transmission between the WIFI6 module 306 and the control unit 305 is completed.
Therefore, coexistence of the remote language interaction function and the WIFI function is achieved. Moreover, due to the advantages of low delay and high transmission rate of the WIFI6 module 306, services such as language recognition and semantic understanding associated with the cloud server are smoother, so that the voice experience effect of the user is better.
Exemplarily, the embodiment of the application also provides an electronic device. The electronic device may include: a housing and the far field speech interaction device of the previous embodiment.
In some embodiments, the electronic device may include, but is not limited to, a television, a sound box, an air conditioner, a refrigerator, and other smart devices.
The electronic device of the embodiment of the present application may be configured to implement the aforementioned technical solution of the far-field speech interaction apparatus, and the implementation principle and the technical effect of the electronic device are similar, which are not described herein again.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (12)

1. A far-field speech interaction device, comprising:
the device comprises a receiving circuit, an analog-to-digital conversion circuit, an output circuit, a power amplification circuit and a control unit;
the receiving circuit is used for collecting original audio signals, the receiving circuit is electrically connected with the analog-to-digital conversion circuit, the output circuit is electrically connected with the power amplification circuit, the power amplification circuit is used for providing echo reference signals, and the output circuit is used for outputting voice data synthesized by the control unit based on the original audio signals and the echo reference signals;
the control unit comprises a first group of pins and a second group of pins;
a first data pin, a clock pin and a bit selection pin in the first group of pins are electrically connected with the analog-to-digital conversion circuit, and a second data pin in the first group of pins is electrically connected with the power amplification circuit;
and a data pin, a clock pin and a bit selection pin in the second group of pins are all electrically connected with the power amplification circuit.
2. The apparatus of claim 1, wherein a microphone array is disposed in the receiving circuit and a speaker array is disposed in the output circuit.
3. The apparatus of claim 2, wherein the number of the first data pins is M, the total number of microphones in the microphone array is 2M, and M is a positive integer; the number of the second data pins is 1.
4. The apparatus according to any one of claims 1 to 3,
one clock pin in the first group of pins is electrically connected with one clock pin in the second group of pins; and one bit selection pin in the first group of pins is electrically connected with one bit selection pin in the second group of pins.
5. A far-field speech interaction device, comprising:
the device comprises a receiving circuit, an analog-to-digital conversion circuit, an output circuit, a power amplification circuit and a control unit;
the receiving circuit is used for collecting original audio signals, the receiving circuit is electrically connected with the analog-to-digital conversion circuit, the output circuit is electrically connected with the power amplification circuit, the analog-to-digital conversion circuit is also electrically connected between the output circuit and the power amplification circuit, the power amplification circuit is used for providing echo reference signals, and the output circuit is used for outputting voice data synthesized by the control unit based on the original audio signals and the echo reference signals;
the control unit comprises a third group of pins and a fourth group of pins;
a third data pin, a clock pin and a bit selection pin in the third group of pins are electrically connected with the analog-to-digital conversion circuit;
and one data pin, one clock pin and one bit selection pin in the fourth group of pins are electrically connected with the power amplification circuit.
6. The apparatus of claim 5, wherein a microphone array is disposed in the receiving circuit and a speaker array is disposed in the output circuit.
7. The apparatus of claim 6, wherein the number of the third data pins is M +1, the total number of microphones in the microphone array is 2M, and M is a positive integer.
8. The apparatus of claim 5 or 6, wherein the number of the third data pins is 1.
9. The apparatus of claim 5 or 6, further comprising: a WIFI6 module; two fourth data pins in the third group of pins are electrically connected with the WIFI6 module.
10. The apparatus according to any one of claims 5 to 7,
one clock pin in the third group of pins is electrically connected with one clock pin in the fourth group of pins; and one bit selection pin in the third group of pins is electrically connected with one bit selection pin in the fourth group of pins.
11. An electronic device, comprising: a housing and the far-field speech interaction device of any of claims 1-4; or a housing and a far-field speech interaction device according to any of claims 5 to 10.
12. The device of claim 11, wherein the electronic device is a television or a sound box.
CN202022827416.4U 2020-07-03 2020-11-30 Far-field voice interaction device and electronic equipment Active CN213586241U (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202022827416.4U CN213586241U (en) 2020-11-30 2020-11-30 Far-field voice interaction device and electronic equipment
PCT/CN2021/081502 WO2022001200A1 (en) 2020-07-03 2021-03-18 Display device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202022827416.4U CN213586241U (en) 2020-11-30 2020-11-30 Far-field voice interaction device and electronic equipment

Publications (1)

Publication Number Publication Date
CN213586241U true CN213586241U (en) 2021-06-29

Family

ID=76544387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202022827416.4U Active CN213586241U (en) 2020-07-03 2020-11-30 Far-field voice interaction device and electronic equipment

Country Status (1)

Country Link
CN (1) CN213586241U (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113823310A (en) * 2021-11-24 2021-12-21 南昌龙旗信息技术有限公司 Voice interruption wake-up circuit applied to tablet computer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113823310A (en) * 2021-11-24 2021-12-21 南昌龙旗信息技术有限公司 Voice interruption wake-up circuit applied to tablet computer

Similar Documents

Publication Publication Date Title
CN108260051B (en) Voice remote control system, portable transmission equipment and intelligent equipment
US10923138B2 (en) Sound collection apparatus for far-field voice
CN102820032B (en) Speech recognition system and method
CN110349582B (en) Display device and far-field voice processing circuit
CN213586241U (en) Far-field voice interaction device and electronic equipment
CN208691406U (en) Far field speech collecting system for smart television
CN102497609A (en) Wireless network sound system
CN106792321B (en) Split wireless earphone and communication method thereof
CN103079142B (en) A kind of wireless bass big gun bass time delay adjustable system and method
CN210270874U (en) Front-end audio test device and audio test system
US11600288B2 (en) Sound signal processing device
CN210016611U (en) Extensible sound box system based on A2B bus
WO2020216089A1 (en) Voice control system and method, and voice suite and voice apparatus
CN103501478B (en) Communication audio switching device and method in a kind of car
CN217135683U (en) Multi-channel far-field voice circuit
CN216565265U (en) Distributed extensible teleconference system
CN212231697U (en) Voice acquisition device
CN112218205B (en) Annular network audio system based on INIC
CN205647937U (en) Audio device
CN110366017A (en) A kind of smart television voice cam device and intelligent TV set
CN216122906U (en) Audio wireless transmission device and audio transmission system
CN208227287U (en) Intelligent audio processing equipment
CN113823310B (en) Voice interruption wake-up circuit applied to tablet computer
CN208890784U (en) One kind is for the received radio station of land sky call
CN201063796Y (en) Wireless image signal transmission apparatus

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant