CN107123423B - Voice pickup device and multimedia equipment - Google Patents

Voice pickup device and multimedia equipment Download PDF

Info

Publication number
CN107123423B
CN107123423B CN201710423629.0A CN201710423629A CN107123423B CN 107123423 B CN107123423 B CN 107123423B CN 201710423629 A CN201710423629 A CN 201710423629A CN 107123423 B CN107123423 B CN 107123423B
Authority
CN
China
Prior art keywords
voice
unit
face
acquisition unit
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710423629.0A
Other languages
Chinese (zh)
Other versions
CN107123423A (en
Inventor
于豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Whaley Technology Co Ltd
Original Assignee
Whaley Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whaley Technology Co Ltd filed Critical Whaley Technology Co Ltd
Priority to CN201710423629.0A priority Critical patent/CN107123423B/en
Publication of CN107123423A publication Critical patent/CN107123423A/en
Application granted granted Critical
Publication of CN107123423B publication Critical patent/CN107123423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention provides a voice pickup device and a multimedia device. The device comprises: an image acquisition unit for acquiring an image; the face recognition unit is electrically connected with the image acquisition unit and is used for carrying out face recognition on the acquired image; the voice acquisition unit is used for acquiring voice signals; a steering adjusting unit connected with the voice collecting unit for adjusting the direction of the voice collecting unit; and the processing control unit is electrically connected with the image acquisition unit, the face recognition unit, the voice acquisition unit and the steering adjustment unit. When the face recognition unit recognizes that a face exists in the acquired image, the processing control unit controls the steering adjustment unit to enable the voice acquisition unit to be aligned to the position of the face, and controls the voice acquisition unit to acquire a voice signal of the position of the face. The device can automatically turn to and carry out directional voice pickup on voice signals sent by the user according to the face position of the user, and external noise interference is reduced.

Description

Voice pickup device and multimedia equipment
Technical Field
The invention relates to the technical field of audio pickup, in particular to a voice pickup device and multimedia equipment.
Background
With the continuous development of audio pickup technology, the application of audio pickup technology is becoming more and more extensive. However, there are many technical problems to be solved in the audio pickup technology itself. Taking a voice pickup technology as an example, in terms of the present, although a voice pickup device circulating in the market can perform voice pickup on a voice signal sent by a user, the voice pickup device usually performs voice pickup on a large amount of external noise while picking up the voice signal sent by the user, so that the voice signal picked by the voice pickup device is mixed with a large amount of external noise, the corresponding voice recognition accuracy is not high, and the voice pickup distance is short.
Disclosure of Invention
In order to overcome the above-mentioned deficiencies in the prior art, an object of the present invention is to provide a voice pickup apparatus and a multimedia device, which can automatically steer to perform directional voice pickup on a voice signal sent by a user according to the face position of the user, reduce external noise interference, and improve the corresponding voice recognition accuracy and voice pickup distance.
As for the voice pickup apparatus, a preferred embodiment of the present invention provides a voice pickup apparatus. The device comprises:
an image acquisition unit for acquiring an image;
the face recognition unit is electrically connected with the image acquisition unit and is used for carrying out face recognition on the image acquired by the image acquisition unit;
the voice acquisition unit is used for acquiring voice signals;
the steering adjusting unit is connected with the voice acquisition unit and is used for adjusting the direction of the voice acquisition unit; and
the processing control unit is electrically connected with the image acquisition unit, the face recognition unit, the voice acquisition unit and the steering adjustment unit;
when the face recognition unit recognizes that a face exists in the image collected by the image collection unit, the processing control unit controls the operation of the steering adjustment unit to enable the voice collection unit to align the position of the face in the image, and controls the voice collection unit to collect voice signals of the position of the face.
In a preferred embodiment of the present invention, the apparatus further includes a noise collecting unit for collecting noise signals in the surrounding environment of the apparatus;
the processing control unit is electrically connected with the noise acquisition unit so as to perform denoising processing on the voice signal acquired by the voice acquisition unit according to the noise signal acquired by the noise acquisition unit, and obtain denoised voice information.
In a preferred embodiment of the present invention, the apparatus further includes a network communication unit for performing data interaction;
the device is in communication connection with the server through the network communication unit so as to send the voice information subjected to noise elimination and obtained by the processing control unit to the server for voice recognition or receive a control instruction obtained after the voice recognition is carried out on the voice information subjected to noise elimination by the server.
In a preferred embodiment of the present invention, the apparatus further comprises a speech recognition unit for performing speech recognition;
the voice recognition unit is electrically connected with the processing control unit to perform voice recognition on the voice information after the noise removal obtained by the processing control unit to obtain a corresponding control instruction.
In a preferred embodiment of the present invention, when the number of the faces identified by the face identification unit is multiple, the processing control unit obtains the control authority corresponding to each face, and controls the steering adjustment unit to drive the voice acquisition unit to face the direction of the face with the highest control authority, so as to acquire the voice signal of the direction of the face.
In a preferred embodiment of the present invention, the voice collecting unit includes at least one voice sampling microphone, the noise collecting unit includes at least one noise sampling microphone, and the at least one voice sampling microphone cooperates with the at least one noise sampling microphone to form a microphone array, so as to collect a voice signal of a position where a face is located and a noise signal in a surrounding environment of the voice pickup device respectively.
In a preferred embodiment of the present invention, the processing control unit includes an audio processing subunit;
the audio processing subunit is used for amplifying the noise signal acquired by the noise acquisition unit and the voice signal acquired by the voice acquisition unit, performing phase inversion processing on the amplified noise signal, and mixing and superposing the amplified noise signal and the amplified voice signal to eliminate the noise signal in the voice signal and obtain the voice information after noise elimination.
In a preferred embodiment of the present invention, the sliding member includes a sliding block accommodated in the sliding groove, and the engaging member is fixedly connected to the sliding block, so that the engaging member can slide relative to the light bar bearing plate in a direction away from the supporting pillar.
In a preferred embodiment of the present invention, the apparatus further includes a rotation control unit connected to the image capturing unit and configured to control an image capturing direction of the image capturing unit.
In a preferred embodiment of the present invention, when the face recognition unit recognizes that a face exists in the image acquired by the image acquisition unit, the processing control unit processes face information of the recognized face to obtain an orientation corresponding to a mouth in the face, and controls the steering adjustment unit according to the orientation corresponding to the mouth, so that the voice acquisition unit aligns with the orientation corresponding to the mouth to acquire a voice signal.
In terms of a multimedia device, a preferred embodiment of the present invention provides a multimedia device. The multimedia equipment comprises the voice pickup device, carries out voice recognition on the voice signals collected by the voice pickup device, obtains control instructions matched with the voice signals, and executes corresponding operations according to the control instructions.
Compared with the prior art, the voice pickup apparatus and the multimedia device provided by the preferred embodiment of the invention have the following beneficial effects: the voice pickup device and the multimedia equipment can automatically turn to the voice signal sent by the user according to the face position of the user to carry out directional voice pickup, reduce the interference of external noise and improve the corresponding voice recognition accuracy and voice pickup distance. Specifically, the voice pickup device acquires an image through an image acquisition unit; performing face recognition on the acquired image through a face recognition unit electrically connected with the image acquisition unit; collecting voice signals through a voice collecting unit; the orientation of the voice acquisition unit is adjusted through a steering adjustment unit connected with the voice acquisition unit; through with the image acquisition unit, the face recognition unit, the pronunciation acquisition unit, turn to adjustment unit electric connection's processing control unit, when face recognition unit discerns to have the people's face in the image that the image acquisition unit gathered, the control turns to adjustment unit's operation so that the pronunciation acquisition unit aims at the position at people's face place in the image, and control pronunciation acquisition unit and gather the speech signal in people's face place position, thereby realize the directional pickup of the speech signal that sends the user, reduce external noise interference, improve corresponding speech recognition degree of accuracy and pronunciation and pick up the distance.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the claims, and it is obvious for those skilled in the art that other related drawings can be obtained according to these drawings without inventive efforts.
Fig. 1 is a block diagram of a voice pickup apparatus according to a first embodiment of the present invention.
Fig. 2 is a block diagram of a voice pickup apparatus according to a second embodiment of the present invention.
Fig. 3 is a block diagram of a voice pickup apparatus according to a third embodiment of the present invention.
Fig. 4 is a block diagram of a voice pickup apparatus according to a fourth embodiment of the present invention.
Fig. 5 is a block diagram of a voice pickup apparatus according to a fifth embodiment of the present invention.
Icon: 100-voice pickup means; 110-an image acquisition unit; 120-a face recognition unit; 130-a voice acquisition unit; 140-a steering adjustment unit; 150-a process control unit; 160-a noise collection unit; 170-a network communication unit; 180-a speech recognition unit; 190-a rotation control unit; 151-audio processing sub-unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be further noted that the terms "upper", "lower", "left", "right", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or orientations or positional relationships conventionally put on products of the present invention when used, and are only used for convenience of description and simplification of description, but do not indicate or imply that the devices or elements referred to must have specific orientations, be constructed in specific orientations, and be operated, and thus, should not be construed as limiting the present invention.
How to provide a voice pickup device and multimedia equipment that can automatically turn to and carry out directional voice pickup on voice signals sent by a user according to the face position of the user, reduce external noise interference, and improve the corresponding voice recognition accuracy and voice pickup distance is a technical problem that needs to be solved urgently for those skilled in the art.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
The first embodiment:
fig. 1 is a block diagram of a voice pickup apparatus 100 according to a first embodiment of the present invention. In the embodiment of the present invention, the voice pickup apparatus 100 is used for directionally picking up a voice signal sent by a user, so as to improve the accuracy of corresponding voice recognition. The voice pickup apparatus 100 includes an image acquisition unit 110, a face recognition unit 120, a voice acquisition unit 130, a steering adjustment unit 140, and a processing control unit 150.
In an embodiment of the present invention, the image capturing unit 110 is used for capturing an image. The image acquisition unit 110 includes a camera, and the image acquisition unit 110 acquires an image of an environment around the voice pickup apparatus 100 through the camera to determine a specific position of a user of the voice pickup apparatus 100, so as to facilitate directional voice pickup.
In this embodiment, the camera may fixedly acquire an image of the environment information in a specific direction, or may acquire images of the environment information in different directions according to requirements, and in a specific case, the user or the manufacturer of the voice pickup apparatus 100 may perform different settings according to different requirements. In the present embodiment, the camera may be, but is not limited to, a digital camera, an analog camera, and the like.
In an embodiment of the present invention, the face recognition unit 120 is configured to perform face recognition on the image acquired by the image acquisition unit 110 to determine an orientation of a user of the voice pickup apparatus 100 in the image. Specifically, in this embodiment, the face recognition unit 120 is electrically connected to the image acquisition unit 110, so as to perform face recognition on the image acquired by the image acquisition unit 110. In this embodiment, when the face recognition unit 120 recognizes that a face exists in the image acquired by the image acquisition unit 110, the image is analyzed to acquire face information corresponding to the face from the image, and according to the face information and the position of the face in the image, an orientation where a user corresponding to the face is located in the surrounding environment of the voice pickup apparatus 100 and an orientation corresponding to the face are obtained.
In this embodiment, the face recognition unit 120 may be connected to a cloud computing server through a network, so as to perform face recognition on the image acquired by the image acquisition unit 110 in cooperation with the computing capability of the cloud computing server; the face recognition of the image can also be realized only by a software functional module and/or a hardware module included in the face recognition unit 120 for performing face recognition.
In the embodiment of the present invention, the voice collecting unit 130 is used for collecting a voice signal. The voice collecting unit 130 may collect a voice signal of a position where a face is located when the face recognizing unit 120 recognizes that the face exists in the image collected by the image collecting unit 110, so as to realize directional voice pickup and reduce external noise interference.
In this embodiment, the voice collecting unit 130 includes at least one voice sampling microphone, and the voice collecting unit 130 performs voice collection on a voice signal of a position where a face corresponding to a user of the voice pickup apparatus 100 is located through the at least one voice sampling microphone. In the present embodiment, the voice sampling microphone is preferably a microphone having high sensitivity and high directivity, and the microphone may be, but not limited to, an electric microphone, a condenser microphone, an electromagnetic microphone, a piezoelectric microphone, a semiconductor microphone, and the like.
In this embodiment of the present invention, the steering adjustment unit 140 is configured to adjust the orientation of the voice acquisition unit 130, so that the voice acquisition unit 130 can align with the direction in which the face of the user is located in the image acquired by the face identification unit 120, and the voice acquisition unit 130 acquires the voice signal of the direction in which the face of the user is located, so as to implement directional voice pickup for the user corresponding to the face of the user. In this embodiment, the steering adjustment unit 140 is connected to the voice collecting unit 130 to adjust the orientation of the voice collecting unit 130.
In this embodiment, the steering adjustment unit 140 includes a steering component for driving the voice sampling microphone in the voice acquisition unit 130 to perform steering adjustment and a steering control component for controlling the steering component. The steering component is directly connected with the voice sampling microphone in the voice acquisition unit 130, so that the voice sampling microphone in the voice acquisition unit 130 can be adjusted in the directions of up, down, left and right. The steering control component is electrically connected with the steering component to control the rotating component to drive the voice sampling microphone in the voice acquisition unit 130 to perform voice acquisition on the voice signal of the position of the face according to the position information of the face identified by the face identification unit 120.
In the embodiment of the present invention, the processing control unit 150 is configured to process the signal and control other unit modules in the voice pickup apparatus 100 according to a processing result. Specifically, in this embodiment, the processing control unit 150 is electrically connected to the image acquisition unit 110, the face recognition unit 120, the voice acquisition unit 130, and the steering adjustment unit 140, so that when the face recognition unit 120 recognizes that a face exists in an image acquired by the image acquisition unit 110, the steering adjustment unit 140 is controlled to operate according to the direction information corresponding to the face, so that the voice acquisition unit 130 aligns with the direction of the face in the image, and controls the voice acquisition unit 130 to acquire a voice signal of the direction of the face, thereby realizing directional voice pickup of a user corresponding to the face.
In this embodiment, the processing control unit 150 includes a memory, and the processing control unit 150 may store the face information of the specific user of the voice pickup apparatus 100 and the corresponding control authority through the memory, so that when the number of the faces identified by the face identification unit 120 is multiple, the direction where the face with the highest control authority is located in the identified faces is selected as the direction to which the voice acquisition unit 130 should be directed. Specifically, when the number of faces recognized by the face recognition unit 120 is plural, the processing control unit 150 matches the face information of each face recognized by the face recognition unit 120 with the face information of the specific user stored in the memory. When the matching is successful, the face successfully matched can search the corresponding control authority in the memory through the relation between the face information of the specific user and the corresponding control authority; when the face is not matched successfully, the control authority corresponding to the face which is not matched successfully is defaulted to be the lowest control authority; when the control authority of each identified face is the lowest control authority, the processing control unit 150 randomly selects one face from the faces as the face with the highest control authority. When the processing control unit 150 acquires the position corresponding to the face with the highest control authority, the steering adjustment unit 140 is controlled to drive the voice acquisition unit 130 to face the position of the face with the highest control authority, so as to perform voice acquisition on the voice signal of the position of the face and realize corresponding directional voice pickup. In this embodiment, the memory may be, but is not limited to, a random access memory, a read only memory, a programmable read only memory, an erasable read only memory, an electrically erasable read only memory, and the like.
In this embodiment, when the face recognition unit 120 recognizes that a face exists in the image acquired by the image acquisition unit 110, the processing control unit 150 processes the face information of the recognized face to obtain an orientation corresponding to the mouth in the face, and controls the steering adjustment unit 140 to drive the voice acquisition unit 130 to align with the orientation of the mouth of the face according to the orientation corresponding to the mouth, so that the voice acquisition unit 130 acquires the voice signal in the orientation corresponding to the mouth.
In this embodiment, the processing control unit 150 may implement the entry of facial information of a specific user through a camera in the image capturing unit 110, and may implement the entry of the control authority of the specific user through an external input device. The processing control unit 150 may also perform entry of facial information and corresponding control authority of a specific user through a network. The specific recording mode can be set differently according to requirements.
Second embodiment:
fig. 2 is a block diagram of a voice pickup apparatus 100 according to a second embodiment of the present invention. In the embodiment of the present invention, the shape structure, the operation principle and the obtained technical effect of the voice pickup apparatus 100 provided by the second embodiment are similar to those of the voice pickup apparatus 100 provided by the first embodiment, except that the voice pickup apparatus 100 provided by the second embodiment may further include a noise collecting unit 160, and the processing control unit 150 further includes an audio processing subunit 151 for processing an audio signal.
In the embodiment of the present invention, the noise collecting unit 160 is configured to collect noise signals in the surrounding environment of the voice pickup apparatus 100. Specifically, the processing control unit 150 is electrically connected to the noise collecting unit 160, so that when the voice collecting unit 130 collects the voice signal in the direction corresponding to the face recognized by the face recognizing unit 120, the noise collecting unit 160 is controlled to collect the noise signal in the surrounding environment of the voice pickup apparatus 100, and the voice signal collected by the voice collecting unit 130 is subjected to denoising processing according to the noise signal collected by the noise collecting unit 160, so as to obtain denoised voice information.
In this embodiment, the noise collecting unit 160 includes at least one noise sampling microphone, and the noise collecting unit 160 collects the noise signal through the at least one noise sampling microphone. At least one of the voice sampling microphones in the voice collecting unit 130 and at least one of the noise sampling microphones in the noise collecting unit 160 form a microphone array for collecting audio signals in the air. The audio signal includes a voice signal of the direction of the face and a noise signal in the surrounding environment of the voice pickup apparatus 100. In one embodiment of this embodiment, the number of the noise sampling microphones is preferably an even number, and the noise sampling microphones are respectively disposed on both sides of the voice sampling microphone.
In this embodiment, the orientation of the noise sampling microphone may be fixed, and the orientation of the noise sampling microphone may be directly in front of the voice pickup apparatus 100, or may be a direction that is deflected to an angle toward the outside of the voice pickup apparatus 100, specifically, the orientation may be set by a user or a manufacturer of the voice pickup apparatus 100 according to different needs. The noise sampling microphone may be, but not limited to, an electrodynamic microphone, a condenser microphone, an electromagnetic microphone, a piezoelectric microphone, a semiconductor microphone, and the like.
In this embodiment of the present invention, the processing control unit 150 performs denoising processing on the voice signal acquired by the voice acquisition unit 130 through the audio processing subunit 151. Specifically, in an implementation manner of this embodiment, when acquiring the voice signal acquired by the voice acquisition unit 130 and the noise signal acquired by the noise acquisition unit 160, the audio processing subunit 151 respectively amplifies the voice signal acquired by the voice acquisition unit 130 and the noise signal acquired by the noise acquisition unit 160, performs inverse phase processing on the amplified noise signal, and performs mixing and superposition on the processed noise signal and the amplified voice signal to eliminate the noise signal in the voice signal, so as to obtain the voice information after denoising. In another implementation manner of this embodiment, the audio processing subunit 151 may also perform amplification processing on the noise signal collected by the noise collection unit 160 and the voice signal collected by the voice collection unit 130, and then perform filtering processing on the amplified voice signal according to the amplified noise signal to obtain the voice information after denoising.
The third embodiment:
fig. 3 is a block diagram of a voice pickup apparatus 100 according to a third embodiment of the present invention. In the embodiment of the present invention, the shape structure, the operation principle, and the obtained technical effects of the voice pickup apparatus 100 provided by the third embodiment are similar to those of the voice pickup apparatus 100 provided by the second embodiment, except that the voice pickup apparatus 100 provided by the third embodiment may further include a voice recognition unit 180.
In the embodiment of the present invention, the voice recognition unit 180 is used for performing voice recognition on a voice signal. Specifically, the voice recognition unit 180 is electrically connected to the processing control unit 150, so as to perform voice recognition on the voice information after denoising, which is obtained by the processing control unit 150, to obtain a corresponding control instruction. The control instruction is used to control the electronic device including the voice pickup apparatus 100, and the control instruction corresponds to the voice information after denoising.
The fourth embodiment:
fig. 4 is a block diagram of a voice pickup apparatus 100 according to a fourth embodiment of the present invention. In the embodiment of the present invention, the shape structure, the operation principle, and the obtained technical effects of the voice pickup apparatus 100 provided by the fourth embodiment are similar to those of the voice pickup apparatus 100 provided by the second embodiment, except that the voice pickup apparatus 100 provided by the fourth embodiment may further include a network communication unit 170.
In the embodiment of the present invention, the network communication unit 170 is configured to perform data interaction. The network communication unit 170 is electrically connected to the processing control unit 150, and the voice pickup apparatus 100 is in communication connection with a server through the network communication unit 170, so as to send the voice information after de-noising obtained by the processing control unit 150 to the server for voice recognition, or receive a control instruction obtained by the server after voice recognition of the voice information after de-noising, wherein the control instruction is used for controlling an electronic device including the voice pickup apparatus 100, and the control instruction corresponds to the voice information after de-noising.
Fifth embodiment:
fig. 5 is a block diagram of a voice pickup apparatus 100 according to a fifth embodiment of the present invention. In the embodiment of the present invention, the shape configuration, the operation principle, and the obtained technical effects of the voice pickup apparatus 100 provided by the fifth embodiment are similar to those of the voice pickup apparatus 100 provided by the fourth embodiment, except that the voice pickup apparatus 100 provided by the fifth embodiment may further include a rotation control unit 190.
In the embodiment of the present invention, the rotation control unit 190 is connected to the image capturing unit 110 to control the image capturing direction of the image capturing unit 110. Specifically, the rotation control unit 190 includes a rotation assembly for driving the camera in the image acquisition unit 110 to adjust the image acquisition direction and a rotation control assembly for controlling the rotation assembly. The rotating assembly is directly connected with the camera in the image acquisition unit 110, and the rotation control assembly is electrically connected with the rotating assembly to control the camera in the image acquisition unit 110 to rotate according to a preset rotation strategy, so that image acquisition of environmental information in different directions is realized.
In the invention, the embodiment of the invention also provides a multimedia device. The multimedia device includes the voice pickup apparatus 100 provided in any one of the first, second, third, fourth, and fifth embodiments. The multimedia device performs voice recognition on the voice signal acquired by the voice pickup device 100 according to the direction of the face of the user, obtains a control instruction matched with the voice signal, and executes corresponding operation according to the control instruction. In this embodiment, the multimedia device may be, but is not limited to, a smart speaker, a smart television, a smart washing machine, a smart refrigerator, a smart robot, and the like.
In summary, in the voice pickup apparatus and the multimedia device provided in the preferred embodiment of the present invention, the voice pickup apparatus and the multimedia device can automatically turn to perform directional voice pickup on the voice signal sent by the user according to the face position of the user, so as to reduce external noise interference and improve the corresponding voice recognition accuracy and voice pickup distance. Specifically, the voice pickup device acquires an image through an image acquisition unit; performing face recognition on the acquired image through a face recognition unit electrically connected with the image acquisition unit; collecting voice signals through a voice collecting unit; the orientation of the voice acquisition unit is adjusted through a steering adjustment unit connected with the voice acquisition unit; through with the image acquisition unit, the face recognition unit, the pronunciation acquisition unit, turn to adjustment unit electric connection's processing control unit, when face recognition unit discerns to have the people's face in the image that the image acquisition unit gathered, the control turns to adjustment unit's operation so that the pronunciation acquisition unit aims at the position at people's face place in the image, and control pronunciation acquisition unit and gather the speech signal in people's face place position, thereby realize the directional pickup of the speech signal that sends the user, reduce external noise interference, improve corresponding speech recognition degree of accuracy and pronunciation and pick up the distance.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A voice pickup apparatus, characterized in that the apparatus comprises:
an image acquisition unit for acquiring an image;
the face recognition unit is electrically connected with the image acquisition unit and is used for carrying out face recognition on the image acquired by the image acquisition unit;
the voice acquisition unit is used for acquiring voice signals;
the steering adjusting unit is connected with the voice acquisition unit and is used for adjusting the direction of the voice acquisition unit; and
the processing control unit is electrically connected with the image acquisition unit, the face recognition unit, the voice acquisition unit and the steering adjustment unit;
when the face recognition unit recognizes that a face exists in the image acquired by the image acquisition unit, the processing control unit controls the operation of the steering adjustment unit so that the voice acquisition unit is aligned with the position of the face in the image, and controls the voice acquisition unit to acquire the voice signal of the position of the face;
the processing control unit is specifically configured to, when the face recognition unit recognizes that a face exists in the image acquired by the image acquisition unit, process face information of the recognized face to obtain an orientation corresponding to a mouth in the face, and control the steering adjustment unit according to the orientation corresponding to the mouth to enable the voice acquisition unit to align with the orientation corresponding to the mouth to acquire a voice signal;
the processing control unit obtains the control authority corresponding to each face when the number of the faces identified by the face identification unit is multiple, and controls the steering adjustment unit to drive the voice acquisition unit to face the position where the face with the highest control authority is located so as to acquire the voice signal of the position where the face is located.
2. The device of claim 1, further comprising a noise acquisition unit for acquiring noise signals in the environment surrounding the device;
the processing control unit is electrically connected with the noise acquisition unit so as to perform denoising processing on the voice signal acquired by the voice acquisition unit according to the noise signal acquired by the noise acquisition unit, and obtain denoised voice information.
3. The apparatus of claim 2, wherein the apparatus further comprises a network communication unit for data interaction;
the device is in communication connection with the server through the network communication unit so as to send the voice information subjected to noise elimination and obtained by the processing control unit to the server for voice recognition or receive a control instruction obtained after the voice recognition is carried out on the voice information subjected to noise elimination by the server.
4. The apparatus according to claim 2, wherein the apparatus further comprises a speech recognition unit for performing speech recognition;
the voice recognition unit is electrically connected with the processing control unit to perform voice recognition on the voice information after the noise removal obtained by the processing control unit to obtain a corresponding control instruction.
5. The apparatus of claim 2, wherein the voice collecting unit comprises at least one voice sampling microphone, the noise collecting unit comprises at least one noise sampling microphone, and at least one of the voice sampling microphones cooperates with at least one of the noise sampling microphones to form a microphone array for collecting voice signals of the face position and noise signals of the surrounding environment of the voice pickup apparatus.
6. The apparatus of claim 2, wherein the processing control unit comprises an audio processing subunit;
the audio processing subunit is used for amplifying the noise signal acquired by the noise acquisition unit and the voice signal acquired by the voice acquisition unit, performing phase inversion processing on the amplified noise signal, and mixing and superposing the amplified noise signal and the amplified voice signal to eliminate the noise signal in the voice signal and obtain the voice information after noise elimination.
7. The apparatus according to claim 1, further comprising a rotation control unit connected to the image capturing unit for controlling an image capturing direction of the image capturing unit.
8. A multimedia device, characterized in that the multimedia device comprises the voice pickup apparatus of any one of claims 1 to 7, and the multimedia device performs voice recognition on the voice signal collected by the voice pickup apparatus, obtains a control instruction matched with the voice signal, and executes a corresponding operation according to the control instruction.
CN201710423629.0A 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment Active CN107123423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710423629.0A CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710423629.0A CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Publications (2)

Publication Number Publication Date
CN107123423A CN107123423A (en) 2017-09-01
CN107123423B true CN107123423B (en) 2021-05-18

Family

ID=59730052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710423629.0A Active CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Country Status (1)

Country Link
CN (1) CN107123423B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109696658B (en) 2017-10-23 2021-08-24 京东方科技集团股份有限公司 Acquisition device, sound acquisition method, sound source tracking system and sound source tracking method
CN107864430A (en) * 2017-11-03 2018-03-30 杭州聚声科技有限公司 A kind of sound wave direction propagation control system and its control method
CN108615534B (en) * 2018-04-04 2020-01-24 百度在线网络技术(北京)有限公司 Far-field voice noise reduction method and system, terminal and computer readable storage medium
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN110767228B (en) * 2018-07-25 2022-06-03 杭州海康威视数字技术股份有限公司 Sound acquisition method, device, equipment and system
CN110767221A (en) * 2018-07-26 2020-02-07 珠海格力电器股份有限公司 Household appliance and method for determining control authority
CN109461443A (en) * 2018-09-28 2019-03-12 广州智伴人工智能科技有限公司 A kind of no key opening device
CN110210196B (en) * 2019-05-08 2023-01-06 北京地平线机器人技术研发有限公司 Identity authentication method and device
CN110186171B (en) * 2019-05-30 2021-09-10 广东美的制冷设备有限公司 Air conditioner, method of controlling the same, and computer-readable storage medium
CN110223686A (en) * 2019-05-31 2019-09-10 联想(北京)有限公司 Audio recognition method, speech recognition equipment and electronic equipment
CN111276142B (en) * 2020-01-20 2023-04-07 北京声智科技有限公司 Voice wake-up method and electronic equipment
CN112770029B (en) * 2020-12-30 2022-02-25 国家电网有限公司客户服务中心 Intelligent device

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005114576A1 (en) * 2004-05-21 2005-12-01 Asahi Kasei Kabushiki Kaisha Operation content judgment device
KR101199349B1 (en) * 2004-08-27 2012-11-09 엘지전자 주식회사 Mobile phone having image communication function
CN100524465C (en) * 2006-11-24 2009-08-05 北京中星微电子有限公司 A method and device for noise elimination
CN101101752B (en) * 2007-07-19 2010-12-01 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101833624B (en) * 2010-05-05 2014-12-10 中兴通讯股份有限公司 Information machine and access control method thereof
US8395653B2 (en) * 2010-05-18 2013-03-12 Polycom, Inc. Videoconferencing endpoint having multiple voice-tracking cameras
CN102196333B (en) * 2010-12-16 2013-12-25 宁波三维技术有限公司 Long-distance sound pickup device for video positioning
CN103167149A (en) * 2012-09-20 2013-06-19 深圳市金立通信设备有限公司 System and method of safety of mobile phone based on face recognition
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
US9414153B2 (en) * 2014-05-08 2016-08-09 Panasonic Intellectual Property Management Co., Ltd. Directivity control apparatus, directivity control method, storage medium and directivity control system
CN104202694B (en) * 2014-07-31 2018-03-13 广东美的制冷设备有限公司 The orientation method and system of voice pick device
CN104361638A (en) * 2014-11-13 2015-02-18 安徽省新方尊铸造科技有限公司 Highway tolling system based on facial recognition technology
CN105263052A (en) * 2015-10-13 2016-01-20 微鲸科技有限公司 Audio-video push method and system based on face identification
CN105898635B (en) * 2016-04-26 2019-02-12 宁波桑德纳电子科技有限公司 A kind of sound pick up equipment that outdoor uses at a distance
CN105915798A (en) * 2016-06-02 2016-08-31 北京小米移动软件有限公司 Camera control method in video conference and control device thereof
CN106346487B (en) * 2016-08-25 2018-09-21 威仔软件科技(苏州)有限公司 Interactive VR sand table show robot

Also Published As

Publication number Publication date
CN107123423A (en) 2017-09-01

Similar Documents

Publication Publication Date Title
CN107123423B (en) Voice pickup device and multimedia equipment
CN107534725B (en) Voice signal processing method and device
CN104012074B (en) Intelligent audio and Video capture system for data handling system
EP1738567B1 (en) Glasses frame with integrated acoustic communication system for communication with a mobile phone and respective method
US11277686B2 (en) Electronic device with audio zoom and operating method thereof
CN109318243B (en) Sound source tracking system and method of vision robot and cleaning robot
CN108735226B (en) Voice acquisition method, device and equipment
US20160094812A1 (en) Method And System For Mobile Surveillance And Mobile Infant Surveillance Platform
CN108877787A (en) Audio recognition method, device, server and storage medium
CN109104683B (en) Method and system for correcting phase measurement of double microphones
WO2017112070A1 (en) Controlling audio beam forming with video stream data
CN111477206A (en) Noise reduction method and device for vehicle-mounted environment, electronic equipment and storage medium
CN111251307A (en) Voice acquisition method and device applied to robot and robot
CN112925235A (en) Sound source localization method, apparatus and computer-readable storage medium at the time of interaction
CN110197671A (en) Orient sound pick-up method, sound pick-up outfit and storage medium
WO2020034207A1 (en) Photographing control method and controller
WO2022105571A1 (en) Speech enhancement method and apparatus, and device and computer-readable storage medium
CN114167356A (en) Sound source positioning method and system based on polyhedral microphone array
CN113539288A (en) Voice signal denoising method and device
CN105451139A (en) Sound signal processing method and device, and mobile terminal
CN111103807A (en) Control method and device for household terminal equipment
CN113851143A (en) Pickup noise reduction method and voice air conditioner
JP2021103881A (en) Information processing device, control method, and program
CN111681668A (en) Acoustic imaging method and terminal equipment
CN112104964B (en) Control method and control system of following type sound amplification robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant