CN107123423A - Voice pick device and multimedia equipment - Google Patents

Voice pick device and multimedia equipment Download PDF

Info

Publication number
CN107123423A
CN107123423A CN201710423629.0A CN201710423629A CN107123423A CN 107123423 A CN107123423 A CN 107123423A CN 201710423629 A CN201710423629 A CN 201710423629A CN 107123423 A CN107123423 A CN 107123423A
Authority
CN
China
Prior art keywords
voice
unit
face
collecting unit
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710423629.0A
Other languages
Chinese (zh)
Other versions
CN107123423B (en
Inventor
于豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ismartv Network Technologies Co ltd
Original Assignee
Whaley Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Whaley Technology Co Ltd filed Critical Whaley Technology Co Ltd
Priority to CN201710423629.0A priority Critical patent/CN107123423B/en
Publication of CN107123423A publication Critical patent/CN107123423A/en
Application granted granted Critical
Publication of CN107123423B publication Critical patent/CN107123423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Studio Devices (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention provides a kind of voice pick device and multimedia equipment.Described device includes:Image acquisition units for gathering image;It is electrically connected with to carry out the image collected the face identification unit of recognition of face with image acquisition units;Voice collecting unit for gathering voice signal;It is connected to adjust the steering adjustment unit of voice collecting unit direction with voice collecting unit;With image acquisition units, face identification unit, voice collecting unit, the processing and control element (PCE) for turning to adjustment unit electric connection.When there is face in processing and control element (PCE), control turns to the orientation that adjustment unit makes voice collecting unit be aligned where the face during face identification unit recognizes the image of collection, and control voice collecting unit is acquired to the voice signal in orientation where face.Described device can be oriented voice pickup, reduction outside noise interference according to the face location auto-steering of user to the voice signal that user sends.

Description

Voice pick device and multimedia equipment
Technical field
The present invention relates to audio pickup technical field, in particular to a kind of voice pick device and multimedia equipment.
Background technology
With continuing to develop for audio pickup technology, the application of audio pickup technology is more extensive.But with regard to audio pickup Technology, which itself, even has many technical problems, needs solution.By taking voice pickup technology as an example, for now, city Although the voice signal that the voice pick device circulated on face can be sent to user carries out voice pickup, sent out to user While the voice signal gone out is picked up, generally also voice pickup can will be carried out to substantial amounts of outside noise, pick up voice It is mingled with substantial amounts of outside noise in the voice signal that equipment is picked up, corresponding speech discrimination accuracy is not high, voice pickup Distance it is also very short.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, it is an object of the invention to provide a kind of voice pick device and many Media device, the voice pick device and multimedia equipment can be sent out user according to the face location auto-steering of user The voice signal gone out is oriented voice pickup, and reduction outside noise interference improves corresponding speech discrimination accuracy and voice Pick up distance.
For voice pick device, preferred embodiments of the present invention provide a kind of voice pick device.Described device bag Include:
Image acquisition units for gathering image;
It is electrically connected with described image collecting unit, the image for being collected to image acquisition units carries out recognition of face Face identification unit;
Voice collecting unit for gathering voice signal;
It is connected with the voice collecting unit, the steering adjustment unit for adjusting the voice collecting unit direction;And
With described image collecting unit, face identification unit, voice collecting unit, the place for turning to adjustment unit electric connection Manage control unit;
The processing and control element (PCE) is in the image that the face identification unit recognizes the collection of described image collecting unit When there is face, the operation for turning to adjustment unit is controlled so that in voice collecting unit alignment image where face Orientation, and control the voice collecting unit to be acquired the voice signal in orientation where face.
In preferred embodiments of the present invention, said apparatus also includes being used to believe the noise in described device surrounding enviroment Number Noise Acquisition unit being acquired;
The processing and control element (PCE) is electrically connected with the Noise Acquisition unit, to be gathered according to the Noise Acquisition unit To the voice signal that is collected to the voice collecting unit of noise signal carry out except processing of making an uproar, obtain except the voice after making an uproar is believed Breath.
In preferred embodiments of the present invention, said apparatus also includes the network communication unit for being used to carry out data interaction;
The network communication unit is electrically connected with the processing and control element (PCE), and described device passes through the network service list Member is connected with server communication, and the voice messaging removed after making an uproar that the processing and control element (PCE) is obtained is sent into the server Speech recognition, or the reception server are carried out to carrying out the control instruction obtained after speech recognition except the voice messaging after making an uproar.
In preferred embodiments of the present invention, said apparatus also includes the voice recognition unit for being used to carry out speech recognition;
The voice recognition unit is electrically connected with the processing and control element (PCE), with what is obtained to the processing and control element (PCE) Except the voice messaging after making an uproar carries out speech recognition, corresponding control instruction is obtained.
In preferred embodiments of the present invention, the face that above-mentioned processing and control element (PCE) is recognized in the face identification unit Number for it is multiple when, obtain the corresponding control authority of each face, control the steering adjustment unit to drive the voice to adopt Collect the orientation where unit towards control authority highest face, adopted with the voice signal to orientation where the face Collection.
In preferred embodiments of the present invention, above-mentioned voice collecting unit includes at least one phonetic sampling microphone, institute Stating Noise Acquisition unit includes at least one noise samples microphone, and at least one described phonetic sampling microphone coordinates at least one One microphone array of the individual noise samples microphone formation, with respectively to the voice signal and the voice in orientation where face Noise signal in pick device surrounding enviroment is acquired.
In preferred embodiments of the present invention, above-mentioned processing and control element (PCE) includes audio frequency process subelement;
The audio frequency process subelement is used for the noise signal collected to the Noise Acquisition unit and the voice is adopted The voice signal that collects of collection unit is amplified after processing, and by the noise signal after amplification carry out after anti-phase processing with amplification Voice signal afterwards carries out mixing superposition, to eliminate the noise signal in the voice signal, obtains removing the voice messaging after making an uproar.
In preferred embodiments of the present invention, above-mentioned sliding part includes the sliding block being contained in the chute, the engaging Component is fixedly connected with the sliding block, so that the engaging component can be relative to the lamp bar loading plate along away from the support column Direction slide.
In preferred embodiments of the present invention, said apparatus also includes being connected with described image collecting unit, for controlling The rotation control unit in the IMAQ direction of described image collecting unit.
In preferred embodiments of the present invention, above-mentioned processing and control element (PCE) recognizes IMAQ list in face identification unit When there is face in the image of member collection, the facial information of the face to recognizing is handled, and obtains mouth in the face Corresponding orientation, and the steering adjustment unit is controlled according to the corresponding orientation of the mouth, make the voice collecting unit pair The corresponding orientation of the standard mouth carries out the collection of voice signal.
For multimedia equipment, preferred embodiments of the present invention provide a kind of multimedia equipment.The multimedia equipment Including above-mentioned voice pick device, the voice signal that the multimedia equipment is collected to the voice pick device carries out language Sound is recognized, obtains the control instruction matched with the voice signal, and perform corresponding operation according to the control instruction.
In terms of existing technologies, preferred embodiments of the present invention are provided voice pick device and multimedia equipment tool There is following beneficial effect:The voice pick device and multimedia equipment can according to the face location auto-steering of user pair The voice signal that user sends is oriented voice pickup, and reduction outside noise interference improves corresponding speech discrimination accuracy And voice pickup distance.Specifically, the voice pick device gathers image by image acquisition units;By with IMAQ The face identification unit that unit is electrically connected with carries out recognition of face to the image collected;Voice is carried out by voice collecting unit The collection of signal;Voice collecting unit direction is adjusted by the steering adjustment unit being connected with voice collecting unit;It is logical Cross and image acquisition units, face identification unit, voice collecting unit, the processing control list for turning to adjustment unit electric connection Member, when there is face in the image that face identification unit recognizes image acquisition units collection, control turns to adjustment unit The orientation so that in voice collecting unit alignment image where face is run, and control voice collecting unit is to orientation where face Voice signal be acquired so that realize the voice signal sent to user orientation pickup, reduction outside noise interference, carry High corresponding speech discrimination accuracy and voice pickup distance.
To enable the above objects, features and advantages of the present invention to become apparent, present pre-ferred embodiments cited below particularly, And coordinate appended accompanying drawing, it is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be attached to what is used required in embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore is not construed as pair The restriction of claims, for those of ordinary skill in the art, on the premise of not paying creative work, Other related accompanying drawings can also be obtained according to these accompanying drawings.
The block diagram for the voice pick device that Fig. 1 provides for first embodiment of the invention.
The block diagram for the voice pick device that Fig. 2 provides for second embodiment of the invention.
The block diagram for the voice pick device that Fig. 3 provides for third embodiment of the invention.
The block diagram for the voice pick device that Fig. 4 provides for fourth embodiment of the invention.
The block diagram for the voice pick device that Fig. 5 provides for fifth embodiment of the invention.
Icon:100- voice pick devices;110- image acquisition units;120- face identification units;130- voice collectings Unit;140- turns to adjustment unit;150- processing and control element (PCE)s;160- Noise Acquisition units;170- network communication units;180- Voice recognition unit;190- rotates control unit;151- audio frequency process subelements.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.The present invention implementation being generally described and illustrated herein in the accompanying drawings The component of example can be arranged and designed with a variety of configurations.
Therefore, the detailed description of embodiments of the invention below to providing in the accompanying drawings is not intended to limit claimed The scope of the present invention, but be merely representative of the present invention selected embodiment.Based on the embodiment in the present invention, this area is common The every other embodiment that technical staff is obtained under the premise of creative work is not made, belongs to the model that the present invention is protected Enclose.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined in individual accompanying drawing, then it further need not be defined and explained in subsequent accompanying drawing.
In the description of the invention, it is necessary to illustrate, unless otherwise clearly defined and limited, term " setting ", " peace Dress ", " connected ", " connection " should be interpreted broadly, for example, it may be fixedly connected or be detachably connected, or integratedly Connection;Can be mechanical connection or electrical connection;Can be joined directly together, can also be indirectly connected to by intermediary, It can be the connection of two element internals.For the ordinary skill in the art, above-mentioned art can be understood with concrete condition The concrete meaning of language in the present invention.In addition, term " first ", " second ", " the 3rd " etc. are only used for distinguishing description, and it can not manage Solve to indicate or imply relative importance.
In the description of the invention, in addition it is also necessary to explanation, the orientation of instruction such as term " on ", " under ", "left", "right" or Position relationship be based on orientation shown in the drawings or position relationship, or the orientation usually put when using of the invention product or Position relationship, is for only for ease of the description present invention and simplifies description, rather than indicate or imply that the device or element of meaning must There must be specific orientation, with specific azimuth configuration and operation, therefore be not considered as limiting the invention.
A kind of voice signal that can be sent user according to the face location auto-steering of user how is provided to carry out Voice pickup is oriented, reduction outside noise interference, the voice for improving corresponding speech discrimination accuracy and voice pickup distance is picked up Device and multimedia equipment are taken, is the technical problem for being badly in need of solving to those skilled in the art.
Below in conjunction with the accompanying drawings, some embodiments of the present invention are elaborated.It is following in the case where not conflicting Feature in embodiment and embodiment can be mutually combined.
First embodiment:
Fig. 1 is refer to, is the block diagram for the voice pick device 100 that first embodiment of the invention is provided.In this hair In bright embodiment, the voice pick device 100 is used to be oriented the voice signal that user sends pickup, improves corresponding Speech discrimination accuracy.The voice pick device 100 includes image acquisition units 110, face identification unit 120, voice and adopted Collect unit 130, turn to adjustment unit 140 and processing and control element (PCE) 150.
In embodiments of the present invention, described image collecting unit 110 is used to gather image.Described image collecting unit 110 Including a camera, described image collecting unit 110 is entered by the camera to the surrounding enviroment of voice pick device 100 Row IMAQ, with the particular location for the user of service for determining the voice pick device 100, is easy to implement orientation voice and picks up Take.
In the present embodiment, the camera can be environmental information progress IMAQ regularly to specific direction, Can also be the environmental information progress IMAQ according to demand to different directions, particular situation can be picked up by the voice and filled The user of service or manufacturer for putting 100 carry out different settings according to different demands.In the present embodiment, the camera It may be, but not limited to, digital camera, simulation shooting are first-class.
In embodiments of the present invention, the face identification unit 120 is used for what described image collecting unit 110 was collected Image carries out recognition of face, with the orientation where determining the user of service of voice pick device 100 in the picture.Specifically, exist In the present embodiment, the face identification unit 120 is electrically connected with described image collecting unit 110, to be gathered to described image The image that unit 110 is collected carries out recognition of face.In the present embodiment, the face identification unit 120 recognize it is described When there is face in the image that image acquisition units 110 are gathered, by being analyzed and processed to described image from described image Facial information corresponding with the face, and the position according to the facial information and the face in described image are obtained, Obtain the orientation and the face pair at the corresponding user of service of face place in the surrounding enviroment of voice pick device 100 The orientation answered.
In the present embodiment, the face identification unit 120 can be connected by network with high in the clouds calculation server, with Close the image progress recognition of face that the computing capability of the high in the clouds calculation server is collected to image acquisition units 110;Also may be used With the software function module and/or hardware module that are used to carry out recognition of face only included by the face identification unit 120 Realize the recognition of face carried out to described image.
In embodiments of the present invention, the voice collecting unit 130 is used for collection voice signal.The voice collecting list Member 130 can have face in the image that the face identification unit 120 recognizes the collection of described image collecting unit 110 When, the voice signal in orientation where face is acquired, to realize orientation voice pickup, reduction outside noise interference.
In the present embodiment, the voice collecting unit 130 includes at least one phonetic sampling microphone, and the voice is adopted Collect unit 130 corresponding to the user of service of the voice pick device 100 by phonetic sampling microphone at least one described The voice signal in orientation carries out voice collecting where face.In the present embodiment, the phonetic sampling microphone is preferably Gao Ling The microphone of sensitivity high directivity, the microphone may be, but not limited to, moving coil microphone, condenser microphone, electromagnetism Formula microphone, piezoelectric microphone and semiconductor microphone etc..
In embodiments of the present invention, the adjustment unit 140 that turns to is used to enter the direction of the voice collecting unit 130 Row adjustment, so that the voice collecting unit 130 can be directed at the face identification unit 120 and recognize described image collection list Orientation in the image of the collection of member 110 where face, the voice in orientation where making 130 pairs of the voice collecting unit face Signal is acquired, to realize the orientation voice pickup to the corresponding user of service of the face.In the present embodiment, described turn It is connected to adjustment unit 140 with the voice collecting unit 130, to adjust the direction of voice collecting unit 130.
In the present embodiment, the adjustment unit 140 that turns to includes being used to drive the language in the voice collecting unit 130 Sound sampling microphone carries out turning to the steering assembly of adjustment and the course changing control component for controlling the steering assembly.Described turn It is directly connected to component with the phonetic sampling microphone in the voice collecting unit 130, so that the voice collecting unit 130 In phonetic sampling microphone can all directions up and down carry out direction adjustment.The course changing control component with it is described Steering assembly is electrically connected with, and with the azimuth information where the face that is recognized according to the face identification unit 120, control is described Runner assembly drives the phonetic sampling microphone in the voice collecting unit 130 to the voice signal in orientation where the face Carry out voice collecting.
In embodiments of the present invention, the processing and control element (PCE) 150 is used to handle signal, and according to result Other unit modules in the voice pick device 100 are controlled.Specifically, in the present embodiment, the processing control Unit 150 processed and described image collecting unit 110, face identification unit 120, voice collecting unit 130 and steering adjustment unit 140 are electrically connected with, to exist in the image that the face identification unit 120 recognizes the collection of described image collecting unit 110 During face, according to the corresponding azimuth information control operation for turning to adjustment unit 140 of the face so that the voice is adopted Collect the orientation where face in the alignment image of unit 130, and control the voice collecting unit 130 to the language in orientation where face Message number is acquired, and realizes and the orientation voice of the corresponding user of service of the face is picked up.
In the present embodiment, the processing and control element (PCE) 150 includes a memory, and the processing and control element (PCE) 150 passes through The memory can enter to the facial information and corresponding control authority of the specific user of service of the voice pick device 100 Row storage, during using the number of face that is recognized in the face identification unit 120 as multiple, chooses control in the face recognized Orientation where authority highest face processed as the voice collecting unit 130 should direction orientation.Specifically, when described When the number for the face that face identification unit 120 is recognized is multiple, the processing and control element (PCE) 150 will be by the recognition of face The facial information for each face that unit 120 is recognized and the facial information for the specific user of service being stored in the memory Matched.When the match is successful, facial information and corresponding control that the face that the match is successful can be by specific user of service Contact between authority finds corresponding control authority in the memory;When the match is successful, what the match is successful The corresponding control authority of face will be defaulted as minimum control authority;When the control authority of each face recognized is most During low control authority, the processing and control element (PCE) 150 will randomly select a face as control authority from each face Highest face.The processing and control element (PCE) 150 is when getting the corresponding orientation of control authority highest face, and control is described Steering adjustment unit 140 drives the orientation where the voice collecting unit 130 towards control authority highest face, with to institute The voice signal in orientation carries out voice collecting where stating face, realizes corresponding orientation voice pickup.In the present embodiment, it is described Memory may be, but not limited to, random access memory, read-only storage, programmable read only memory, erasable read-only to deposit Reservoir, electricallyerasable ROM (EEROM) etc..
In the present embodiment, the processing and control element (PCE) 150 recognizes described image in the face identification unit 120 and adopted When there is face in the image that collection unit 110 is gathered, the facial information of the face to recognizing is handled, the people is obtained The corresponding orientation of mouth in face, and institute's predicate is driven according to the corresponding orientation control steering adjustment unit 140 of the mouth Sound collecting unit 130 is directed at the orientation where the mouth of the face, makes 130 pairs of the voice collecting unit mouth correspondence The voice signal in orientation is acquired.
In the present embodiment, the processing and control element (PCE) 150 can be real by the camera in described image collecting unit 110 Now to the typing of the facial information of specific user of service, the control to specific user of service can be realized by external input equipment The typing of authority.The processing and control element (PCE) 150 also can be by real-time performance to the facial information of specific user of service and correspondingly Control authority typing.Specific typing mode can carry out different settings according to demand.
Second embodiment:
Fig. 2 is refer to, is the block diagram for the voice pick device 100 that second embodiment of the invention is provided.In this hair In bright embodiment, shape design, operation principle and the technique effect of acquirement of the voice pick device 100 that second embodiment is provided Similar with the voice pick device 100 that first embodiment is provided, difference is, the voice pickup dress that second embodiment is provided Noise Acquisition unit 160 can also be included by putting 100, and the processing and control element (PCE) 150 also includes being used to carry out audio signal The audio frequency process subelement 151 of processing.
In embodiments of the present invention, the Noise Acquisition unit 160 is used for the surrounding enviroment of voice pick device 100 In noise signal be acquired.Specifically, the processing and control element (PCE) 150 is electrically connected with the Noise Acquisition unit 160, Entered with the voice signal in the face correspondence orientation recognized in 130 pairs of the voice collecting unit face identification unit 120 During row collection, the noise signal in 160 pairs of the Noise Acquisition unit surrounding enviroment of voice pick device 100 is controlled to carry out Collection, and the language that the noise signal collected according to the Noise Acquisition unit 160 is collected to the voice collecting unit 130 Message number is carried out except processing of making an uproar, and obtains removing the voice messaging after making an uproar.
In the present embodiment, the Noise Acquisition unit 160 includes at least one noise samples microphone, and the noise is adopted Collection unit 160 is acquired by noise samples microphone at least one described to the noise signal.The voice collecting list Made an uproar described at least one at least one described Noise Acquisition unit 160 of phonetic sampling microphone cooperation in member 130 Sound sampling microphone one microphone array of formation, for being acquired to the audio signal in air.Wherein, the audio signal Noise signal in voice signal and the surrounding enviroment of voice pick device 100 including orientation where face.In this implementation In a kind of embodiment of example, the number of the noise samples microphone is preferably even number, the noise samples microphone point The both sides of the phonetic sampling microphone are not arranged on.
In the present embodiment, the direction of the noise samples microphone can be fixed, the noise samples microphone Direction can be directed towards the front of the voice pick device 100 or towards the voice pick device 100 Outside deflects the direction of certain angle, specifically towards user of service that can be by the voice pick device 100 or factory Family carries out different settings as needed.Wherein, the noise samples microphone may be, but not limited to, moving coil microphone, Condenser microphone, electromagnetic microphone, piezoelectric microphone and semiconductor microphone etc..
In embodiments of the present invention, the processing and control element (PCE) 150 passes through described 151 pairs of institute's predicates of audio frequency process subelement The voice signal that sound collecting unit 130 is collected is carried out except processing of making an uproar.Specifically, in an embodiment of the present embodiment, The audio frequency process subelement 151 is adopted getting voice signal and the noise that the voice collecting unit 130 collects During the noise signal that collection unit 160 is collected, the voice signal that is collected respectively to the voice collecting unit 130 and described make an uproar The noise signal that sound collecting unit 160 is collected is amplified processing, and the noise signal after amplification is carried out into anti-phase processing, will Noise signal after processing carries out mixing with the voice signal after amplification and is superimposed, to eliminate the letter of the noise in the voice signal Number, obtain removing the voice messaging after making an uproar.In the another embodiment of the present embodiment, the audio frequency process subelement 151 The voice that can be collected in the noise signal and the voice collecting unit 130 collected to the Noise Acquisition unit 160 Signal is amplified after processing, is filtered processing to the voice signal after amplification according to the noise signal after amplification, is removed Voice messaging after making an uproar.
3rd embodiment:
Fig. 3 is refer to, is the block diagram for the voice pick device 100 that third embodiment of the invention is provided.In this hair In bright embodiment, shape design, operation principle and the technique effect of acquirement of the voice pick device 100 that 3rd embodiment is provided Similar with the voice pick device 100 that second embodiment is provided, difference is, the voice pickup dress that 3rd embodiment is provided Voice recognition unit 180 can also be included by putting 100.
In embodiments of the present invention, the voice recognition unit 180 is used to carry out speech recognition to voice signal.Specifically Ground, the voice recognition unit 180 is electrically connected with the processing and control element (PCE) 150, to be obtained to the processing and control element (PCE) 150 To carry out speech recognition except the voice messaging after making an uproar, obtain corresponding control instruction.Wherein, the control instruction is used for bag The electronic equipment for including predicate sound pick device 100 is controlled, and the control instruction is with removing the voice messaging after making an uproar It is mutually corresponding.
Fourth embodiment:
Fig. 4 is refer to, is the block diagram for the voice pick device 100 that fourth embodiment of the invention is provided.In this hair In bright embodiment, shape design, operation principle and the technique effect of acquirement of the voice pick device 100 that fourth embodiment is provided Similar with the voice pick device 100 that second embodiment is provided, difference is, the voice pickup dress that fourth embodiment is provided Network communication unit 170 can also be included by putting 100.
In embodiments of the present invention, the network communication unit 170 is used to carry out data interaction.The network communication unit 170 are electrically connected with the processing and control element (PCE) 150, the voice pick device 100 by the network communication unit 170 with Server communication is connected, and the voice messaging removed after making an uproar that the processing and control element (PCE) 150 is obtained is sent into the server Speech recognition, or the reception server are carried out to carrying out the control instruction obtained after speech recognition except the voice messaging after making an uproar, Wherein, the control instruction is used for the electronic equipment for controlling to include the voice pick device 100, and the control instruction is with removing Voice messaging after making an uproar mutually is corresponded to.
5th embodiment:
Fig. 5 is refer to, is the block diagram for the voice pick device 100 that fifth embodiment of the invention is provided.In this hair In bright embodiment, shape design, operation principle and the technique effect of acquirement of the voice pick device 100 that the 5th embodiment is provided Similar with the voice pick device 100 that fourth embodiment is provided, difference is, the voice pickup dress that the 5th embodiment is provided Putting 100 can also include rotating control unit 190.
In embodiments of the present invention, the rotation control unit 190 is connected with described image collecting unit 110, to control The IMAQ direction of described image collecting unit 110.Specifically, the rotation control unit 190 includes described for driving Camera in image acquisition units 110 carries out the runner assembly of IMAQ direction adjustment and for controlling the runner assembly Rotation control assembly.The runner assembly is directly connected to the camera in described image collecting unit 110, described to rotate control Component processed is electrically connected with the runner assembly, to control the camera in described image collecting unit 110 to be rotated according to default Strategically rotated, realize the IMAQ to the environmental information of different directions.
In the present invention, the embodiment of the present invention also provides a kind of multimedia equipment.The multimedia equipment includes above-mentioned Any one embodiment is provided in first embodiment, second embodiment, 3rd embodiment, fourth embodiment and the 5th embodiment Voice pick device 100.Multimedia equipment orientation according to where the face of user of service to the voice pick device 100 The voice signal collected carries out speech recognition, obtains the control instruction matched with the voice signal, and according to the control Instruction performs corresponding operation.In the present embodiment, the multimedia equipment may be, but not limited to, intelligent sound box, intelligence electricity Depending on, intelligent washing machine, intelligent refrigerator and intelligent robot etc..
In summary, in the voice pick device and multimedia equipment that preferred embodiments of the present invention are provided, institute's predicate Sound pick device and multimedia equipment can enter according to the face location auto-steering of user to the voice signal that user sends Row orientation voice pickup, reduction outside noise interference improves corresponding speech discrimination accuracy and voice pickup distance.Specifically Ground, the voice pick device gathers image by image acquisition units;Pass through the face being electrically connected with image acquisition units Recognition unit carries out recognition of face to the image collected;The collection of voice signal is carried out by voice collecting unit;By with The steering adjustment unit of voice collecting unit connection is adjusted to voice collecting unit direction;By with image acquisition units, Face identification unit, voice collecting unit, the processing and control element (PCE) for turning to adjustment unit electric connection, know in face identification unit When there is face in the image for being clipped to image acquisition units collection, control turns to the operation of adjustment unit so that voice collecting unit The orientation where face in image is directed at, and control voice collecting unit is acquired to the voice signal in orientation where face, So as to realize the orientation pickup of the voice signal sent to user, reduction outside noise interference improves corresponding speech recognition accurate Exactness and voice pickup distance.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

1. a kind of voice pick device, it is characterised in that described device includes:
Image acquisition units for gathering image;
It is electrically connected with described image collecting unit, the image for being collected to image acquisition units carries out the people of recognition of face Face recognition unit;
Voice collecting unit for gathering voice signal;
It is connected with the voice collecting unit, the steering adjustment unit for adjusting the voice collecting unit direction;And
With described image collecting unit, face identification unit, voice collecting unit, the processing control for turning to adjustment unit electric connection Unit processed;
The processing and control element (PCE) exists in the image that the face identification unit recognizes the collection of described image collecting unit During face, the operation for turning to adjustment unit is controlled so that the voice collecting unit is directed at the side where face in image Position, and control the voice collecting unit to be acquired the voice signal in orientation where face.
2. device according to claim 1, it is characterised in that described device also includes being used for described device surrounding enviroment In the Noise Acquisition unit that is acquired of noise signal;
The processing and control element (PCE) is electrically connected with the Noise Acquisition unit, with what is collected according to the Noise Acquisition unit The voice signal that noise signal is collected to the voice collecting unit is carried out except processing of making an uproar, and obtains removing the voice messaging after making an uproar.
3. device according to claim 2, it is characterised in that described device also includes the network for being used to carry out data interaction Communication unit;
The network communication unit and the processing and control element (PCE) are electrically connected with, described device by the network communication unit with Server communication is connected, and the voice messaging removed after making an uproar that the processing and control element (PCE) is obtained is sent into the server is carried out Speech recognition, or the reception server except the voice messaging after making an uproar to carrying out the control instruction obtained after speech recognition.
4. device according to claim 2, it is characterised in that described device also includes the voice for being used to carry out speech recognition Recognition unit;
The voice recognition unit and the processing and control element (PCE) are electrically connected with, to be obtained to the processing and control element (PCE) except making an uproar Voice messaging afterwards carries out speech recognition, obtains corresponding control instruction.
5. device according to claim 1, it is characterised in that the processing and control element (PCE) is known in the face identification unit When the number for the face being clipped to is multiple, the corresponding control authority of each face is obtained, controls the steering adjustment unit to drive Orientation where the voice collecting unit towards control authority highest face, to believe the voice in orientation where the face Number it is acquired.
6. device according to claim 2, it is characterised in that the voice collecting unit includes at least one phonetic sampling Microphone, the Noise Acquisition unit includes at least one noise samples microphone, at least one described phonetic sampling microphone Coordinate at least one one microphone array of the noise samples microphone formation, with respectively to the voice signal in orientation where face It is acquired with the noise signal in the voice pick device surrounding enviroment.
7. device according to claim 2, it is characterised in that the processing and control element (PCE) includes audio frequency process subelement;
The audio frequency process subelement is used for the noise signal collected to the Noise Acquisition unit and the voice collecting list The voice signal that member is collected is amplified after processing, and by the noise signal after amplification carry out after anti-phase processing with after amplification Voice signal carries out mixing superposition, to eliminate the noise signal in the voice signal, obtains removing the voice messaging after making an uproar.
8. device according to claim 1, it is characterised in that described device also includes connecting with described image collecting unit Connect, the rotation control unit in the IMAQ direction for controlling described image collecting unit.
9. the device according to any one in claim 1-8, it is characterised in that the processing and control element (PCE) is known in face Other unit is recognized in the image of image acquisition units collection when there is face, at the facial information of the face recognized Reason, obtains the corresponding orientation of mouth in the face, and controls the steering adjustment unit according to the corresponding orientation of the mouth, The voice collecting unit is set to be directed at the collection that the corresponding orientation of the mouth carries out voice signal.
10. a kind of multimedia equipment, it is characterised in that the multimedia equipment is included in claim 1-9 described in any one Voice pick device, voice signal that the multimedia equipment is collected to the voice pick device carries out speech recognition, The control instruction matched with the voice signal is obtained, and corresponding operation is performed according to the control instruction.
CN201710423629.0A 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment Active CN107123423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710423629.0A CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710423629.0A CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Publications (2)

Publication Number Publication Date
CN107123423A true CN107123423A (en) 2017-09-01
CN107123423B CN107123423B (en) 2021-05-18

Family

ID=59730052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710423629.0A Active CN107123423B (en) 2017-06-07 2017-06-07 Voice pickup device and multimedia equipment

Country Status (1)

Country Link
CN (1) CN107123423B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864430A (en) * 2017-11-03 2018-03-30 杭州聚声科技有限公司 A kind of sound wave direction propagation control system and its control method
CN108615534A (en) * 2018-04-04 2018-10-02 百度在线网络技术(北京)有限公司 Far field voice de-noising method and system, terminal and computer readable storage medium
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN109461443A (en) * 2018-09-28 2019-03-12 广州智伴人工智能科技有限公司 A kind of no key opening device
CN109696658A (en) * 2017-10-23 2019-04-30 京东方科技集团股份有限公司 Acquire equipment, sound collection method, audio source tracking system and method
CN110186171A (en) * 2019-05-30 2019-08-30 广东美的制冷设备有限公司 Air conditioner and its control method and computer readable storage medium
CN110210196A (en) * 2019-05-08 2019-09-06 北京地平线机器人技术研发有限公司 Identity identifying method and device
CN110223686A (en) * 2019-05-31 2019-09-10 联想(北京)有限公司 Audio recognition method, speech recognition equipment and electronic equipment
CN110767228A (en) * 2018-07-25 2020-02-07 杭州海康威视数字技术股份有限公司 Sound acquisition method, device, equipment and system
CN110767221A (en) * 2018-07-26 2020-02-07 珠海格力电器股份有限公司 Household appliance and method for determining control authority
CN111276142A (en) * 2020-01-20 2020-06-12 北京声智科技有限公司 Voice awakening method and electronic equipment
CN111933136A (en) * 2020-08-18 2020-11-13 南京奥拓电子科技有限公司 Auxiliary voice recognition control method and device
CN112770029A (en) * 2020-12-30 2021-05-07 国家电网有限公司客户服务中心 Method for improving personnel energy efficiency based on face recognition technology and intelligent equipment thereof

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005114576A1 (en) * 2004-05-21 2005-12-01 Asahi Kasei Kabushiki Kaisha Operation content judgment device
CN1741606A (en) * 2004-08-27 2006-03-01 乐金电子(中国)研究开发中心有限公司 Mobile communication terminal with image talking function
CN1953059A (en) * 2006-11-24 2007-04-25 北京中星微电子有限公司 A method and device for noise elimination
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101833624A (en) * 2010-05-05 2010-09-15 中兴通讯股份有限公司 Information machine and access control method thereof
CN102196333A (en) * 2010-12-16 2011-09-21 宁波三维技术有限公司 Long-distance sound pickup device for video positioning
CN102256098A (en) * 2010-05-18 2011-11-23 宝利通公司 Videoconferencing endpoint having multiple voice-tracking cameras
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
CN103167149A (en) * 2012-09-20 2013-06-19 深圳市金立通信设备有限公司 System and method of safety of mobile phone based on face recognition
CN104202694A (en) * 2014-07-31 2014-12-10 广东美的制冷设备有限公司 Method and system of orientation of voice pick-up device
CN104361638A (en) * 2014-11-13 2015-02-18 安徽省新方尊铸造科技有限公司 Highway tolling system based on facial recognition technology
US20150326968A1 (en) * 2014-05-08 2015-11-12 Panasonic Intellectual Property Management Co., Ltd. Directivity control apparatus, directivity control method, storage medium and directivity control system
CN105263052A (en) * 2015-10-13 2016-01-20 微鲸科技有限公司 Audio-video push method and system based on face identification
CN105898635A (en) * 2016-04-26 2016-08-24 宁波桑德纳电子科技有限公司 Pickup device for outdoor long-distance use
CN105915798A (en) * 2016-06-02 2016-08-31 北京小米移动软件有限公司 Camera control method in video conference and control device thereof
CN106346487A (en) * 2016-08-25 2017-01-25 威仔软件科技(苏州)有限公司 Interactive VR sand table show robot

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005114576A1 (en) * 2004-05-21 2005-12-01 Asahi Kasei Kabushiki Kaisha Operation content judgment device
CN1741606A (en) * 2004-08-27 2006-03-01 乐金电子(中国)研究开发中心有限公司 Mobile communication terminal with image talking function
CN1953059A (en) * 2006-11-24 2007-04-25 北京中星微电子有限公司 A method and device for noise elimination
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN101833624A (en) * 2010-05-05 2010-09-15 中兴通讯股份有限公司 Information machine and access control method thereof
CN101833624B (en) * 2010-05-05 2014-12-10 中兴通讯股份有限公司 Information machine and access control method thereof
CN102256098A (en) * 2010-05-18 2011-11-23 宝利通公司 Videoconferencing endpoint having multiple voice-tracking cameras
CN102196333A (en) * 2010-12-16 2011-09-21 宁波三维技术有限公司 Long-distance sound pickup device for video positioning
CN103167149A (en) * 2012-09-20 2013-06-19 深圳市金立通信设备有限公司 System and method of safety of mobile phone based on face recognition
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
US20150326968A1 (en) * 2014-05-08 2015-11-12 Panasonic Intellectual Property Management Co., Ltd. Directivity control apparatus, directivity control method, storage medium and directivity control system
CN104202694A (en) * 2014-07-31 2014-12-10 广东美的制冷设备有限公司 Method and system of orientation of voice pick-up device
CN104361638A (en) * 2014-11-13 2015-02-18 安徽省新方尊铸造科技有限公司 Highway tolling system based on facial recognition technology
CN105263052A (en) * 2015-10-13 2016-01-20 微鲸科技有限公司 Audio-video push method and system based on face identification
CN105898635A (en) * 2016-04-26 2016-08-24 宁波桑德纳电子科技有限公司 Pickup device for outdoor long-distance use
CN105915798A (en) * 2016-06-02 2016-08-31 北京小米移动软件有限公司 Camera control method in video conference and control device thereof
CN106346487A (en) * 2016-08-25 2017-01-25 威仔软件科技(苏州)有限公司 Interactive VR sand table show robot

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
K. MEENA ET AL.: "《Local binary patterns and its variants for face recognition》", 《2011 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT)》 *
M. FIALA ET AL.: "《A panoramic video and acoustic beamforming sensor for videoconferencing》", 《THE 3RD IEEE INTERNATIONAL WORKSHOP ON HAPTIC, AUDIO AND VISUAL ENVIRONMENTS AND THEIR APPLICATIONS》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11525883B2 (en) 2017-10-23 2022-12-13 Beijing Boe Technology Development Co., Ltd. Acquisition equipment, sound acquisition method, and sound source tracking system and method
CN109696658A (en) * 2017-10-23 2019-04-30 京东方科技集团股份有限公司 Acquire equipment, sound collection method, audio source tracking system and method
WO2019080705A1 (en) * 2017-10-23 2019-05-02 京东方科技集团股份有限公司 Collection device, sound collection method, and sound source tracking system and method therefor
CN107864430A (en) * 2017-11-03 2018-03-30 杭州聚声科技有限公司 A kind of sound wave direction propagation control system and its control method
CN108615534A (en) * 2018-04-04 2018-10-02 百度在线网络技术(北京)有限公司 Far field voice de-noising method and system, terminal and computer readable storage medium
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN110767228B (en) * 2018-07-25 2022-06-03 杭州海康威视数字技术股份有限公司 Sound acquisition method, device, equipment and system
CN110767228A (en) * 2018-07-25 2020-02-07 杭州海康威视数字技术股份有限公司 Sound acquisition method, device, equipment and system
CN110767221A (en) * 2018-07-26 2020-02-07 珠海格力电器股份有限公司 Household appliance and method for determining control authority
CN109461443A (en) * 2018-09-28 2019-03-12 广州智伴人工智能科技有限公司 A kind of no key opening device
CN110210196A (en) * 2019-05-08 2019-09-06 北京地平线机器人技术研发有限公司 Identity identifying method and device
CN110210196B (en) * 2019-05-08 2023-01-06 北京地平线机器人技术研发有限公司 Identity authentication method and device
CN110186171A (en) * 2019-05-30 2019-08-30 广东美的制冷设备有限公司 Air conditioner and its control method and computer readable storage medium
CN110223686A (en) * 2019-05-31 2019-09-10 联想(北京)有限公司 Audio recognition method, speech recognition equipment and electronic equipment
CN111276142A (en) * 2020-01-20 2020-06-12 北京声智科技有限公司 Voice awakening method and electronic equipment
CN111276142B (en) * 2020-01-20 2023-04-07 北京声智科技有限公司 Voice wake-up method and electronic equipment
CN111933136A (en) * 2020-08-18 2020-11-13 南京奥拓电子科技有限公司 Auxiliary voice recognition control method and device
CN111933136B (en) * 2020-08-18 2024-05-10 南京奥拓电子科技有限公司 Auxiliary voice recognition control method and device
CN112770029A (en) * 2020-12-30 2021-05-07 国家电网有限公司客户服务中心 Method for improving personnel energy efficiency based on face recognition technology and intelligent equipment thereof
CN112770029B (en) * 2020-12-30 2022-02-25 国家电网有限公司客户服务中心 Intelligent device

Also Published As

Publication number Publication date
CN107123423B (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN107123423A (en) Voice pick device and multimedia equipment
CN110135341B (en) Weed identification method and device and terminal equipment
CN102929288B (en) Unmanned aerial vehicle inspection head control method based on visual servo
CN108735226B (en) Voice acquisition method, device and equipment
US10043064B2 (en) Method and apparatus of detecting object using event-based sensor
CN107016347A (en) A kind of body-sensing action identification method, device and system
US11850747B2 (en) Action imitation method and robot and computer readable medium using the same
CN109839384B (en) Visual detector and detection method for detecting defects of micro vibration motor
US12030191B2 (en) Vision-guided picking and placing method, mobile robot and computer-readable storage medium
CN208507181U (en) Voice capture device
CN105824420A (en) Gesture recognition method based on acceleration transducer
CN104699231A (en) Eyeball recognition based automatic adjustment display equipment and automatic adjustment method thereof
CN106003119A (en) Object grabbing method and system for suction type mechanical hand
CN117841041B (en) Mechanical arm combination device based on multi-arm cooperation
CN113034526B (en) Grabbing method, grabbing device and robot
CN102735690A (en) Intelligent high speed online automation detection method based on machine vision, and system thereof
CN117974956A (en) Robot vision acquisition and recognition system and processing method thereof
CN210256167U (en) Intelligent obstacle avoidance system and robot
Gao et al. An automatic assembling system for sealing rings based on machine vision
CN108536156A (en) Target Tracking System and method for tracking target
CN108734098A (en) Human body image recognition methods and device
TW201608528A (en) 3D visual detection system and method for determining if an object enters a zone on demand
CN108335329A (en) Applied to the method for detecting position and device, aircraft in aircraft
CN107483705A (en) Bio-identification pattern open method and Related product
CN112783021A (en) Robot cooperative control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240530

Address after: Room 212, Building 14, No. 350 Xianxia Road, Changning District, Shanghai, 200050

Patentee after: SHANGHAI ISMARTV NETWORK TECHNOLOGIES Co.,Ltd.

Country or region after: China

Address before: 201210 3rd floor, building e, Shangtou Shengyin building, 666 shengxia Road, Pudong New Area, Shanghai

Patentee before: WHALEY TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right