CN108769400A

CN108769400A - A kind of method and device of locating recordings

Info

Publication number: CN108769400A
Application number: CN201810501516.2A
Authority: CN
Inventors: 林萌; 姜南
Original assignee: Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Current assignee: Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2018-11-06

Abstract

The invention discloses a kind of method and devices of locating recordings.This method includes：Obtain the image information of target sound source；According to the location information of described image acquisition of information target sound source；Obtain multichannel recorded message；The acoustic information of target sound source described in the recorded message is identified according to the location information of the target sound source；The acoustic information of the target sound source is handled.The present invention uses above-mentioned technical proposal, the acoustic information of target sound source described in multichannel recorded message is determined according to the location information of target sound source, to realize the locating recordings on target sound source direction, so that under recording mode, no matter how the position of target sound source moves, recording device can Direct Recognition target sound source sound, realize to the locating recordings of target sound source, better recording effect is obtained, user experience is promoted.

Description

A kind of method and device of locating recordings

Technical field

The present invention relates to audio and video process field, more particularly to a kind of method and device of locating recordings.

Background technology

With the rise of network AP P, audio and video live streaming becomes a very hot industry, at present majority mobile terminal All have sound-recording function, such as the recording pen of mobile phone, tablet computer, music player and profession, is being lived with meeting user And the recording demand in work.

For user when using the sound-recording function of mobile terminal, often there are various environmental noises in playback environ-ment, and current The recording technology that mobile phone or other handheld terminals in the market uses in camera shooting is all only with a MIC come when acquiring camera shooting Ambient sound be used as the sound accompaniment of camera shooting and video；Or there are multiple MIC to acquire multidirectional ambient sound message when camera shooting Number, the sound accompaniment that simple synthesis is used as camera shooting and video then is carried out to multi-channel sound signal.The video that both schemes obtain Sound accompaniment can all have the ambient noise that some photographers are not intended to acquisition, the sound on recording photographer assigned direction that can not be oriented Sound influences recording effect.

For the problem, a kind of orientation record type exists in the prior art, utilizes two opposite direction MIC on handheld terminal Video recording recording is carried out with one group of camera, selects reference audio of the recorded audio signals that the MIC away from camera is acquired as de-noising Signal removes the frequency for meeting the reference audio signal in the recorded audio signals of the MIC acquisitions of camera side, to reach camera shooting The purpose of middle orientation recording, but such mode by the microphone of camera side record into sound in there are still institutes in environment Some sound can hear the sound of target sound source due to the sound interference of other sound sources, such as：When by this mode application When under main broadcaster's pattern, which does not distinguish the sound and background music or background noise of main broadcaster person, when main broadcaster person does not have When having close sound pick-up outfit to perform, the sound of main broadcaster person can be interfered by background music or ambient noise, cause voice not loud enough Or it is not clear enough, final recording effect is influenced, which cannot achieve the locating recordings to target sound source, be brought not to user Just.

Invention content

In view of this, the purpose of the embodiment of the present invention is to provide a kind of method and device of locating recordings, it is existing to solve Can not be to target sound source locating recordings in orientation recording device the problem of.

According in a first aspect, provide a kind of method of locating recordings in first embodiment of the invention, including：Obtain mesh Mark the image information of sound source；According to the location information of described image acquisition of information target sound source；Obtain multichannel recorded message；According to The location information of the target sound source identifies the acoustic information of target sound source described in the recorded message；To the target sound source Acoustic information handled.

By using above-mentioned technical proposal, the locating recordings on target sound source locality may be implemented so that recording Under pattern, no matter how the position of target sound source moves, recording device can Direct Recognition target sound source sound, with obtain Better recording effect.

With reference to first aspect, described according to described image acquisition of information target sound in first aspect first embodiment The location information in source includes：Extract the directional information and depth of view information of target sound source in described image information；According to the direction Information and depth of view information calculate the location information of the target source of students.The position of target sound source can be obtained by the above method.

With reference to first aspect, in first aspect first embodiment, the multichannel recorded message includes at least three tunnels.It is logical All sound source informations including target sound source can be collected by crossing the multichannel recorded message.

With reference to first aspect, in first aspect second embodiment, the location information according to the target sound source Identify that the acoustic information of target sound source described in the recorded message includes：Obtain the roads multichannel recorded message Zhong Ge recording letter The phase and signal amplitude of breath；By comparing the phase and signal amplitude difference of above-mentioned each road recorded message, multichannel recording is obtained The location information of institute's sound source in information；By the location information of the sound source of acquisition and according to the mesh of described image acquisition of information The location information of mark sound source is compared, and identifies the information of target sound source in the multichannel recorded message.Pass through the above method It can recognize that the acoustic information from target sound source in multichannel recorded message, realize to target sound source localization of sound.

With reference to first aspect, in the above embodiment of first aspect, the acoustic information of the target sound source is carried out Processing includes：The acoustic information of the target sound source is enhanced, the first recorded message is obtained；And/or to the recording letter Other sound source informations in breath other than the acoustic information of target sound source are weakened, and the second recorded message is obtained.Pass through above-mentioned place Reason can protrude the acoustic information of target sound source in the recorded message, enhance the locating recordings effect to target sound source.

With reference to first aspect, further include according to first recorded message and institute in first aspect the above embodiment It states the second recorded message and generates recording file, to obtain the audio file of locating recordings.

With reference to first aspect, further include that the recording file is corresponding in first aspect the above embodiment Vision signal synchronizes synthesis, to obtain the audio video file of locating recordings.

According to second aspect, an embodiment of the present invention provides a kind of locating recordings device, described device includes：Image information Acquisition module, the image information for obtaining target sound source；Position information acquisition module, for according to described image acquisition of information The location information of target sound source；Recorded message acquisition module, the recorded message for obtaining multichannel microphone；Target sound source sound Information determination module, the sound for identifying target sound source described in the recorded message according to the location information of the target sound source Message ceases；Acoustic information system module is handled for the acoustic information to the target sound source.

By using above-mentioned technical proposal, a kind of dress that the locating recordings on the direction of target location may be implemented is provided Set, which can implement the above-mentioned locating recordings method to target sound source so that under recording mode, no matter target sound source How position moves, recording device can Direct Recognition target sound source sound, to obtain better recording effect.

In conjunction with second aspect, in second aspect first embodiment, the position information acquisition module includes：Direction connects Module is received, the locality of the image information acquisition target sound source according to the target sound source is configured as；Depth of field parameter acquiring Module is configured as the depth of view information of the image information acquisition target sound source according to the target sound source.Believed by above-mentioned position Breath acquisition module can obtain the position of target sound source.

In conjunction with second aspect, in second aspect second embodiment, the target sound source acoustic information determining module packet It includes：Phase and signal amplitude acquisition module；Multichannel recorded message sound source position data obtaining module；Target sound source information identifies mould Block.Multichannel recorded message is compared by using the module, identifies the acoustic information from target sound source.

Description of the drawings

The features and advantages of the present invention can be more clearly understood by reference to attached drawing, attached drawing is schematically without that should manage Solution is carries out any restrictions to the present invention, in the accompanying drawings：

Fig. 1 is a kind of flow diagram of the method for locating recordings that the embodiment of the present invention one provides；

Fig. 2 is that target sound source in recorded message is identified in a kind of method of locating recordings provided by Embodiment 2 of the present invention The flow diagram of acoustic information；

Fig. 3 is a kind of structure diagram of the device for locating recordings that the embodiment of the present invention three provides.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.

Based on the embodiments of the present invention, those skilled in the art are obtained without creative efforts Every other embodiment, shall fall within the protection scope of the present invention.

Embodiment one

Fig. 1 is a kind of flow diagram for the method for locating recordings that the embodiment of the present invention one provides, and this method can be by The device of locating recordings executes, and wherein the device can be realized by software and hardware, can generally be integrated in the terminal, this implementation Mobile terminal concretely terminal devices such as mobile phone, tablet computer and recording pen in example.

As shown in Figure 1, this method includes：

Step 101, the image information for obtaining target sound source.

The image information of target sound source can be obtained by integrated camera on mobile terminals, target sound source is exactly People or the object of target sound are sent out, target sound is the sound for wishing to record under current scene, is determined according to the scene of recording, Such as during live streaming, the face of live streaming person is exactly target sound source；During instrument playing, the musical instrument played music is exactly mesh Mark sound source.By integrated camera on mobile terminals, carrys out the image of photographic subjects sound source, thus obtain target sound source Image information.Illustratively, it can be identified by the recognition of face of camera or human body contour outline, obtain the image of target sound source Information.

Step 102, according to the location information of described image acquisition of information target sound source.

In the image information of the target sound source of shooting, target sound source can be identified by image recognition, then basis Position in image where target sound source and the position where the mobile terminal of shooting image, mobile terminal and target sound source Distance, so that it may to obtain the location information of target sound.Such as during live streaming under main broadcaster's pattern, where mobile terminal Position is reference coordinate, and in the image of the live streaming person of mobile terminal shooting, live streaming person is located at 30 centimetres of mobile terminal dead astern Position has thus obtained the location information of target sound source.Can be obtained by way of image procossing the camera of shooting with The distance for shooting object, has the position relationship that functional ranging camera can also obtain subject and camera.At other Embodiment in, can also by modes such as laser ranging, infrared distance measurements, obtain between photographic subjects and camera away from From.

Illustratively, the location information of described image acquisition of information target sound source may include：Extract described image information The directional information and depth of view information of middle target sound source；And then the target source of students is calculated according to the directional information and depth of view information Location information.

Step 103 obtains multichannel recorded message.

Multichannel recorded message herein includes at least two-way, passes through the Mike of multiple microphones composition positioned at different location Wind array is recorded.When user's operation mobile terminal starts recording, all or part of wheat in mobile terminal can be automatically turned on Gram wind, and Mike's recording of unlatching is controlled, obtain the recorded audio signals of at least two microphones in opened microphone.

By multiple microphones positioned at different location different angle, the sound that a direction transmits can be enhanced or be pressed down System.With this method, microphone array can effectively enhance particular sound signal in noise circumstance, have suppression well The ability of noise and speech enhan-cement processed, and do not need microphone moment direction Sounnd source direction.It, can basis in practical Recording Process Actually record needs select suitably for recording microphone number, microphone much better noise reduction more easy to implement and Speech enhan-cement effect.For to the higher recording mode of target sound source positioning requirements, such as the target sound that is moved under main broadcaster's pattern Source, preferably three or more microphone arrays, to accomplish the positioning within the scope of more complete more extensive angle, to realize to target sound source More accurately locating recordings.

Step 104, the sound that target sound source described in the recorded message is identified according to the location information of the target sound source Message ceases.

Illustratively, when obtaining multichannel recorded message by microphone array, because of the difference of microphone placement position, So for sound source on a direction, the sound sent out reaches the time meeting different from of each microphone, same sound The sound that source is sent out reach different microphones time and distance it is also all variant, microphone collected voice signal Intensity is also different, thereby produces different phase and signal amplitude, these differences are because the position difference of sound source can embody Different difference.When the air line distance relative close of sound source and microphone, the time that sound reaches the microphone can be relatively Early, the intensity of voice signal is also relatively strong；When sound source and the air line distance of microphone relatively far away from when, sound reaches the Mike The time of wind can relatively late, and the intensity of voice signal is also relatively weak.It therefore, can be according to above-mentioned principle in step 103 Obtain at least two-way more than recorded message analyzed, it can be determined that go out angle of the sound source with respect to microphone array and away from From being compared in conjunction with the position of target sound source determined in step 102, determine the letter of which sound source in the recorded message Breath is the sound from target sound source, i.e., the acoustic information from target sound source is identified from multichannel recorded message

Step 105 handles the acoustic information of the target sound source.

After obtaining the acoustic information of target sound source, the acoustic information of target sound source is desirable to the sound recorded, can be more It in the recorded message of road, records after further enhancing the acoustic information of target sound source, does so and mesh in recording file can be improved The attributes such as intensity of sound and the loudness of sound source are marked, the sound locating recordings effect to target sound source is enhanced.

As the embodiment that other can be replaced, in step 105, can also be believed by retaining the sound of target sound source Breath, but record after weakening to the acoustic information of other sound sources other than the sound of target sound source, doing so can The acoustic information of prominent target sound source, enhances the sound locating recordings effect to target sound source.

Or it can also be by enhancing the acoustic information of target sound source, also to other records other than target sound source The acoustic information of sound source is recorded again after being weakened on sound direction.Do so to remove other than target sound source to greatest extent Other recording directions on sound source interference of the sound to target sound source sound, that is, having filtered out its other than target sound source The sound of sound source on his recording direction, it is ensured that the clear sound of target sound source obtains better recording effect.

To the acoustic information of target sound source carry out enhancing and to the acoustic informations of other sound sources other than target sound source into Row weakens can be by enhancing the gain of the acoustic information of target sound source or the increasing of the acoustic information of the reduction non-targeted sound source Benefit is realized, wherein the amplitude that gain is promoted or reduced can be arranged by system default, also can be voluntarily arranged by user.For example, can Lifting Coefficients are set and reduce coefficient, target is removed by adjusting the cooperation of two coefficients, it can be achieved that retaining as the case may be The proportion of the sound on other recording directions other than Sounnd source direction, and then avoid the generation of some distortion sound.

Can also include according to the first recording enhanced the acoustic information of target sound source after step 105 Information and/or the second recorded message weakened to the acoustic information of other sound sources other than target sound source generate recording The step of file, audio file to obtain locating recordings.

After the above step, can also include that the corresponding vision signal of above-mentioned recording file is synchronized into conjunction At to obtain the audio video file of locating recordings.

The method for the locating recordings that the embodiment of the present invention one provides is obtained by obtaining the image information of target sound source The location information of target sound source is identified from multichannel recorded message from the mesh according to the location information of the target sound source The acoustic information of sound source is marked, then the acoustic information of the target sound source is handled, to protrude the acoustic information of target sound source, from And realize and record to target sound source localization of sound, this method makes under recording mode, the position regardless of target sound source It is mobile, recording device can Direct Recognition and prominent target sound source sound, obtain preferable recording effect, promote user's body It tests.

Embodiment two

Fig. 2 is a kind of flow diagram of the method for locating recordings provided by Embodiment 2 of the present invention, more than the present embodiment It states and optimizes based on embodiment one, it is in the present embodiment, step is " described to be known according to the location information of the target sound source The acoustic information of target sound source described in the not described recorded message " is optimized, and the method for the present embodiment includes the following steps：

Step 101, the image information for obtaining target sound source；；Same as Example 1, details are not described herein.

Step 102, according to the location information of described image acquisition of information target sound source；It is same as Example 1, herein no longer It repeats.

Step 103, obtain multichannel recorded message, recorded message described in this scheme be three tunnels more than, that is, use three with On microphone, specific steps are same as Example 1, and details are not described herein.

Step 104, the sound that target sound source described in the recorded message is identified according to the location information of the target sound source Message ceases；

Specifically, this step may include：Including：

Step 1041, the phase and signal amplitude for obtaining the roads multichannel recorded message Zhong Ge recorded message；

That is, obtaining the phase and signal amplitude of the recorded message of each microphone acquisition in microphone array.

Step 1042, phase and signal amplitude difference by comparing above-mentioned each road recorded message, obtain multichannel recording letter The location information of institute's sound source in breath；

Using the signal that first microphone receives as signal is referred to, the letter that each microphone receives is calculated separately The difference of phase and signal amplitude number between the signal that receives of first microphone；And then utilize Crosspower spectrum phase method The broad sense cross-correlation function between each signal and reference signal is acquired respectively, is sought cross-correlation function and is maximized adopting for moment Sampling point, then sampled point is converted into the delay inequality between two signals, to obtain the signal received by each microphone Time delay between the signal received with first microphone；In conjunction with each time delay, according near field model spherical wave The bearing estimate formula that preceding model inference goes out calculates each approximate direction angle, and surveyed sound source is obtained in conjunction with the approximate direction angle Estimation deflection, and then judge each sound source with respect to the angle and distance of microphone array to get to the position of the sound source Confidence ceases.

Step 1043, by the location information of the sound source of acquisition with according to the target sound source of described image acquisition of information Location information is compared, and identifies the information of target sound source in the multichannel recorded message.

Step 105 handles the acoustic information of the target sound source.

By the above method, the information of target sound source in multichannel recorded message, identification effect more can be accurately identified Fruit is more preferable, and the locating recordings effect of acquisition is more preferable.

Embodiment three

Fig. 3 is the structure diagram of a kind of locating recordings device that the embodiment of the present invention four provides, the device can by software and/ Or hardware realization, it is typically integrated in mobile terminal, can realize locating recordings by executing the method for locating recordings.Such as Fig. 3 Shown, which includes：Image information acquisition module 301, position information acquisition module 302, recorded message acquisition module 303, Target sound source acoustic information determining module 304, acoustic information system module 305.

Wherein, image information acquisition module, the image information for obtaining target sound source；Position information acquisition module is used In the location information according to described image acquisition of information target sound source；Recorded message acquisition module, for obtaining multichannel microphone Recorded message；Target sound source acoustic information determining module, for identifying the record according to the location information of the target sound source The acoustic information of target sound source described in message breath；Acoustic information system module, for the acoustic information to the target sound source It is handled.

The device of above-mentioned locating recordings, can be under recording mode, and no matter how the position of target sound source moves, can The sound of Direct Recognition target sound source, to obtain better recording effect.

Preferably, the position information acquisition module may include：Direction receiving module and depth of field parameter acquisition module.Wherein Direction receiving module is configured as the locality of the image information acquisition target sound source according to the target sound source；The depth of field is joined Number acquisition module, is configured as the depth of view information of the image information acquisition target sound source according to the target sound source.

Preferably, the target sound source acoustic information determining module may include：Phase and signal amplitude acquisition module；Multichannel Recorded message sound source position data obtaining module；Target sound source information identification module.

Preferably, the acoustic information system module includes enhancing module, weakens module and recording file generation module.Its Middle enhancing module, for carrying out intelligent enhancing to the acoustic information from target sound source；Weaken module, for non-targeted sound source Acoustic information weakened；Recording file generation module, the recording letter for being crossed according to enhancing module and weakening resume module Breath generates recording file.

Preferably, the enhancing module is gain lift unit, the increasing of the acoustic information for promoting the target sound source Benefit obtains the first recorded message；The weakening module is that unit is lowered in gain, for reducing non-targeted sound in the recorded message The gain of the acoustic information in source obtains the second recorded message.Preferably, which further includes audio-frequency information storage and synthesis mould Block is configured as storing the recording file according to certain format, and synchronizes synthesis with camera shooting and video signal.

Although being described in conjunction with the accompanying the embodiment of the present invention, those skilled in the art can not depart from the present invention Spirit and scope in the case of various modifications and variations can be made, such modifications and variations are each fallen within by appended claims institute Within the scope of restriction.

Claims

1. a kind of method of locating recordings, it is characterised in that：Including：

Obtain the image information of target sound source；

According to the location information of described image acquisition of information target sound source；

Obtain multichannel recorded message；

The acoustic information of target sound source described in the recorded message is identified according to the location information of the target sound source；

The acoustic information of the target sound source is handled.

2. according to the method described in claim 1, it is characterized in that：The position according to described image acquisition of information target sound source Confidence ceases：

Extract the directional information and depth of view information of target sound source in described image information；

The location information of the target sound source is calculated according to the directional information and depth of view information.

3. according to the method described in claim 1, it is characterized in that：The multichannel recorded message includes at least three tunnels.

4. according to the method described in claim 1, it is characterized in that, identifying the record according to the location information of the target sound source The acoustic information of target sound source in message breath, including：

Obtain the phase and signal amplitude of the roads multichannel recorded message Zhong Ge recorded message；

By comparing the phase and signal amplitude difference of above-mentioned each road recorded message, institute's sound source in multichannel recorded message is obtained Location information；

The location information of the sound source of acquisition and the location information of the target sound source according to described image acquisition of information are carried out Comparison, identifies the information of target sound source in the multichannel recorded message.

5. according to the described method of any one of claim 1-4, which is characterized in that the acoustic information of the target sound source into Row is handled：

The acoustic information of the target sound source is enhanced, the first recorded message is obtained；

And/or other sound source informations other than the acoustic information of target sound source in the recorded message are weakened, obtain Two recorded messages.

6. according to the method described in claim 5, it is characterized in that：Further include according to first recorded message and described second Recorded message generates recording file.

7. according to the method described in claim 6, it is characterized in that：Further include by the corresponding video letter of the recording file Number synchronize synthesis.

8. a kind of locating recordings device, which is characterized in that described device includes：

Image information acquisition module, the image information for obtaining target sound source；

Position information acquisition module, for the location information according to described image acquisition of information target sound source；

Recorded message acquisition module, the recorded message for obtaining multichannel microphone；

Target sound source acoustic information determining module, for being identified in the recorded message according to the location information of the target sound source The acoustic information of the target sound source；

Acoustic information system module is handled for the acoustic information to the target sound source.

9. device according to claim 8, which is characterized in that the position information acquisition module includes：

Direction receiving submodule is used for the locality of the image information acquisition target sound source according to the target sound source；

Depth of field parameter acquiring submodule is used for the depth of view information of the image information acquisition target sound source according to the target sound source.

10. according to the device described in any one of claim 8 and 9, it is characterised in that：The target sound source acoustic information determines Module includes：

Phase and signal amplitude acquisition module, phase and signal for obtaining the roads multichannel recorded message Zhong Ge recorded message Amplitude；

Multichannel recorded message sound source position data obtaining module, phase and signal amplitude for comparing above-mentioned each road recorded message Difference obtains the location information of institute's sound source in multichannel recorded message；

Target sound source information identification module, the location information of the sound source for that will obtain and according to described image acquisition of information The location information of target sound source compared, identify the information of target sound source in the multichannel recorded message.