CN109920405A

CN109920405A - Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing

Info

Publication number: CN109920405A
Application number: CN201910164535.5A
Authority: CN
Inventors: 陈建哲; 彭汉迎; 欧阳能钧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2019-06-21

Abstract

The embodiment of the present invention provides a kind of multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing.The method of the embodiment of the present invention, by the audio data for receiving the acquisition of multichannel microphone array, beam forming processing is carried out to every road audio data, audio signal corresponding with corresponding audio pickup area in every road audio data is obtained, weakens the audio signal in the road audio data on other directions；It carries out AF panel to multipath audio signal and handles to obtain each audio collection region to correspond to voice signal, reduce interference of the noise signal in other audio collection regions to the road voice signal, the corresponding speech recognition result in each audio collection region is obtained to each voice signal speech recognition, improves the discrimination of speech recognition；When more people talk simultaneously, inhibits interfering with each other between each road voice signal, obtain the corresponding speech recognition result in each audio collection position, improve the efficiency and accuracy of speech recognition.

Description

Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing

Technical field

The present embodiments relate to technical field of voice recognition more particularly to a kind of multi-path voice recognition methods, device, set Standby and readable storage medium storing program for executing.

Background technique

Currently, the vehicle device on vehicle is all that two-channel microphone, including two wheats of left and right sound channels all the way only is arranged at front row Gram wind is mainly used for acquiring the audio data near skipper position, by carrying out speech recognition to the audio data of acquisition, to know Instruction that other driver issues to vehicle device etc. identifies language.

But if when the passenger for being sitting in co-driver or back row seat on vehicle issues identification language to vehicle device, due to Farther out, the audio data of microphone acquisition is second-rate for sound source distance microphone, causes phonetic recognization rate very low, especially in more people When saying identification language simultaneously, it will cause reverberation, be more difficult to correctly identify identification language.

Summary of the invention

The embodiment of the present invention provides a kind of multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing, to solve The very low problem of the phonetic recognization rate of audio recognition method on vehicle in the prior art.

The one aspect of the embodiment of the present invention is to provide a kind of multi-path voice recognition methods, comprising:

The audio data of multichannel microphone array acquisition is received, microphone array described in every road is directed toward a sound in vehicle Frequency pickup area, for acquiring audio data all the way；

Position according to every road microphone array relative to corresponding audio pickup area, the audio data described in every road carry out Beam forming processing, obtains audio signal corresponding with corresponding audio pickup area in audio data described in every road；

The audio signal described in multichannel carries out AF panel processing, obtains each audio collection region and corresponds to voice letter Number；

Speech recognition is carried out to the corresponding voice signal in each audio collection region, obtains each audio collection area The corresponding speech recognition result in domain.

The other side of the embodiment of the present invention is to provide a kind of multi-path voice identification device, comprising:

Data acquisition module, for receiving the audio data of multichannel microphone array acquisition, microphone array described in every road It is directed toward an audio collection region in vehicle, for acquiring audio data all the way；

Beamforming block, for the position according to every road microphone array relative to corresponding audio pickup area, to every Audio data described in road carries out beam forming processing, obtains corresponding with corresponding audio pickup area in audio data described in every road Audio signal；

AF panel processing module carries out AF panel processing for the audio signal described in multichannel, obtains each described Audio collection region corresponds to voice signal；

Speech recognition module is obtained for carrying out speech recognition to the corresponding voice signal in each audio collection region The corresponding speech recognition result in each audio collection region.

The other side of the embodiment of the present invention is to provide a kind of multi-path voice identification equipment, comprising:

Memory, processor, and it is stored in the computer journey that can be run on the memory and on the processor Sequence,

The processor realizes multi-path voice recognition methods described above when running the computer program.

The other side of the embodiment of the present invention is to provide a kind of computer readable storage medium, is stored with computer journey Sequence,

The computer program realizes multi-path voice recognition methods described above when being executed by processor.

Multi-path voice recognition methods, device, equipment and readable storage medium storing program for executing provided in an embodiment of the present invention are more by receiving The audio data of road microphone array acquisition, microphone array described in every road are directed toward an audio collection region in vehicle, use In acquisition audio data all the way；Position according to every road microphone array relative to corresponding audio pickup area, described in every road Audio data carries out beam forming processing, obtains audio letter corresponding with corresponding audio pickup area in audio data described in every road Number, weaken the audio signal in the road audio data on other directions, realizes the compacting to audio signal on other directions；Then The audio signal described in multichannel carries out AF panel processing, obtains each audio collection region and corresponds to voice signal, into one Step reduces interference of the noise signal in other audio collection regions to the road voice signal, obtains more clean audio collection area The corresponding voice signal in domain；Speech recognition is carried out to the corresponding voice signal in each audio collection region, is obtained each described The corresponding speech recognition result in audio collection region；Realize no matter sound source is located at which audio collection region of vehicle, has pair The microphone array all the way answered can accurately acquire the audio data, and obtain accurate speech recognition result, improve language The discrimination of sound identification；And it is able to suppress mutual between each road voice signal in more people when different location is talked simultaneously Interference, identifies the corresponding speech recognition result in each audio collection position, substantially increases the efficiency of speech recognition and accurate Property.

Detailed description of the invention

Fig. 1 is the multi-path voice recognition methods flow chart that the embodiment of the present invention one provides；

Fig. 2 is multi-path voice recognition methods flow chart provided by Embodiment 2 of the present invention；

Fig. 3 is the structural schematic diagram for the multi-path voice identification device that the embodiment of the present invention three provides；

Fig. 4 is the structural schematic diagram that the multi-path voice that the embodiment of the present invention five provides identifies equipment.

Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the range of design of the embodiment of the present invention in any manner with verbal description, but by reference to specific reality Applying example is that those skilled in the art illustrate idea of the invention.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the embodiment of the present invention.On the contrary, they be only with The example of the consistent device and method of as detailed in the attached claim, the embodiment of the present invention some aspects.

Term " first " involved in the embodiment of the present invention, " second " etc. are used for description purposes only, and should not be understood as referring to Show or imply relative importance or implicitly indicates the quantity of indicated technical characteristic.In the description of following embodiment, The meaning of " plurality " is two or more, unless otherwise specifically defined.

These specific embodiments can be combined with each other below, may be at certain for the same or similar concept or process It is repeated no more in a little embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.

Embodiment one

Fig. 1 is the multi-path voice recognition methods flow chart that the embodiment of the present invention one provides.The embodiment of the present invention is for existing The very low problem of the phonetic recognization rate of audio recognition method in technology on vehicle, provides multi-path voice recognition methods.This reality The method applied in example is applied to speech recognition apparatus, which can be installation and set with the car-mounted terminal on vehicle It is standby, or can be communicated with the vehicle-mounted terminal equipment on vehicle, and carry out the computer equipment of speech recognition, at it In his embodiment, this method applies also for other equipment, and the present embodiment is schematically illustrated by taking speech recognition apparatus as an example.

As shown in Figure 1, specific step is as follows for this method:

Step S101, the audio data of multichannel microphone array acquisition is received, every road microphone array is directed toward in vehicle One audio collection region, for acquiring audio data all the way.

The embodiment of the present invention is applied to carry out speech recognition vehicle, is usually provided with multiple seats, example in the vehicle Such as skipper seat, assistant driver seat and other seats are equipped with multichannel microphone array, every road microphone in vehicle Array is directed toward an audio collection region, for acquiring the audio data in the audio collection region being directed toward.Each audio collection area Domain corresponds to the position where a seat, and the seat in audio collection region and vehicle corresponds, it is, every road Mike Wind array is correspondingly arranged for being directed toward a seat, microphone array with the seat in vehicle.For example, for four automatically drive Vehicle is sailed, installation is respectively directed to the four road microphone arrays at four seats in vehicle.

In the present embodiment, when carrying out speech recognition, microphone array can acquire audio data in real time, and will acquisition Audio data be sent to speech recognition apparatus.Speech recognition apparatus can receive the audio number of each road microphone array acquisition According to.

Wherein, every road audio data may include the mark for acquiring the microphone array of the road audio data, each to distinguish Road audio data.

Step S102, the position according to every road microphone array relative to corresponding audio pickup area, to every road audio number According to beam forming processing is carried out, audio signal corresponding with corresponding audio pickup area in every road audio data is obtained.

Wherein, the corresponding audio collection region of every road microphone array refers to: audio pointed by the road microphone array Pickup area.The corresponding audio collection region of every road voice data refers to: acquiring pointed by the microphone array of the audio data Audio collection region.

In the step, after receiving multi-path audio-frequency data, respectively according to every road microphone array relative to its meaning To audio collection region position, by beam forming (beam forming) technology, to road microphone array acquisition Audio data carries out beam forming processing all the way, obtains the corresponding audio collection area of the road road audio data Zhong Yugai audio data The corresponding audio signal in domain weakens the audio signal in the road audio data on other directions, realizes to audio on other directions The compacting of signal.

In the step, every road is obtained by carrying out beam forming processing to every road audio data using beam forming technique Audio signal corresponding with corresponding audio pickup area in audio data that is to say to obtain the corresponding sound in each audio collection region Frequency signal.

Step S103, AF panel processing is carried out to multipath audio signal, obtains each audio collection region and corresponds to voice Signal.

In the present embodiment, the audio signal on other directions may not be able to be fully eliminated due to beam forming technique, In obtaining the corresponding audio signal in each audio collection region, the audio signal on other directions still may include, that is, It may include the audio signal that sound source issues in other audio collection regions.In the step, each audio collection region pair is being obtained After the audio signal answered, AF panel processing is carried out to multipath audio signal, from the corresponding audio signal in audio collection region Middle other corresponding audio signal parts in audio collection region of removal, obtain the more clean road audio signal and correspond to audio adopting Collect voice signal corresponding to region.

In addition, the process of wave beam processing and AF panel processing is carried out, it can be by the digital signal on speech recognition apparatus It handles (Digital Signal Processing, abbreviation DSP) processing module or independent dsp chip is completed, the present embodiment It is not specifically limited herein.

Step S104, speech recognition is carried out to the corresponding voice signal in each audio collection region, obtains each audio collection The corresponding speech recognition result in region.

After obtaining the corresponding voice signal in each audio collection region, respectively to the corresponding language in each audio collection region Sound signal carries out speech recognition, obtains the recognition result of the voice signal in each audio collection region.

In addition, in the step carry out speech recognition process, can by speech recognition apparatus DSP processing module or The independent speech recognition engine of person is completed, and the present embodiment is not specifically limited herein.

The audio data that the embodiment of the present invention is acquired by receiving multichannel microphone array, every road microphone array are directed toward vehicle An audio collection region in, for acquiring audio data all the way；According to every road microphone array relative to corresponding audio The position of pickup area carries out beam forming processing to every road audio data, obtains adopting in every road audio data with corresponding audio Collect the corresponding audio signal in region, weaken the audio signal in the road audio data on other directions, realizes on other directions The compacting of audio signal；Then AF panel processing is carried out to multipath audio signal, obtains each audio collection region and corresponds to language Sound signal is further reduced interference of the noise signal in other audio collection regions to the road voice signal, obtains more clean The corresponding voice signal in audio collection region；Speech recognition is carried out to the corresponding voice signal in each audio collection region, is obtained The corresponding speech recognition result in each audio collection region；Realize no matter sound source is located at which audio collection region of vehicle, There is corresponding microphone array all the way that can accurately acquire the audio data, and obtain accurate speech recognition result, improves The discrimination of speech recognition；And it is able to suppress between each road voice signal in more people when different location is talked simultaneously Interfere with each other, identify the corresponding speech recognition result in each audio collection position, substantially increase speech recognition efficiency and Accuracy.

Embodiment two

Fig. 2 is multi-path voice recognition methods flow chart provided by Embodiment 2 of the present invention.On the basis of above-described embodiment one On, in the present embodiment, position according to every road microphone array relative to corresponding audio pickup area, to every road audio data into The processing of traveling wave beam shaping, obtains in every road audio data before audio signal corresponding with corresponding audio pickup area, further includes: Obtain position of every road microphone array relative to corresponding audio pickup area.Voice signal corresponding to each audio collection region Speech recognition is carried out, after obtaining the corresponding recognition result in each audio collection region, further includes: calculate each audio collection region The average energy amplitude of corresponding voice signal；Remove the corresponding identification of voice signal that average energy amplitude is less than preset threshold As a result.As shown in Fig. 2, specific step is as follows for this method:

Step S201, the audio data of multichannel microphone array acquisition is received, every road microphone array is directed toward in vehicle One audio collection region, for acquiring audio data all the way.

In addition, microphone array is installed nearby relative to corresponding audio collection region, for microphone in the present embodiment The specific installation site of array is not specifically limited.

For example, can be installed in vehicle for four vehicles and be respectively directed to four seats, for acquiring four seats On sound source audio data microphone array, four microphone arrays can install the vehicle in vehicle above four seats respectively On top.

Step S202, position of every road microphone array relative to corresponding audio pickup area is obtained.

Wherein, every road microphone array includes: every road microphone array phase relative to the position of corresponding audio pickup area For the angular range and distance range of corresponding audio pickup area.

In a kind of application scenarios of the present embodiment, technical staff can use beam forming technique, preset every road wheat The position of gram wind array relative to corresponding audio pickup area, after the installation for completing each road microphone array, every road Mike Wind array just has determined relative to the position of corresponding audio pickup area.

A kind of feasible embodiment of the step are as follows:

The available preset every road microphone array of speech recognition apparatus is relative to corresponding audio pickup area Position.

Optionally, the position for presetting every road microphone array relative to corresponding audio pickup area can be stored in advance In the vehicle-mounted terminal equipment of vehicle, speech recognition apparatus can obtain the road vehicle Shang Ge microphone from vehicle-mounted terminal equipment Position of the array relative to corresponding audio pickup area.

In the present embodiment, in order to more accurately get the voice signal of sound source, each road microphone array is completed After the installation of column, the different seats of vehicle can be sitting in respectively by technical staff, positioning audio is issued, obtain every road microphone Position of the array relative to corresponding audio pickup area, can specifically realize in the following way:

For arbitrarily microphone array all the way, the sound source of the correspondence audio pickup area of road microphone array acquisition is received The positioning audio of sending；Auditory localization processing is carried out to positioning audio, calculates the sound source of positioning audio relative to the road microphone The position of array；Position of the sound source of audio relative to the road microphone array will be positioned, it is opposite as the road microphone array In the position of corresponding audio pickup area, so that the position to the road microphone array relative to corresponding audio pickup area carries out Calibration.

For example, for four vehicles, installation is respectively directed to four seats in vehicle, for acquiring on four seats The microphone array of the audio data of sound source.When the sound source on a wherein seat makes a sound, the corresponding Mike in the seat Wind array is available to arrive the audio data, and speech recognition apparatus can determine the sound source relative to this by auditory localization technology The position of microphone array, and the position as the road microphone array relative to the corresponding audio collection region in the seat.

In the present embodiment, position of the available each microphone array of speech recognition apparatus relative to each audio collection region It sets.It is carrying out after auditory localization determines position of a certain sound source relative to a certain microphone array, it can be according to presetting Position of every road microphone array relative to corresponding audio pickup area, determine whether the sound source is located at the microphone array pair In the audio collection region answered.

In another application scenarios of the present embodiment, when the personnel in vehicle want speech-controlled vehicle, it usually needs first The speech identifying function of vehicle is waken up by preset wake-up language.Speech recognition apparatus can identify wake up language it Afterwards, the corresponding audio of language will be waken up as positioning audio, auditory localization processing is carried out to voice frequency is waken up, calculate positioning audio Position of the sound source relative to the road microphone array, and determining that the sound source is located at the corresponding audio collection area of the microphone array When in domain, position of the sound source of audio relative to the road microphone array will be positioned, as the road this speech recognition process Zhong Gai Position of the microphone array relative to corresponding audio pickup area, carries out beam forming processing to every road audio data in this way, obtains It is more accurate to audio signal, the identification accuracy of the audio data issued for the personnel can be improved.

In the present embodiment, position of every road microphone array relative to corresponding audio pickup area is obtained in the step, it can With the execution when carrying out speech recognition for the first time after speech recognition apparatus powers on, the road Bing Jiangmei microphone array is relative to diaphone The position of frequency pickup area is stored, and in subsequent speech recognition process, be can be read directly and is used, language can be improved The efficiency of sound identification.

It optionally,, can be with for the audio data of the acquisition per microphone array all the way when carrying out speech recognition every time The audio fragment that preset period of time is intercepted from audio data updates this speech recognition using the audio fragment as positioning audio Position of the road microphone array relative to corresponding audio pickup area in the process, in this way to every road audio data carry out wave beam at Shape processing, obtains that audio signal is more accurate, and the identification accuracy of the audio data issued for the personnel can be improved.Its In, preset period of time can be a period of audio data starting or a period at end, and preset period of time can be by technology Personnel set according to practical application scene and experience, and the present embodiment is not specifically limited herein.

Step S203, the position according to every road microphone array relative to corresponding audio pickup area, to every road audio number According to beam forming processing is carried out, audio signal corresponding with corresponding audio pickup area in every road audio data is obtained.

In the step, after receiving multi-path audio-frequency data, respectively according to every road microphone array relative to its meaning To audio collection region position, by beam forming technique, to the audio data all the way of road microphone array acquisition into The processing of traveling wave beam shaping obtains the corresponding audio letter in the corresponding audio collection region of the road road audio data Zhong Yugai audio data Number, weaken the audio signal in the road audio data on other directions, realizes the compacting to audio signal on other directions.

Step S204, AF panel processing is carried out to multipath audio signal, obtains each audio collection region and corresponds to voice Signal.

Specifically, carrying out AF panel processing to multipath audio signal, obtains each audio collection region and correspond to voice letter Number, it can specifically realize in the following ways:

Respectively using every road audio signal as target audio, auditory localization processing is carried out to target audio, determines target sound The sound source position of frequency；According to the sound source position of target audio, judge in target audio whether to include other audio collection regions The audio signal that sound source issues；If the audio signal that the sound source in target audio comprising other audio collection regions issues, from Other corresponding audio signals in audio collection region are removed in target audio, obtaining target audio, to correspond to audio pickup area institute right The voice signal answered.

If the audio signal that the sound source in target audio not comprising other audio collection regions issues, can be directly by mesh Mark with phonetic symbols frequency is used as it to correspond to voice signal corresponding to audio pickup area.

After determining the sound source position of target audio, if target audio corresponds to multi-acoustical, each sound can be determined Position of the source relative to the corresponding microphone array of target audio；According to each microphone array relative to each audio collection region Position may further determine that audio collection region locating for each sound source, judge in these sound sources with the presence or absence of in other The sound source in audio collection region, so as to judge whether the sound source comprising other audio collection regions issues in target audio Audio signal.

For example, two people on skipper position and co-driver issue the first identification language and the second identification language respectively, at this moment, It can in the audio data of corresponding first microphone array in skipper position and the corresponding second microphone array acquisition of co-driver It can include two identification language information；If after beam forming is handled, corresponding first audio signal in obtained skipper position The signal of language is identified comprising part second；Auditory localization processing is carried out to the first audio signal, can determine that there are two sound sources, and Obtain position of two sound sources relative to the first microphone array；In conjunction with each microphone array relative to each audio collection region Position can determine that two sound sources are located at the audio collection region of skipper position and co-driver；So as to judge First audio signal includes other corresponding audio signals in audio collection region, according to corresponding second audio signal of co-driver Property parameters, the second audio signal is eliminated from the first audio signal, obtains the corresponding voice signal of the first audio signal, It is to obtain the corresponding voice signal in skipper position.In addition, the second audio signal for copilot can also do similar place Reason, obtains the corresponding voice signal of co-driver.

Step S205, speech recognition concurrently is carried out to the corresponding voice signal in each audio collection region, obtains each sound The corresponding speech recognition result of frequency pickup area.

It, can be concurrently to each sound after obtaining the corresponding voice signal in each audio collection region in the present embodiment The corresponding voice signal of frequency pickup area carries out speech recognition, obtains the identification knot of the voice signal in each audio collection region Fruit.

Specifically, the corresponding voice signal in each audio collection region can be inputted into a speech recognition module respectively, Speech recognition concurrently is carried out to the corresponding voice signal in each audio collection region, obtains the voice in each audio collection region The recognition result of signal can greatly improve the efficiency of speech recognition.

In the present embodiment, after identifying the recognition result of voice signal in each audio collection region, it can also walk Rapid S206 and S207 removes the null result in speech recognition result to checking treatment, screening is carried out in speech recognition result, with Improve the accuracy of speech recognition.

Step S206, the average energy amplitude of the corresponding voice signal in each audio collection region is calculated.

In the present embodiment, the average energy amplitude of the corresponding voice signal in audio collection region is calculated, can be used existing The method that any voice signal average energy amplitude is calculated in technology realizes that details are not described herein again for the present embodiment.

Step S207, removal average energy amplitude is less than the corresponding recognition result of voice signal of preset threshold.

After the average energy amplitude that the corresponding voice signal in each audio collection region is calculated, more each voice letter Number average energy amplitude and preset threshold size, by average energy amplitude be less than preset threshold the corresponding language of voice signal Sound recognition result is as invalid identification as a result, the voice signal that average energy amplitude is more than or equal to preset threshold is corresponding Speech recognition result screens speech recognition result obtained in step S205 as effective recognition result, and removal is wherein Average energy amplitude be less than preset threshold the corresponding invalid identification of voice signal as a result, obtaining final speech recognition knot Fruit.

Wherein, preset threshold can be set by technical staff according to practical application scene and experience, the present embodiment this Place is not specifically limited.

For example, the corresponding microphone array all the way of co-driver is also adopted after interpersonal on skipper position has said identification language Audio data is collected, speech recognition apparatus has identified corresponding speech recognition result, skipper and copilot corresponding two Road speech recognition result should be consistent.Since after beam forming and AF panel processing, co-driver is corresponding The energy amplitude very little of voice signal, if the energy amplitude of the corresponding voice signal of co-driver is less than preset threshold, The corresponding speech recognition result all the way of copilot is likely to malfunction, and can abandon the recognition result, retains skipper corresponding one Road speech recognition result, to improve the accuracy rate of speech recognition.

The embodiment of the present invention is by the position according to every road microphone array relative to corresponding audio pickup area, to every Road audio data carries out beam forming processing, obtains audio signal corresponding with corresponding audio pickup area in every road audio data Before, it is calibrated by the position to every road microphone array relative to corresponding audio pickup area, so that at beam forming It is more accurate to manage obtained audio signal；By concurrently carrying out voice knowledge to the corresponding voice signal in each audio collection region Not, the corresponding speech recognition result in each audio collection region is obtained, the efficiency of speech recognition is further improved；Further Ground, by calculating the average energy amplitude of the corresponding voice signal in each audio collection region, removal average energy amplitude is less than pre- If the corresponding recognition result of the voice signal of threshold value, the secondary verification to speech recognition result is completed, removes invalid knowledge therein Not as a result, improving the accuracy of speech recognition.

Embodiment three

Fig. 3 is the structural schematic diagram for the multi-path voice identification device that the embodiment of the present invention three provides.The embodiment of the present invention mentions The multi-path voice identification device of confession can execute the process flow of multi-path voice recognition methods embodiment offer.As shown in figure 3, should Multi-path voice identification device 30 includes: data acquisition module 301, beamforming block 302,303 He of AF panel processing module Speech recognition module 304.

Specifically, data acquisition module 301, for receiving the audio data of multichannel microphone array acquisition, every road Mike Wind array is directed toward an audio collection region in vehicle, for acquiring audio data all the way.

Beamforming block 302, it is right for the position according to every road microphone array relative to corresponding audio pickup area Every road audio data carries out beam forming processing, obtains audio letter corresponding with corresponding audio pickup area in every road audio data Number.

AF panel processing module 303 obtains each audio and adopts for carrying out AF panel processing to multipath audio signal Collection region corresponds to voice signal.

Speech recognition module 304 obtains every for carrying out speech recognition to the corresponding voice signal in each audio collection region The corresponding speech recognition result in a audio collection region.

Device provided in an embodiment of the present invention can be specifically used for executing embodiment of the method provided by above-described embodiment one, Details are not described herein again for concrete function.

Example IV

On the basis of above-described embodiment three, in the present embodiment, speech recognition module is also used to:

Calculate the average energy amplitude of the corresponding voice signal in each audio collection region；Average energy amplitude is removed to be less than in advance If the corresponding recognition result of the voice signal of threshold value.

Optionally, speech recognition module is also used to:

Speech recognition concurrently is carried out to the corresponding voice signal in each audio collection region, obtains each audio collection region Corresponding speech recognition result.

Optionally, AF panel processing module is also used to:

Optionally, data acquisition module is also used to:

Obtain position of every road microphone array relative to corresponding audio pickup area.

Optionally, data acquisition module is also used to:

For arbitrarily microphone array all the way, the sound source of the correspondence audio pickup area of road microphone array acquisition is received The positioning audio of sending；Auditory localization processing is carried out to positioning audio, calculates the sound source of positioning audio relative to the road microphone The position of array；Position of the sound source of audio relative to the road microphone array will be positioned, it is opposite as the road microphone array In the position of corresponding audio pickup area.

Optionally, data acquisition module is also used to:

Obtain position of the preset every road microphone array relative to corresponding audio pickup area.

Optionally, position of every road microphone array relative to corresponding audio pickup area, comprising:

Angular range and distance range of every road microphone array relative to corresponding audio pickup area.

In the present embodiment, the seat in audio collection region and vehicle in vehicle is corresponded.

Device provided in an embodiment of the present invention can be specifically used for executing embodiment of the method provided by above-described embodiment two, Details are not described herein again for concrete function.

Embodiment five

Fig. 4 is the structural schematic diagram that the multi-path voice that the embodiment of the present invention five provides identifies equipment.As shown in figure 4, this sets Standby 40 include: processor 401, memory 402, and is stored in the computer that can be executed on memory 402 and by processor 401 Program.

Processor 401 realizes any of the above-described embodiment of the method when executing and storing in the computer program on memory 402 The multi-path voice recognition methods of offer.

In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored with computer program, the meter Calculation machine program realizes the multi-path voice recognition methods that any of the above-described embodiment of the method provides when being executed by processor.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.

Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each functional module Division progress for example, in practical application, can according to need and above-mentioned function distribution is complete by different functional modules At the internal structure of device being divided into different functional modules, to complete all or part of the functions described above.On The specific work process for stating the device of description, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claims are pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by appended claims System.

Claims

1. a kind of multi-path voice recognition methods characterized by comprising

The audio data of multichannel microphone array acquisition is received, the audio that microphone array described in every road is directed toward in vehicle is adopted Collect region, for acquiring audio data all the way；

Position according to every road microphone array relative to corresponding audio pickup area, the audio data described in every road carry out wave beam Forming processing, obtains audio signal corresponding with corresponding audio pickup area in audio data described in every road；

The audio signal described in multichannel carries out AF panel processing, obtains each audio collection region and corresponds to voice signal；

Speech recognition is carried out to the corresponding voice signal in each audio collection region, obtains each audio collection region pair The speech recognition result answered.

2. the method according to claim 1, wherein described believe the corresponding voice in each audio collection region Number carry out speech recognition, obtain the corresponding speech recognition result in each audio collection region, comprising:

Speech recognition concurrently is carried out to the corresponding voice signal in each audio collection region, obtains each audio collection The corresponding speech recognition result in region.

3. the method according to claim 1, wherein the audio signal described in multichannel carries out at AF panel Reason, obtains each audio collection region and corresponds to voice signal, comprising:

Respectively using audio signal described in every road as target audio, auditory localization processing is carried out to the target audio, determines institute State the sound source position of target audio；

According to the sound source position of the target audio, judge in the target audio whether include other audio collection regions sound The audio signal that source issues；

If the audio signal that the sound source in the target audio comprising other audio collection regions issues, from the target audio Other corresponding audio signals in audio collection region described in middle removal, obtaining the target audio, to correspond to audio pickup area institute right The voice signal answered.

4. the method according to claim 1, wherein it is described according to every road microphone array relative to corresponding audio The position of pickup area, the audio data described in every road carry out beam forming processing, obtain in audio data described in every road with it is right Before answering the corresponding audio signal in audio collection region, further includes:

5. according to the method described in claim 4, it is characterized in that, described obtain every road microphone array relative to corresponding audio The position of pickup area, comprising:

For arbitrarily microphone array, the sound source for receiving the correspondence audio pickup area of road microphone array acquisition issue all the way Positioning audio；

Auditory localization processing is carried out to the positioning audio, calculates the sound source of the positioning audio relative to the road microphone array Position；

Position by the sound source of the positioning audio relative to the road microphone array, as the road microphone array relative to right Answer the position in audio collection region.

6. according to the method described in claim 4, it is characterized in that, described obtain every road microphone array relative to corresponding audio The position of pickup area, comprising:

7. according to the method described in claim 4, it is characterized in that, every road microphone array is relative to corresponding audio collection The position in region, comprising:

8. the method according to claim 1, wherein described believe the corresponding voice in each audio collection region Number speech recognition is carried out, after obtaining the corresponding recognition result in each audio collection region, further includes:

Calculate the average energy amplitude of the corresponding voice signal in each audio collection region；

Remove the corresponding recognition result of voice signal that average energy amplitude is less than preset threshold.

9. method according to claim 1-8, which is characterized in that audio collection region and vehicle in the vehicle Seat in corresponds.

10. a kind of multi-path voice identification device characterized by comprising

Data acquisition module, for receiving the audio data of multichannel microphone array acquisition, microphone array described in every road is directed toward An audio collection region in vehicle, for acquiring audio data all the way；

Beamforming block, for the position according to every road microphone array relative to corresponding audio pickup area, to every road institute It states audio data and carries out beam forming processing, obtain audio corresponding with corresponding audio pickup area in audio data described in every road Signal；

AF panel processing module carries out AF panel processing for the audio signal described in multichannel, obtains each audio Pickup area corresponds to voice signal；

Speech recognition module obtains each for carrying out speech recognition to the corresponding voice signal in each audio collection region The corresponding speech recognition result in the audio collection region.

11. a kind of multi-path voice identifies equipment characterized by comprising

Memory, processor, and it is stored in the computer program that can be run on the memory and on the processor,

The processor realizes method as claimed in any one of claims 1-9 wherein when running the computer program.

12. a kind of computer readable storage medium, which is characterized in that it is stored with computer program,

The computer program realizes method as claimed in any one of claims 1-9 wherein when being executed by processor.