CN109409353A

CN109409353A - Vehicle whistle monitoring method and system based on DCNN target identification

Info

Publication number: CN109409353A
Application number: CN201710713579.XA
Authority: CN
Inventors: 魏敦楷; 李宏斌; 邱国庆; 梁冉
Original assignee: SHANGHAI KEYGO ELECTRONIC TECHNOLOGY Co Ltd
Current assignee: SHANGHAI KEYGO ELECTRONIC TECHNOLOGY Co Ltd
Priority date: 2017-08-18
Filing date: 2017-08-18
Publication date: 2019-03-01

Abstract

A kind of vehicle whistle monitoring method and system based on DCNN target identification, it include: control unit, microphone array and high-definition image acquisition processing device, wherein: microphone array and high-definition image acquisition processing device acquire whistle acoustical signal, video flowing and still image in real time respectively and export to control unit, control unit obtains acoustic pressure cloud atlas using beam forming algorithm according to whistle acoustical signal and is overlapped with video flowing, the video flowing with acoustic pressure coverage area is obtained, and extracts the area-of-interest of wherein all targets according to still image；Duplication calculating is finally carried out according to acoustic pressure coverage area and area-of-interest, onestep extraction of going forward side by side goes out the number plate information in target；High efficient and reliable of the present invention can effectively prevent erroneous judgement.

Description

Vehicle whistle monitoring method and system based on DCNN target identification

Technical field

It is specifically a kind of to be based on depth convolutional Neural net the present invention relates to a kind of technology in intelligent traffic administration system field The vehicle whistle monitoring method and system of network (DCNN) target identification.

Background technique

Evidence-obtaining system is captured currently based on the motor vehicles whistle of microphone array auditory localization technology, is mostly based on 4*4 16 simulation microphone array systems of structure are aided with the capture of single picture form, but there are various deficiencies for this system: microphone array Column and optical camera needs are separately installed, and the two needs to be spaced 20 meters, and actual use needs new vertical electric alert dedicated cross bar to cause Dispose, safeguard it is relatively difficult；Although individual static images can reflect the appearance and the trade mark of illegal vehicle, but can not reflect separated The behavior pattern of method process is easy the query by illegal driver；The most important, it is fixed that the prior art only relies upon acoustics Position thus limits the time-frequency characteristics that these technologies only rely on whistle sound, can not effectively distinguish power assist vehicle, motorcycle and common Motor vehicle；And phenomena such as there are reflection, diffraction due to sound, and road environment is complicated, the moment is all changing, microphone array Column are easy to produce deviation to the result that sound position, the case where vehicle is assembled, especially such as end to end two or more machines The case where generating erroneous judgement is easy to when motor-car.

Summary of the invention

The present invention In view of the above shortcomings of the prior art, proposes a kind of based on depth convolutional neural networks target identification Vehicle whistle monitoring method and system, positioning result and depth convolutional neural networks pair of the fusion microphone array to whistle sound The analysis of all kinds of targets, positioning result in image by the practical whistle vehicle of computer AI intelligent decision, and provide its license plate, fold Add the illegal procedure video of acoustic pressure cloud atlas, forms chain of evidence, the accuracy rate of existing system can be further increased.

The present invention is achieved by the following technical solutions:

The vehicle whistle monitoring method based on depth convolutional neural networks target identification that the present invention relates to a kind of, by real-time Whistle identification monitoring obtains whistle occur the moment acoustic pressure cloud atlas, and the timing synchronization acquisition low-res video flowing and The still image of high-res；Then respectively by acoustic pressure cloud atlas and video flowing superposition processing, using depth convolutional neural networks pair Still image is identified and obtains the area-of-interest (Region Of Interest, abbreviation ROI) of wherein all objects；Again Image coordinate projection relation is obtained by the single frames registration process between low-res video-very high resolution image, is projected Acoustic pressure coverage area on to still image；Coverage area is pressed finally by area-of-interest harmony is compared, with Duplication highest Object be determined as object of blowing a whistle.

The single frames registration process refers to: control unit runs registration procedure, registration procedure automatic collection synchronization Two field pictures from low resolution camera and high-definition camera, are registrated, and at least five feature in two field pictures is obtained Registering control points calculate the coordinate projection relationship from low resolution camera to high-definition camera.

The time difference of the synchronization is no more than 40ms.

The acoustic pressure cloud atlas carries out whistle identification in real time by the acoustical signal acquired to microphone array, when judgement is rung Flute occurs, then carries out real-time acoustic source positioning using beam forming algorithm (Beamforming), generates comprising whistle sound source coordinate Acoustic pressure cloud atlas.

The still image of the high-res, using but be not limited to the triggered shooting of high-definition image acquisition processing device Image, or the single-frame images extracted from high definition video steaming according to trigger signal.

The area-of-interest identifies the movement in picture by trained depth convolutional neural networks algorithm in advance The type of target, and provide area-of-interest (ROI) of each mobile target in picture；It is preferred that when target be motor vehicles when into One step identifies its license board information.

The present invention relates to a kind of vehicle whistle evidence-obtaining system for being based on depth convolutional neural networks (DCNN) target identification, packets It includes: control unit, microphone array and high-definition image acquisition processing device, in which: at microphone array and high-definition image acquisition It manages device to acquire whistle acoustical signal, video flowing and still image in real time respectively and export to control unit, control unit is according to ring Whistling signal is obtained acoustic pressure cloud atlas and is overlapped with video flowing using beam forming algorithm, is obtained with acoustic pressure coverage area Video flowing, and extract according to still image the area-of-interest of wherein all targets；Finally according to acoustic pressure coverage area and sense Interest region carries out Duplication calculating, and onestep extraction of going forward side by side goes out the number plate information in target.

The microphone array includes: shell and the sensor board being set in shell, low resolution camera, number Word signal acquisition module and interface module.

Image processing module is equipped in the control unit, which is to accelerate the embedding of calculating based on GPU Enter formula image processing module, for running by depth convolutional neural networks recognizer trained in advance, so that judgement obtains The type of target, type, coordinate, range and number plate in image frame.

It preferably further include execution unit in the vehicle whistle evidence-obtaining system, execution unit reception control unit pushes away The number plate of the whistle vehicle sent shows number plate of vehicle or illegal picture according to apolegamy situation.

The execution unit uses but is not limited to highlighted LED display or intelligent transportation large LED color screen.

Technical effect

Compared with prior art, the present invention can be effectively reduced system erroneous judgement rate, ring particularly with non-targeted motor vehicle Flute has very high discrimination if power assist vehicle, motorcycle are blown a whistle.It is extremely monitored simultaneously for due to sound reflecting of blowing a whistle outside monitoring range It is judged by accident caused by range, also there is very high exclusion effect.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of vehicle whistle evidence-obtaining system；

Fig. 2 is microphone array and high-definition image acquisition processing device scheme of installation；

Fig. 3 is overall installation schematic diagram；

Fig. 4 is control unit program flow diagram；

Fig. 5 is image processing module program flow diagram；

Fig. 6 is the image schematic diagram being superimposed after acoustic pressure cloud atlas；

Fig. 7 is the label schematic diagram after identification whistle target；

Fig. 8 is final output image schematic diagram；

In figure: 1 high-definition image acquisition processing device, 2 microphone arrays, 3 electricity alert bar, 4 electronic boxs；

Fig. 9 is embodiment effect diagram；

In figure: A~H is to illustrate screenshot in embodiment；

License number and digital content in attached drawing of the present invention is modified, with the practical licence plate vehicle without any relationship.

Specific embodiment

As shown in FIG. 1 to 3, the present embodiment include: the alert bar 3 of electricity, electronic box 4, the control unit with image processing module, Execution unit, microphone array 2 and high-definition image acquisition processing device 1, in which: microphone array 2 in real time believe by acquisition whistle sound Number and video flowing, the static images of the acquisition high-res in real time of high-definition image acquisition processing device 1, image processing module is from high definition Type and the region of all mobile targets are differentiated in the static images for the high-res that image acquisition and processing device 1 acquires, Its number plate is identified simultaneously when mobile target is motor vehicles；Control unit is from the whistle acoustical signal that microphone array 2 acquires The coordinate and range of whistle target are obtained, and the differentiation result of image processing module is combined to judge whether that vehicle whistle occurs, into And the number plate of corresponding identification is reported and is pushed to execution unit, execution unit shows number plate of vehicle according to apolegamy situation or disobeys Method picture.

The microphone array 2 and high-definition image acquisition processing device 1 can be, but not limited to the cross for being set to the alert bar 3 of electricity Towards monitoring area on bar,

The microphone array 2 includes: shell, the sensor board being set in shell, low resolution camera, number Signal acquisition module and lightning protection, power-supplying interface module.

The high-definition image acquisition processing device 1 is the police high-definition image acquisition processing device 1 of standard electric, provides high definition Video flowing output.

The image processing module is the embedded image processing module for accelerating to calculate based on GPU, is passed through for running Trained depth convolutional neural networks recognizer in advance, so that judgement obtains the type of target, type, seat in image frame Mark, range and number plate.

The control unit is fan-free industrial personal computer.

The execution unit is highlighted LED display or intelligent transportation large LED color screen.

The microphone array 2 of the present embodiment uses 32 channel number word micromechanics (MEMS) sound transducers, and carries out waterproof Entrant sound processing；The output PDM signal sequence of sensor is acquired and is demodulated via the digital signal acquiring board based on FPGA, is obtained The noise signal of 32 road synchronous acquisitions；The low-resolution video camera highest resolution in 2 center of microphone array is 720P, is adopted Collect real-time low-resolution video stream signal, after merging 32 road noise acoustical signals by FPGA signal processing collection plate, uniformly passes through ether Network switch is sent to control unit.

The high-definition image acquisition processing device 1 of the present embodiment is that electronic police often uses 700W pixel high-definition camera, It is configured as output to standard RTSP video stream signal.

The image processing module of the present embodiment is the embedded system based on GPU, the model NVIDIA of use Jetson-TX1 runs the trained target identification system based on depth convolutional neural networks in advance thereon.

The control unit of the present embodiment is the fan-free industrial personal computer using Intel i7 processor, uses Windows Embeded System operating system, automatic running acquisition, control program after system electrification.

The execution unit of the present embodiment uses the outdoor highlight LED display of 1.2m*1.2m, can show that three are disobeyed simultaneously Method number plate of vehicle.

Step 1: by control unit to the low resolution camera and high-definition image acquisition process on microphone array 2 The high-definition camera of device 1 carries out image registration.

The image registration refers to: control unit runs registration procedure, registration procedure automatic collection synchronization (time Difference is no more than 40ms) two field pictures from low resolution camera and high-definition camera, it is registrated, is obtained in two field pictures At least five characteristic matching control points calculate the coordinate projection relationship from low resolution camera to high-definition camera.

Step 2: as shown in figure 4, the acoustical signal that control unit acquires microphone array 2 carries out whistle identification in real time, when Determine whistle, then real-time acoustic source positioning is carried out using beam forming algorithm (Beamforming), generates acoustic pressure cloud atlas, obtain To whistle sound source coordinate, and it is superimposed to video flowing of the formation with acoustic pressure cloud atlas on low-resolution video stream signal, to image procossing Module sends trigger signal, as shown in fig. 6, specific steps include:

2.1 use the 32 real-time collected sound signals of channel microphone, and every 40ms judges that primary signal collected is It is no to meet whistle feature, step 2.2 is executed when meeting whistle feature.

The whistle feature, using but be not limited by whistle audio spectrum signature, frequency domain, peak power pair The methods of Frequency point is answered to be judged.

2.2 carry out auditory localization, generation sound to whistle vehicle using removing from the beam forming algorithm of spectrum based on spherical wave Press cloud atlas；The corresponding coordinate of acoustic pressure maximum value in acoustic pressure cloud atlas, as whistle sound source coordinate.

The beam forming algorithm specifically:Wherein: V (k, w) is The molding mean-square value of wave number, k are focus direction, and w is angular frequency, and M is number of sensors, C_nmAcoustic pressure is received for m microphone Signal receives the cross-spectrum of sound pressure signal, r relative to n microphone_mFor the coordinate vector of m microphone, r_nFor n microphone Coordinate vector.

2.3 intercept out low-resolution image from low-resolution video stream signal, and choose the sound that dynamic range is 0.2dB Cloud atlas carries out Alpha translucent graphic with the low-resolution image and is superimposed, and the video with acoustic pressure cloud atlas is obtained after video compress Stream.

2.4 control units send the trigger signal including triggered time and sound source coordinate of blowing a whistle to image processing module.

Step 3: as shown in figure 5, image processing module is completed from high-definition camera receiving in trigger signal 40ms Single frames picture is extracted in the RTSP high definition video steaming of output, and picture is identified by trained depth convolutional neural networks algorithm in advance The type of mobile target in face such as lorry, car, motorcycle, pedestrian, and provides covering of each mobile target in picture Range (ROI)；Its license board information is identified when mobile target is motor vehicles, and processing result is exported to control unit.

The type that the mobile target in picture is identified by trained depth convolutional neural networks algorithm in advance Specifically includes the following steps:

S1: image normalization processing is carried out to the single frames picture of extraction.

S2: the image after normalized is imported into trained SSD (Single Shot MultiBox Detector) It is detected in target detection neural network, generates the characteristic pattern of multiple and different scales.

The SSD target detection neural network is using VGG-16 as basic neural network.

The training method of the SSD target detection neural network are as follows: calculated using the SSD target identification based on deep learning Method, image is discrete for one group of default frame of the different size and the ratio of width to height that generate on different characteristic figure point, to each default frame Matching with regard to variety classes object is given a mark, and calculates the adjustment to frame preferably to match the shape of object.

In the training process, use Pascal VOC2007 and Pascal VOC2012 data set as training sample This.It is overlapped the default frame that each standard drawing is matched to an aforementioned generation in training with optimal jaccard first, then will be all silent Recognize frame to match with standard drawing, finds the default frame that all jaccard overlappings are greater than 0.5.

The objective function of the training is made of positioning loss function and confidence level loss function, it may be assumed that

Wherein: N is the number of matched default frame, N 0 When loss function value be that 0, α by cross validation is set as 1；Whether x is to match, that is, is usedI-th of default of characterization Whether frame matches with j-th of standard picture frame in p classification, and c is confidence level, and l is prediction block, and g is standard drawing, L_loc(x, l, g) is fixed Bit-loss function, L_conf(x, c) is confidence level loss function.

The positioning loss function refers to: the smooth loss L between prediction block l and standard drawing g₁:

Wherein:

Wherein: (cx, cy) is the center for defaulting frame, and w is the width for defaulting frame, and h is the height for defaulting frame, and d is default frame.Function It returns as to the center for defaulting frame and its wide and high compensation.

The confidence level loss function L_conf(x, c) is the softmax loss of Each class confidence c:

Wherein:

The beginning neural network of the detection uses the standard architecture of image classification, and multiple scales are added thereafter and pass The convolution characteristic layer subtracted, to generate the characteristic pattern of multiple and different scales.

S3: the characteristic pattern unit in each feature figure layer obtained to S2 generates one group according to different sizes and the ratio of width to height Default frame, different the ratio of width to height is imposed to different default frames, so that each point in each characteristic pattern generates six default frames.

The scaling s of the default frame_kCalculation formula are as follows:Its In: m is characterized map number, s_minFor bottom characteristic pattern ratio, s_maxFrame ratio is defaulted for top characteristic pattern.

The ratio of width to height of the default frame isIt is each default frame width beHighly ForFor the ratio of width to height a_rFor 1 default frame, adding a ratio isDefault frame.

The center of the default frame are as follows:Wherein: | f_k| be k-th of characteristic pattern size, i, j ∈ [0, f_k)。

In the application, the setting of default frame can be adjusted to adapt to from concrete condition.

S4: it to each characteristic layer using the default frame of one group of convolution filter traversal different size and the ratio of width to height, generates To the positioning compensation and the wherein confidence level of appearance of all categories on default four sides of frame, confidence level is higher than to the default of certain threshold value Frame is as candidate frame.

S5: non-maxima suppression is carried out to the candidate frame of all possible inclusion bodies and is obtained most to remove extra frame It may include the position of various types of target, the i.e. testing result of final goal, and export the classification of each mobile target in image And ROI information.

Described identifies its license board information when mobile target is motor vehicles, specifically includes the following steps:

1) image processing module obtains candidate license plate region using image processing techniques to the original single frames picture of extraction.

The acquisition refers to be combined with each other and positioning licence plate position using color positioning, Sobel positioning, text location.

The process of the color positioning are as follows: the color space of image is switched into HSV from RGB, successively traverses the institute of image There is pixel, white pixel is labeled as to the pixel of matching corresponding color, is otherwise black picture element；Then closed operation is used, is found out All profiles in figure simultaneously take minimum circumscribed rectangle, and removal tilt angle is greater than the rectangle of threshold value, revolve to remaining rectangle Turn and uniform sizes obtain candidate license plate position.

The process of the Sobel positioning are as follows: Gaussian Blur, gray processing are carried out to image and carry out Sobel operation removal The noise of interference carries out binary conversion treatment, then such as aforementioned to use closed operation, profile is gone to intercept candidate license plate region.

The process of the text location are as follows: extract region using MSER (maximum stable extremal region) method, carry out ruler Very little judgement filters out and does not obviously meet license plate character size, is then judged using text classifier and for vehicle license plate characteristic And obtain candidate license plate region.

2) whether confirmation candidate license plate region is license plate, obtains license plate picture.

The license plate confirms in the following manner: candidate license plate region is imported trained support vector machines in advance (SVM) model judges whether it is genuine license plate, when be provide license plate ROI and enter in next step；Otherwise give up.

3) Character segmentation is carried out to license plate picture, obtains character picture.

The Character segmentation refers to: carrying out gray processing, binaryzation, closed operation, contouring to license plate picture and looks for minimum Boundary rectangle obtains the boundary rectangle of all characters；Wherein, the phenomenon of rupture that will cause for Chinese contouring, with opposite Position judges first English character of license plate, and is done the anti-boundary rectangle for releasing Chinese character of offset；By all characters Boundary rectangle intercepts out and is normalized to unified format.

4) character picture importing ARTOICAL NEURAL NETWORK MODEL is identified, obtains characters on license plate string.

The identification refers to: normalized character picture is successively imported preparatory trained artificial neural network's word It accords in identification model, predicts specific character represented by each segment, characters on license plate string is exported after sequence.

Step 4: control unit is positioned according to the coordinate projection relationship in step 1 and the real-time acoustic source in step 2, meter The coverage area for projecting to whistle target on high definition picture is calculated, by covering for the coverage area and all targets in step 3 Lid range is compared, and finds out a most target of lap and calculates Duplication, when Duplication is more than preset threshold value, Then it is considered that illegal whistle behavior occurs for the whistle target, as shown in Figure 7.

Step 5: control unit generates whistle behavior evidence picture according to the result of step 4, and by corresponding illegal vehicle License board information is sent to execution unit and traffic police's processing platform.

The whistle behavior evidence picture includes the following contents: being superimposed with the static images and dynamic vision of acoustic pressure cloud atlas Frequently, illegal vehicle feature and license plate feature.

Step 6: execution unit shows the number plate or picture of illegal vehicle, as shown in figure 8, for police according to default behavior Show.

The present embodiment is a kind of acousto-optic integrated noise capture device of acoustic array superimposed image acquisition process unit, is melted Close analysis, the positioning of the positioning result and depth convolutional neural networks of 2 pairs of whistle sound of microphone array to targets all kinds of in image As a result, by the practical whistle vehicle of computer AI intelligent decision, and the illegal procedure video for providing its license plate, being superimposed acoustic pressure cloud atlas, Chain of evidence is formed, the accuracy rate of existing system can be further increased.

Application 1 (non-motor vehicle whistle situation): as shown in Figure 9 A, before the present invention, sound bearing direction black Motorcycle, but because of motorcycle unlicensed, the nearest license plate of Distance positioning coordinate is then found, causes to be mistaken for the ring of rear taxi Flute.

As shown in Fig. 9 B1~Fig. 9 B3, after the present invention, identifies that whistle position is a motorcycle, avoid near The erroneous judgement of vehicle.

Application 2 (non-motor vehicle whistle situation): as shown in Figure 9 C, before the present invention, sound bearing direction white Motorcycle, but because of motorcycle unlicensed, the nearest license plate of Distance positioning coordinate is then found, causes to be mistaken for side taxi ring Flute.

As shown in Fig. 9 D1~Fig. 9 D3, after the present invention, identifies that whistle position is a motorcycle, avoid near The erroneous judgement of vehicle.

Application 3 (whistle position overlapping cases): as shown in fig. 9e, before the present invention, rear is biased in sound positioning Taxi, but since its license plate is kept off, the license plate identification procedure identification nearest license plate of Distance positioning point, causes to be mistaken for front Minibus.

As shown in Fig. 9 F1~Fig. 9 F3, after the present invention, identify that whistle position is located at two vehicle intersections, after being partial to Square taxi, to avoid judging by accident.

Application 4 (the whistle position estimate of situation under more positioning objects): as shown in fig. 9g, before the present invention, sound Sound is positioned as picture edge vehicle, but since it does not drive into picture, without license plate, license plate identification procedure identification Distance positioning The nearest license plate of point, causes to be mistaken for grey car.

As shown in Fig. 9 H1~Fig. 9 H3, after the present invention, identifies and blow a whistle position not in grey car region, from And it avoids judging by accident.

Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims

The monitoring method 1. a kind of vehicle based on depth convolutional neural networks target identification is blown a whistle, which is characterized in that by real-time Whistle identification monitoring obtains whistle occur the moment acoustic pressure cloud atlas, and the timing synchronization acquisition low-res video flowing and The still image of high-res；Then respectively by acoustic pressure cloud atlas and video flowing superposition processing, using depth convolutional neural networks pair Still image is identified and obtains the area-of-interest of wherein all objects；Pass through low-res video-high-res figure again Single frames registration process as between obtains image coordinate projection relation, obtains projecting to the acoustic pressure coverage area on still image； Coverage area is pressed finally by area-of-interest harmony is compared, object of blowing a whistle is determined as with the highest object of Duplication.
2. according to the method described in claim 1, it is characterized in that, the single frames registration process refers to: control unit operation matches Quasi-ordering, two field pictures of the registration procedure automatic collection synchronization from low resolution camera and high-definition camera carry out Registration, obtains at least five characteristic matching control point in two field pictures, calculates from low resolution camera to high-definition camera Coordinate projection relationship.
3. according to the method described in claim 1, it is characterized in that, the acoustic pressure cloud atlas passes through what is acquired to microphone array Acoustical signal carries out whistle identification in real time, when determining whistle, then carries out real-time acoustic source positioning using beam forming algorithm, generates Acoustic pressure cloud atlas comprising sound source coordinate of blowing a whistle.
4. according to the method described in claim 3, it is characterized in that, the beam forming algorithm specifically:Wherein: V (k, w) is the molding mean-square value of wave number, and k is focus direction, and w is Angular frequency, M are number of sensors, C_nmSound pressure signal, which is received, for m microphone receives the mutual of sound pressure signal relative to n microphone Spectrum, r_mFor the coordinate vector of m microphone, r_nFor the coordinate vector of n microphone.
5. according to the method described in claim 1, it is characterized in that, the area-of-interest, pass through in advance trained depth Convolutional neural networks algorithm identifies the type of the mobile target in picture, and provides region of interest of each mobile target in picture Domain.
6. according to the method described in claim 5, it is characterized in that, when target be motor vehicles when further identify its license plate believe Breath.
7. method according to claim 1 or 5, characterized in that the depth convolutional neural networks, using SSD target Neural network is detected, training uses the SSD Target Recognition Algorithms based on deep learning, and image is discrete for different characteristic figure point As soon as the different size of upper generation and group default frame of the ratio of width to height play the matching of each default frame variety classes object Point, and the adjustment to frame is calculated preferably to match the shape of object.
8. according to the method described in claim 5, it is characterized in that, the training, objective function by positioning loss function and Confidence level loss function composition, it may be assumed thatWherein: N is matched silent The number for recognizing frame, loss function value is that 0, α by cross validation is set as 1 when N is 0；Whether x is to match, that is, is usedWhether i-th of default frame of characterization matches with j-th of standard picture frame in p classification, and c is confidence level, and l is prediction block, and g is Standard drawing, L_loc(x, l, g) is positioning loss function, L_conf(x, c) is confidence level loss function.
9. according to the method described in claim 8, it is characterized in that, the positioning loss function refers to: prediction block l and standard drawing Smooth loss L between g₁:Wherein:

Wherein: (cx, cy) is the center for defaulting frame, and w is the width for defaulting frame, and h is the height for defaulting frame, and d is default frame.Function It returns as to the center for defaulting frame and its wide and high compensation；The confidence level loss function L_conf(x, c) is of all categories set The softmax of reliability c loses:Wherein
A kind of evidence-obtaining system 10. vehicle based on depth convolutional neural networks target identification is blown a whistle characterized by comprising control Unit, microphone array and high-definition image acquisition processing device processed, in which: microphone array and high-definition image acquisition processing device It acquires whistle acoustical signal, video flowing and still image in real time respectively and exports to control unit, control unit is believed according to whistle sound It number obtains acoustic pressure cloud atlas and is overlapped with video flowing using beam forming algorithm, obtain the video with acoustic pressure coverage area It flows, and extracts the area-of-interest of wherein all targets according to still image；Finally according to acoustic pressure coverage area and interested Region carries out Duplication calculating, and onestep extraction of going forward side by side goes out the number plate information in target.
11. system according to claim 10, characterized in that the microphone array includes: shell and is set to Sensor board, low resolution camera, digital signal acquiring module and interface module in shell.
12. system according to claim 10, characterized in that image processing module is equipped in the control unit, it should Image processing module is the embedded image processing module for accelerating to calculate based on GPU, for running by depth trained in advance Convolutional neural networks recognizer, so that judgement obtains the type of target, type, coordinate, range and number plate in image frame.
13. system according to claim 10, characterized in that the vehicle whistle evidence-obtaining system further comprises holding Row unit, the number plate of the whistle vehicle of execution unit reception control unit push, shows number plate of vehicle according to apolegamy situation or disobeys Method picture.