CN109063694A

CN109063694A - A kind of video object detection recognition method

Info

Publication number: CN109063694A
Application number: CN201811063637.XA
Authority: CN
Inventors: 张德政; 陈天傲; 栗辉; 陈鹏; 杨容季; 施祖贤
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2018-12-21
Anticipated expiration: 2038-09-12
Also published as: CN109063694B

Abstract

The present invention provides a kind of video object detection recognition method, efficiently, quickly can carry out target detection to video.The described method includes: judging whether the difference degree between reference frame and frame to be detected is less than or equal to preset discrepancy threshold；If being less than or equal to, assigns reference frame testing result to frame to be detected and export；Otherwise, then it treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced with to new reference frame.The present invention relates to field of image processings.

Description

A kind of video object detection recognition method

Technical field

The present invention relates to field of image processings, particularly relate to a kind of video object detection recognition method.

Background technique

In recent years, with the fast development of internet, big data era is arrived, demand and day of the every profession and trade to valid data It is all to increase.At the same time, more and more videos are uploaded to network, become emerging data treasure-house, if can obtain from video Valid data are taken to be analyzed, it will to bring huge income.Therefore it is very powerful and exceedingly arrogant to become computer field for video object detection Study a question, but in research over the years, the research of target detection is laid particular emphasis in static image detection, to video detection Research it is relatively fewer.Video is first split into static image frame by convention video detection method, is then detected frame by frame.Due to video With very strong temporal locality, before and after frames similarity is very high, and the process detected frame by frame after being split just produces largely Idle work increases computing cost, reduces detection speed.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of video object detection recognition methods, to solve prior art institute Existing convention video detection method need to carry out the detection of a large amount of redundancy image frame, lead to that overhead is big, detection efficiency is low The problem of.

In order to solve the above technical problems, the embodiment of the present invention provides a kind of video object detection recognition method, comprising:

S1, judges whether the difference degree between reference frame and frame to be detected is less than or equal to preset discrepancy threshold；

S2 assigns reference frame testing result to frame to be detected and exports if being less than or equal to；

Otherwise S3 then treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced For new reference frame.

Further, judging whether the difference degree between reference frame and frame to be detected is less than or equal to preset difference threshold Before value, the method also includes:

Obtain video flowing to be detected；

The video flowing to be detected that will acquire splits into independent image frame；

Judge whether current image frame is first frame；

If first frame, then image enhancement is executed to current image frame and target detection operates, it, will be current after the completion of detection Image frame is set as reference frame；

If not first frame, then judge whether current image frame is frame to be detected without frame-skip, if so, executing S1。

Further, described if not first frame, then judge whether current image frame is without the to be detected of frame-skip Frame, if so, execution S1 includes:

Note start frame is the 0th frame, judges framing bit f locating for current image frame_nowWhether formula is met: f_now%f_skip=0, In, f_skipTo detect frame per second, for indicating every f_skipFrame carries out a diversity judgement；

If satisfied, then current image frame is the frame to be detected without frame-skip, then S1 is executed；

If not satisfied, then assigning reference frame testing result to present frame image, the testing result as present frame image is defeated Out.

Further, the judge index of reference frame and frame difference degree to be detected include: root-mean-square error, it is gray scale difference, straight Square figure difference, Y-PSNR or structural similarity.

Further, the calculation formula of root-mean-square error MSE are as follows:Wherein, X_ref,iFor reference frame The pixel value of pixel i, X_det,iFor the pixel value of frame pixel i to be detected, n is image pixel number；

The calculation formula of gray scale difference are as follows:Wherein, G_ref,iFor the gray value of reference frame pixel i, G_det,i For the gray value of frame pixel i to be detected；

The calculation formula of Y-PSNR are as follows:Wherein, MAX_IFor the maximum value of color of image.

Further, diversity judgement mode includes: full figure judgement or grid judgement.

Further, image enhancement mode includes: brightness regulation, contrast adjustment, sharpening, defogging, Auto Laves, histogram It is one or more in figure equalization.

Further, the detection frame for the treatment of carries out image enhancement and target detection, and after the completion of detection, frame to be detected is replaced Being changed to new reference frame includes:

It treats detection frame and carries out image enhancement；

According to the frame to be detected of enhancing, using Yolov3 algorithm to enhanced frame to be detected carry out refinement target detection with Identification.

It treats detection frame and carries out image enhancement；

According to the frame to be detected of enhancing, using Yolov3 algorithm to enhanced frame to be detected carry out refinement target detection with Identification；

It is propagated using exercise guidance, is corrected by testing result of the temporal information correction technique to Yolov3 algorithm, After the completion of correction, frame to be detected is replaced with to new reference frame.

Further, the method also includes:

Confrontation network is generated using depth convolution based on existing image and generates new data, expands the training of Yolov3 algorithm Collection carries out second training to Yolov3 model.

The advantageous effects of the above technical solutions of the present invention are as follows:

In above scheme, judge whether the difference degree between reference frame and frame to be detected is less than or equal to preset difference threshold Value；If being less than or equal to, assigns reference frame testing result to frame to be detected and export；Otherwise, then it treats detection frame and carries out image increasing Frame to be detected, after the completion of detection, is replaced with new reference frame, the standard as subsequent Difference test by strong and target detection.This Sample can be reduced by carrying out image enhancement and target detection to the big frame to be detected of difference degree to redundant frame in video Detection reduces overhead to accelerate video detection speed.

Detailed description of the invention

Fig. 1 is the flow diagram of video object detection recognition method provided in an embodiment of the present invention；

Fig. 2 is the structural schematic diagram that video object provided in an embodiment of the present invention detects identifying system；

Fig. 3 is the detailed process schematic diagram of video object detection recognition method provided in an embodiment of the present invention；

Fig. 4 is frame schematic diagram to be detected provided in an embodiment of the present invention；

Fig. 5 is enhanced frame schematic diagram to be detected provided in an embodiment of the present invention；

Fig. 6 is the reference frame schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement；

Fig. 7 is frame to be detected 1 schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement；

Fig. 8 is frame to be detected 2 schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement.

Specific embodiment

To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.

The present invention need to carry out a large amount of redundancy image frame detection for existing convention video detection method, and system is caused to be opened The problem that pin is big, detection efficiency is low, provides a kind of video object detection recognition method.

Embodiment one

As shown in Figure 1, video object detection recognition method provided in an embodiment of the present invention, comprising:

Video object detection recognition method described in the embodiment of the present invention, judges the difference between reference frame and frame to be detected Whether degree is less than or equal to preset discrepancy threshold；If being less than or equal to, assigns reference frame testing result to frame to be detected and export； Otherwise, then it treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced with to new reference Frame, the standard as subsequent Difference test.In this way, by carrying out image enhancement and target inspection to the big frame to be detected of difference degree It surveys, the detection to redundant frame in video can be reduced, to accelerate video detection speed, reduce overhead.

In the specific embodiment of aforementioned video target detection recognition methods, further, judge reference frame and to Whether the difference degree between detection frame is less than or equal to before preset discrepancy threshold, the method also includes:

Obtain video flowing to be detected；

Judge whether current image frame is first frame；

In the present embodiment, if the difference degree between reference frame and frame to be detected is diff, discrepancy threshold T, first will Then two width images (reference frame and frame to be detected) matrixing calculates the difference degree diff between reference frame and frame to be detected, Judge whether diff is less than or equal to preset discrepancy threshold T；If so, thinking that the two difference is little, frame to be detected can be considered superfluous Remaining frame assigns reference frame testing result to frame to be detected and exports；Otherwise, then it is assumed that two width images need pair there are notable difference Frame to be detected carries out image enhancement processing, improves the identification of target in image, then, carries out mesh with regard to enhanced frame to be detected Frame to be detected, after the completion of detection, is replaced with new reference frame, the standard as subsequent Difference test by mark detection.

It is further, described if not first frame in the specific embodiment of aforementioned video target detection recognition methods, Then judge whether current image frame is frame to be detected without frame-skip, if so, executing S1 and including:

In the present embodiment, even if reducing the detection of redundant frame with diversity judgement, also still need to reference frame and frame to be detected Carry out difference calculating.For high frame-rate video or the slow video of variation, the difference very little of consecutive frame, at this time difference calculate also at For overhead, the present embodiment jumps over method with frame to reduce this Section Overhead.Specifically: setting time parameter t_skip, it is It unites every t_skipTime span executes a Difference test, in t_skipTime interval in not to consecutive frame execute Difference test, lead to The detection for jumping over redundant frame is crossed, video detection is accelerated.t_skipSelection it is often related with the frame per second of current detection video, so in reality It applies in example using setting detection frame per second f_skipMode, realize frame jump over, i.e. the every f of system_skipFrame executes a diversity judgement, In practical application, parameter f_skipAccording to video frame rate and video situation of change flexible setting, guarantee detection effect meet demand.

In the present embodiment, frame per second f is detected by setting_skip, the frequency for carrying out Difference test is controlled, is sentenced by the way that reduction is useless It is disconnected, the detection of redundant frame is jumped over, accelerates video object detection, realizes the tradeoff of target detection precision and efficiency in video.

In the present embodiment, a variety of judge index can be calculated as the difference degree measured between reference frame and frame to be detected Quantitative criteria.

In the specific embodiment of aforementioned video target detection recognition methods, further, reference frame and frame to be detected The judge index of difference degree includes: root-mean-square error, gray scale difference, histogram difference, Y-PSNR or structural similarity.

The calculation formula of root-mean-square error MSE are as follows:Wherein, X_ref,iFor the picture of reference frame pixel i Element value, X_det,iFor the pixel value of frame pixel i to be detected, n is image pixel number；

Histogram difference: calculating separately the histogram of two images, judges after being normalized according to certain distance metric Similarity；

The calculation formula of Y-PSNR (PSNR) are as follows:Wherein, MAX_IFor color of image Maximum value；

Structural similarity (SSIM): two width picture similarities are measured in terms of brightness, contrast, structure three.Practical application When, N block is divided the image into using sliding window, using each window mean value of Gauss weighted calculation, variance and covariance, is then counted The structural similarity SSIM of corresponding blocks is calculated, is finally measured average value as the structural similarity of two images；Wherein, SSIM value Range [0,1], value is smaller, and two images difference is bigger.

In the present embodiment, depending on certain specifically used index measures difference degree according to the actual situation.It is found through experiments that:

SSIM is suitble to measure the style transformation of picture in its entirety, is poorly suitable for part difference existing for before and after frames；

Histogram difference reacts the distributional difference of two width picture gray values, without spatial positional information, and information lose compared with Greatly；

PSNR can be considered the variation of MSE；

The information that gray scale difference compares MSE consideration is not comprehensive enough, and colour information is lost larger.

Comprehensively consider, in the present embodiment, using MSE as default and preferred difference measurement index, while providing other fingers It is denoted as to change as the case may be in practical applications for judgment criteria.

In the specific embodiment of aforementioned video target detection recognition methods, further, diversity judgement mode includes: Full figure judgement or grid judgement.

In the present embodiment, to weigh influence of the diversity judgement to video object detection recognition method in precision and efficiency, Judge that both modes calculate the difference degree between reference frame and frame to be detected using full figure judgement or grid:

Full figure judgement: two frame images directly calculate difference index in full figure range, in this, as difference degree；

Grid judgement: two frame images are first divided into the grid of p × p, calculate difference index on each grid, by p²A calculating As a result it arranges in descending order, m value calculates average value before taking, as difference degree.

In the present embodiment, m, p are adjustable parameter, in practical applications, can be carried out according to specific requirements and detection effect Customization is adjusted.

In the present embodiment, by taking root-mean-square error as an example, to full figure judgement and grid judge both diversity judgement modes into Row is described in detail:

Full figure judgement: two frame images directly calculate root-mean-square error in full figure range, in this, as difference degree；

Grid judgement: two frame images are first divided into the grid of p × p, calculate root-mean-square error on each grid, by p²A meter It calculates result to arrange in descending order, m value calculates average value before taking, as difference degree.

In the specific embodiment of aforementioned video target detection recognition methods, further, image enhancement mode includes: It is brightness regulation, contrast adjustment, sharpening, defogging, Auto Laves, one or more in histogram equalization.

In the present embodiment, after diversity judgement, it can obtain that variation is big, it is necessary to the frames to be detected of detection.For improve to The identification of target in detection frame, and then improve detection accuracy, in the present embodiment, first to reference frame there are notable difference to Detection frame carries out image enhancement processing, wherein image enhancement mode includes:

Contrast adjustment and brightness regulation: improve the contrast and brightness of frame to be detected；

Sharpen: the edge, profile of target and the part of Gray Level Jump in enhancing image are apparent from image, or prominent The feature of certain linear goal elements；

Defogging: desalinate frame to be detected because of influence caused by thick fog, image is made to become clear；

Auto Laves: it using pixel value minimum and maximum in each channel as white and black, redistributes in proportion therebetween Pixel value keeps picture color more abundant；

Histogram equalization: conversion image gray-scale level makes image histogram become constant value as far as possible, utilizes dynamic with equilibrium All grayscale in range.

In the present embodiment, contrast adjustment, brightness regulation, sharpening, defogging, Auto Laves, histogram equalization can be used It etc. a variety of methods, treats detection frame and carries out image enhancement, selected enhancement method is that configurable item in practical applications can To be configured according to the effect of concrete scene and target detection, guarantee video detection effect meet demand.Image enhancement Purpose is to improve target identification degree, strengthens useful feature in image, is convenient for target detection, and reinforcing effect will not be shown in view On frequency, a module being intended only as in image processing section.

In the specific embodiment of aforementioned video target detection recognition methods, further, it is described treat detection frame into Row image enhancement and target detection, after the completion of detection, frame to be detected, which is replaced with new reference frame, includes:

It treats detection frame and carries out image enhancement；

According to the frame to be detected of enhancing, using Yolov3 (You Only Look Once) algorithm to enhanced to be detected Frame carries out refinement object detection and recognition.

In the present embodiment, Yolov3 is a kind of algorithm of target detection, algorithm Chinese still ununified at present.

In the present embodiment, after carrying out image enhancement to difference frame significantly to be detected, just target detection is refined to it.Inspection Method of determining and calculating selects efficient detection model Yolov3, and Yolov3 is different from conventional target detection algorithm, and Yolov3 is by the positioning of target It is used as recurrence task with identification classification, is fully completed in a stage by a full convolutional neural networks, this possesses it Cracking detection speed.Yolov3 also has the advantages such as high-resolution classifier, fine granularity feature, multiple dimensioned training and prediction, And the expense for carrying out target detection is smaller, meets the demand of this system.Yolov3 to the enhanced frame to be detected of input into Row target detection then returns to the rectangle frame coordinate and target category for surrounding target, after obtaining testing result, can be used for redundancy The result of frame is demarcated.

In the present embodiment, refinement object detection and recognition is carried out to enhanced frame to be detected using Yolov3, it can Detection overhead is further decreased, to reduce the configuration needs to hardware device.But it, can according to practical application scene and hardware environment It will test algorithm and be changed to other algorithms and meet detection demand with more preferable, further mentioned for example, can both be changed to two stages detection algorithm High measurement accuracy can also be changed to subsequent new proposition, more excellent detection algorithm, improve systematic entirety energy.And detection algorithm The expense and difficulty of replacement are simultaneously little, system it is lasting using and it is versatile.

It treats detection frame and carries out image enhancement；

Refinement object detection and recognition is carried out to enhanced frame to be detected using Yolov3 algorithm；

The characteristics of Yolov3 algorithm is that the expense of target detection is small, detection speed is fast, detection accuracy is high, in the present embodiment, In order to further increase the ability of video object detection identification, auxiliary corrects testing result using temporal information correction technique.Tool Body refers to, when being detected frame by frame to image frame, since the testing result of algorithm is unstable or video quality presence is asked Topic causes the detection on consecutive frame for same target to will appear detection leakage phenomenon.Each target is static mostly on adjacent image frame Or low speed is mobile, target should appear on the analogous location of previous frame image on the later frame image of institute, therefore this detection leakage phenomenon It can be restored by the testing result from consecutive frame.

It in the present embodiment, is propagated using exercise guidance, by temporal information correction technique to the detection knot of Yolov3 algorithm Fruit is corrected, and after the completion of correction, frame to be detected is replaced with to new reference frame, specific:

(Motion-guided Propagation, MGP) is propagated by the testing result of previous frame image using exercise guidance, As the tutorial message on time dimension, a part on frame to be detected (current detection frame) as testing result is propagated to, it is right The target of frame missing inspection to be detected compensates label, then utilizes class non-maxima suppression (Non Maximum Suppression, NMS) algorithm removal repeating label target propagation frame, to reduce omission factor.

In the present embodiment, the thought of MGP is derived from the emerging pipeline type deep learning frame based on convolutional neural networks (Tubelets with Convolutional Neural Networks, T-CNN), it is used exclusively for video object detection Frame.MGP thinks,, also can be in certain frames even the detection to the adjacent image in front and back in still image object detection Object Loss is generated when detection, can will test result part using the motion information of such as light stream at this time travels to phase For adjacent frame to reduce missing inspection, realization is the target following technology based on Optic flow information.The utilization of optical flow field is divided into instantly with ten Ripe, theory is not described here any more.

In the present embodiment, after using efficient detection algorithm Yolov3, it is used cooperatively temporal information correction technique, Neng Goujin One step improves target detection precision.

In the specific embodiment of aforementioned video target detection recognition methods, further, the method also includes:

In the present embodiment, Yolov3 algorithm is a kind of algorithm of general target detection, but Yolov3 model uses pre-training The effect that weight is detected is more general, so in the present embodiment, carrying out secondary instruction on public data collection to it first Practice, lift scheme detectability.It, can be (that is: existing in private data collection relevant to scene by Yolov3 model according to actual scene Have image) on be customized training, and often valid data are smaller for existing data set, are at this time to expand training set, increase number According to diversity, depth convolution generation confrontation network (DCGAN) can be used and generate new picture on the basis of private data collection.

In the present embodiment, DCGAN includes to generate network and differentiate network two parts, and the two is the relationship confronted with each other, The weight of two networks can be adjusted constantly in training, then can be by limited data extending at a large amount of qualified numbers using it According to.Since depth network is very high to amount of training data demand and requirement, in the case where high-quality training data is less, this implementation In example, according to concrete scene, DCGAN can be used and generate a large amount of training data, trained convenient for being customized to Yolov3, improve Yolov3 model inspection ability.That is, on the training data, the present embodiment generates confrontation network existing using depth convolution Have and generate new picture (new data) on the basis of image, to expand, abundant training set, increases data diversity, improve target inspection Survey precision.

To sum up, video object detection recognition method described in the embodiment of the present invention, first to reference frame and frame to be detected into Row diversity judgement assigns reference frame testing result to frame to be detected and exports if the two difference is small；If the two difference compared with Greatly, then it treats detection frame and carries out image enhancement processing, then carry out target detection, identification with efficient detection model Yolov3, And it is aided with temporal information correction technique correction testing result.In addition, also generating new training data, auxiliary instruction using DCGAN Practice Yolov3 model, improve target detection precision, in this way, can not only substantially accelerate video detection speed and take into account detection accuracy, The requirement to hardware device can also be reduced.

Embodiment two

The present invention also provides a kind of specific embodiments of video object detection identifying system, due to view provided by the invention Frequency target detection identifying system is corresponding with the specific embodiment of aforementioned video target detection recognition methods, video object inspection Surveying identifying system can be achieved the object of the present invention by executing the process step in above method specific embodiment, therefore Explanation in above-mentioned video object detection recognition method specific embodiment, is also applied for video object provided by the invention The specific embodiment for detecting identifying system will not be described in great detail in present invention specific embodiment below.

As shown in Fig. 2, the embodiment of the present invention also provides a kind of video object detection identifying system, and the system comprises: figure As preprocessing module, detection module；Wherein, described image preprocessing module includes: diversity judgement unit and image enhancing unit, The detection module includes: object detection unit；

Diversity judgement unit, it is preset whether the difference degree for judging between reference frame and frame to be detected is less than or equal to Discrepancy threshold assigns reference frame testing result to frame to be detected and exports if being less than or equal to；If more than preset discrepancy threshold, Detection frame then is treated using image enhancing unit and carries out image enhancement, and using object detection unit to enhanced frame to be detected It carries out target detection and frame to be detected is replaced with to new reference frame after the completion of detecting.

In the present embodiment, the object detection unit refines enhanced frame to be detected using Yolov3 algorithm Object detection and recognition.

In the present embodiment, the detection module further include: MGP result correcting unit；

The MGP result correcting unit, for being propagated using exercise guidance, by temporal information correction technique to Yolov3 The testing result of algorithm is corrected, and after the completion of correcting, frame to be detected is replaced with to new reference frame.

In the present embodiment, the detection module further include: DCGAN data generating unit；

The DCGAN data generating unit is new for being generated based on existing image using depth convolution generation confrontation network Data expand, the training set of abundant Yolov3 algorithm, carry out second training to Yolov3 model.

In the present embodiment, the Difference test unit of the incoming image pre-processing module of video flowing to be detected, progress frame-skip, The work of difference degree is calculated with reference frame, Difference test Elementary Function is the judge index such as framing bit judgement, grid dividing, MSE It calculates and discrepancy threshold judges；It is greater than the frame to be detected of preset discrepancy threshold T for diversity factor, is passed to image preprocessing Image enhancing unit in module strengthens the useful feature of the frame to be detected, improves target identification degree, the image enhancing unit function Can realize the image enhancement operations such as defogging, sharpening, Auto Laves, histogram equalization；Then to enhanced frame to be detected, The object detection unit it being further transferred in detection module, carries out refinement target detection, and the object detection unit is One Yolov3 model by customization training；After detection, frame to be detected and testing result are passed to detection module together In MGP result correcting unit be aided with after temporal information corrected, obtain on the basis of Yolov3 algorithm testing result To more accurate detection, positioning result.MGP result correcting unit function is to realize the target following based on Optic flow information, is obtained Target compares the direction of motion of previous frame in frame to be detected, and the testing result of previous frame image frame is transferred to according to target direction of motion On frame to be detected, the compensation detection of target is carried out.It also needs to carry out class non-maxima suppression to whole testing result frames simultaneously (Non Maximum Suppression, NMS) processing, eliminates same target and is repeated the phenomenon that outlining, final to reduce target inspection The omission factor of survey improves detection accuracy.In detection module, the present invention also provides independent DCGAN data generating unit, DCGAN Data generating unit core is one for generating the DCGAN model of image data, plays the role of abundant data collection.To make up When high-quality training data deficiency, the bad phenomenon of Yolov3 simulated target detection effect mainly undertakes the work of data enhancing.

It in the present embodiment, can be found by Fig. 2, system proposed by the present invention is cascaded with modular form, and module again may be used It is further broken into unit, each unit carries different functions, but the coupling between unit is shallower.This means that unit The replaceability of content is stronger.Which kind of technology certain system unit specifically is realized with, is not laid down hard and fast rule.This is allowed for Computer industry development is swift and violent, and each field new technology is maked rapid progress, as more preferable, more top or more suitable technology is sent out It is existing, the currently used technology of its replacement system can be used.That is the work thought of system is constant, and each unit realizes that technology constantly improve.This Ensure that lasting usability and the versatility of system are preferable.

To sum up, the detection of video object described in embodiment of the present invention identifying system is adopted first in image pre-processing module With diversity judgement mechanism, is calculated using frame-skip and difference degree, under the premise of guaranteeing detection effect meet demand, substantially contracted Subtract the image frame number of actual needs target detection, to accelerate video detection speed；It then treats detection frame and carries out image enhancement, Strengthen useful feature in image, target identification degree is improved, to improve video detection precision；In detection module, calculated using high-performance Method Yolov3 carries out refinement target detection to image, small with computing cost, and detection speed is fast, and detection accuracy is preferably excellent Gesture；The testing result of previous frame image is corrected current detection knot by the skill propagated simultaneously using exercise guidance Fruit further increases system detection precision；Final system addition enhances technology based on the data that depth convolution generates confrontation network, Convenient for carrying out the expansion of diversification to small training set, for the second training of model, model inspection ability is improved.Whole system is adopted It is multiple module-cascades by each function package, intermodule coupling is not strong, can suitably be replaced according to demand with modularization mechanism It changes, so that system has better expressive ability.System efficiently can not only carry out target detection to video, and to hardware environment Demand be greatly reduced.

Embodiment three

Video object detection recognition method and system described in embodiment for a better understanding of the present invention, in conjunction with Fig. 3 pairs The video object detection recognition method and system are described in detail, and workflow may include:

A11 need to preset system parameter, including detection frame per second f before video detection_skip(for setting every f_skip Frame carries out a Difference test, is set according to the change frequency of specific video frame rate and interesting target), the judgement of difference degree Index (including MSE, SSIM, PSNR etc., default use MSE), discrepancy threshold T (for controlling the harsh degree of diversity judgement, Be worth smaller, then smaller to the difference tolerance of two frame images, the frequency for carrying out target detection to image frame is more frequent), difference sentences Disconnected mode (including full figure judgment model and grid judgment model), image enhancement mode (including sharpening, defogging, Auto Laves etc., The good method of effect is chosen according to specific video scene).The above parameter will affect system working effect, should as the case may be into Row adjustment.After each parameter setting, start to detect video and identified.Video frame rate used is in the present embodiment 30fps, image size 1920*1080；

A12 obtains video flowing to be detected, is split into using the video flowing to be detected that the library Opencv will acquire independent Image frame, and subsequent processing is carried out by independent operation unit of every frame image；

A13 judges whether current image frame is first frame, if first frame, since first frame image does not have as start frame There is reference frame to be compared, then image enhancement is executed to current image frame and target detection operates, after the completion of detection, by current shadow As frame is set as reference frame.

In the present embodiment, image enhancement is realized using Auto Laves algorithm, if frame to be reinforced is Fig. 4, to its automatic color Effect such as Fig. 5 after rank, it is possible to find: the pixel Distribution value of image is adjusted after Auto Laves, and contrast is improved, and image becomes clear Clear, so that the identification of target increases, this carries out target detection for subsequent Yolov3 and plays good Effect of Pretreatment, often There are also histogram equalization, defogging, sharpenings etc. for method.It is then incoming to enhanced first frame image based on Yolov3 The detection module of body carries out target detection.Yolov3, to image zooming-out feature, is cooperated using Darknet frame cleverly to lose Function realizes target classification, and will test result output is xml document.The present embodiment is adjusted Yolov3 source code at this time, The results box coordinate and respective objects classification that will test retain in a program, are used as to subsequent frame calibration result and progress MGP；

A14 then works according to Fig. 3 since the second frame image.First determine whether that framing bit locating for current image frame (is assumed to be f_now, and start frame is set as the 0th frame), if meeting formula:

f_now%f_skip=O

Then current image frame is the frame to be detected without frame-skip, carries out diversity judgement to current image frame, otherwise will The reference frame testing result retained in system assigns present frame image, exports as its testing result, to guarantee detection efficiency；

A15, in diversity judgement link, being passed to therein is reference frame and the frame to be detected without frame-skip, the two benefit Diversity judgement is carried out with the judge index of the difference degree of setting, is illustrated by taking MSE as an example in the present embodiment, it is assumed that carries out difference and sentences Disconnected reference frame is Fig. 6, and frame 1 to be detected is Fig. 7 (it and Fig. 6 are separated by 70 frames, 2~3s of time interval).It can find the two difference It is smaller, it does not need actually to treat the progress target detection of detection frame 1, difference is calculated using full figure judgment model+MSE at this time and is obtained Result be 24.387, value is smaller；And when (Fig. 8, it and Fig. 6 are separated by 170 frames, time interval 5 for reference frame and frame to be detected 2 ~6s) when comparing, it both can intuitively find to differ greatly, detection frame 2 should be treated and carry out target detection, mould is judged using full figure It is 65.396 that formula+MSE, which calculates the result that difference obtains, and value is larger, illustrates that the presence of diversity judgement unit is valuable.Value It is to be noted that: above-mentioned this larger or smaller description of sayed difference value is the knot with preset discrepancy threshold T-phase than obtaining Fruit, the setting relative difficulty of T value, it will directly affect whether this system can be taken into account efficient and work in high precision, consider Specific video presentation, by obtaining the T value for meeting this video after experiment.In the selection of diversity judgement mode, equally with ginseng For examining frame and frame to be detected 2, (experiment setting uses 10*10 specification grid, single-frame counts when using grid judgment model+MSE After calculating MSE and descending arrangement, taking the average value of preceding 10 MSE is result), calculating the result that difference obtains is 106.450.It compares Full figure judgment model, the difference value that grid judgment model obtains often can more reflect the real difference between image, this is because net Lattice mode more considers the local message of image, and the grid that it is differed greatly using in two images takes its MSE as representing Value represents the difference conditions of two width pictures, this more meets the temporal locality principle of consecutive frame in video, and adjacent two frame is often The mutation within the scope of picture in its entirety will not occur, and only part changes.But phase the time required to the calculating diversity factor of mesh model Slightly long compared with full figure mode, full figure mode computation needs 0.0182s in upper example, and mesh model calculating then needs 0.0293s, so two The specific choice of kind mode is still needed to according to practical adjustment.

Under the hardware environment tested, individual image frame is carried out required for target detection using Yolov3 algorithm Time is 1.6371s, cooperates parameter preset f_skip, the substantially scale using the present invention to video detection speed-raising can be calculated.Tool Body parameter is as follows: a length of 5 minutes when one section, frame per second 30fps, image size for the video of 1920*1080 for.Method 1 Video detection frame by frame is carried out using Yolov3, method 2 is frame-skip technology proposed by the present invention+diversity judgement technology+Yolov3 The combined method of detection carries out video detection.Video is split as 9023 frame images by the library Opencv, does not consider that program is split Video, image enhancement and other pretreatment durations, the only substantially difference of 2 video detection performance of calculation method 1 and method.Using net Lattice mode+MSE combined method, the used time for carrying out diversity judgement to a framing is 0.0293s, presets detection frame per second parameter f_skip=5, It the use of the time that Yolov3 algorithm carries out target detection to single frames image is 1.6371s, by taking two kinds of extreme cases as an example:

A151, difference degree are respectively less than discrepancy threshold T: only having start frame to carry out Yolov3 target detection at this time, it is poor to carry out The number of different judgement:

n₁≈9023÷5≈1804

Duration T used_minProbably are as follows:

T_min1804 × 0.0293+1.6371=54.4943s of ≈

A152, difference degree are all greater than discrepancy threshold T: after carrying out diversity judgement every time at this time, all needing to calculate using Yolov3 Method carries out target detection, duration T used_maxProbably are as follows:

T_max≈ 1804 × (0.0293+1.6371)+1.6371=3007.8227s

A153 carries out conventional detection frame by frame, duration T used using Yolov3 model_yoloProbably are as follows:

T_yolo9023 × 1.6371=14771.5533s of ≈

By carrying out the multiplying power of video detection speed-raising substantially using the present invention after calculating are as follows: 4.9~271.1 times, multiplying power wave Moving the main reason for range is big is to be illustrated during the present embodiment calculates using two kinds of extreme cases, actually diversity judgement It is greater than the image frame unknown proportion of discrepancy threshold T afterwards, so multiplying power fluctuation can be generated.On the other hand, pass through act in the present embodiment Example and theoretical calculation, to summarize this system to the acceleration situation of video detection, and actually influence the other of accelerated ratio Because being known as video length, video frame rate, image size, frame-skip parameter f_skip, computer hardware performance, system realize used compile Cheng Yuyan etc., it is also contemplated that the used time of the internal processes such as the transmitting of previous frame testing result and MGP, so this system carries out video detection Accelerated ratio be difficult to determine, but detect speed amplification effect it is obvious.The present invention makes a change from the method for video detection, Under the premise of detection accuracy is able to guarantee, detection speed is greatly improved, this allows for same video task, using this method institute The hardware condition needed is greatly reduced.Reducing video detection is also one of outstanding advantages of the invention to the dependence of hardware.

A16 after target detection is completed, propagates (MGP) using exercise guidance, carries out to the testing result of Yolov3 algorithm Correction, further increases detection accuracy, then sets new reference frame for the image frame after fine detection, and to its framing bit Judged, if the last frame of video then terminates this video detection；If it is not, then will test frame is drawn on image frame On, next frame image is then extracted, the process of step A14-A16 is repeated.

A17 preferably detects for auxiliary Yolov3 model, improves the detectability of whole system, the present invention also provides only Vertical data generating unit, with the content and diversity of abundant data collection.When it is suitable for high-quality data less, it is unable to fully instruct The case where practicing Yolov3 model.Realize that data enhancement process, Yolov3 are based on Darknet and carry out feature extraction and divide in this part Class, therefore it based on depth convolutional neural networks, this allows for it and a large amount of data is needed to carry out after repeatedly training, network Performance can just emerge from, and often training video number is few for many video detection tasks, simultaneously because the time office of video Portion's property, although it is caused to have a large amount of image frame, repetition, the image of redundancy are too many, and effective, good image is very little, this When data systematic function provided by the invention can work well.Data generating unit is based on DCGAN model, with existing image Based on data, the new images for meeting given training image requirement are generated, as new training data, network is carried out more Adequately training, to improve the detectability of Yolov3.DCGAN data generating unit is that system of the present invention is perfect, feature-rich Another embodiment.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.

The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of video object detection recognition method characterized by comprising

Otherwise S3 then treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced with newly Reference frame.

2. video object detection recognition method according to claim 1, which is characterized in that judging reference frame and to be detected Whether the difference degree between frame is less than or equal to before preset discrepancy threshold, the method also includes:

Obtain video flowing to be detected；

Judge whether current image frame is first frame；

If first frame, then image enhancement is executed to current image frame and target detection operates, after the completion of detection, by current image Frame is set as reference frame；

If not first frame, then judge whether current image frame is frame to be detected without frame-skip, if so, executing S1.

3. video object detection recognition method according to claim 2, which is characterized in that it is described if not first frame, then Judge whether current image frame is frame to be detected without frame-skip, if so, executing S1 and including:

Note start frame is the 0th frame, judges framing bit f locating for current image frame_nowWhether formula is met: f_now%f_skip=0, wherein f_skipTo detect frame per second, for indicating every f_skipFrame carries out a diversity judgement；

If not satisfied, then assigning reference frame testing result to present frame image, the testing result as present frame image is exported.

4. video object detection recognition method according to claim 1, which is characterized in that reference frame and frame difference to be detected The judge index of degree includes: root-mean-square error, gray scale difference, histogram difference, Y-PSNR or structural similarity.

5. video object detection recognition method according to claim 4, which is characterized in that the calculating of root-mean-square error MSE Formula are as follows:Wherein, X_ref,iFor the pixel value of reference frame pixel i, X_det,iFor frame pixel i's to be detected Pixel value, n are image pixel number；

The calculation formula of gray scale difference are as follows:Wherein, G_ref,iFor the gray value of reference frame pixel i, G_det,iFor to The gray value of detection frame pixel i；

6. video object detection recognition method according to claim 1, which is characterized in that diversity judgement mode includes: complete Figure judgement or grid judgement.

7. video object detection recognition method according to claim 1, which is characterized in that image enhancement mode includes: bright Spend adjusting, contrast adjustment, sharpening, defogging, Auto Laves, one or more in histogram equalization.

8. video object detection recognition method according to claim 1, which is characterized in that the detection frame for the treatment of carries out figure Image intensifying and target detection, after the completion of detection, frame to be detected, which is replaced with new reference frame, includes:

It treats detection frame and carries out image enhancement；

According to the frame to be detected of enhancing, refinement target detection is carried out to enhanced frame to be detected using Yolov3 algorithm and is known Not.

9. video object detection recognition method according to claim 1, which is characterized in that the detection frame for the treatment of carries out figure Image intensifying and target detection, after the completion of detection, frame to be detected, which is replaced with new reference frame, includes:

It treats detection frame and carries out image enhancement；

According to the frame to be detected of enhancing, refinement target detection is carried out to enhanced frame to be detected using Yolov3 algorithm and is known Not；

It is propagated using exercise guidance, is corrected by testing result of the temporal information correction technique to Yolov3 algorithm, corrected After the completion, frame to be detected is replaced with to new reference frame.

10. video object detection recognition method according to claim 8 or claim 9, which is characterized in that the method also includes:

Confrontation network is generated using depth convolution based on existing image and generates new data, expands the training set of Yolov3 algorithm, it is right Yolov3 model carries out second training.