A kind of video object detection recognition method
Technical field
The present invention relates to field of image processings, particularly relate to a kind of video object detection recognition method.
Background technique
In recent years, with the fast development of internet, big data era is arrived, demand and day of the every profession and trade to valid data
It is all to increase.At the same time, more and more videos are uploaded to network, become emerging data treasure-house, if can obtain from video
Valid data are taken to be analyzed, it will to bring huge income.Therefore it is very powerful and exceedingly arrogant to become computer field for video object detection
Study a question, but in research over the years, the research of target detection is laid particular emphasis in static image detection, to video detection
Research it is relatively fewer.Video is first split into static image frame by convention video detection method, is then detected frame by frame.Due to video
With very strong temporal locality, before and after frames similarity is very high, and the process detected frame by frame after being split just produces largely
Idle work increases computing cost, reduces detection speed.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of video object detection recognition methods, to solve prior art institute
Existing convention video detection method need to carry out the detection of a large amount of redundancy image frame, lead to that overhead is big, detection efficiency is low
The problem of.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of video object detection recognition method, comprising:
S1, judges whether the difference degree between reference frame and frame to be detected is less than or equal to preset discrepancy threshold;
S2 assigns reference frame testing result to frame to be detected and exports if being less than or equal to;
Otherwise S3 then treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced
For new reference frame.
Further, judging whether the difference degree between reference frame and frame to be detected is less than or equal to preset difference threshold
Before value, the method also includes:
Obtain video flowing to be detected;
The video flowing to be detected that will acquire splits into independent image frame;
Judge whether current image frame is first frame;
If first frame, then image enhancement is executed to current image frame and target detection operates, it, will be current after the completion of detection
Image frame is set as reference frame;
If not first frame, then judge whether current image frame is frame to be detected without frame-skip, if so, executing
S1。
Further, described if not first frame, then judge whether current image frame is without the to be detected of frame-skip
Frame, if so, execution S1 includes:
Note start frame is the 0th frame, judges framing bit f locating for current image framenowWhether formula is met: fnow%fskip=0,
In, fskipTo detect frame per second, for indicating every fskipFrame carries out a diversity judgement;
If satisfied, then current image frame is the frame to be detected without frame-skip, then S1 is executed;
If not satisfied, then assigning reference frame testing result to present frame image, the testing result as present frame image is defeated
Out.
Further, the judge index of reference frame and frame difference degree to be detected include: root-mean-square error, it is gray scale difference, straight
Square figure difference, Y-PSNR or structural similarity.
Further, the calculation formula of root-mean-square error MSE are as follows:Wherein, Xref,iFor reference frame
The pixel value of pixel i, Xdet,iFor the pixel value of frame pixel i to be detected, n is image pixel number;
The calculation formula of gray scale difference are as follows:Wherein, Gref,iFor the gray value of reference frame pixel i, Gdet,i
For the gray value of frame pixel i to be detected;
The calculation formula of Y-PSNR are as follows:Wherein, MAXIFor the maximum value of color of image.
Further, diversity judgement mode includes: full figure judgement or grid judgement.
Further, image enhancement mode includes: brightness regulation, contrast adjustment, sharpening, defogging, Auto Laves, histogram
It is one or more in figure equalization.
Further, the detection frame for the treatment of carries out image enhancement and target detection, and after the completion of detection, frame to be detected is replaced
Being changed to new reference frame includes:
It treats detection frame and carries out image enhancement;
According to the frame to be detected of enhancing, using Yolov3 algorithm to enhanced frame to be detected carry out refinement target detection with
Identification.
Further, the detection frame for the treatment of carries out image enhancement and target detection, and after the completion of detection, frame to be detected is replaced
Being changed to new reference frame includes:
It treats detection frame and carries out image enhancement;
According to the frame to be detected of enhancing, using Yolov3 algorithm to enhanced frame to be detected carry out refinement target detection with
Identification;
It is propagated using exercise guidance, is corrected by testing result of the temporal information correction technique to Yolov3 algorithm,
After the completion of correction, frame to be detected is replaced with to new reference frame.
Further, the method also includes:
Confrontation network is generated using depth convolution based on existing image and generates new data, expands the training of Yolov3 algorithm
Collection carries out second training to Yolov3 model.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, judge whether the difference degree between reference frame and frame to be detected is less than or equal to preset difference threshold
Value;If being less than or equal to, assigns reference frame testing result to frame to be detected and export;Otherwise, then it treats detection frame and carries out image increasing
Frame to be detected, after the completion of detection, is replaced with new reference frame, the standard as subsequent Difference test by strong and target detection.This
Sample can be reduced by carrying out image enhancement and target detection to the big frame to be detected of difference degree to redundant frame in video
Detection reduces overhead to accelerate video detection speed.
Detailed description of the invention
Fig. 1 is the flow diagram of video object detection recognition method provided in an embodiment of the present invention;
Fig. 2 is the structural schematic diagram that video object provided in an embodiment of the present invention detects identifying system;
Fig. 3 is the detailed process schematic diagram of video object detection recognition method provided in an embodiment of the present invention;
Fig. 4 is frame schematic diagram to be detected provided in an embodiment of the present invention;
Fig. 5 is enhanced frame schematic diagram to be detected provided in an embodiment of the present invention;
Fig. 6 is the reference frame schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement;
Fig. 7 is frame to be detected 1 schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement;
Fig. 8 is frame to be detected 2 schematic diagram provided in an embodiment of the present invention for carrying out diversity judgement.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention need to carry out a large amount of redundancy image frame detection for existing convention video detection method, and system is caused to be opened
The problem that pin is big, detection efficiency is low, provides a kind of video object detection recognition method.
Embodiment one
As shown in Figure 1, video object detection recognition method provided in an embodiment of the present invention, comprising:
S1, judges whether the difference degree between reference frame and frame to be detected is less than or equal to preset discrepancy threshold;
S2 assigns reference frame testing result to frame to be detected and exports if being less than or equal to;
Otherwise S3 then treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced
For new reference frame.
Video object detection recognition method described in the embodiment of the present invention, judges the difference between reference frame and frame to be detected
Whether degree is less than or equal to preset discrepancy threshold;If being less than or equal to, assigns reference frame testing result to frame to be detected and export;
Otherwise, then it treats detection frame and carries out image enhancement and target detection, after the completion of detection, frame to be detected is replaced with to new reference
Frame, the standard as subsequent Difference test.In this way, by carrying out image enhancement and target inspection to the big frame to be detected of difference degree
It surveys, the detection to redundant frame in video can be reduced, to accelerate video detection speed, reduce overhead.
In the specific embodiment of aforementioned video target detection recognition methods, further, judge reference frame and to
Whether the difference degree between detection frame is less than or equal to before preset discrepancy threshold, the method also includes:
Obtain video flowing to be detected;
The video flowing to be detected that will acquire splits into independent image frame;
Judge whether current image frame is first frame;
If first frame, then image enhancement is executed to current image frame and target detection operates, it, will be current after the completion of detection
Image frame is set as reference frame;
If not first frame, then judge whether current image frame is frame to be detected without frame-skip, if so, executing
S1。
In the present embodiment, if the difference degree between reference frame and frame to be detected is diff, discrepancy threshold T, first will
Then two width images (reference frame and frame to be detected) matrixing calculates the difference degree diff between reference frame and frame to be detected,
Judge whether diff is less than or equal to preset discrepancy threshold T;If so, thinking that the two difference is little, frame to be detected can be considered superfluous
Remaining frame assigns reference frame testing result to frame to be detected and exports;Otherwise, then it is assumed that two width images need pair there are notable difference
Frame to be detected carries out image enhancement processing, improves the identification of target in image, then, carries out mesh with regard to enhanced frame to be detected
Frame to be detected, after the completion of detection, is replaced with new reference frame, the standard as subsequent Difference test by mark detection.
It is further, described if not first frame in the specific embodiment of aforementioned video target detection recognition methods,
Then judge whether current image frame is frame to be detected without frame-skip, if so, executing S1 and including:
Note start frame is the 0th frame, judges framing bit f locating for current image framenowWhether formula is met: fnow%fskip=0,
In, fskipTo detect frame per second, for indicating every fskipFrame carries out a diversity judgement;
If satisfied, then current image frame is the frame to be detected without frame-skip, then S1 is executed;
If not satisfied, then assigning reference frame testing result to present frame image, the testing result as present frame image is defeated
Out.
In the present embodiment, even if reducing the detection of redundant frame with diversity judgement, also still need to reference frame and frame to be detected
Carry out difference calculating.For high frame-rate video or the slow video of variation, the difference very little of consecutive frame, at this time difference calculate also at
For overhead, the present embodiment jumps over method with frame to reduce this Section Overhead.Specifically: setting time parameter tskip, it is
It unites every tskipTime span executes a Difference test, in tskipTime interval in not to consecutive frame execute Difference test, lead to
The detection for jumping over redundant frame is crossed, video detection is accelerated.tskipSelection it is often related with the frame per second of current detection video, so in reality
It applies in example using setting detection frame per second fskipMode, realize frame jump over, i.e. the every f of systemskipFrame executes a diversity judgement,
In practical application, parameter fskipAccording to video frame rate and video situation of change flexible setting, guarantee detection effect meet demand.
In the present embodiment, frame per second f is detected by settingskip, the frequency for carrying out Difference test is controlled, is sentenced by the way that reduction is useless
It is disconnected, the detection of redundant frame is jumped over, accelerates video object detection, realizes the tradeoff of target detection precision and efficiency in video.
In the present embodiment, a variety of judge index can be calculated as the difference degree measured between reference frame and frame to be detected
Quantitative criteria.
In the specific embodiment of aforementioned video target detection recognition methods, further, reference frame and frame to be detected
The judge index of difference degree includes: root-mean-square error, gray scale difference, histogram difference, Y-PSNR or structural similarity.
The calculation formula of root-mean-square error MSE are as follows:Wherein, Xref,iFor the picture of reference frame pixel i
Element value, Xdet,iFor the pixel value of frame pixel i to be detected, n is image pixel number;
The calculation formula of gray scale difference are as follows:Wherein, Gref,iFor the gray value of reference frame pixel i, Gdet,i
For the gray value of frame pixel i to be detected;
Histogram difference: calculating separately the histogram of two images, judges after being normalized according to certain distance metric
Similarity;
The calculation formula of Y-PSNR (PSNR) are as follows:Wherein, MAXIFor color of image
Maximum value;
Structural similarity (SSIM): two width picture similarities are measured in terms of brightness, contrast, structure three.Practical application
When, N block is divided the image into using sliding window, using each window mean value of Gauss weighted calculation, variance and covariance, is then counted
The structural similarity SSIM of corresponding blocks is calculated, is finally measured average value as the structural similarity of two images;Wherein, SSIM value
Range [0,1], value is smaller, and two images difference is bigger.
In the present embodiment, depending on certain specifically used index measures difference degree according to the actual situation.It is found through experiments that:
SSIM is suitble to measure the style transformation of picture in its entirety, is poorly suitable for part difference existing for before and after frames;
Histogram difference reacts the distributional difference of two width picture gray values, without spatial positional information, and information lose compared with
Greatly;
PSNR can be considered the variation of MSE;
The information that gray scale difference compares MSE consideration is not comprehensive enough, and colour information is lost larger.
Comprehensively consider, in the present embodiment, using MSE as default and preferred difference measurement index, while providing other fingers
It is denoted as to change as the case may be in practical applications for judgment criteria.
In the specific embodiment of aforementioned video target detection recognition methods, further, diversity judgement mode includes:
Full figure judgement or grid judgement.
In the present embodiment, to weigh influence of the diversity judgement to video object detection recognition method in precision and efficiency,
Judge that both modes calculate the difference degree between reference frame and frame to be detected using full figure judgement or grid:
Full figure judgement: two frame images directly calculate difference index in full figure range, in this, as difference degree;
Grid judgement: two frame images are first divided into the grid of p × p, calculate difference index on each grid, by p2A calculating
As a result it arranges in descending order, m value calculates average value before taking, as difference degree.
In the present embodiment, m, p are adjustable parameter, in practical applications, can be carried out according to specific requirements and detection effect
Customization is adjusted.
In the present embodiment, by taking root-mean-square error as an example, to full figure judgement and grid judge both diversity judgement modes into
Row is described in detail:
Full figure judgement: two frame images directly calculate root-mean-square error in full figure range, in this, as difference degree;
Grid judgement: two frame images are first divided into the grid of p × p, calculate root-mean-square error on each grid, by p2A meter
It calculates result to arrange in descending order, m value calculates average value before taking, as difference degree.
In the specific embodiment of aforementioned video target detection recognition methods, further, image enhancement mode includes:
It is brightness regulation, contrast adjustment, sharpening, defogging, Auto Laves, one or more in histogram equalization.
In the present embodiment, after diversity judgement, it can obtain that variation is big, it is necessary to the frames to be detected of detection.For improve to
The identification of target in detection frame, and then improve detection accuracy, in the present embodiment, first to reference frame there are notable difference to
Detection frame carries out image enhancement processing, wherein image enhancement mode includes:
Contrast adjustment and brightness regulation: improve the contrast and brightness of frame to be detected;
Sharpen: the edge, profile of target and the part of Gray Level Jump in enhancing image are apparent from image, or prominent
The feature of certain linear goal elements;
Defogging: desalinate frame to be detected because of influence caused by thick fog, image is made to become clear;
Auto Laves: it using pixel value minimum and maximum in each channel as white and black, redistributes in proportion therebetween
Pixel value keeps picture color more abundant;
Histogram equalization: conversion image gray-scale level makes image histogram become constant value as far as possible, utilizes dynamic with equilibrium
All grayscale in range.
In the present embodiment, contrast adjustment, brightness regulation, sharpening, defogging, Auto Laves, histogram equalization can be used
It etc. a variety of methods, treats detection frame and carries out image enhancement, selected enhancement method is that configurable item in practical applications can
To be configured according to the effect of concrete scene and target detection, guarantee video detection effect meet demand.Image enhancement
Purpose is to improve target identification degree, strengthens useful feature in image, is convenient for target detection, and reinforcing effect will not be shown in view
On frequency, a module being intended only as in image processing section.
In the specific embodiment of aforementioned video target detection recognition methods, further, it is described treat detection frame into
Row image enhancement and target detection, after the completion of detection, frame to be detected, which is replaced with new reference frame, includes:
It treats detection frame and carries out image enhancement;
According to the frame to be detected of enhancing, using Yolov3 (You Only Look Once) algorithm to enhanced to be detected
Frame carries out refinement object detection and recognition.
In the present embodiment, Yolov3 is a kind of algorithm of target detection, algorithm Chinese still ununified at present.
In the present embodiment, after carrying out image enhancement to difference frame significantly to be detected, just target detection is refined to it.Inspection
Method of determining and calculating selects efficient detection model Yolov3, and Yolov3 is different from conventional target detection algorithm, and Yolov3 is by the positioning of target
It is used as recurrence task with identification classification, is fully completed in a stage by a full convolutional neural networks, this possesses it
Cracking detection speed.Yolov3 also has the advantages such as high-resolution classifier, fine granularity feature, multiple dimensioned training and prediction,
And the expense for carrying out target detection is smaller, meets the demand of this system.Yolov3 to the enhanced frame to be detected of input into
Row target detection then returns to the rectangle frame coordinate and target category for surrounding target, after obtaining testing result, can be used for redundancy
The result of frame is demarcated.
In the present embodiment, refinement object detection and recognition is carried out to enhanced frame to be detected using Yolov3, it can
Detection overhead is further decreased, to reduce the configuration needs to hardware device.But it, can according to practical application scene and hardware environment
It will test algorithm and be changed to other algorithms and meet detection demand with more preferable, further mentioned for example, can both be changed to two stages detection algorithm
High measurement accuracy can also be changed to subsequent new proposition, more excellent detection algorithm, improve systematic entirety energy.And detection algorithm
The expense and difficulty of replacement are simultaneously little, system it is lasting using and it is versatile.
In the specific embodiment of aforementioned video target detection recognition methods, further, it is described treat detection frame into
Row image enhancement and target detection, after the completion of detection, frame to be detected, which is replaced with new reference frame, includes:
It treats detection frame and carries out image enhancement;
Refinement object detection and recognition is carried out to enhanced frame to be detected using Yolov3 algorithm;
It is propagated using exercise guidance, is corrected by testing result of the temporal information correction technique to Yolov3 algorithm,
After the completion of correction, frame to be detected is replaced with to new reference frame.
The characteristics of Yolov3 algorithm is that the expense of target detection is small, detection speed is fast, detection accuracy is high, in the present embodiment,
In order to further increase the ability of video object detection identification, auxiliary corrects testing result using temporal information correction technique.Tool
Body refers to, when being detected frame by frame to image frame, since the testing result of algorithm is unstable or video quality presence is asked
Topic causes the detection on consecutive frame for same target to will appear detection leakage phenomenon.Each target is static mostly on adjacent image frame
Or low speed is mobile, target should appear on the analogous location of previous frame image on the later frame image of institute, therefore this detection leakage phenomenon
It can be restored by the testing result from consecutive frame.
It in the present embodiment, is propagated using exercise guidance, by temporal information correction technique to the detection knot of Yolov3 algorithm
Fruit is corrected, and after the completion of correction, frame to be detected is replaced with to new reference frame, specific:
(Motion-guided Propagation, MGP) is propagated by the testing result of previous frame image using exercise guidance,
As the tutorial message on time dimension, a part on frame to be detected (current detection frame) as testing result is propagated to, it is right
The target of frame missing inspection to be detected compensates label, then utilizes class non-maxima suppression (Non Maximum
Suppression, NMS) algorithm removal repeating label target propagation frame, to reduce omission factor.
In the present embodiment, the thought of MGP is derived from the emerging pipeline type deep learning frame based on convolutional neural networks
(Tubelets with Convolutional Neural Networks, T-CNN), it is used exclusively for video object detection
Frame.MGP thinks,, also can be in certain frames even the detection to the adjacent image in front and back in still image object detection
Object Loss is generated when detection, can will test result part using the motion information of such as light stream at this time travels to phase
For adjacent frame to reduce missing inspection, realization is the target following technology based on Optic flow information.The utilization of optical flow field is divided into instantly with ten
Ripe, theory is not described here any more.
In the present embodiment, after using efficient detection algorithm Yolov3, it is used cooperatively temporal information correction technique, Neng Goujin
One step improves target detection precision.
In the specific embodiment of aforementioned video target detection recognition methods, further, the method also includes:
Confrontation network is generated using depth convolution based on existing image and generates new data, expands the training of Yolov3 algorithm
Collection carries out second training to Yolov3 model.
In the present embodiment, Yolov3 algorithm is a kind of algorithm of general target detection, but Yolov3 model uses pre-training
The effect that weight is detected is more general, so in the present embodiment, carrying out secondary instruction on public data collection to it first
Practice, lift scheme detectability.It, can be (that is: existing in private data collection relevant to scene by Yolov3 model according to actual scene
Have image) on be customized training, and often valid data are smaller for existing data set, are at this time to expand training set, increase number
According to diversity, depth convolution generation confrontation network (DCGAN) can be used and generate new picture on the basis of private data collection.
In the present embodiment, DCGAN includes to generate network and differentiate network two parts, and the two is the relationship confronted with each other,
The weight of two networks can be adjusted constantly in training, then can be by limited data extending at a large amount of qualified numbers using it
According to.Since depth network is very high to amount of training data demand and requirement, in the case where high-quality training data is less, this implementation
In example, according to concrete scene, DCGAN can be used and generate a large amount of training data, trained convenient for being customized to Yolov3, improve
Yolov3 model inspection ability.That is, on the training data, the present embodiment generates confrontation network existing using depth convolution
Have and generate new picture (new data) on the basis of image, to expand, abundant training set, increases data diversity, improve target inspection
Survey precision.
To sum up, video object detection recognition method described in the embodiment of the present invention, first to reference frame and frame to be detected into
Row diversity judgement assigns reference frame testing result to frame to be detected and exports if the two difference is small;If the two difference compared with
Greatly, then it treats detection frame and carries out image enhancement processing, then carry out target detection, identification with efficient detection model Yolov3,
And it is aided with temporal information correction technique correction testing result.In addition, also generating new training data, auxiliary instruction using DCGAN
Practice Yolov3 model, improve target detection precision, in this way, can not only substantially accelerate video detection speed and take into account detection accuracy,
The requirement to hardware device can also be reduced.
Embodiment two
The present invention also provides a kind of specific embodiments of video object detection identifying system, due to view provided by the invention
Frequency target detection identifying system is corresponding with the specific embodiment of aforementioned video target detection recognition methods, video object inspection
Surveying identifying system can be achieved the object of the present invention by executing the process step in above method specific embodiment, therefore
Explanation in above-mentioned video object detection recognition method specific embodiment, is also applied for video object provided by the invention
The specific embodiment for detecting identifying system will not be described in great detail in present invention specific embodiment below.
As shown in Fig. 2, the embodiment of the present invention also provides a kind of video object detection identifying system, and the system comprises: figure
As preprocessing module, detection module;Wherein, described image preprocessing module includes: diversity judgement unit and image enhancing unit,
The detection module includes: object detection unit;
Diversity judgement unit, it is preset whether the difference degree for judging between reference frame and frame to be detected is less than or equal to
Discrepancy threshold assigns reference frame testing result to frame to be detected and exports if being less than or equal to;If more than preset discrepancy threshold,
Detection frame then is treated using image enhancing unit and carries out image enhancement, and using object detection unit to enhanced frame to be detected
It carries out target detection and frame to be detected is replaced with to new reference frame after the completion of detecting.
In the present embodiment, the object detection unit refines enhanced frame to be detected using Yolov3 algorithm
Object detection and recognition.
In the present embodiment, the detection module further include: MGP result correcting unit;
The MGP result correcting unit, for being propagated using exercise guidance, by temporal information correction technique to Yolov3
The testing result of algorithm is corrected, and after the completion of correcting, frame to be detected is replaced with to new reference frame.
In the present embodiment, the detection module further include: DCGAN data generating unit;
The DCGAN data generating unit is new for being generated based on existing image using depth convolution generation confrontation network
Data expand, the training set of abundant Yolov3 algorithm, carry out second training to Yolov3 model.
In the present embodiment, the Difference test unit of the incoming image pre-processing module of video flowing to be detected, progress frame-skip,
The work of difference degree is calculated with reference frame, Difference test Elementary Function is the judge index such as framing bit judgement, grid dividing, MSE
It calculates and discrepancy threshold judges;It is greater than the frame to be detected of preset discrepancy threshold T for diversity factor, is passed to image preprocessing
Image enhancing unit in module strengthens the useful feature of the frame to be detected, improves target identification degree, the image enhancing unit function
Can realize the image enhancement operations such as defogging, sharpening, Auto Laves, histogram equalization;Then to enhanced frame to be detected,
The object detection unit it being further transferred in detection module, carries out refinement target detection, and the object detection unit is
One Yolov3 model by customization training;After detection, frame to be detected and testing result are passed to detection module together
In MGP result correcting unit be aided with after temporal information corrected, obtain on the basis of Yolov3 algorithm testing result
To more accurate detection, positioning result.MGP result correcting unit function is to realize the target following based on Optic flow information, is obtained
Target compares the direction of motion of previous frame in frame to be detected, and the testing result of previous frame image frame is transferred to according to target direction of motion
On frame to be detected, the compensation detection of target is carried out.It also needs to carry out class non-maxima suppression to whole testing result frames simultaneously
(Non Maximum Suppression, NMS) processing, eliminates same target and is repeated the phenomenon that outlining, final to reduce target inspection
The omission factor of survey improves detection accuracy.In detection module, the present invention also provides independent DCGAN data generating unit, DCGAN
Data generating unit core is one for generating the DCGAN model of image data, plays the role of abundant data collection.To make up
When high-quality training data deficiency, the bad phenomenon of Yolov3 simulated target detection effect mainly undertakes the work of data enhancing.
It in the present embodiment, can be found by Fig. 2, system proposed by the present invention is cascaded with modular form, and module again may be used
It is further broken into unit, each unit carries different functions, but the coupling between unit is shallower.This means that unit
The replaceability of content is stronger.Which kind of technology certain system unit specifically is realized with, is not laid down hard and fast rule.This is allowed for
Computer industry development is swift and violent, and each field new technology is maked rapid progress, as more preferable, more top or more suitable technology is sent out
It is existing, the currently used technology of its replacement system can be used.That is the work thought of system is constant, and each unit realizes that technology constantly improve.This
Ensure that lasting usability and the versatility of system are preferable.
To sum up, the detection of video object described in embodiment of the present invention identifying system is adopted first in image pre-processing module
With diversity judgement mechanism, is calculated using frame-skip and difference degree, under the premise of guaranteeing detection effect meet demand, substantially contracted
Subtract the image frame number of actual needs target detection, to accelerate video detection speed;It then treats detection frame and carries out image enhancement,
Strengthen useful feature in image, target identification degree is improved, to improve video detection precision;In detection module, calculated using high-performance
Method Yolov3 carries out refinement target detection to image, small with computing cost, and detection speed is fast, and detection accuracy is preferably excellent
Gesture;The testing result of previous frame image is corrected current detection knot by the skill propagated simultaneously using exercise guidance
Fruit further increases system detection precision;Final system addition enhances technology based on the data that depth convolution generates confrontation network,
Convenient for carrying out the expansion of diversification to small training set, for the second training of model, model inspection ability is improved.Whole system is adopted
It is multiple module-cascades by each function package, intermodule coupling is not strong, can suitably be replaced according to demand with modularization mechanism
It changes, so that system has better expressive ability.System efficiently can not only carry out target detection to video, and to hardware environment
Demand be greatly reduced.
Embodiment three
Video object detection recognition method and system described in embodiment for a better understanding of the present invention, in conjunction with Fig. 3 pairs
The video object detection recognition method and system are described in detail, and workflow may include:
A11 need to preset system parameter, including detection frame per second f before video detectionskip(for setting every fskip
Frame carries out a Difference test, is set according to the change frequency of specific video frame rate and interesting target), the judgement of difference degree
Index (including MSE, SSIM, PSNR etc., default use MSE), discrepancy threshold T (for controlling the harsh degree of diversity judgement,
Be worth smaller, then smaller to the difference tolerance of two frame images, the frequency for carrying out target detection to image frame is more frequent), difference sentences
Disconnected mode (including full figure judgment model and grid judgment model), image enhancement mode (including sharpening, defogging, Auto Laves etc.,
The good method of effect is chosen according to specific video scene).The above parameter will affect system working effect, should as the case may be into
Row adjustment.After each parameter setting, start to detect video and identified.Video frame rate used is in the present embodiment
30fps, image size 1920*1080;
A12 obtains video flowing to be detected, is split into using the video flowing to be detected that the library Opencv will acquire independent
Image frame, and subsequent processing is carried out by independent operation unit of every frame image;
A13 judges whether current image frame is first frame, if first frame, since first frame image does not have as start frame
There is reference frame to be compared, then image enhancement is executed to current image frame and target detection operates, after the completion of detection, by current shadow
As frame is set as reference frame.
In the present embodiment, image enhancement is realized using Auto Laves algorithm, if frame to be reinforced is Fig. 4, to its automatic color
Effect such as Fig. 5 after rank, it is possible to find: the pixel Distribution value of image is adjusted after Auto Laves, and contrast is improved, and image becomes clear
Clear, so that the identification of target increases, this carries out target detection for subsequent Yolov3 and plays good Effect of Pretreatment, often
There are also histogram equalization, defogging, sharpenings etc. for method.It is then incoming to enhanced first frame image based on Yolov3
The detection module of body carries out target detection.Yolov3, to image zooming-out feature, is cooperated using Darknet frame cleverly to lose
Function realizes target classification, and will test result output is xml document.The present embodiment is adjusted Yolov3 source code at this time,
The results box coordinate and respective objects classification that will test retain in a program, are used as to subsequent frame calibration result and progress
MGP;
A14 then works according to Fig. 3 since the second frame image.First determine whether that framing bit locating for current image frame (is assumed to be
fnow, and start frame is set as the 0th frame), if meeting formula:
fnow%fskip=O
Then current image frame is the frame to be detected without frame-skip, carries out diversity judgement to current image frame, otherwise will
The reference frame testing result retained in system assigns present frame image, exports as its testing result, to guarantee detection efficiency;
A15, in diversity judgement link, being passed to therein is reference frame and the frame to be detected without frame-skip, the two benefit
Diversity judgement is carried out with the judge index of the difference degree of setting, is illustrated by taking MSE as an example in the present embodiment, it is assumed that carries out difference and sentences
Disconnected reference frame is Fig. 6, and frame 1 to be detected is Fig. 7 (it and Fig. 6 are separated by 70 frames, 2~3s of time interval).It can find the two difference
It is smaller, it does not need actually to treat the progress target detection of detection frame 1, difference is calculated using full figure judgment model+MSE at this time and is obtained
Result be 24.387, value is smaller;And when (Fig. 8, it and Fig. 6 are separated by 170 frames, time interval 5 for reference frame and frame to be detected 2
~6s) when comparing, it both can intuitively find to differ greatly, detection frame 2 should be treated and carry out target detection, mould is judged using full figure
It is 65.396 that formula+MSE, which calculates the result that difference obtains, and value is larger, illustrates that the presence of diversity judgement unit is valuable.Value
It is to be noted that: above-mentioned this larger or smaller description of sayed difference value is the knot with preset discrepancy threshold T-phase than obtaining
Fruit, the setting relative difficulty of T value, it will directly affect whether this system can be taken into account efficient and work in high precision, consider
Specific video presentation, by obtaining the T value for meeting this video after experiment.In the selection of diversity judgement mode, equally with ginseng
For examining frame and frame to be detected 2, (experiment setting uses 10*10 specification grid, single-frame counts when using grid judgment model+MSE
After calculating MSE and descending arrangement, taking the average value of preceding 10 MSE is result), calculating the result that difference obtains is 106.450.It compares
Full figure judgment model, the difference value that grid judgment model obtains often can more reflect the real difference between image, this is because net
Lattice mode more considers the local message of image, and the grid that it is differed greatly using in two images takes its MSE as representing
Value represents the difference conditions of two width pictures, this more meets the temporal locality principle of consecutive frame in video, and adjacent two frame is often
The mutation within the scope of picture in its entirety will not occur, and only part changes.But phase the time required to the calculating diversity factor of mesh model
Slightly long compared with full figure mode, full figure mode computation needs 0.0182s in upper example, and mesh model calculating then needs 0.0293s, so two
The specific choice of kind mode is still needed to according to practical adjustment.
Under the hardware environment tested, individual image frame is carried out required for target detection using Yolov3 algorithm
Time is 1.6371s, cooperates parameter preset fskip, the substantially scale using the present invention to video detection speed-raising can be calculated.Tool
Body parameter is as follows: a length of 5 minutes when one section, frame per second 30fps, image size for the video of 1920*1080 for.Method 1
Video detection frame by frame is carried out using Yolov3, method 2 is frame-skip technology proposed by the present invention+diversity judgement technology+Yolov3
The combined method of detection carries out video detection.Video is split as 9023 frame images by the library Opencv, does not consider that program is split
Video, image enhancement and other pretreatment durations, the only substantially difference of 2 video detection performance of calculation method 1 and method.Using net
Lattice mode+MSE combined method, the used time for carrying out diversity judgement to a framing is 0.0293s, presets detection frame per second parameter fskip=5,
It the use of the time that Yolov3 algorithm carries out target detection to single frames image is 1.6371s, by taking two kinds of extreme cases as an example:
A151, difference degree are respectively less than discrepancy threshold T: only having start frame to carry out Yolov3 target detection at this time, it is poor to carry out
The number of different judgement:
n1≈9023÷5≈1804
Duration T usedminProbably are as follows:
Tmin1804 × 0.0293+1.6371=54.4943s of ≈
A152, difference degree are all greater than discrepancy threshold T: after carrying out diversity judgement every time at this time, all needing to calculate using Yolov3
Method carries out target detection, duration T usedmaxProbably are as follows:
Tmax≈ 1804 × (0.0293+1.6371)+1.6371=3007.8227s
A153 carries out conventional detection frame by frame, duration T used using Yolov3 modelyoloProbably are as follows:
Tyolo9023 × 1.6371=14771.5533s of ≈
By carrying out the multiplying power of video detection speed-raising substantially using the present invention after calculating are as follows: 4.9~271.1 times, multiplying power wave
Moving the main reason for range is big is to be illustrated during the present embodiment calculates using two kinds of extreme cases, actually diversity judgement
It is greater than the image frame unknown proportion of discrepancy threshold T afterwards, so multiplying power fluctuation can be generated.On the other hand, pass through act in the present embodiment
Example and theoretical calculation, to summarize this system to the acceleration situation of video detection, and actually influence the other of accelerated ratio
Because being known as video length, video frame rate, image size, frame-skip parameter fskip, computer hardware performance, system realize used compile
Cheng Yuyan etc., it is also contemplated that the used time of the internal processes such as the transmitting of previous frame testing result and MGP, so this system carries out video detection
Accelerated ratio be difficult to determine, but detect speed amplification effect it is obvious.The present invention makes a change from the method for video detection,
Under the premise of detection accuracy is able to guarantee, detection speed is greatly improved, this allows for same video task, using this method institute
The hardware condition needed is greatly reduced.Reducing video detection is also one of outstanding advantages of the invention to the dependence of hardware.
A16 after target detection is completed, propagates (MGP) using exercise guidance, carries out to the testing result of Yolov3 algorithm
Correction, further increases detection accuracy, then sets new reference frame for the image frame after fine detection, and to its framing bit
Judged, if the last frame of video then terminates this video detection;If it is not, then will test frame is drawn on image frame
On, next frame image is then extracted, the process of step A14-A16 is repeated.
A17 preferably detects for auxiliary Yolov3 model, improves the detectability of whole system, the present invention also provides only
Vertical data generating unit, with the content and diversity of abundant data collection.When it is suitable for high-quality data less, it is unable to fully instruct
The case where practicing Yolov3 model.Realize that data enhancement process, Yolov3 are based on Darknet and carry out feature extraction and divide in this part
Class, therefore it based on depth convolutional neural networks, this allows for it and a large amount of data is needed to carry out after repeatedly training, network
Performance can just emerge from, and often training video number is few for many video detection tasks, simultaneously because the time office of video
Portion's property, although it is caused to have a large amount of image frame, repetition, the image of redundancy are too many, and effective, good image is very little, this
When data systematic function provided by the invention can work well.Data generating unit is based on DCGAN model, with existing image
Based on data, the new images for meeting given training image requirement are generated, as new training data, network is carried out more
Adequately training, to improve the detectability of Yolov3.DCGAN data generating unit is that system of the present invention is perfect, feature-rich
Another embodiment.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications
It should be regarded as protection scope of the present invention.