CN112069967A - Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion - Google Patents

Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion Download PDF

Info

Publication number
CN112069967A
CN112069967A CN202010896881.5A CN202010896881A CN112069967A CN 112069967 A CN112069967 A CN 112069967A CN 202010896881 A CN202010896881 A CN 202010896881A CN 112069967 A CN112069967 A CN 112069967A
Authority
CN
China
Prior art keywords
detection
frame
frames
current frame
pedestrian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010896881.5A
Other languages
Chinese (zh)
Other versions
CN112069967B (en
Inventor
郭全民
张文平
田英侠
柴改霞
杨建华
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Technological University
Original Assignee
Xian Technological University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Technological University filed Critical Xian Technological University
Priority to CN202010896881.5A priority Critical patent/CN112069967B/en
Publication of CN112069967A publication Critical patent/CN112069967A/en
Application granted granted Critical
Publication of CN112069967B publication Critical patent/CN112069967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a night vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion. The method selects detection frames in a self-adaptive manner according to the condition of an interframe difference threshold and a maximum interframe separation threshold, wherein the interframe difference threshold is determined by the correlation of a feature vector cosine included angle threshold of two frames of images, the number of interval frames and the visual effect of the detection frames, so that the number of the detection frames can be reduced to the maximum extent on the premise of meeting the visual characteristics of human eyes, the processing efficiency of night-vision anti-halation pedestrian detection is improved, and the problems of missed detection and multiple detection caused by over-small difference due to over-large content difference of video frames during intermittent cyclic detection-tracking can be solved; the maximum frame separation threshold is introduced, the problems of untimely updating, tracking loss error and the like caused by too small content difference of video frames during detection and tracking can be solved, and the precision and fault tolerance of night vision anti-halation pedestrian detection are improved.

Description

Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion
Technical Field
The invention belongs to the technical field of night vision anti-halation, particularly relates to a self-adaptive intermittent cyclic detection-tracking method for night vision anti-halation, and particularly relates to a night vision anti-halation pedestrian detection and tracking method with heterogeneous video fusion.
Background
The night vision anti-halation technology of the heterogeneous image fusion combines the advantages of no halation of the infrared image and rich color detail information of the visible image, provides a new approach for solving the problem of halation during night driving, and has good application prospect.
The night vision anti-halation method for the heterogeneous image fusion eliminates halation in a night vision image through an infrared and visible light image fusion technology, improves color and detail information of the image, improves imaging quality of the night vision image, develops from single transformation fusion of early wavelets, color spaces and the like to a composite fusion method combining color spaces with transformation of multi-resolution, multi-scale and the like, obtains a fusion image with more thorough halation elimination and richer detail colors, but still faces two challenges when being applied to night pedestrian detection: firstly, although the composite fusion method has good effect, the algorithm complexity is high, so that the pedestrian detection and tracking processing efficiency of the whole system is low, and the speed is difficult to meet the actual requirement; and secondly, although the fused image has no halation compared with the original image and has rich detail color information in a dark place, the fused image has the problems of missing detection of pedestrians, low tracking precision and the like due to insufficient light at night and poor imaging conditions and larger difference compared with the daytime image quality.
Disclosure of Invention
Aiming at the problems of low processing speed and low precision of the traditional heterogeneous video fusion night vision anti-halation pedestrian detection and tracking method, the invention designs the heterogeneous video fusion night vision anti-halation pedestrian detection and tracking method to improve the speed and precision of pedestrian detection and tracking in a night vision halation dynamic scene.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a night vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion comprises the following steps:
step 1, self-adaptively selecting a detection frame, wherein the step specifically comprises the following steps:
step 1.1, extracting a video frame, and determining a1 st frame of a video sequence as a reference frame and a 2 nd frame of the video sequence as a current frame;
step 1.2, calculating the number n of interval frames between the current frame and the reference frame according to the following formula;
n=nc-nr
in the formula, ncIs the current frame number, nrIs the reference frame number.
Step 1.3, calculating a maximum frame separation threshold value N according to the following formula;
N=f×T
where f is the frame rate, T is the maximum detection time threshold, and T is 0.2 s.
Step 1.4, if N is larger than or equal to N, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame C of the current frame is set as a new current frame, and the step 1.2 is executed;
if N is less than N, sequentially executing the step 1.5;
step 1.5, calculating a characteristic vector cosine included angle theta between a current frame and a reference frame according to the following steps;
Figure BDA0002658678130000021
in the formula, the reference frame feature vector R ═ R0,r1,...,r63]The current frame feature vector C ═ C0,c1,...,c63]。
Step 1.6, comparing theta with a set interframe difference threshold tau, wherein the value of the threshold tau is 1.5-2.2, and only detecting the current frame exceeding the threshold:
if theta is larger than tau, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame of the current frame C is set as a new current frame, and the step 1.2 is executed;
if theta is less than tau, the current frame C is not a detection frame, the reference frame is unchanged, the next frame of the current frame C is set as a new current frame, and the step 1.2 is skipped to execute;
and 2, starting pedestrian detection after the detection condition is met, and specifically comprising the following steps:
step 2.1, cutting the detection frame into a plurality of subgraphs with equal size, and obtaining the initial detection result S of each subgraph through the pedestrian detectoriThe set S is obtained according to the following formula:
S=(S0∪S1∪…∪Sn)-((S0∩S1)∪(S1∩S2)…∪(Sm-1∩Sm))
wherein m is the number of cuts, i is 0.
Step 2.2, calculating the first screening test set S according to the following formulaI
SI=S-SO
In the formula, SOThe degree of overlap with the detection frame is larger than a threshold value y1Set of all detection boxes of (1), take y1=0.7。
And 2.3, further screening the detection frames in the redundant area boundary range.
Step 2.3.1, screening frames in the redundant area according to the position coordinates of the detection frames;
step 2.3.2, calculate the set of redundant boxes S according to the following equationR
Figure BDA0002658678130000022
Wherein a and b are candidate frames, y2、y3To reject the threshold, take y2=0.8,y3=0.6。
Step 2.4, acquiring a pedestrian detection frame set S of the detection frame according to the following formulaP
SP=SI-SR
And 3, tracking the pedestrian after detecting the pedestrian, wherein the method comprises the following specific steps:
step 3.1, taking the detection frame in the detection frame as a sample image block, and obtaining a training sample x ═ x by cyclic shift i1,2, n, inputting the i ═ 1,2,. n } into a classifier for training;
step 3.2, solving DFT transformation of tracker template alpha according to the following formula
Figure BDA0002658678130000023
Figure BDA0002658678130000024
In the formula (I), the compound is shown in the specification,
Figure BDA0002658678130000031
is the element of the kernel function k (x, x'),
Figure BDA0002658678130000032
for training sample x ═ xiA regression value y ═ { y ═ 1,2,. and n } corresponding to | i ═ 1,2iA DFT transform of 1, 2.,. n }, λ being a regularization parameter.
Step 3.3, calculate according to the following equation
Figure BDA0002658678130000033
After the tracking pedestrian position is converted from the frequency domain to the time domain, the region with the largest numerical value is the position of the tracking pedestrian;
Figure BDA0002658678130000034
in the formula (I), the compound is shown in the specification,
Figure BDA0002658678130000035
is an element of the kernel function k (x, z'). Constructing a detection sample z ═ { z ] by cyclic dense samplingi|i=1,2,...,n},zi=Piz。
In step 1.6, τ is 1.8. The method is located in the middle of the NCIE stable region, is far away from the inflection point of the NCIE sudden change and has higher frame spacing, and can ensure that the detection frame number is reduced to the greatest extent on the premise of continuous video sequences, namely the lowest detection frame number when the video content is continuous, effectively reduce the detection frame number and meet the visual characteristics of human eyes.
The self-adaptive intermittent cyclic detection-tracking method for night vision anti-halation, which is designed by the invention, solves the problem of insufficient real-time performance caused by detection of each frame by adopting a cyclic mode of intermittent detection; the self-adaptive selection detection frame is adopted, the phenomenon of multi-detection or missing detection caused by adopting an intermittent detection mode is avoided, and the detection and tracking precision of the night vision anti-halation pedestrians is improved.
Compared with the prior art, the invention has the beneficial effects that:
1. the self-adaptive intermittent cyclic detection-tracking method provided by the invention aims to solve the problems of low processing speed and low precision of a pedestrian detection and tracking method in a dynamic scene of halation at night, and provides a method for triggering a detection end in a detection-tracking working mode in an intermittent cyclic mode, so that the number of detection frames is greatly reduced, and the detection speed is greatly improved;
2. the self-adaptive intermittent cyclic detection-tracking method designed by the invention adaptively selects the detection frame according to the content difference of the video sequence, solves the problems of missed detection and multiple detection caused by over-small difference due to over-large content difference of the video frame during intermittent cyclic detection-tracking, and effectively improves the detection precision.
Drawings
FIG. 1 is an adaptive intermittent cycle detection-tracking flow diagram;
FIG. 2 is a fused No. 1 detection frame image of a slow video sequence;
FIG. 3 is a 4 th inspection frame image of the merged slow video sequence;
FIG. 4 is a 6 th inspection frame image of the merged slow video sequence;
FIG. 5 is an 11 th inspection frame image of the merged slow video sequence;
FIG. 6 is a1 st detection frame image of the fused fast video sequence;
FIG. 7 is a 3 rd detected frame image of the fused fast video sequence;
FIG. 8 is a 4 th inspection frame image of the fused fast video sequence;
FIG. 9 is a 7 th inspection frame image of the fused fast video sequence;
FIG. 10 is a1 st frame image of a synchronous detection-tracking method detecting a night vision anti-blooming video sequence;
FIG. 11 is a 2 nd frame image of a synchronous detection-tracking method detecting a night vision anti-blooming video sequence;
FIG. 12 is a 3 rd frame image of a synchronous detection-tracking method detecting a night vision anti-blooming video sequence;
FIG. 13 is an intermittent loop detection-tracking method for detecting a1 st frame image of a night vision anti-blooming video sequence;
FIG. 14 is an intermittent loop detection-tracking method for detecting a 6 th frame image of a night vision anti-blooming video sequence;
FIG. 15 is a diagram of an adaptive intermittent loop detection-tracking method for detecting a1 st frame image of a night vision anti-blooming video sequence;
FIG. 16 is an adaptive intermittent loop detection-tracking method for detecting a 3 rd frame image of a night vision anti-blooming video sequence;
FIG. 17 is a diagram of an adaptive intermittent loop detection-tracking method for detecting a 4 th frame image of a night vision anti-blooming video sequence;
FIG. 18 is an adaptive intermittent loop detection-tracking method for detecting the 7 th frame image of a night vision anti-blooming video sequence.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
In order to solve the problems of low processing speed, low precision and the like of a pedestrian detection and tracking method in a dynamic scene of halation at night, the invention designs a self-adaptive intermittent cyclic detection-tracking method suitable for the night vision integration night vision anti-halation pedestrian detection and tracking. The method selects detection frames in a self-adaptive manner according to the condition of an interframe difference threshold and a maximum interframe separation threshold, wherein the interframe difference threshold is determined by the correlation of a feature vector cosine included angle threshold of two frames of images, the number of interval frames and the visual effect of the detection frames, so that the number of the detection frames can be reduced to the maximum extent on the premise of meeting the visual characteristics of human eyes, the processing efficiency of night-vision anti-halation pedestrian detection is improved, and the problems of missed detection and multiple detection caused by over-small difference due to over-large content difference of video frames during intermittent cyclic detection-tracking can be solved; the maximum frame separation threshold is introduced, the problems of untimely updating, tracking loss error and the like caused by too small content difference of video frames during detection and tracking can be solved, and the precision and fault tolerance of night vision anti-halation pedestrian detection are improved.
A night vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion comprises the following steps:
step 1, self-adaptive selection of detection frame
Step 1.1, extracting a video frame, and determining a1 st frame of a video sequence as a reference frame and a 2 nd frame of the video sequence as a current frame;
step 1.2, calculating the number n of interval frames between the current frame and the reference frame according to the following formula;
n=nc-nr (1)
in the formula, ncIs the current frame number, nrIs the reference frame number.
Step 1.3, calculating a maximum frame separation threshold value N according to the following formula;
N=f×T (2)
where f is the frame rate and T is the maximum detection time threshold. When video content is observed, the time from the transmission of the light signal from human eyes to brain nerves until visual residue disappears is 0.2s, and T is taken to be 0.2s to ensure the continuity of video observed by the human eyes.
Step 1.4, if N is larger than or equal to N, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame C of the current frame is set as a new current frame, and the step 1.2 is executed; if N is less than N, sequentially executing the step 1.5;
step 1.5, calculating a characteristic vector cosine included angle theta between a current frame and a reference frame according to the following steps;
Figure BDA0002658678130000051
in the formula, the reference frame feature vector R ═ R0,r1,...,r63]The current frame feature vector C ═ C0,c1,...,c63]。
Step 1.5.1, acquiring an RGB histogram of a visible halo image;
step 1.5.2, constructing RGB histogram feature vectors, and mapping the RGB three-dimensional vectors into one-dimensional feature vectors; the method comprises the following processing steps:
step 1.5.2.1, mapping each RGB pixel value to an integer with a value range of [0,63] according to the following formula;
indexi=[Bi/64]×42+[Gi/64]×41+[Ri/64]×40,1≤i≤N (4)
in the formula, indexiRepresenting the mapping value, R, corresponding to the ith pixel pointi、Gi、BiThe pixel value of the ith pixel point in the image is shown, and N is the total pixel point. ([ B)i/64],[Gi/64],[Ri/64]) Is a quadruple number with three digits from low to high, and has (0,1,2,3) four color partitions.
Step 1.5.2.2 counts the number of each mapping value in the whole image, and the number of 64 mapping values forms a one-dimensional vector X, which is denoted as X ═ Num0,Num1,…,Num63) The method not only keeps the characteristics of each color channel of the whole image, but also avoids the problem of huge calculation amount by directly using an RGB histogram;
and 1.6, comparing theta with a set interframe difference threshold tau, and detecting only the current frame exceeding the threshold.
If theta is larger than tau, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame of the current frame C is set as a new current frame, and the step 1.2 is executed; if theta is less than tau, the current frame C is not a detection frame, the reference frame is unchanged, the next frame of the current frame is set as a new current frame, and the step 1.2 is skipped to execute;
the specific implementation method of the threshold τ is as follows:
if the threshold value of the cosine included angle of the feature vector is set to be too large, the number of the interval frames is too large, so that the observed video content is discontinuous, and if the threshold value is set to be too small, the number of the interval frames is too small, so that the inter-frame content still has redundancy. Therefore, to ensure that the detection frame number is reduced to the maximum extent on the premise that the video sequence is continuous, the key is to determine the optimal balance point between the interval detection frame number of the video sequence and the visual effect. And determining the best value of the interframe difference threshold tau by researching the relation between the threshold of the cosine included angle of the characteristic vector and the number of interval detection frames and the visual effect. The number of interval frames is measured by a frame rate, and the visual effect after frame detection is judged from the subjective and objective aspects according to the visual characteristics of human eyes and the overall correlation index of the video sequence.
On the aspect of human visual characteristics, on the premise of keeping the playing time lengths of the video sequences before and after the frame detection, if the human eyes hardly feel the difference between the two video sequences before and after the frame detection during actual playing, the video content is still continuous, and the middle undetected frame is a redundant frame; otherwise, the undetected frame contains a valid frame.
On an objective index, the overall correlation of the video sequence after frame detection is measured by nonlinear correlation information entropy NCIE. Nonlinear correlation information entropy of K video interframe nonlinear correlation quantitative measures
Figure BDA0002658678130000061
Comprises the following steps:
Figure BDA0002658678130000063
wherein the non-linear correlation matrix RNFormula (9) non-linear joint entropy
Figure BDA0002658678130000064
The formula is (10):
RN={NCCij}1≤i≤K,1≤j≤K (6)
Figure BDA0002658678130000065
in the formula, NCCijRepresents a non-linear correlation coefficient between the ith frame image and the jth frame image, wherein,
Figure BDA0002658678130000066
is the eigenvalue of the non-linear correlation matrix.
By performing experiments on video sequences with different pedestrian motion speeds in different night-vision halation scenes, the number of interval frames is calculated under the condition that the threshold value of a cosine included angle is gradually increased, the overall correlation of the video sequences after the interval frame detection is judged by calculating the NCIE value, and the interframe difference threshold value tau is determined according to the variation trend of the overall correlation.
According to research results, the interval frame number is increased along with the increase of the threshold value of the cosine included angle of the feature vector. The frame removing rate of a video sequence with slow pedestrian movement is higher, and the frame removing rate of different videos is 62% -76% when tau is 2; the relative frame removing rate of a sequence with fast motion of a video object is low, and when tau is 2, the frame removing rate is between 30% and 38%.
The overall trend of NCIE changes is decreasing with increasing τ. When tau is less than or equal to 2, the change of the NCIE value is relatively stable, the NCIE value of the whole sequence is quite close to the NCIE value before frame detection, and human eyes cannot feel the difference between the two sequences when the NCIE is actually played; when tau is more than 2 and less than 2.5, the NCIE begins to be greatly reduced and has a larger difference with the NCIE value before frame-separating detection, and human eyes can feel the difference of two sequences during actual playing; when tau is more than or equal to 2.5, the NCIE value oscillates and changes, but the integral value is less than the NCIE value when tau is less than or equal to 2, which shows that an inflection point exists in the range of 1.5 to tau is less than or equal to 2.5, so that NCIE is mutated, the integral correlation of a video sequence is weakened, and the video content begins to generate a discontinuous phenomenon.
In summary, the value of the interframe difference threshold τ is to satisfy the requirement that NCIE is in a stable region and the frame removing rate is high. From the research results, it is known that the NCIE value changes abruptly when τ is 2.2, and the requirement of high number of interval frames is also met, and the value of the threshold τ is reasonable between [1.5 and 2.2 ]. Preferably, τ is 1.8, is located in the middle of the NCIE plateau, is far from the inflection point of the NCIE abrupt change, and has a high number of frame intervals, so as to ensure that the number of detected frames, i.e., the lowest number of detected frames when the video content is continuous, can be reduced to the maximum extent on the premise that the video sequence is continuous, and the number of detected frames can be effectively reduced and meets the visual characteristics of human eyes.
And 2, starting pedestrian detection after the detection condition is met, and specifically comprising the following steps:
step 2.1, cutting the detection frame into a plurality of subgraphs with equal size, and obtaining the initial detection result S of each subgraph through the pedestrian detectoriThe set S is obtained according to the following formula:
S=(S0∪S1∪…∪Sn)-((S0∩S1)∪(S1∩S2)…∪(Sm-1∩Sm)) (8)
wherein m is the number of cuts, i is 0.
Step 2.2, calculating the first screening test set S according to the following formulaI
SI=S-SO (9)
In the formula, SOThe degree of overlap with the detection frame is larger than a threshold value y1Set of all detection boxes. By experiment on the actual video sequence, when the overlapping degree y1When the number of the overlapping frames is 0.7, the number of the overlapping frames is small, and the detected pedestrian is not lost.
And 2.3, further screening the detection frames in the redundant area boundary range.
Step 2.3.1, screening frames in the redundant area according to the position coordinates of the detection frames;
step 2.3.2, calculate the set of redundant boxes S according to the following equationR
Figure BDA0002658678130000071
Wherein a and b are candidate frames, y2、y3The culling threshold is determined. By experiment on the actual video sequence, take y2=0.8,y30.6, can accurately detect complete pedestrian and precision is higher.
Step 2.4, acquiring a pedestrian detection frame set S of the detection frame according to the following formulaP
SP=SI-SR (11)
And 3, tracking the pedestrian after detecting the pedestrian, wherein the method comprises the following specific steps:
step 3.1, taking the detection frame in the detection frame as a sample image block, and obtaining a training sample x ═ x by cyclic shift i1,2, n, inputting the i ═ 1,2,. n } into a classifier for training;
step 3.2, solving DFT transformation of tracker template alpha according to the following formula
Figure BDA0002658678130000072
Figure BDA0002658678130000073
In the formula (I), the compound is shown in the specification,
Figure BDA0002658678130000074
is the element of the kernel function k (x, x'),
Figure BDA0002658678130000075
for training sample x ═ xiA regression value y ═ { y ═ 1,2,. and n } corresponding to | i ═ 1,2iA DFT transform of 1, 2.,. n }, λ being a regularization parameter.
Step 3.3, calculate according to the following equation
Figure BDA0002658678130000076
After the tracking pedestrian position is converted from the frequency domain to the time domain, the region with the largest numerical value is the position of the tracking pedestrian;
Figure BDA0002658678130000077
in the formula (I), the compound is shown in the specification,
Figure BDA0002658678130000078
is an element of the kernel function k (x, z'). Constructing a detection sample z ═ { z ] by cyclic dense samplingi|i=1,2,...,n},zi=Piz。
A specific simulation example is given below to illustrate the present invention.
Example (b):
the embodiment builds the environment description: a visible light camera Basler ACA1280-60gc and a far infrared camera Gobi-640-GigE are adopted to simultaneously acquire visible light and infrared video under a night halation scene, the resolution ratio is 640 x 480, and image data are transmitted to an image processing platform through a gigabit network port. The processing platform adopts a portable computer, the computer processor is Intel (R) core (TM) i7-7700HQ CPU @2.80GHz, the display card is NVIDIA GeForce GTX1050, and a Windows 1064-bit operating system is used. The processing software was MATLAB2018, Visual Studio 2017 in combination with OpenCV3.4.1 library.
The main contents are as follows: a self-adaptive intermittent cycle detection-tracking method (see figure 1) is designed, a detection end in a detection-tracking working mode is triggered by self-adaptive intermittent cycles, the number of frames processed by a detection algorithm is reduced, and the problems of low processing speed, low precision and the like of a pedestrian detection and tracking method in a night halation dynamic scene are solved. The method comprises the following specific steps:
an adaptive selection detection frame
1. Extracting a video frame, and determining a1 st frame of a video sequence as a reference frame and a 2 nd frame of the video sequence as a current frame;
2. calculating the frame number n of the interval between the reference frame and the current frame according to the formula (1);
3. calculating a maximum frame separation threshold value N according to a formula (2);
4. if N is larger than or equal to N, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame of the current frame C is set as a new current frame, and the step 2 is skipped to execute; if N is less than N, sequentially executing the step 5;
5. calculating a vector cosine included angle theta between the current frame and the reference frame according to the formula (3);
6. and determining the frame difference threshold value tau according to the formulas (5), (6) and (7) to ensure that the detection frame number is reduced to the maximum extent on the premise of continuous video sequences.
7. And comparing theta with a set threshold value tau, and detecting only the current frame exceeding the threshold value. If theta is larger than tau, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame of the current frame C is set as a new current frame, and the step 2 is executed; if theta is less than tau, the current frame C is not the detection frame, the reference frame is not changed, the next frame of the current frame is set as the new current frame, and the step 2 is executed.
Randomly selecting two groups of video sequences under a night vision anti-halation scene for an experiment, wherein Slow video is a detection motion sequence of a Slow video, Fast video is a frame detection motion sequence of a Fast video, the original video frame rate is 25 frames/s, N is set to be 5 according to a formula (2), and the playing time lengths of the Slow video and the Fast video are respectively 14.84s and 15 s. According to the video frame detection process (see fig. 1), the original detection frame number of two groups of videos is reduced from 371, 375 to 96, 231 frames by using an adaptive intermittent cycle detection-detection tracking method. The first 4 frames of detection frames adaptively selected from the fused video sequence are listed, which correspond to the 1 st, 4 th, 6 th and 11 th frames of Slow video of the original video (see fig. 2-5) and the 1 st, 3 th, 4 th and 7 th frames of Fast video of the original video (see fig. 6-9), and the experimental results of the intermittent cycle detection-detection tracking method and the adaptive intermittent cycle detection-detection tracking method of the present invention are shown in table 1 and table 2.
Table 1 sequential detection frames selected in Slow video
Figure BDA0002658678130000081
TABLE 2 consecutive detection frames selected in Fast video
Figure BDA0002658678130000091
As can be seen from the frame detection results in tables 1 and 2, in Slow video, when the number of original detection frames is 11 frames, only 1, 4, 6 and 11 four frames need to be detected by adopting a self-adaptive intermittent cycle detection method, wherein the 1, 4 and 6 frames meet (N < N) # and (theta is more than or equal to tau), the self-adaptive interval time at the moment is respectively 0.12s and 0.08s, and the 6 th frame and the 11 th frame meet N is more than or equal to N, so that the 11 th frame needs to be detected, and the self-adaptive interval time is 0.2 s; the frames selected by intermittent cycle detection are 1, 6 and 11 frames, and the intermittent interval time is fixed to 0.2 s. In Fast video, when the number of original detection frames is 7 frames, selecting 1, 3, 4 and 7 frames by adopting a self-adaptive intermittent cycle detection method, wherein the 1, 3, 4 and 7 frames all meet (N < N) # and (theta is more than or equal to tau), and the corresponding self-adaptive interval detection time is 0.08s, 0.04s and 0.12 s; the detection frames selected by intermittent cycle detection are 1 and 6 frames, and the interval time is 0.2 s. The self-adaptive intermittent cyclic detection-tracking method can be used for improving the real-time performance of processing, avoiding missing detection caused by intermittent cyclic detection and improving the speed and the precision of the heterogeneous video fusion night vision anti-halation pedestrian detection and tracking.
Two, video frame detection-tracking
1. Cutting the detection frame into a plurality of subgraphs with equal sizes, and calculating a set S according to a formula (8) by using the detection result of each subgraph;
2. obtaining a first-screened detection set S according to a formula (9)I
3. Calculating a redundant box set S according to equation (10)R
4. Pedestrian detection frame set S for obtaining detection frames according to formula (11)P
5. Solving the DFT transform of the tracker template α according to equation (12)
Figure BDA0002658678130000092
6. Determining the position of the tracked pedestrian according to the formula (13);
7. the next frame image is tracked.
Randomly selecting a group of video sequences under a night vision anti-halation scene, wherein the original video frame rate is 25 frames/s, the original frame number is 375, the playing time length is 15s, and processing the night vision anti-halation video by using three methods of synchronous detection-tracking, intermittent cyclic detection-tracking and self-adaptive intermittent cyclic detection-tracking respectively, wherein the frame number of the detection-tracking detection is 375, the frame number of the intermittent cyclic detection is 62, and the frame number of the self-adaptive intermittent cyclic detection is 231. Three methods are listed here to select the first few consecutive frames of detection, the synchronous detection-tracking method detects the 1 st, 2 nd, 3 rd frames of the night vision anti-blooming video sequence (see fig. 10-12), the intermittent cyclic detection-tracking detects the 1 st, 6 th frames of the night vision anti-blooming video sequence (see fig. 13-14), and the adaptive intermittent cyclic detection-tracking method detects the 1 st, 3 rd, 4 th, 7 th frames of the night vision anti-blooming video sequence (see fig. 15-18). Table 3 gives the accuracy and average detection time for the three detection-tracking methods.
TABLE 3 comparison of the results of the three methods in night vision anti-halation video
Figure BDA0002658678130000101
As can be seen from table 3, the detection speed of the intermittent cyclic detection-tracking method is increased by more than 3 times compared with the synchronous detection-tracking method, so that the problems of low processing speed, video blockage and the like of the synchronous detection-tracking method are solved, but the problem of missing detection caused by fixed interval detection also exists, so that the accuracy of pedestrian detection is reduced to some extent; compared with the intermittent cycle, the accuracy of the self-adaptive intermittent cycle detection-tracking method is improved by 9.47%, the detection speed is reduced from 50FPS to 30FPS, and the real-time requirement of pedestrian detection and tracking is still met. From the comprehensive performance, the self-adaptive intermittent cyclic detection-tracking method improves the night-vision anti-halation pedestrian detection and tracking accuracy while meeting the real-time requirement, is superior to the synchronous detection-tracking and intermittent cyclic detection-tracking methods, and is more suitable for pedestrian detection and tracking under the dynamic background of halation at night.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (2)

1. A night vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion comprises the following steps:
step 1, self-adaptively selecting a detection frame, wherein the step specifically comprises the following steps:
step 1.1, extracting a video frame, and determining a1 st frame of a video sequence as a reference frame and a 2 nd frame of the video sequence as a current frame;
step 1.2, calculating the number n of interval frames between the current frame and the reference frame according to the following formula;
n=nc-nr
in the formula, ncIs the current frame number, nrIs a reference frame number;
step 1.3, calculating a maximum frame separation threshold value N according to the following formula;
N=f×T
in the formula, f is a frame rate, T is a maximum detection time threshold, and T is 0.2 s;
step 1.4, if N is larger than or equal to N, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame C of the current frame is set as a new current frame, and the step 1.2 is executed;
if N is less than N, sequentially executing the step 1.5;
step 1.5, calculating a characteristic vector cosine included angle theta between a current frame and a reference frame according to the following steps;
Figure FDA0002658678120000011
where the reference frame feature vector R is R,0r1,...,r63]the current frame feature vector C ═ C0,c1,...,c63];
Step 1.6, comparing theta with a set interframe difference threshold tau, wherein the value of the threshold tau is 1.5-2.2, and only detecting the current frame exceeding the threshold:
if theta is larger than tau, the current frame C is a detection frame, the current frame C is set as a new reference frame, the next frame of the current frame C is set as a new current frame, and the step 1.2 is executed;
if theta is less than tau, the current frame C is not a detection frame, the reference frame is unchanged, the next frame of the current frame C is set as a new current frame, and the step 1.2 is skipped to execute;
and 2, starting pedestrian detection after the detection condition is met, and specifically comprising the following steps:
step 2.1, cutting the detection frame into a plurality of subgraphs with equal size, and obtaining the initial detection result S of each subgraph through the pedestrian detectoriThe set S is obtained according to the following formula:
S=(S0∪S1∪…∪Sn)-((S0∩S1)∪(S1∩S2)…∪(Sm-1∩Sm))
wherein m is the number of times of cutting, i is 0.
Step 2.2, calculating the first screening test set S according to the following formulaI
SI=S-SO
In the formula, SOThe degree of overlap with the detection frame is larger than a threshold value y1Set of all detection boxes of (1), take y1=0.7;
2.3, further screening the detection frames in the redundant area boundary range;
step 2.3.1, screening frames in the redundant area according to the position coordinates of the detection frames;
step 2.3.2, calculate the set of redundant boxes S according to the following equationR
Figure FDA0002658678120000021
Wherein a and b are candidate frames, y2、y3To reject the threshold, take y2=0.8,y3=0.6;
Step 2.4, acquiring a pedestrian detection frame set S of the detection frame according to the following formulaP
SP=SI-SR
And 3, tracking the pedestrian after detecting the pedestrian, wherein the method comprises the following specific steps:
step 3.1, taking the detection frame in the detection frame as a sample image block, and obtaining a training sample x ═ x by cyclic shifti1,2, n, inputting the i ═ 1,2,. n } into a classifier for training;
step 3.2, solving DFT transformation of tracker template alpha according to the following formula
Figure FDA0002658678120000022
Figure FDA0002658678120000023
In the formula (I), the compound is shown in the specification,
Figure FDA0002658678120000024
is the element of the kernel function k (x, x'),
Figure FDA0002658678120000025
for training sample x ═ xiA regression value y ═ { y ═ 1,2,. and n } corresponding to | i ═ 1,2iDFT transform of 1,2, ·, n, λ is a regularization parameter;
step 3.3, calculate according to the following equation
Figure FDA0002658678120000026
After the tracking pedestrian position is converted from the frequency domain to the time domain, the region with the largest numerical value is the position of the tracking pedestrian;
Figure FDA0002658678120000027
in the formula (I), the compound is shown in the specification,
Figure FDA0002658678120000028
is an element of a kernel function k (x, z'); constructing a detection sample z ═ { z ] by cyclic dense samplingi|i=1,2,...,n},zi=Piz。
2. The method for night vision anti-halation pedestrian detection and tracking fused with heterologous video according to claim 1, characterized in that: in step 1.6, τ is 1.8.
CN202010896881.5A 2020-08-31 2020-08-31 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion Active CN112069967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010896881.5A CN112069967B (en) 2020-08-31 2020-08-31 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010896881.5A CN112069967B (en) 2020-08-31 2020-08-31 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion

Publications (2)

Publication Number Publication Date
CN112069967A true CN112069967A (en) 2020-12-11
CN112069967B CN112069967B (en) 2022-12-06

Family

ID=73665359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010896881.5A Active CN112069967B (en) 2020-08-31 2020-08-31 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion

Country Status (1)

Country Link
CN (1) CN112069967B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080036576A1 (en) * 2006-05-31 2008-02-14 Mobileye Technologies Ltd. Fusion of far infrared and visible images in enhanced obstacle detection in automotive applications
US20090237511A1 (en) * 2008-03-18 2009-09-24 Bae Systems Information And Electronic Systems Integration Inc. Multi-window/multi-target tracking (mw/mt tracking) for point source objects
CN106023129A (en) * 2016-05-26 2016-10-12 西安工业大学 Infrared and visible light image fused automobile anti-blooming video image processing method
CN109166149A (en) * 2018-08-13 2019-01-08 武汉大学 A kind of positioning and three-dimensional wire-frame method for reconstructing and system of fusion binocular camera and IMU
CN109166131A (en) * 2018-09-29 2019-01-08 西安工业大学 The infrared vehicle night vision anti-blooming light image segmentation and evaluation method merged with visible light
CN111339369A (en) * 2020-02-25 2020-06-26 佛山科学技术学院 Video retrieval method, system, computer equipment and storage medium based on depth features

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080036576A1 (en) * 2006-05-31 2008-02-14 Mobileye Technologies Ltd. Fusion of far infrared and visible images in enhanced obstacle detection in automotive applications
US20090237511A1 (en) * 2008-03-18 2009-09-24 Bae Systems Information And Electronic Systems Integration Inc. Multi-window/multi-target tracking (mw/mt tracking) for point source objects
CN106023129A (en) * 2016-05-26 2016-10-12 西安工业大学 Infrared and visible light image fused automobile anti-blooming video image processing method
CN109166149A (en) * 2018-08-13 2019-01-08 武汉大学 A kind of positioning and three-dimensional wire-frame method for reconstructing and system of fusion binocular camera and IMU
CN109166131A (en) * 2018-09-29 2019-01-08 西安工业大学 The infrared vehicle night vision anti-blooming light image segmentation and evaluation method merged with visible light
CN111339369A (en) * 2020-02-25 2020-06-26 佛山科学技术学院 Video retrieval method, system, computer equipment and storage medium based on depth features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KAI ZHANG等: "Infrared Weak-Small Target Image Fusion Based on Contrast and Wavelet Transform", 《AISS’19》 *
柴改霞等: "可见光与红外融合的汽车抗晕光图像评价方法", 《红外技术》 *
郭全民等: "夜视抗晕光融合图像自适应分区质量评价", 《电子与信息学报》 *
郭全民等: "红外与可见光图像融合的汽车抗晕光***", 《红外与激光工程》 *

Also Published As

Publication number Publication date
CN112069967B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN109697726B (en) Event camera-based end-to-end target motion estimation method
Wang et al. Haze concentration adaptive network for image dehazing
CN108062525B (en) Deep learning hand detection method based on hand region prediction
Hassanien et al. Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks
CN113011329B (en) Multi-scale feature pyramid network-based and dense crowd counting method
CN102903085B (en) Based on the fast image splicing method of corners Matching
CN105139421B (en) Video key frame extracting method of the electric system based on mutual information
CN101281650A (en) Quick global motion estimating method for steadying video
CN104063871B (en) The image sequence Scene Segmentation of wearable device
CN105931270A (en) Video keyframe extraction method based on movement trajectory analysis
Lai et al. Video anomaly detection via predictive autoencoder with gradient-based attention
CN110163887A (en) The video target tracking method combined with foreground segmentation is estimated based on sport interpolation
CN111968153A (en) Long-time target tracking method and system based on correlation filtering and particle filtering
Wang et al. Lightweight multiple scale-patch dehazing network for real-world hazy image
Yang et al. BANDT: A border-aware network with deformable transformers for visual tracking
Guo et al. DeblurSLAM: A novel visual SLAM system robust in blurring scene
CN113365103B (en) Automatic bad frame detection method, device, equipment, storage medium and program product
Tang et al. MPCFusion: Multi-scale parallel cross fusion for infrared and visible images via convolution and vision Transformer
CN112069967B (en) Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion
CN113014923A (en) Behavior identification method based on compressed domain representation motion vector
Ponomaryov et al. Fuzzy color video filtering technique for sequences corrupted by additive Gaussian noise
CN110322479B (en) Dual-core KCF target tracking method based on space-time significance
Li et al. Graph-based saliency fusion with superpixel-level belief propagation for 3D fixation prediction
RU2656785C1 (en) Motion estimation through three-dimensional recursive search (3drs) in real time for frame conversion (frc)
CN113610736B (en) Night image enhancement method and system based on cyclic generation of countermeasure residual error network and QTP loss item

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant