CN110929560A - Video semi-automatic target labeling method integrating target detection and tracking - Google Patents

Video semi-automatic target labeling method integrating target detection and tracking Download PDF

Info

Publication number
CN110929560A
CN110929560A CN201910963482.3A CN201910963482A CN110929560A CN 110929560 A CN110929560 A CN 110929560A CN 201910963482 A CN201910963482 A CN 201910963482A CN 110929560 A CN110929560 A CN 110929560A
Authority
CN
China
Prior art keywords
frame
target
value
tracking
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910963482.3A
Other languages
Chinese (zh)
Other versions
CN110929560B (en
Inventor
徐英
谷雨
刘俊
彭冬亮
陈庆林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Electronic Science and Technology University
Original Assignee
Hangzhou Electronic Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Electronic Science and Technology University filed Critical Hangzhou Electronic Science and Technology University
Priority to CN201910963482.3A priority Critical patent/CN110929560B/en
Publication of CN110929560A publication Critical patent/CN110929560A/en
Application granted granted Critical
Publication of CN110929560B publication Critical patent/CN110929560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video semi-automatic target marking method integrating target detection and tracking. In the subsequent frames, fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the invention judges whether the target marking is finished according to the target tracking algorithm. If the target marking is finished, extracting the video key frames according to the size of the significant value of each frame of target to obtain a target marking result, and otherwise, continuously estimating the position of the target in the video image; the method for extracting the video key frame based on the target significance enables the key frame to reflect the diversity of target changes. The invention adopts multi-lens multi-ship video to carry out experimental test, and verifies the effectiveness of the method provided by the invention.

Description

Video semi-automatic target labeling method integrating target detection and tracking
Technical Field
The invention belongs to the field of video data marking, and relates to a video target marking method which integrates target detection and target tracking and extracts video key frames according to target significance.
Background
In recent years, the deep learning technology is rapidly developed, and the target detection and target tracking field is promoted to continuously realize new breakthroughs. Because the deep learning technology needs the support of big data, obtaining a large amount of accurate label training data with sample diversity is the key for obtaining excellent performance by the deep learning technology.
At present, two methods of manual marking and automatic marking are mainly used for acquiring training data. The manual marking adopts a manual mode to mark the target position and the label in a single image, a large number of continuous image frames exist in the video, the manual marking efficiency is low, and the automatic marking becomes possible due to the fact that the target in the video has the characteristic of space-time continuity. In the prior art, only a target tracking algorithm based on relevant filtering is used for video target labeling, and the accuracy of a labeling result cannot meet the requirement of being used as training data. And only the target detection algorithm is used for marking the video target, the detector marks all targets which accord with the target type in the subsequent frame according to the type of the target of the initial frame, and whether the targets are the same as the initial frame cannot be judged, or the detector fails to detect due to factors such as jitter and blurring of the target and the like to cause inconsistent marking of the video target. The invention integrates the detection and tracking algorithms, combines the advantages of the two algorithms, can improve the accuracy of automatic labeling, can determine the same target by utilizing the space-time continuity of the tracking algorithm, solves the problem of detection omission of the detector, can automatically judge that the target disappears and improves the labeling efficiency.
The invention provides a video semi-automatic labeling method, which comprises the steps of manually labeling a target position in an initial frame, automatically labeling the target position in a subsequent frame, and finally automatically extracting a plurality of key frames to obtain a labeling result. The main problems to be solved include: (1) how to improve the accuracy and the consistency of video target labeling is the first problem to be solved. (2) In order to reduce manual participation and improve the labeling efficiency, it is necessary to automatically determine the target disappearance and the end of labeling. (3) The extracted key frames can reflect the diversity of changes of target dimension size, angle, illumination and the like.
Aiming at the condition that the current independent target detection algorithm or target tracking algorithm cannot meet the automatic labeling requirement of the video target, the invention integrates target detection and target tracking through reasonable rules, thereby greatly improving the efficiency and accuracy of labeling the video target; in addition, a method for extracting video key frames based on target significance is provided, so that the extracted key frames can accurately reflect the diversity of target changes.
Disclosure of Invention
The invention provides a video semi-automatic target marking method integrating target detection and tracking, aiming at solving the technical problems that the existing automatic marking means is low in precision and continuity or low in manual marking speed.
Firstly, a certain frame is selected as an initial frame in a video image, the initial position of a target is marked manually, and a category label of the target is determined. And in the subsequent frames, fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of the target in the image, and judging whether the target marking is finished according to the target tracking algorithm. And if the target marking is finished, extracting the video key frames according to the size of the significant value of each frame of target to obtain a target marking result, and otherwise, continuously estimating the position of the target in the video image. .
The technical scheme adopted by the invention comprises the following steps:
1. the video semi-automatic target marking method integrating target detection and tracking is characterized by comprising the following steps of:
selecting a certain frame as an initial frame in a certain shot of a video, manually marking the initial position and size of a target, and determining a category label of the target;
step (2), adopting automatic labeling for other subsequent frames after the initial frame, specifically fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the method comprises the following steps:
2.1 detecting the target in each frame of image by adopting YOLO V3 and marking a detection frame;
the YOLO V3 is a training sample obtained by adjusting the size of the labeled target image to a fixed scale, and training YOLO-V3; wherein, the YOLO layer is increased to 4 layers, and four different receptive field characteristic maps with different scales of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104 are obtained through multi-scale characteristic fusion; using three prior boxes of (116x90), (156x198) and (373x326) to predict the 13 x13 feature map, detecting a larger object; predicting the 26 × 26 feature map by using three prior boxes of (30x61), (62x45) and (59x119), and detecting objects with medium sizes; using three prior boxes of (10x13), (16x30) and (33x23) to predict a 52 x 52 feature map, detecting smaller objects; the 104 x 104 feature map is predicted by using three kinds of prior boxes of (5x6), (8x15) and (16x10) which are newly added, and a smaller target is detected;
2.2 acquiring a tracking frame of the target by adopting a KCF related filtering tracking algorithm;
firstly, HOG characteristics are extracted according to the target position and size of the previous frame, then the HOG characteristics are converted into a frequency domain through Fourier transformation, the obtained frequency domain characteristics are mapped to a high dimension through a Gaussian kernel function, and a filtering template α is obtained according to the formula (1):
Figure BDA0002229725840000031
wherein x represents the HOG characteristic of the sample, ^ represents Fourier transform, g is a two-dimensional Gaussian function with the center as the peak value, and λ is a regularization parameter used for controlling overfitting of training; k is a radical ofxxThe kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure BDA0002229725840000032
in which σ is a Gaussian kernel functionThe width parameter, which controls the radial extent of action of the function, represents the complex conjugate, ⊙ represents the dot product,
Figure BDA0002229725840000033
representing the inverse fourier transform, c is the number of channels of the HOG feature x;
when the target is tracked on the image of the t-th frame, the update of the correlation filter α is given by:
Figure BDA0002229725840000034
η is an update parameter;
to accommodate the scale change of the object, the filter α of the current frametScaling is needed so as to predict the size of the next frame of target; wherein the scaling ratios are [1.1,1.05,1,0.95,0.9];
Extracting a candidate sample HOG characteristic z at a t frame target position on a t +1 frame image; in conjunction with each of the above-mentioned size-scaled filters, each corresponding filtered output response plot f is shown in equation (4):
Figure BDA0002229725840000035
Figure BDA0002229725840000036
where m ═ 1,2,3,4,5, corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is screened from the 5 response graphs f maximum values max (f)max,fmaxThe corresponding position is the position of the target center, fmaxThe corresponding scaling is the target size, and a tracking frame of the t +1 th frame is obtained;
2.3 fusing the results of target detection and target tracking to determine the labeled target frame;
firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the detection frame is only one, if yes, calculating the IOU values of the tracking frame and the detection frame, if the IOU value is larger than a threshold value, taking the target frame as the detection frame, initializing a KCF tracking algorithm by using the detection frame, and if not, taking the target frame as the tracking frame; if the number of the detection frames is multiple, the IOU value of the tracking frame and each detection frame needs to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target frame is the detection frame corresponding to the maximum IOU value, a KCF tracking algorithm is initialized by using the detection frame, and if not, the target frame is the tracking frame;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure BDA0002229725840000041
wherein SIIndicates the overlapping area of the tracking frame and each detection frame under the same frame, SURepresenting the area of the set part of the tracking frame and each detection frame under the same frame, wherein the area of the set part is the sum of the areas of the tracking frame and the detection frame minus the overlapping area;
step (3), judging whether the target marking is finished or not according to a target tracking algorithm;
judging whether max (f) is smaller than a set threshold value theta and whether the Peak Sidelobe Ratio (PSR) is smaller than the set threshold value theta according to a response graph f of the KCF correlation filtering trackerPSRWhen, namely:
max(f)<θandPSR<θPSR(7)
if yes, judging that the target marking is finished, and turning to the step (4) to select the key frame; otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image;
the peak side lobe ratio (PSR) is calculated as follows:
Figure BDA0002229725840000042
where max (f) is the peak value of the correlation filter response map f, and Φ is 0.5,μΦ(f) And σΦ(f) Mean and standard deviation of 50% response area centered on the f peak, respectively;
step (4), calculating a significant value of each frame of target in the current shot; extracting a set number of video key frames according to the significant value of each frame of target to obtain a target labeling result; the method comprises the following steps:
4.1LBP (local binary pattern) extracting the texture feature of the image, the basic idea is that the texture feature is defined in the neighborhood of pixel 3x3, the neighborhood center pixel is used as a threshold value, the gray value of 8 adjacent pixels is compared with the threshold value, if the surrounding pixels are larger than the center pixel value, the position of the pixel point is marked as 1, otherwise, the position is 0; comparing 8 points in 3-by-3 neighborhood to generate 8-bit binary number, converting into decimal number to obtain LBP value of central pixel, and reflecting LBP information of the region with the value; the specific calculation formula is shown as (8):
Figure BDA0002229725840000051
wherein (x)0,y0) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, jpIs the gray value of the neighborhood pixel, j0The gray value of the central pixel; s (x) is a sign function:
Figure BDA0002229725840000052
4.2 the calculation formula of the color saliency map is as follows:
Figure BDA0002229725840000053
wherein the patch is an original image of the target frame areagaussianThe method is characterized in that the image is the image of patch after Gaussian filtering processing with 5 multiplied by 5 Gaussian kernel and 0 standard deviation, | | represents absolute value, i represents channel number, and (x, y) is pixel coordinate;
4.3 obtaining edge saliency characteristic map for pixel points of target edge region in each frame of image target frame
In the target edge area in the target frame, pixel values can jump, derivatives are obtained for the pixel values, and the first derivative of the derivatives is an extreme value at the edge position, namely the edge is at the extreme value, which is the principle used by the Sobel operator; if the second derivative is calculated for the pixel value, the derivative value at the edge is 0; the method for realizing the Laplace function is that first Sobel operators are used for calculating second-order x and y derivatives, then edge significance characteristic graphs are obtained through summation, and the calculation formula is as follows:
Figure BDA0002229725840000054
wherein I represents an image in the target frame, and (x, y) represents pixel coordinates of a target edge region in the target frame;
4.4, carrying out average weighted fusion on the LBP texture characteristics, the color saliency characteristics, the edge saliency characteristics and other characteristics to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure BDA0002229725840000055
wherein the content of the first and second substances,
Figure BDA0002229725840000056
respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t frame;
4.5 the color histogram variation value Dist is obtained by calculating the babbitt distance between the color histogram of the selected target area of the initial frame and the target area of the t-th frame, and the calculation formula is as follows:
Figure BDA0002229725840000061
wherein H0Manually labeling a selected target frame color histogram for an initial frame, HtAutomatically labeling the color histogram of the target frame for the t-th frame, n representing the total number of color histogram bins,
Figure BDA0002229725840000062
the calculation formula of (a) is given by:
Figure BDA0002229725840000068
wherein k is 0 or t;
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure BDA0002229725840000063
wherein
Figure BDA0002229725840000064
For the width and height of the target box of the initial frame,
Figure BDA0002229725840000065
and
Figure BDA0002229725840000066
the width and height of the target frame of the t frame;
4.7 according to the fusion value, the color histogram change value and the scale change value of the image target frame region, the calculation formula of the target significant value of the t-th frame is as follows:
Figure BDA0002229725840000067
wherein T represents the total number of frames of the video;
4.8 saliency S of each frame object in videotConstructing a significant value line graph, and solving all peak values and corresponding frames;
assuming that the video has T frames, setting the number of extracted key frames as n; the number of the significant value peak values is k, if n is less than k, the peak values are sorted in a descending order, frames corresponding to the first n peak values are extracted as key frames, if k is less than n and less than T, frames corresponding to all the peak values are extracted, and the rest n-k key frames adopt a random and unrepeated extraction mode; if n is larger than T, all video frames are used as key frames;
and (5) returning to the step (1) to label the target of the next video shot.
Compared with the prior art, the invention has the following remarkable advantages: (1) the invention creatively fuses the target detection algorithm and the target tracking algorithm, thereby improving the accuracy of target positioning and the continuity of target state estimation in the video image; (2) only the target initial position needs to be marked manually in the initial frame, and the marking is automatically judged to be finished in the marking process, so that the times of artificial participation are reduced; (3) and fusing the LBP texture features, the color saliency features and the edge saliency features of the target region, and calculating the target saliency by combining the color histogram change and the scale change, so that the extracted key frame can reflect the diversity of the target change.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of fused target detection and target tracking;
FIG. 3 is a flow chart of target saliency calculation;
FIG. 4 is a detection result of a 2 nd frame image in an example video;
FIG. 5 is a tracking result of a 2 nd frame image in an example video;
FIG. 6 is a fusion detection and tracking result for a 2 nd frame of image in an example video;
FIG. 7 is a KCF response plot peak change curve for the 2 nd lens of an example video;
FIG. 8 is a plot of the peak-to-side lobe ratio variation of the KCF response plot for the 2 nd lens of the example video;
FIG. 9 is a 243 frame image of an example video shot 2;
FIG. 10 is a 1 st frame image of an example video shot 3;
FIG. 11 is a target saliency curve for an example video shot 6;
fig. 12 is a key frame extracted for the 6 th shot of an example video.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, the method comprises the following steps:
selecting a certain frame in a video image as an initial frame, manually marking the initial position of a target, and determining the category label of the target.
And (2) fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image in a subsequent frame. The invention adopts a YOLO V3 detection algorithm and a KCF related filtering tracking algorithm, and a fusion method is shown as a figure 2 and specifically comprises the following steps:
2.1 the detector of the invention adopts the fast YOLO V3 detection algorithm in the current mainstream detection network, which meets the requirements of real-time and accuracy in the video annotation technology, including the feature extraction network Darknet-53 and the prediction network, the Darknet53 network adopts the ResNet shortcut connection, avoiding the gradient disappearance, in the prediction phase, the algorithm uses the method of region of interest extraction based on anchors in the RPN network, and fpn (feature Pyramid) network uses 3 scales of feature maps, small feature maps provide semantic information, large feature maps have finer granularity information, small feature maps are fused by upsampling and large scale, better detection effect is realized, in addition, compared with V1 and V2, yoo V3 does not use soft max loss function any more, but uses sigmod max + entropy cross function, thereby supporting the prediction of multiple labels.
The invention carries out the following improvement and optimization on the basis of the original model:
firstly, initializing training parameters by adopting a darknet53.conv.74 pre-training model in a feature extraction part, then increasing a YOLO layer of an original model to 4 layers, obtaining four different receptive field feature maps of 13 x13, 26 x 26, 52 x 52 and 104 x 104 in different scales through multi-scale feature fusion, and then predicting the feature map of 13 x13 by using three prior frames (116x90), (156x198) and (373x326) to detect a larger object; using (30x61), (62x45), (59x119) to predict the 26 x 26 feature map, detecting objects of medium size; the 52 × 52 feature map is predicted using (10x13), (16x30), (33x23) to detect smaller objects, and the 104 × 104 feature map is predicted using the newly added (5x6), (8x15), (16x10) three prior blocks to detect smaller objects. Compared with the original model, the improved detection network integrates the characteristics of lower layers, so that the detection rate of the small target is improved.
In each detection operation, inputting a t +1 th frame image, firstly resize to a fixed scale, and finally obtaining a detection frame containing an object category and a score value as a detection result of the t +1 th frame through a feature extraction network and a prediction network.
2.2 the KCF related filtering tracking algorithm firstly extracts HOG characteristics according to the target position and size of the t-th frame, then transfers the HOG characteristics to a frequency domain through Fourier transformation, maps the obtained frequency domain characteristics to a high dimension through a Gaussian kernel function, and obtains a filtering template α according to the formula (1):
Figure BDA0002229725840000081
where x represents the HOG characteristics of the sample, a represents the Fourier transform, g is a two-dimensional Gaussian function centered at the peak, and λ is a regularization parameter used to control the overfitting of the training. k is a radical ofxxThe kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure BDA0002229725840000082
where σ is the width parameter of the gaussian kernel function, which controls the radial extent of action of the function, denotes the complex conjugate, ⊙ denotes the dot product,
Figure BDA0002229725840000083
denotes the inverse fourier transform and c is the number of channels of the HOG feature x.
When performing object tracking on the tth frame image, the update of the correlation filter α is given by:
Figure BDA0002229725840000084
Figure BDA0002229725840000085
where m ═ 1,2,3,4,5, corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is screened from the 5 response graphs f maximum values max (f)max,fmaxThe corresponding position is the position of the target center, fmaxThe corresponding scaling is the target size, and a tracking frame of the t +1 th frame is obtained;
and 2.3, fusing the results of target detection and target tracking to determine the labeled target frame.
Firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the detection frame is only one, if yes, calculating the IOU values of the tracking frame and the detection frame, if the IOU value is larger than a threshold value, taking the target frame as the detection frame, initializing a KCF tracking algorithm by using the detection frame, and if not, taking the target frame as the tracking frame; if the number of the detection frames is multiple, the IOU value of the tracking frame and each detection frame needs to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target frame is the detection frame corresponding to the maximum IOU value, a KCF tracking algorithm is initialized by using the detection frame, and if not, the target frame is the tracking frame;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure BDA0002229725840000091
wherein SIIndicates the overlapping area of the tracking frame and each detection frame under the same frame, SUAnd the area of the collection part of the tracking frame and each detection frame under the same frame is represented. The area of the set part is the total area of the tracking frame and the detection frame minus the overlapping area;
and (3) the peak value of the response image f of the KCF correlation filtering tracker represents the confidence that the corresponding position is the target, and the higher the peak value is, the higher the probability that the position is the target is. The peak-to-side lobe ratio (PSR) measures the peak intensity of the correlation filtering output, and the higher the PSR value is, the higher the reliability of the tracking result is. If the peak value and the PSR are lower than the set threshold values, the target is possibly disappeared, and therefore the video target marking is judged to be finished. The peak side lobe ratio (PSR) is calculated as follows:
Figure BDA0002229725840000092
where max (f) is the peak value of the correlation filter response diagram f, Φ is 0.5, μΦ(f) And σΦ(f) Mean and standard deviation, respectively, of a 50% response region centered at the f-peak. If max (f) is less than the set threshold θ and PSR is less than the set threshold θPSRWhen, namely:
max(f)<θandPSR<θPSR(7)
and (4) judging that the target marking is finished, and turning to the step (4) to select the key frame. Otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image.
And (4) calculating a significant value of each frame of target, as shown in fig. 3, in the labeling process, acquiring a target region by using the target frame obtained in the step (2), then performing LBP texture feature, color significant feature and edge significant feature fusion on the target region, and calculating the significant value of the target by combining color histogram change and scale change. The method comprises the following specific steps:
4.1LBP (local binary pattern) extracts the texture feature of the target region, the basic idea is to define in the neighborhood of pixel 3x3, with the neighborhood center pixel as the threshold, the gray value of the adjacent 8 pixels is compared with it, if the surrounding pixel is greater than the center pixel value, the position of the pixel is marked as 1, otherwise, it is 0. 8 points in 3-by-3 neighborhood can generate 8-bit binary numbers through comparison, the LBP value of the central pixel can be obtained through conversion into decimal numbers, and the LBP information of the area is reflected through the value. The specific calculation formula is shown as (8):
Figure BDA0002229725840000101
wherein (x)0,y0) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, jpIs the gray value of the neighborhood pixel, j0Is the gray value of the neighborhood pixel. s (x) is a sign function:
Figure BDA0002229725840000102
4.2 the calculation formula of the color saliency map is as follows:
Figure BDA0002229725840000103
wherein the patch is a target area imagegaussianFor the image after patch is subjected to gaussian filtering with a gaussian kernel of 5 × 5 and a standard deviation of 0, | | represents an absolute value, i represents the number of channels of the image, and (x, y) is a horizontal coordinate and a vertical coordinate.
4.3 in the edge region of the target region image, the pixel values will "jump", and the derivative of these pixel values is determined, and its first derivative is extreme at the edge position, which is the principle used by the Sobel operator-the extreme is the edge. If the second derivative is taken over the pixel value, the derivative value at the edge is 0. The method for realizing the Laplace function is that first Sobel operators are used for calculating second-order x and y derivatives, then edge significance characteristic graphs are obtained through summation, and the calculation formula is as follows:
Figure BDA0002229725840000104
wherein I represents an image, (x, y) represents pixel coordinates of an edge region of the object in the object frame;
4.4, carrying out average weighted fusion on the LBP texture characteristics, the color saliency characteristics, the edge saliency characteristics and other characteristics to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure BDA0002229725840000111
wherein the content of the first and second substances,
Figure BDA0002229725840000112
and respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t-th frame.
4.5 color histogram of the target area image represents the distribution of color components in the image, showing different types of colors and the number of pixels in each color. The color histogram change value Dist is obtained by calculating the babbit distance between the color histogram of the initial frame selected target area and the color histogram of the t frame target area, the greater the Dist value is, the lower the similarity is, the more obvious the target change is, and the calculation formula is as follows:
Figure BDA0002229725840000113
wherein H0Selecting a target region color histogram for the initial frame, HtIs the color histogram of the target region of the t-th frame, n represents the total number of color histogram bins,
Figure BDA0002229725840000114
the calculation formula of (a) is given by:
Figure BDA0002229725840000115
where k is 0 or t.
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure BDA0002229725840000116
wherein
Figure BDA0002229725840000117
For the width and height of the target box of the initial frame,
Figure BDA0002229725840000118
and
Figure BDA0002229725840000119
the width and height of the target box of the t-th frame.
4.7 through the above calculation, the calculation formula of the target significant value of the t-th frame is as follows:
Figure BDA00022297258400001110
where T represents the total number of video frames for a shot.
And 4.8 drawing a significant value line graph according to the significant value of each frame of target in the scene shot to obtain all peak values and corresponding frames. Assuming that the shot has T video frames, the number of key frames to be extracted is n, the number of peak values is k, if n is less than k, the peak values are sorted in a descending order, the frames corresponding to the first n peak values are extracted as key frames, if k is less than n, the frames corresponding to all the peak values are extracted, and the rest n-k key frames adopt a random and unrepeated extraction mode; if n > T, all video frames are used as key frames.
And (5) returning to the step (1) to mark the target of the next lens.
In order to verify the effectiveness of the method provided by the invention, a section of multi-lens multi-ship video is adopted for experimental testing. The video has 9 scene shots of multiple ships, the frame number of each scene shot is shown in table 1, and for accelerated calculation, the experiment is labeled once every 5 frames.
TABLE 1 video shot and frame number
Figure BDA0002229725840000121
In the stage of target detection, a single-stage target detection algorithm YOLO V3 is trained on a large number of labeled samples with ship label information and position information to obtain a detection model, and then the model is used as a detector. Considering that the original algorithm has low capability of detecting small targets, the method adds small scale on the original basisThe Anchor improves the defect of low detection precision, improves the detection capability of targets with various scales on the premise of ensuring the detection speed, and realizes accurate real-time detection. In the target tracking stage, the parameter setting lambda of the KCF tracking algorithm is 1 multiplied by 10-4And sigma is 0.5, and η is 0.02, considering that the original algorithm cannot adapt to the change of the target scale, the scale judgment is added to the KCF tracking algorithm, and the improved KCF tracking algorithm is used as a tracker.
In the stage of fusing the detection result and the tracking result, the IOU threshold is set to be 0.5. And if the IOU value of the tracking frame and each detection frame is less than 0.5, the detector does not detect the target to be marked, and the target frame of the target is the tracking frame. If the IOU values of the tracking frame and one or more detection frames are greater than 0.5, which indicates that the detector detects the target to be labeled, the target frame of the target is the detection frame corresponding to the maximum IOU value. For example, after the 1 st frame of the video shot 1 is manually marked with the target, the detection result and the tracking result of the 2 nd frame are shown in fig. 4 and 5. As can be seen from the figure, there are multiple targets in the detection result of the detector, and the tracking result of the tracker has only one target. By calculating the IOU values of the tracking frame and each detection frame, only one detection frame and the IOU value of the tracking frame are greater than the threshold value of 0.5, the result of fusing and outputting the target frame is shown in fig. 6, and the result of fusing and outputting the detection frame is output.
When judging whether the target marking is finished, setting a peak value threshold theta of a KCF tracker to be 0.3 and setting a peak value sidelobe ratio threshold thetaPSRAnd 3.5, and when the ratio of the peak value to the peak sidelobe is smaller than the threshold value, the marking is finished. For example, when the target disappears during the process of labeling the 2 nd lens of the video, the response diagram of the KCF tracking algorithm has smaller peak value and peak-to-side lobe ratio, as shown in fig. 7 and 8. In 0-47 frames under a scene lens, the ratio value of the peak value and the peak value sidelobe of a response graph of a KCF tracking algorithm is larger, the peak value and the peak value sidelobe in the 48 th frame are smaller, the target of the frame disappears, 243 frames are actually and just corresponding to the scene lens, the scene lens is marked once every 5 frames, and the scene lens of the next frame of 243 frames is switched. Wherein the 243 st frame image of the shot 2 and the 1 st frame image of the shot 3 are as shown in fig. 9 and 10. In the figure it can be seen that the video is cut by shot 2And the target disappears due to the fact that the shot 3 is replaced, which shows that the method judges that the marking is finished accurately.
When the tracker judges that the video shot target marking is finished, a video shot target significant value curve is obtained according to the target significant value of each frame, key frames are extracted at the local maximum value of the curve, and in the experiment, 10 frames are extracted from each shot to serve as the key frames. For example, the target saliency curve for shot 6 is shown in fig. 11. The local maxima are arranged from large to small, then the frames corresponding to the first 10 local maxima are taken as key frames, and the extracted key frames are shown in fig. 12 (a-j). As can be seen from the figure, the extracted key frame has strong representativeness, and the diversity of changes such as the size, the angle and the like of the target size can be accurately reflected.
The results of this experiment are shown in table 2,
TABLE 2 Key Frames for each shot
Lens barrel Key frame
1 5,10,25,30,40,50,55,65,75,80
2 90,110,125,135,145,160,180,195,205,215
3 325,340,365,380,400,420,430,445,460,480
4 1099,1109,1119,1139,1149,1159,1169,1179,1329,1369
5 1424,1519,1559,1594,1604,1624,1634,1674,1754,1764
6 1779,1854,1869,1994,2054,2064,2089,2114,2144,2154
7 2194,2199,2214,2229,2249,2269,2279,2289,2294,2314
8 2349,2359,2379,2399,2414,2424,2444,2459,2474,2539
9 2974,3094,3164,3179,3189,3199,3214,3229,3259,3274
The extraction ranges of the key frames are all in the corresponding shots, and further prove that the method can distinguish different shots and automatically judge the end of target marking. The method adopts the local maximum value of the target significant value as the extraction basis of the key frame, so that the extracted key frame is representative. According to the experimental result, the video target labeling method based on the fusion target detection algorithm and the target tracking algorithm obtains higher accuracy.

Claims (1)

1. The video semi-automatic target marking method integrating target detection and tracking is characterized by comprising the following steps of:
selecting a certain frame as an initial frame in a certain shot of a video, manually marking the initial position and size of a target, and determining a category label of the target;
step (2), adopting automatic labeling for other subsequent frames after the initial frame, specifically fusing an image-based target detection algorithm and an image sequence-based video target tracking algorithm to estimate the position of a target in an image; the method comprises the following steps:
2.1 detecting the target in each frame of image by adopting YOLO V3 and marking a detection frame;
the YOLO V3 is a training sample obtained by adjusting the size of the labeled target image to a fixed scale, and training YOLO-V3; wherein, the YOLO layer is increased to 4 layers, and four different receptive field characteristic maps with different scales of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104 are obtained through multi-scale characteristic fusion; using three prior boxes of (116x90), (156x198) and (373x326) to predict the 13 x13 feature map, detecting a larger object; predicting the 26 × 26 feature map by using three prior boxes of (30x61), (62x45) and (59x119), and detecting objects with medium sizes; using three prior boxes of (10x13), (16x30) and (33x23) to predict a 52 x 52 feature map, detecting smaller objects; the 104 x 104 feature map is predicted by using three kinds of prior boxes of (5x6), (8x15) and (16x10) which are newly added, and a smaller target is detected;
2.2 acquiring a tracking frame of the target by adopting a KCF related filtering tracking algorithm;
firstly, HOG characteristics are extracted according to the target position and size of the previous frame, then the HOG characteristics are converted into a frequency domain through Fourier transformation, the obtained frequency domain characteristics are mapped to a high dimension through a Gaussian kernel function, and a filtering template α is obtained according to the formula (1):
Figure FDA0002229725830000011
wherein x represents the HOG characteristic of the sample, ^ represents Fourier transform, g is a two-dimensional Gaussian function with the center as the peak value, and λ is a regularization parameter used for controlling overfitting of training; k is a radical ofxxThe kernel autocorrelation matrix of x in the high-dimensional space is represented, and the calculation mode is given by the formula (2):
Figure FDA0002229725830000012
where σ is the width parameter of the gaussian kernel function, which controls the radial extent of action of the function, denotes the complex conjugate, ⊙ denotes the dot product,
Figure FDA0002229725830000013
representing the inverse fourier transform, c is the number of channels of the HOG feature x;
when the target is tracked on the image of the t-th frame, the update of the correlation filter α is given by:
Figure FDA0002229725830000021
η is an update parameter;
to accommodate the scale change of the object, the filter α of the current frametScaling is needed so as to predict the size of the next frame of target; wherein the scaling ratios are [1.1,1.05,1,0.95,0.9];
Extracting a candidate sample HOG characteristic z at a t frame target position on a t +1 frame image; in conjunction with each of the above-mentioned size-scaled filters, each corresponding filtered output response plot f is shown in equation (4):
Figure FDA0002229725830000022
Figure FDA0002229725830000023
where m ═ 1,2,3,4,5, corresponding to scaled ratios [1.1,1.05,1,0.95,0.9], respectively; x represents the HOG characteristic of the t frame target;
the maximum value f is screened from the 5 response graphs f maximum values max (f)max,fmaxThe corresponding position is the position of the target center, fmaxThe corresponding scaling is the target size, and a tracking frame of the t +1 th frame is obtained;
2.3 fusing the results of target detection and target tracking to determine the labeled target frame;
firstly, judging whether each frame of image contains a detection frame or not, and if not, taking the target frame as a tracking frame; if yes, continuously judging whether the detection frame is only one, if yes, calculating the IOU values of the tracking frame and the detection frame, if the IOU value is larger than a threshold value, taking the target frame as the detection frame, initializing a KCF tracking algorithm by using the detection frame, and if not, taking the target frame as the tracking frame; if the number of the detection frames is multiple, the IOU value of the tracking frame and each detection frame needs to be calculated, the maximum IOU value is further screened out, if the maximum IOU value is larger than a threshold value, the target frame is the detection frame corresponding to the maximum IOU value, a KCF tracking algorithm is initialized by using the detection frame, and if not, the target frame is the tracking frame;
the IOU value is used for evaluating the coincidence degree of the tracking frame and each detection frame under the current frame, and the formula is as follows:
Figure FDA0002229725830000024
wherein SIIndicates the overlapping area of the tracking frame and each detection frame under the same frame, SURepresenting the area of the set part of the tracking frame and each detection frame under the same frame, wherein the area of the set part is the sum of the areas of the tracking frame and the detection frame minus the overlapping area;
step (3), judging whether the target marking is finished or not according to a target tracking algorithm;
judging whether max (f) is smaller than a set threshold value theta and whether the Peak Sidelobe Ratio (PSR) is smaller than the set threshold value theta according to a response graph f of the KCF correlation filtering trackerPSRWhen, namely:
max(f)<θandPSR<θPSR(7)
if yes, judging that the target marking is finished, and turning to the step (4) to select the key frame; otherwise, turning to the step (2), and continuing to estimate the position of the target in the next frame image;
the peak side lobe ratio (PSR) is calculated as follows:
Figure FDA0002229725830000031
where max (f) is the peak value of the correlation filter response diagram f, Φ is 0.5, μΦ(f) And σΦ(f) Mean and standard deviation of 50% response area centered on the f peak, respectively;
step (4), calculating a significant value of each frame of target in the current shot; extracting a set number of video key frames according to the significant value of each frame of target to obtain a target labeling result; the method comprises the following steps:
4.1LBP (local binary pattern) extracting the texture feature of the image, the basic idea is that the texture feature is defined in the neighborhood of pixel 3x3, the neighborhood center pixel is used as a threshold value, the gray value of 8 adjacent pixels is compared with the threshold value, if the surrounding pixels are larger than the center pixel value, the position of the pixel point is marked as 1, otherwise, the position is 0; comparing 8 points in 3-by-3 neighborhood to generate 8-bit binary number, converting into decimal number to obtain LBP value of central pixel, and reflecting LBP information of the region with the value; the specific calculation formula is shown as (8):
Figure FDA0002229725830000032
wherein (x)0,y0) Is the coordinate of the central pixel, p is the p-th pixel of the neighborhood, jpIs the gray value of the neighborhood pixel, j0The gray value of the central pixel; s (x) is a sign function:
Figure FDA0002229725830000033
4.2 the calculation formula of the color saliency map is as follows:
Figure FDA0002229725830000034
wherein the patch is an original image of the target frame areagaussianFor the image after the patch is processed by the gaussian filter with a gaussian kernel of 5 × 5 and a standard deviation of 0, | | represents the absolute value, | represents the number of channels, and (x, y) is the pixel coordinate;
4.3 obtaining edge saliency characteristic map for pixel points of target edge region in each frame of image target frame
In the target edge area in the target frame, pixel values can jump, derivatives are obtained for the pixel values, and the first derivative of the derivatives is an extreme value at the edge position, namely the edge is at the extreme value, which is the principle used by the Sobel operator; if the second derivative is calculated for the pixel value, the derivative value at the edge is 0; the method for realizing the Laplace function is that first Sobel operators are used for calculating second-order x and y derivatives, then edge significance characteristic graphs are obtained through summation, and the calculation formula is as follows:
Figure FDA0002229725830000041
wherein I represents an image in the target frame, and (x, y) represents pixel coordinates of a target edge region in the target frame;
4.4, carrying out average weighted fusion on the LBP texture characteristics, the color saliency characteristics, the edge saliency characteristics and other characteristics to obtain a fusion value mean, wherein a fusion calculation formula is as follows:
Figure FDA0002229725830000042
wherein the content of the first and second substances,
Figure FDA0002229725830000047
respectively representing the values of pixel points (x, y) in an LBP texture characteristic graph, a color saliency characteristic graph and an edge saliency characteristic graph in the t frame;
4.5 the color histogram variation value Dist is obtained by calculating the babbitt distance between the color histogram of the selected target area of the initial frame and the target area of the t-th frame, and the calculation formula is as follows:
Figure FDA0002229725830000043
wherein H0Manually labeling a selected target frame color histogram for an initial frame, HtIs the t-th frameAutomatically labeling the color histogram of the target box, n representing the total number of color histogram bins,
Figure FDA0002229725830000048
the calculation formula of (a) is given by:
Figure FDA0002229725830000044
wherein k is 0 or t;
4.6 the scale change value is obtained by calculating the width and height change of the initial frame target frame and the t frame target frame, and the calculation formula is as follows:
Figure FDA0002229725830000045
wherein
Figure FDA0002229725830000049
For the width and height of the target box of the initial frame,
Figure FDA00022297258300000410
and
Figure FDA00022297258300000411
the width and height of the target frame of the t frame;
4.7 according to the fusion value, the color histogram change value and the scale change value of the image target frame region, the calculation formula of the target significant value of the t-th frame is as follows:
Figure FDA0002229725830000046
wherein T represents the total number of frames of the video;
4.8 saliency S of each frame object in videotConstructing a significant value line graph, and solving all peak values and corresponding frames;
assuming that the video has T frames, setting the number of extracted key frames as n; the number of the significant value peak values is k, if n is less than k, the peak values are sorted in a descending order, frames corresponding to the first n peak values are extracted as key frames, if k is less than n and less than T, frames corresponding to all the peak values are extracted, and the rest n-k key frames adopt a random and unrepeated extraction mode; if n is larger than T, all video frames are used as key frames;
and (5) returning to the step (1) to label the target of the next video shot.
CN201910963482.3A 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking Active CN110929560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910963482.3A CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910963482.3A CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Publications (2)

Publication Number Publication Date
CN110929560A true CN110929560A (en) 2020-03-27
CN110929560B CN110929560B (en) 2022-10-14

Family

ID=69848801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910963482.3A Active CN110929560B (en) 2019-10-11 2019-10-11 Video semi-automatic target labeling method integrating target detection and tracking

Country Status (1)

Country Link
CN (1) CN110929560B (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415370A (en) * 2020-04-13 2020-07-14 中山大学 Embedded infrared complex scene target real-time tracking method and system
CN111626990A (en) * 2020-05-06 2020-09-04 北京字节跳动网络技术有限公司 Target detection frame processing method and device and electronic equipment
CN111652080A (en) * 2020-05-12 2020-09-11 合肥的卢深视科技有限公司 Target tracking method and device based on RGB-D image
CN111681260A (en) * 2020-06-15 2020-09-18 深延科技(北京)有限公司 Multi-target tracking method and tracking system for aerial images of unmanned aerial vehicle
CN111709971A (en) * 2020-05-29 2020-09-25 西安理工大学 Semi-automatic video labeling method based on multi-target tracking
CN111754545A (en) * 2020-06-16 2020-10-09 江南大学 Dual-filter video multi-target tracking method based on IOU matching
CN111768668A (en) * 2020-03-31 2020-10-13 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN111882582A (en) * 2020-07-24 2020-11-03 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN112070071A (en) * 2020-11-11 2020-12-11 腾讯科技(深圳)有限公司 Method and device for labeling objects in video, computer equipment and storage medium
CN112132855A (en) * 2020-09-22 2020-12-25 山东工商学院 Self-adaptive Gaussian function target tracking method based on foreground segmentation guidance
CN112164097A (en) * 2020-10-20 2021-01-01 南京莱斯网信技术研究院有限公司 Ship video detection sample acquisition method
CN112257612A (en) * 2020-10-23 2021-01-22 华侨大学 Unmanned aerial vehicle video frame filtering method and device based on edge intelligence
CN112308082A (en) * 2020-11-05 2021-02-02 湖南科技大学 Dynamic video image segmentation method based on dual-channel convolution kernel and multi-frame feature fusion
CN112395957A (en) * 2020-10-28 2021-02-23 连云港杰瑞电子有限公司 Online learning method for video target detection
CN112489089A (en) * 2020-12-15 2021-03-12 中国人民解放军国防科技大学 Airborne ground moving target identification and tracking method for micro fixed wing unmanned aerial vehicle
CN113034551A (en) * 2021-05-31 2021-06-25 南昌虚拟现实研究院股份有限公司 Target tracking and labeling method and device, readable storage medium and computer equipment
CN113095239A (en) * 2021-04-15 2021-07-09 深圳市英威诺科技有限公司 Key frame extraction method, terminal and computer readable storage medium
CN113112519A (en) * 2021-04-23 2021-07-13 电子科技大学 Key frame screening method based on interested target distribution
CN113705643A (en) * 2021-08-17 2021-11-26 荣耀终端有限公司 Target detection method and device and electronic equipment
WO2021237678A1 (en) * 2020-05-29 2021-12-02 深圳市大疆创新科技有限公司 Target tracking method and device
CN113761981A (en) * 2020-06-05 2021-12-07 北京四维图新科技股份有限公司 Automatic driving visual perception method and device and storage medium
CN114463370A (en) * 2020-11-09 2022-05-10 北京理工大学 Two-dimensional image target tracking optimization method and device
WO2022116545A1 (en) * 2020-12-04 2022-06-09 全球能源互联网研究院有限公司 Interaction method and apparatus based on multi-feature recognition, and computer device
CN114697702A (en) * 2022-03-23 2022-07-01 咪咕文化科技有限公司 Audio and video marking method, device, equipment and storage medium
CN114882211A (en) * 2022-03-01 2022-08-09 广州文远知行科技有限公司 Time sequence data automatic labeling method and device, electronic equipment, medium and product
CN114972418A (en) * 2022-03-30 2022-08-30 北京航空航天大学 Maneuvering multi-target tracking method based on combination of nuclear adaptive filtering and YOLOX detection
CN115018885A (en) * 2022-08-05 2022-09-06 四川迪晟新达类脑智能技术有限公司 Multi-scale target tracking algorithm suitable for edge equipment
CN115082862A (en) * 2022-07-07 2022-09-20 南京杰迈视讯科技有限公司 High-precision pedestrian flow statistical method based on monocular camera
CN115424207A (en) * 2022-09-05 2022-12-02 南京星云软件科技有限公司 Self-adaptive monitoring system and method
CN116109975A (en) * 2023-02-08 2023-05-12 广州宝立科技有限公司 Power grid safety operation monitoring image processing method and intelligent video monitoring system
CN116912289A (en) * 2023-08-09 2023-10-20 北京航空航天大学 Weak and small target layering visual tracking method oriented to edge intelligence
CN117635637A (en) * 2023-11-28 2024-03-01 北京航空航天大学 Autonomous conceived intelligent target dynamic detection system
CN117671801A (en) * 2024-02-02 2024-03-08 中科方寸知微(南京)科技有限公司 Real-time target detection method and system based on binary reduction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN107767405A (en) * 2017-09-29 2018-03-06 华中科技大学 A kind of nuclear phase for merging convolutional neural networks closes filtered target tracking

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIMO OJALA等: "A COMPARATIVE STUDY OF TEXTURE MEASURES WITH CLASSIFICATION BASED ON FEATURE DISTRIBUTIONS", 《PATTERNRECOONITION》 *

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768668A (en) * 2020-03-31 2020-10-13 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN111768668B (en) * 2020-03-31 2022-09-02 杭州海康威视数字技术股份有限公司 Experimental operation scoring method, device, equipment and storage medium
CN111415370A (en) * 2020-04-13 2020-07-14 中山大学 Embedded infrared complex scene target real-time tracking method and system
CN111626990A (en) * 2020-05-06 2020-09-04 北京字节跳动网络技术有限公司 Target detection frame processing method and device and electronic equipment
CN111652080A (en) * 2020-05-12 2020-09-11 合肥的卢深视科技有限公司 Target tracking method and device based on RGB-D image
CN111652080B (en) * 2020-05-12 2023-10-17 合肥的卢深视科技有限公司 Target tracking method and device based on RGB-D image
CN111709971A (en) * 2020-05-29 2020-09-25 西安理工大学 Semi-automatic video labeling method based on multi-target tracking
WO2021237678A1 (en) * 2020-05-29 2021-12-02 深圳市大疆创新科技有限公司 Target tracking method and device
CN113761981A (en) * 2020-06-05 2021-12-07 北京四维图新科技股份有限公司 Automatic driving visual perception method and device and storage medium
CN113761981B (en) * 2020-06-05 2023-07-11 北京四维图新科技股份有限公司 Automatic driving visual perception method, device and storage medium
CN111681260A (en) * 2020-06-15 2020-09-18 深延科技(北京)有限公司 Multi-target tracking method and tracking system for aerial images of unmanned aerial vehicle
CN111754545A (en) * 2020-06-16 2020-10-09 江南大学 Dual-filter video multi-target tracking method based on IOU matching
CN111754545B (en) * 2020-06-16 2024-05-03 江南大学 IOU (input-output unit) matching-based double-filter video multi-target tracking method
CN111882582B (en) * 2020-07-24 2021-10-08 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN111882582A (en) * 2020-07-24 2020-11-03 广州云从博衍智能科技有限公司 Image tracking correlation method, system, device and medium
CN112132855A (en) * 2020-09-22 2020-12-25 山东工商学院 Self-adaptive Gaussian function target tracking method based on foreground segmentation guidance
CN112132855B (en) * 2020-09-22 2022-05-20 山东工商学院 Target tracking method of self-adaptive Gaussian function based on foreground segmentation guide
CN112164097B (en) * 2020-10-20 2024-03-29 南京莱斯网信技术研究院有限公司 Ship video detection sample collection method
CN112164097A (en) * 2020-10-20 2021-01-01 南京莱斯网信技术研究院有限公司 Ship video detection sample acquisition method
CN112257612A (en) * 2020-10-23 2021-01-22 华侨大学 Unmanned aerial vehicle video frame filtering method and device based on edge intelligence
CN112257612B (en) * 2020-10-23 2023-06-02 华侨大学 Unmanned aerial vehicle video frame filtering method and device based on edge intelligence
CN112395957A (en) * 2020-10-28 2021-02-23 连云港杰瑞电子有限公司 Online learning method for video target detection
CN112395957B (en) * 2020-10-28 2024-06-04 连云港杰瑞电子有限公司 Online learning method for video target detection
CN112308082B (en) * 2020-11-05 2023-04-07 湖南科技大学 Dynamic video image segmentation method based on dual-channel convolution kernel and multi-frame feature fusion
CN112308082A (en) * 2020-11-05 2021-02-02 湖南科技大学 Dynamic video image segmentation method based on dual-channel convolution kernel and multi-frame feature fusion
CN114463370A (en) * 2020-11-09 2022-05-10 北京理工大学 Two-dimensional image target tracking optimization method and device
CN112070071B (en) * 2020-11-11 2021-03-26 腾讯科技(深圳)有限公司 Method and device for labeling objects in video, computer equipment and storage medium
CN112070071A (en) * 2020-11-11 2020-12-11 腾讯科技(深圳)有限公司 Method and device for labeling objects in video, computer equipment and storage medium
WO2022116545A1 (en) * 2020-12-04 2022-06-09 全球能源互联网研究院有限公司 Interaction method and apparatus based on multi-feature recognition, and computer device
CN112489089A (en) * 2020-12-15 2021-03-12 中国人民解放军国防科技大学 Airborne ground moving target identification and tracking method for micro fixed wing unmanned aerial vehicle
CN112489089B (en) * 2020-12-15 2022-06-07 中国人民解放军国防科技大学 Airborne ground moving target identification and tracking method for micro fixed wing unmanned aerial vehicle
CN113095239A (en) * 2021-04-15 2021-07-09 深圳市英威诺科技有限公司 Key frame extraction method, terminal and computer readable storage medium
CN113112519A (en) * 2021-04-23 2021-07-13 电子科技大学 Key frame screening method based on interested target distribution
CN113034551A (en) * 2021-05-31 2021-06-25 南昌虚拟现实研究院股份有限公司 Target tracking and labeling method and device, readable storage medium and computer equipment
CN113705643A (en) * 2021-08-17 2021-11-26 荣耀终端有限公司 Target detection method and device and electronic equipment
CN114882211A (en) * 2022-03-01 2022-08-09 广州文远知行科技有限公司 Time sequence data automatic labeling method and device, electronic equipment, medium and product
CN114697702A (en) * 2022-03-23 2022-07-01 咪咕文化科技有限公司 Audio and video marking method, device, equipment and storage medium
CN114697702B (en) * 2022-03-23 2024-01-30 咪咕文化科技有限公司 Audio and video marking method, device, equipment and storage medium
CN114972418A (en) * 2022-03-30 2022-08-30 北京航空航天大学 Maneuvering multi-target tracking method based on combination of nuclear adaptive filtering and YOLOX detection
CN114972418B (en) * 2022-03-30 2023-11-21 北京航空航天大学 Maneuvering multi-target tracking method based on combination of kernel adaptive filtering and YOLOX detection
CN115082862A (en) * 2022-07-07 2022-09-20 南京杰迈视讯科技有限公司 High-precision pedestrian flow statistical method based on monocular camera
CN115018885A (en) * 2022-08-05 2022-09-06 四川迪晟新达类脑智能技术有限公司 Multi-scale target tracking algorithm suitable for edge equipment
CN115424207A (en) * 2022-09-05 2022-12-02 南京星云软件科技有限公司 Self-adaptive monitoring system and method
CN116109975B (en) * 2023-02-08 2023-10-20 广州宝立科技有限公司 Power grid safety operation monitoring image processing method and intelligent video monitoring system
CN116109975A (en) * 2023-02-08 2023-05-12 广州宝立科技有限公司 Power grid safety operation monitoring image processing method and intelligent video monitoring system
CN116912289A (en) * 2023-08-09 2023-10-20 北京航空航天大学 Weak and small target layering visual tracking method oriented to edge intelligence
CN116912289B (en) * 2023-08-09 2024-01-30 北京航空航天大学 Weak and small target layering visual tracking method oriented to edge intelligence
CN117635637A (en) * 2023-11-28 2024-03-01 北京航空航天大学 Autonomous conceived intelligent target dynamic detection system
CN117635637B (en) * 2023-11-28 2024-06-11 北京航空航天大学 Autonomous conceived intelligent target dynamic detection system
CN117671801A (en) * 2024-02-02 2024-03-08 中科方寸知微(南京)科技有限公司 Real-time target detection method and system based on binary reduction
CN117671801B (en) * 2024-02-02 2024-04-23 中科方寸知微(南京)科技有限公司 Real-time target detection method and system based on binary reduction

Also Published As

Publication number Publication date
CN110929560B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN110929560B (en) Video semi-automatic target labeling method integrating target detection and tracking
CA2780595A1 (en) Method and multi-scale attention system for spatiotemporal change determination and object detection
CN112734761B (en) Industrial product image boundary contour extraction method
CN113822352B (en) Infrared dim target detection method based on multi-feature fusion
CN111369570B (en) Multi-target detection tracking method for video image
CN113111878B (en) Infrared weak and small target detection method under complex background
CN108319961B (en) Image ROI rapid detection method based on local feature points
Zhang et al. Automatic detection of road traffic signs from natural scene images based on pixel vector and central projected shape feature
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
Wang et al. Unstructured road detection using hybrid features
CN110473255B (en) Ship mooring post positioning method based on multiple grid division
CN116381672A (en) X-band multi-expansion target self-adaptive tracking method based on twin network radar
Zhang et al. A covariance-based method for dynamic background subtraction
CN111666811A (en) Method and system for extracting traffic sign area in traffic scene image
CN110619653A (en) Early warning control system and method for preventing collision between ship and bridge based on artificial intelligence
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN113536896A (en) Small target detection method, device and storage medium based on improved fast RCNN
Han et al. Bayesian filtering and integral image for visual tracking
CN106446832B (en) Video-based pedestrian real-time detection method
CN110334703B (en) Ship detection and identification method in day and night image
Hommos et al. Hd Qatari ANPR system
CN106951831B (en) Pedestrian detection tracking method based on depth camera
CN111583341B (en) Cloud deck camera shift detection method
CN114757967A (en) Multi-scale anti-occlusion target tracking method based on manual feature fusion
Lan et al. Robust visual object tracking with spatiotemporal regularisation and discriminative occlusion deformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant