CN109903310A - Method for tracking target, device, computer installation and computer storage medium - Google Patents
Method for tracking target, device, computer installation and computer storage medium Download PDFInfo
- Publication number
- CN109903310A CN109903310A CN201910064675.5A CN201910064675A CN109903310A CN 109903310 A CN109903310 A CN 109903310A CN 201910064675 A CN201910064675 A CN 201910064675A CN 109903310 A CN109903310 A CN 109903310A
- Authority
- CN
- China
- Prior art keywords
- frame
- target
- prediction block
- present image
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000009434 installation Methods 0.000 title claims abstract description 25
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000001514 detection method Methods 0.000 claims abstract description 36
- 238000012549 training Methods 0.000 claims description 93
- 238000013527 convolutional neural network Methods 0.000 claims description 52
- 238000004590 computer program Methods 0.000 claims description 20
- 238000003062 neural network model Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000005286 illumination Methods 0.000 claims description 8
- 239000012141 concentrate Substances 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 7
- 238000010276 construction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to technical field of image processing, a kind of method for tracking target, device, computer installation and storage medium are provided.The method for tracking target includes: to obtain the first object frame in the present image using the predefined type target in object detector detection present image;The second target frame in the previous frame image of the present image is obtained, position of the second target frame in the present image is predicted using fallout predictor, obtains prediction block of the second target frame in the present image;First object frame in the present image is matched with the prediction block, obtains the matching result of the first object frame and the prediction block;According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.The present invention improves the robustness and scene adaptability of target following.
Description
Technical field
The present invention relates to technical field of image processing, and in particular to a kind of method for tracking target, device, computer installation and
Computer storage medium.
Background technique
Target following refers to the moving object (such as automobile and pedestrian in traffic video) in video or image sequence
It is tracked, obtains moving object in the position of each frame.Target following is led in video monitoring, automatic Pilot and video entertainment etc.
Domain is widely used.
Current target following mainly uses track by detection framework, in video or every frame of image sequence
The location information of each target is detected on image by detector, then by the target position information of present frame and former frame
Target position information is matched.However, the robustness of existing scheme target following is not high, if illumination changes, with
Track is ineffective.
Summary of the invention
In view of the foregoing, it is necessary to propose that a kind of method for tracking target, device, computer installation and computer storage are situated between
The robustness and scene adaptability of target following can be improved in matter.
The first aspect of the application provides a kind of method for tracking target, which comprises
Using the predefined type target in object detector detection present image, the first mesh in the present image is obtained
Mark frame;
The second target frame in the previous frame image of the present image is obtained, predicts second target using fallout predictor
Position of the frame in the present image obtains prediction block of the second target frame in the present image;
First object frame in the present image is matched with the prediction block, obtain the first object frame with
The matching result of the prediction block;
According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image
It sets.
In alternatively possible implementation, the object detector is to speed up region convolutional neural networks model, described
Accelerating region convolutional neural networks model includes that network and fast area convolutional neural networks, the quickening region volume are suggested in region
Product neural network model follows the steps below training before the predefined type target in detection described image:
First training step suggests network using region described in Imagenet model initialization, uses the first training sample
Suggest network in the collection training region;
Second training step suggests that network generates first instruction using the region after training in first training step
The candidate frame for practicing each sample image in sample set utilizes the candidate frame training fast area convolutional neural networks;
Third training step is initialized using the fast area convolutional neural networks after training in second training step
Network is suggested in the region, suggests network using first training sample set training region;
4th training step is suggested described in netinit quickly using the region after training in the third training step
Region convolutional neural networks, and the convolutional layer is kept to fix, use first training sample set training fast area
Convolutional neural networks.
In alternatively possible implementation, quickening region convolutional neural networks model uses ZF frame, the area
Suggest that network and the fast area convolutional neural networks share 5 convolutional layers in domain.
In alternatively possible implementation, the fallout predictor is the deep neural network for being built with feature pyramid network
Model.
In alternatively possible implementation, predict the second target frame in the current figure using fallout predictor described
Position as in, obtains the second target frame before the prediction block in the present image, the method also includes:
The fallout predictor is trained using the second training sample set, second training sample set includes not sharing the same light
According to, the sample image of deformation and high-speed moving object.
In alternatively possible implementation, the first object frame by the present image and the prediction block into
Row matches
The overlapping area ratio for calculating the first object frame and the prediction block, according to the overlapping area ratio-dependent
Every matched first object frame of a pair and the prediction block;Or
The first object frame is calculated at a distance from the central point of the prediction block, determines every a pair of according to the distance
The first object frame matched and the prediction block.
In alternatively possible implementation, the matching result according to the first object frame and the prediction block,
The position of more fresh target includes: in the present image
If the first object frame is matched with the prediction block, by the first object frame in the present image
Position is as the position after the corresponding target update of the prediction block;
If the first object frame and any prediction block mismatch, by first mesh in the present image
Mark position of the position of frame as new target;
If the prediction block and any first object frame mismatch, by the prediction block in the present image
Corresponding target is as lost target.
The second aspect of the application provides a kind of target tracker, and described device includes:
Detection module, for obtaining described current using the predefined type target in object detector detection present image
First object frame in image;
Prediction module, the second target frame in the previous frame image for obtaining the present image are pre- using fallout predictor
Position of the second target frame in the present image is surveyed, it is pre- in the present image to obtain the second target frame
Survey frame;
Matching module obtains institute for matching the first object frame in the present image with the prediction block
State the matching result of first object frame Yu the prediction block;
Update module, for the matching result according to the first object frame and the prediction block, in the present image
In more fresh target position.
The third aspect of the application provides a kind of computer installation, and the computer installation includes processor, the processing
Device is for realizing the method for tracking target when executing the computer program stored in memory.
The fourth aspect of the application provides a kind of computer storage medium, is stored thereon with computer program, the calculating
Machine program realizes the method for tracking target when being executed by processor.
The present invention is obtained in the present image using the predefined type target in object detector detection present image
First object frame;The second target frame in the previous frame image of the present image is obtained, fallout predictor prediction described second is utilized
Position of the target frame in the present image obtains prediction block of the second target frame in the present image;By institute
The first object frame stated in present image is matched with the prediction block, obtains the first object frame and the prediction block
Matching result;According to the matching result of the first object frame and the prediction block, the more fresh target in the present image
Position.The present invention improves the robustness and scene adaptability of target following.
Detailed description of the invention
Fig. 1 is the flow chart of method for tracking target provided in an embodiment of the present invention.
Fig. 2 is the structure chart of target tracker provided in an embodiment of the present invention.
Fig. 3 is the schematic diagram of computer installation provided in an embodiment of the present invention.
Fig. 4 is the schematic diagram of SiamFC model.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real
Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment
In feature can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only
It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Preferably, method for tracking target of the invention is applied in one or more computer installation.The computer
Device is that one kind can be according to the instruction for being previously set or storing, the automatic equipment for carrying out numerical value calculating and/or information processing,
Hardware includes but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated
Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit
(Digital Signal Processor, DSP), embedded device etc..
The computer installation can be the calculating such as desktop PC, notebook, palm PC and cloud server and set
It is standby.The computer installation can carry out people by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices with user
Machine interaction.
Embodiment one
Fig. 1 is the flow chart for the method for tracking target that the embodiment of the present invention one provides.The method for tracking target is applied to
Computer installation.
Method for tracking target of the present invention carries out moving object (such as pedestrian) certain types of in video or image sequence
Tracking, obtains position of the moving object in each frame image.The method for tracking target can solve existing scheme can not be right
The shortcomings that target of high-speed motion is tracked improves the robustness of target following.
As shown in Figure 1, the method for tracking target includes:
Step 101, it using the predefined type target in object detector detection present image, obtains in the present image
First object frame.
The predefined type target may include pedestrian, automobile, aircraft, ship etc..The predefined type target can be
A type of target (such as pedestrian) is also possible to a plurality of types of targets (such as pedestrian and automobile).
The object detector can be with classification and return the neural network model of function.In the present embodiment, institute
Region convolutional neural networks (Faster Region-Based Convolutional can be to speed up by stating object detector
Neural Network, Faster RCNN) model.
Faster RCNN model includes that network (Region Proposal Network, RPN) and quick area are suggested in region
Domain convolutional neural networks (Fast Region-based Convolution Neural Network, Fast RCNN).
Network is suggested in the region and the fast area convolutional neural networks have shared convolutional layer, and the convolutional layer is used
In the characteristic pattern for extracting image.The region suggests that network generates the candidate frame of image according to the characteristic pattern, and by generation
Candidate frame inputs the fast area convolutional neural networks.The fast area convolutional neural networks are according to the characteristic pattern to institute
It states candidate frame to be screened and adjusted, obtains the target frame of image.
Before the first object frame using the predefined type target in object detector detection present image, first is used
Training sample set is trained the object detector.In training, the convolutional layer extracts the first training sample and concentrates respectively
The characteristic pattern of a sample image, the region suggest that network obtains the candidate in each sample image according to the characteristic pattern
Frame, the fast area convolutional neural networks are screened and are adjusted to the candidate frame according to the characteristic pattern, are obtained described
The target frame of each sample image.The target frame may include different type target (such as pedestrian, automobile, aircraft, ship
Deng) target frame.
In a preferred embodiment, quickening region convolutional neural networks model uses ZF frame, and the region is suggested
Network and the fast area convolutional neural networks share 5 convolutional layers.ZF frame is a kind of common network structure, by
Matthew D Zeiler and Rob Fergus is in 2013 in " Visualizing and Understanding
It is proposed in Convolutional Networks " paper, belongs to the variant of AlexNet network.ZF has been carried out micro- based on AlexNet
It adjusts, using ReLU activation primitive and cross entropy cost function, more original pixels information is retained using lesser convolution kernel.
It in one embodiment, can be according to the following steps using the first training sample set to quickening region convolutional Neural
Network model is trained:
(1) suggest network using region described in Imagenet model initialization, use first training sample set training
Suggest network in the region;
(2) region in (1) after training is used to suggest that network generates the time that the first training sample concentrates each sample image
Frame is selected, the candidate frame training fast area convolutional neural networks are utilized.At this point, network and fast area volume are suggested in region
Product neural network shares convolutional layer not yet;
(3) it uses the fast area convolutional neural networks in (2) after training to initialize the region and suggests network, use the
Suggest network in the one training sample set training region;
(4) it uses the region in (3) after training to suggest fast area convolutional neural networks described in netinit, and keeps
The convolutional layer is fixed, and the first training sample set training fast area convolutional neural networks are used.At this point, net is suggested in region
Network and fast area convolutional neural networks share identical convolutional layer, constitute a unified network model.
The candidate frame that region suggests that network is chosen is more, can screen several according to the target classification score of candidate frame
The candidate frame of highest scoring is input to fast area convolutional neural networks, to accelerate the speed of training and detection.
Back-propagation algorithm can be used, network, which is trained, is suggested to region, adjustment region suggests network in training process
Network parameter, minimize loss function.Suggest the forecast confidence of the candidate frame of neural network forecast in loss function indicating area
With the difference of true confidence level.Loss function may include target classification loss and recurrence loss two parts.
Loss function can be with is defined as:
Wherein, i is the index of candidate frame in a training batch (mini-batch).
It is the target classification loss of candidate frame.NclsFor the size of training batch, such as 256.piIt is i-th
A candidate frame is the prediction probability of target.It is GT label, if candidate frame is positive, (label distributed is positive label, referred to as just
Candidate frame),It is 1;If candidate frame is negative (label distributed be negative label, referred to as negative candidate frame),It is 0.It may be calculated
It is the recurrence loss of candidate frame.λ is balance weight, can be taken as 10.NregFor the number of candidate frame
Amount.It may be calculatedtiIt is a coordinate vector, i.e. ti=(tx,ty,
tw,th), indicate 4 parametrization coordinates (such as the coordinate and width in the candidate frame upper left corner, height) of candidate frame.Be with
The coordinate vector of the corresponding GT bounding box of positive candidate frame, i.e., (such as the real goal frame upper left corner
Coordinate and width, height).R is the loss function (smoothL1) with robustness, is defined as:
The training method of fast area convolutional network is referred to the training method that network is suggested in region, no longer superfluous herein
It states.
In the present embodiment, negative sample difficulty example is added in the training of fast area convolutional network and excavates (Hard
Negative Mining, HNM) method.For being wrongly classified as the negative sample of positive sample by fast area convolutional network (i.e.
Difficult example), the information of these negative samples is recorded, during next repetitive exercise, these negative samples are inputted again
It is concentrated to the first training sample, and increases the weight of its loss, enhanced its influence to classifier, can guarantee not stop in this way
Classify for the negative sample being more difficult so that the feature that classifier is acquired is from the easier to the more advanced, the sample distribution covered is also had more
Diversity.
In other examples, the object detector can also be other neural network models, such as region volume
Product neural network (RCNN) model accelerates convolutional neural networks (Faster RCNN) model.
When using predefined type target in object detector detection image, described image is inputted into the target detection
Device, the object detector detect the predefined type target in image, export the predefined type target in described image
First object frame position.For example, 6 first object frames in the object detector output described image.First object
Frame is presented in the form of rectangle frame.The position of first object frame can indicate that the position coordinates may include with position coordinates
Top left co-ordinate (x, y) He Kuangao (w, h).
The object detector can also export the type of each first object frame, for example, 5 pedestrian's types of output the
The first object frame of one target frame and 1 car category.
Step 102, obtain the second target frame in the previous frame image of the present image, predicted using fallout predictor described in
Position of the second target frame in the present image obtains prediction block of the second target frame in the present image.
The second target frame in previous frame image is to utilize the predefined type mesh in object detector detection previous frame image
Mark obtained target frame.
It predicts position of the second target frame in the present image, obtains the second target frame in the present image
In prediction block be position of each second target frame of prediction in the present image, obtain each second target frame described
Prediction block in present image.For example, detecting 4 pedestrian target frames in the previous frame image of present image, then described 4 are predicted
(namely predict corresponding 4 pedestrians of 4 pedestrian target frames described in position of a pedestrian target frame in the present image
Position in present image), obtain prediction block of the 4 pedestrian target frames in the present image.
The fallout predictor can be deep neural network model.
Before being predicted using fallout predictor the second target frame, using the second training sample set to the fallout predictor into
Row training.The feature that the fallout predictor learns is depth characteristic, and color characteristic therein accounts for smaller, and the influence being illuminated by the light has
Limit.Therefore, the fallout predictor can overcome to a certain extent illumination bring influence, improve target following robustness and
Scene adaptability.In the present embodiment, the second training sample set may include a large amount of different illumination, deformation and high-speed motion object
The sample image of body.Therefore, the fallout predictor can further overcome the influence of illumination, and can overcome to a certain degree shape
Become, the influence of high-speed motion bring, the target of high-speed motion is tracked so that the present invention realizes, improves target following
Robustness.
In the present embodiment, can in the deep neural network model construction feature pyramid network (Feature
Pyramid Network, FPN), institute is predicted using the deep neural network model for being built with the feature pyramid network
State position of the second target frame in the present image.High level of the feature pyramid network low resolution, high semantic information
Feature carries out top-down side with the low-level feature of high-resolution, low semantic information and connects, so that the spy under all scales
Sign has semantic information abundant.The connection method of feature pyramid network is high-level characteristic to be done 2 times of up-samplings, then with it is right
The preceding layer feature answered combines (convolution kernel that preceding layer will pass through 1*1), and combination is exactly the addition between doing pixel.Pass through this
The connection of sample, each layer of prediction characteristic pattern used have all merged the feature of different resolution, different semantic intensity, and fusion is not
Do the object detection of corresponding resolution sizes respectively with the characteristic pattern of resolution ratio.This ensure that each layer has suitable resolution
Rate and strong semantic feature.Construction feature pyramid network can be improved to the second target in the deep neural network model
The performance of frame prediction, so that still preferably being predicted the second target frame that deformation occurs.
In one embodiment, the fallout predictor can be SiamFC network (Fully-Convolutional
Siamese Network) model, such as it is built with the SiamFC network model of feature pyramid network.
Fig. 4 is the schematic diagram of SiamFC model.In Fig. 4, what z was represented is template image, i.e. second in previous frame image
Target frame;What x was represented is region of search, i.e. present image;What is represented is a kind of Feature Mapping operation, and original image is mapped
It, can be using the convolutional layer and pond layer in CNN to specific feature space;6*6*128 represents z processThe feature obtained afterwards,
It is the feature of a 128 channel 6*6 sizes, similarly, 22*22*128 is that x passes throughFeature afterwards;* convolution operation, 22* are represented
The feature of 22*128 obtains the shot chart of a 17*17 by the convolution nuclear convolution of 6*6*128, represents each position in region of search
Set the similarity with template image.It with the highest position of similarity similarity of template image is exactly prediction block in region of search
Position.
Step 103, the first object frame in the present image is matched with the prediction block, obtains described first
The matching result of target frame and the prediction block.
The first object frame and the matching result of the prediction block may include the first object frame and the prediction
Frame matching, the first object frame and any prediction block mismatch, the prediction block and any first object frame are not
Matching.
In the present embodiment, the overlapping area ratio of the first object frame and the prediction block can be calculated
(Intersection over Union, IOU), according to matched first mesh of the every a pair of the overlapping area ratio-dependent
Mark frame and the prediction block.
For example, first object frame includes first object frame A1, first object frame A2, first object frame A3, first object frame
A4, prediction block include prediction block P1, prediction block P2, prediction block P3, prediction block P4.Prediction block P1 corresponds to the second target frame B1, pre-
Survey the corresponding second target frame B2 of frame P2, the corresponding second target frame B3 of prediction block P3, the corresponding second target frame B4 of prediction block P4.For
First object frame A1, calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and
The overlapping area ratio of prediction block P3, first object frame A1 and prediction block P4, if first object frame A1 is overlapping with prediction block P1
Area ratio is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A1 and prediction block P1 phase
Match.Similarly, for first object frame A2, calculate first object frame A2 and prediction block P1, first object frame A2 and prediction block P2,
The overlapping area ratio of first object frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 with
The overlapping area ratio of prediction block P2 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A2 with
Prediction block P2 matches;For first object frame A3, first object frame A3 and prediction block P1, first object frame A3 and prediction are calculated
The overlapping area ratio of frame P2, first object frame A3 and prediction block P3, first object frame A3 and prediction block P4, if first object frame
The overlapping area ratio of A3 and prediction block P3 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame
A3 matches with prediction block P3;For first object frame A4, calculate first object frame A4 and prediction block P1, first object frame A4 with
The overlapping area ratio of prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4, if the first mesh
It marks the overlapping area ratio maximum of frame A4 and prediction block P4 and is greater than or equal to preset threshold (such as 70%), it is determined that the first mesh
Mark frame A4 and prediction block P4 matches.
Alternatively, the first object frame can be calculated at a distance from the central point of the prediction block, it is true according to the distance
Fixed every matched first object frame of a pair and the prediction block.
For example, including first object frame A1, first object frame A2, first object frame A3, first object in first object frame
Frame A4, prediction block include in the example of prediction block P1, prediction block P2, prediction block P3, prediction block P4, for first object frame A1,
Calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and prediction block P3, first
Target frame A1 at a distance from the central point of prediction block P4, if first object frame A1 it is minimum at a distance from the central point of prediction block P1 and
Less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame A1 matches with prediction block P1.Similarly,
Similarly, for first object frame A2, first object frame A2 and prediction block P1, first object frame A2 and prediction block P2, the are calculated
At a distance from the central point of one target frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 and pre-
It surveys the distance minimum of the central point of frame P2 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame
A2 matches with prediction block P2;For first object frame A3, calculate first object frame A3 and prediction block P1, first object frame A3 with
Prediction block P2, first object frame A3 are at a distance from the central point of prediction block P3, first object frame A3 and prediction block P4, if the first mesh
Mark frame A3 is minimum at a distance from the central point of prediction block P3 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that
First object frame A3 matches with prediction block P3;For first object frame A4, first object frame A4 and prediction block P1, first are calculated
The central point of target frame A4 and prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4 away from
From if first object frame A4 is minimum at a distance from the central point of prediction block P4 and is less than or equal to pre-determined distance (such as 10 pictures
Vegetarian refreshments), it is determined that first object frame A4 matches with prediction block P4.
Step 104, it according to the matching result of the first object frame and the prediction block, is updated in the present image
The position of target.
According to obtaining the matching result of the first object frame and the prediction block, the more fresh target in the present image
Position may include:
If the first object frame is matched with the prediction block, by the first object frame in the present image
Position is as the position after the corresponding target update of the prediction block;
If the first object frame and any prediction block mismatch, by first mesh in the present image
Mark position of the position of frame as new target;
If the prediction block and any first object frame mismatch, by the prediction block in the present image
Corresponding target is as lost target.
The method for tracking target of embodiment one is obtained using the predefined type target in object detector detection present image
First object frame in the present image;The second target frame in the previous frame image of the present image is obtained, using pre-
It surveys device and predicts position of the second target frame in the present image, obtain the second target frame in the present image
In prediction block;First object frame in the present image is matched with the prediction block, obtains the first object
The matching result of frame and the prediction block;According to the matching result of the first object frame and the prediction block, described current
The position of more fresh target in image.Embodiment one improves the robustness and scene adaptability of target following.
Embodiment two
Fig. 2 is the structure chart of target tracker provided by Embodiment 2 of the present invention.The target tracker 20 is applied
In computer installation.The target following of the present apparatus to moving object (such as pedestrian) certain types of in video or image sequence into
Line trace obtains position of the moving object in each frame image.Target following can be improved in the target tracker 20
Robustness and scene adaptability.As shown in Fig. 2, the target tracker 20 may include detection module 201, prediction module
202, matching module 203, update module 204.
Detection module 201, for obtaining described work as using the predefined type target in object detector detection present image
First object frame in preceding image.
The predefined type target may include pedestrian, automobile, aircraft, ship etc..The predefined type target can be
A type of target (such as pedestrian) is also possible to a plurality of types of targets (such as pedestrian and automobile).
The object detector can be with classification and return the neural network model of function.In the present embodiment, institute
Region convolutional neural networks (Faster Region-Based Convolutional can be to speed up by stating object detector
Neural Network, Faster RCNN) model.
Faster RCNN model includes that network (Region Proposal Network, RPN) and quick area are suggested in region
Domain convolutional neural networks (Fast Region-based Convolution Neural Network, Fast RCNN).
Network is suggested in the region and the fast area convolutional neural networks have shared convolutional layer, and the convolutional layer is used
In the characteristic pattern for extracting image.The region suggests that network generates the candidate frame of image according to the characteristic pattern, and by generation
Candidate frame inputs the fast area convolutional neural networks.The fast area convolutional neural networks are according to the characteristic pattern to institute
It states candidate frame to be screened and adjusted, obtains the target frame of image.
Before the first object frame using the predefined type target in object detector detection present image, first is used
Training sample set is trained the object detector.In training, the convolutional layer extracts the first training sample and concentrates respectively
The characteristic pattern of a sample image, the region suggest that network obtains the candidate in each sample image according to the characteristic pattern
Frame, the fast area convolutional neural networks are screened and are adjusted to the candidate frame according to the characteristic pattern, are obtained described
The target frame of each sample image.The target frame may include different type target (such as pedestrian, automobile, aircraft, ship
Deng) target frame.
In a preferred embodiment, quickening region convolutional neural networks model uses ZF frame, and the region is suggested
Network and the fast area convolutional neural networks share 5 convolutional layers.
It in one embodiment, can be according to the following steps using the first training sample set to quickening region convolutional Neural
Network model is trained:
(1) suggest network using region described in Imagenet model initialization, use first training sample set training
Suggest network in the region;
(2) region in (1) after training is used to suggest that network generates the time that the first training sample concentrates each sample image
Frame is selected, the candidate frame training fast area convolutional neural networks are utilized.At this point, network and fast area volume are suggested in region
Product neural network shares convolutional layer not yet;
(3) it uses the fast area convolutional neural networks in (2) after training to initialize the region and suggests network, use the
Suggest network in the one training sample set training region;
(4) it uses the region in (3) after training to suggest fast area convolutional neural networks described in netinit, and keeps
The convolutional layer is fixed, and the first training sample set training fast area convolutional neural networks are used.At this point, net is suggested in region
Network and fast area convolutional neural networks share identical convolutional layer, constitute a unified network model.
The candidate frame that region suggests that network is chosen is more, can screen several according to the target classification score of candidate frame
The candidate frame of highest scoring is input to fast area convolutional neural networks, to accelerate the speed of training and detection.
Back-propagation algorithm can be used, network, which is trained, is suggested to region, adjustment region suggests network in training process
Network parameter, minimize loss function.Suggest the forecast confidence of the candidate frame of neural network forecast in loss function indicating area
With the difference of true confidence level.Loss function may include target classification loss and recurrence loss two parts.
Wherein, i is the index of candidate frame in a training batch (mini-batch).
It is the target classification loss of candidate frame.NclsFor the size of training batch, such as 256.piIt is i-th
A candidate frame is the prediction probability of target.It is GT label, if candidate frame is positive, (label distributed is positive label, referred to as just
Candidate frame),It is 1;If candidate frame is negative (label distributed be negative label, referred to as negative candidate frame),It is 0.It may be calculated
It is the recurrence loss of candidate frame.λ is balance weight, can be taken as 10.NregFor the number of candidate frame
Amount.It may be calculatedtiIt is a coordinate vector, i.e. ti=(tx,ty,
tw,th), indicate 4 parametrization coordinates (such as the coordinate and width in the candidate frame upper left corner, height) of candidate frame.Be with just
The coordinate vector of the corresponding GT bounding box of candidate frame, i.e., (such as the seat in the real goal frame upper left corner
Be marked with and width, height).R is the loss function (smoothL1) with robustness, is defined as:
The training method of fast area convolutional network is referred to the training method that network is suggested in region, no longer superfluous herein
It states.
In the present embodiment, negative sample difficulty example is added in the training of fast area convolutional network and excavates (Hard
Negative Mining, HNM) method.For being wrongly classified as the negative sample of positive sample by fast area convolutional network (i.e.
Difficult example), the information of these negative samples is recorded, during next repetitive exercise, these negative samples are inputted again
It is concentrated to the first training sample, and increases the weight of its loss, enhanced its influence to classifier, can guarantee not stop in this way
Classify for the negative sample being more difficult so that the feature that classifier is acquired is from the easier to the more advanced, the sample distribution covered is also had more
Diversity.
In other examples, the object detector can also be other neural network models, such as region volume
Product neural network (RCNN) model accelerates convolutional neural networks (Faster RCNN) model.
When using predefined type target in object detector detection image, described image is inputted into the target detection
Device, the object detector detect the predefined type target in image, export the predefined type target in described image
First object frame position.For example, 6 first object frames in the object detector output described image.First object
Frame is presented in the form of rectangle frame.The position of first object frame can indicate that the position coordinates may include with position coordinates
Top left co-ordinate (x, y) He Kuangao (w, h).
The object detector can also export the type of each first object frame, for example, 5 pedestrian's types of output the
The first object frame of one target frame and 1 car category.
Prediction module 202, the second target frame in the previous frame image for obtaining the present image, utilizes fallout predictor
It predicts position of the second target frame in the present image, obtains the second target frame in the present image
Prediction block.
The second target frame in previous frame image is to utilize the predefined type mesh in object detector detection previous frame image
Mark obtained target frame.
It predicts position of the second target frame in the present image, obtains the second target frame in the present image
In prediction block be position of each second target frame of prediction in the present image, obtain each second target frame described
Prediction block in present image.For example, detecting 4 pedestrian target frames in the previous frame image of present image, then described 4 are predicted
(namely predict corresponding 4 pedestrians of 4 pedestrian target frames described in position of a pedestrian target frame in the present image
Position in present image), obtain prediction block of the 4 pedestrian target frames in the present image.
The fallout predictor can be deep neural network model.
Before being predicted using fallout predictor the second target frame, using the second training sample set to the fallout predictor into
Row training.The feature that the fallout predictor learns is depth characteristic, and color characteristic therein accounts for smaller, and the influence being illuminated by the light has
Limit.Therefore, the fallout predictor can overcome to a certain extent illumination bring influence, improve target following robustness and
Scene adaptability.In the present embodiment, the second training sample set may include a large amount of different illumination, deformation and high-speed motion object
The sample image of body.Therefore, the fallout predictor can further overcome the influence of illumination, and can overcome to a certain degree shape
Become, the influence of high-speed motion bring, the target of high-speed motion is tracked so that the present invention realizes, improves target following
Robustness.
In the present embodiment, can in the deep neural network model construction feature pyramid network (Feature
Pyramid Network, FPN), institute is predicted using the deep neural network model for being built with the feature pyramid network
State position of the second target frame in the present image.High level of the feature pyramid network low resolution, high semantic information
Feature carries out top-down side with the low-level feature of high-resolution, low semantic information and connects, so that the spy under all scales
Sign has semantic information abundant.The connection method of feature pyramid network is high-level characteristic to be done 2 times of up-samplings, then with it is right
The preceding layer feature answered combines (convolution kernel that preceding layer will pass through 1*1), and combination is exactly the addition between doing pixel.Pass through this
The connection of sample, each layer of prediction characteristic pattern used have all merged the feature of different resolution, different semantic intensity, and fusion is not
Do the object detection of corresponding resolution sizes respectively with the characteristic pattern of resolution ratio.This ensure that each layer has suitable resolution
Rate and strong semantic feature.Construction feature pyramid network can be improved to the second target in the deep neural network model
The performance of frame prediction, so that still preferably being predicted the second target frame that deformation occurs.
In one embodiment, the fallout predictor can be SiamFC network (Fully-Convolutional
Siamese Network) model, such as it is built with the SiamFC network model of feature pyramid network.Fig. 4 is SiamFC model
Schematic diagram.
In Fig. 4, what z was represented is template image, i.e., the second target frame in previous frame image;What x was represented is region of search,
That is present image;What φ was represented is a kind of Feature Mapping operation, and original image is mapped to specific feature space, can be used
Convolutional layer and pond layer in CNN;6*6*128 represents the feature that z is obtained after φ, is the spy of a 128 channel 6*6 sizes
Sign, similarly, 22*22*128 is feature of the x after φ;* convolution operation is represented, the feature of 22*22*128 is by the volume of 6*6*128
Product nuclear convolution, obtains the shot chart of a 17*17, represents the similarity of each position and template image in region of search.Search
It is exactly the position of prediction block with the highest position of similarity similarity of template image in region.
Matching module 203 is obtained for matching the first object frame in the present image with the prediction block
The matching result of the first object frame and the prediction block.
The first object frame and the matching result of the prediction block may include the first object frame and the prediction
Frame matching, the first object frame and any prediction block mismatch, the prediction block and any first object frame are not
Matching.
In the present embodiment, the overlapping area ratio of the first object frame and the prediction block can be calculated
(Intersection over Union, IOU), according to matched first mesh of the every a pair of the overlapping area ratio-dependent
Mark frame and the prediction block.
For example, first object frame includes first object frame A1, first object frame A2, first object frame A3, first object frame
A4, prediction block include prediction block P1, prediction block P2, prediction block P3, prediction block P4.Prediction block P1 corresponds to the second target frame B1, pre-
Survey the corresponding second target frame B2 of frame P2, the corresponding second target frame B3 of prediction block P3, the corresponding second target frame B4 of prediction block P4.For
First object frame A1, calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and
The overlapping area ratio of prediction block P3, first object frame A1 and prediction block P4, if first object frame A1 is overlapping with prediction block P1
Area ratio is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A1 and prediction block P1 phase
Match.Similarly, for first object frame A2, calculate first object frame A2 and prediction block P1, first object frame A2 and prediction block P2,
The overlapping area ratio of first object frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 with
The overlapping area ratio of prediction block P2 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A2 with
Prediction block P2 matches;For first object frame A3, first object frame A3 and prediction block P1, first object frame A3 and prediction are calculated
The overlapping area ratio of frame P2, first object frame A3 and prediction block P3, first object frame A3 and prediction block P4, if first object frame
The overlapping area ratio of A3 and prediction block P3 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame
A3 matches with prediction block P3;For first object frame A4, calculate first object frame A4 and prediction block P1, first object frame A4 with
The overlapping area ratio of prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4, if the first mesh
It marks the overlapping area ratio maximum of frame A4 and prediction block P4 and is greater than or equal to preset threshold (such as 70%), it is determined that the first mesh
Mark frame A4 and prediction block P4 matches.
Alternatively, the first object frame can be calculated at a distance from the central point of the prediction block, it is true according to the distance
Fixed every matched first object frame of a pair and the prediction block.
For example, including first object frame A1, first object frame A2, first object frame A3, first object in first object frame
Frame A4, prediction block include in the example of prediction block P1, prediction block P2, prediction block P3, prediction block P4, for first object frame A1,
Calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and prediction block P3, first
Target frame A1 at a distance from the central point of prediction block P4, if first object frame A1 it is minimum at a distance from the central point of prediction block P1 and
Less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame A1 matches with prediction block P1.Similarly,
Similarly, for first object frame A2, first object frame A2 and prediction block P1, first object frame A2 and prediction block P2, the are calculated
At a distance from the central point of one target frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 and pre-
It surveys the distance minimum of the central point of frame P2 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame
A2 matches with prediction block P2;For first object frame A3, calculate first object frame A3 and prediction block P1, first object frame A3 with
Prediction block P2, first object frame A3 are at a distance from the central point of prediction block P3, first object frame A3 and prediction block P4, if the first mesh
Mark frame A3 is minimum at a distance from the central point of prediction block P3 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that
First object frame A3 matches with prediction block P3;For first object frame A4, first object frame A4 and prediction block P1, first are calculated
The central point of target frame A4 and prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4 away from
From if first object frame A4 is minimum at a distance from the central point of prediction block P4 and is less than or equal to pre-determined distance (such as 10 pictures
Vegetarian refreshments), it is determined that first object frame A4 matches with prediction block P4.
Update module 204, for the matching result according to the first object frame and the prediction block, in the current figure
The position of more fresh target as in.
According to obtaining the matching result of the first object frame and the prediction block, the more fresh target in the present image
Position may include:
If the first object frame is matched with the prediction block, by the first object frame in the present image
Position is as the position after the corresponding target update of the prediction block;
If the first object frame and any prediction block mismatch, by first mesh in the present image
Mark position of the position of frame as new target;
If the prediction block and any first object frame mismatch, by the prediction block in the present image
Corresponding target is as lost target.
The present embodiment has supplied a kind of target tracker 20.The target following is to certain kinds in video or image sequence
The moving object (such as pedestrian) of type tracks, and obtains position of the moving object in each frame image.The target following
Device 20 obtains the first object in the present image using the predefined type target in object detector detection present image
Frame;The second target frame in the previous frame image of the present image is obtained, predicts that the second target frame exists using fallout predictor
Position in the present image obtains prediction block of the second target frame in the present image;By the current figure
First object frame as in is matched with the prediction block, obtains the matching knot of the first object frame Yu the prediction block
Fruit;According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.This
Embodiment improves the robustness and scene adaptability of target following.
Embodiment three
The present embodiment provides a kind of computer storage medium, it is stored with computer program in the computer storage medium, it should
The step in above-mentioned method for tracking target embodiment, such as step shown in FIG. 1 are realized when computer program is executed by processor
101-104:
Step 101, it using the predefined type target in object detector detection present image, obtains in the present image
First object frame;
Step 102, obtain the second target frame in the previous frame image of the present image, predicted using fallout predictor described in
Position of the second target frame in the present image obtains prediction block of the second target frame in the present image;
Step 103, the first object frame in the present image is matched with the prediction block, obtains described first
The matching result of target frame and the prediction block;
Step 104, it according to the matching result of the first object frame and the prediction block, is updated in the present image
The position of target.
Alternatively, the function of each module in above-mentioned apparatus embodiment is realized when the computer program is executed by processor, such as
Module 201-204 in Fig. 2:
Detection module 201 obtains the current figure using the predefined type target in object detector detection present image
First object frame as in;
Prediction module 202 is obtained the second target frame in the previous frame image of the present image, is predicted using fallout predictor
Position of the second target frame in the present image obtains prediction of the second target frame in the present image
Frame;
Matching module 203 is obtained for matching the first object frame in the present image with the prediction block
The matching result of the first object frame and the prediction block;
Update module 204, for the matching result according to the first object frame and the prediction block, in the current figure
The position of more fresh target as in.
Example IV
Fig. 3 is the schematic diagram for the computer installation that the embodiment of the present invention four provides.The computer installation 30 includes storage
Device 301, processor 302 and it is stored in the computer program that can be run in the memory 301 and on the processor 302
303, such as target following program.The processor 302 realizes above-mentioned method for tracking target when executing the computer program 303
Step in embodiment, such as step 101-104 shown in FIG. 1:
Step 101, it using the predefined type target in object detector detection present image, obtains in the present image
First object frame;
Step 102, obtain the second target frame in the previous frame image of the present image, predicted using fallout predictor described in
Position of the second target frame in the present image obtains prediction block of the second target frame in the present image;
Step 103, the first object frame in the present image is matched with the prediction block, obtains described first
The matching result of target frame and the prediction block;
Step 104, it according to the matching result of the first object frame and the prediction block, is updated in the present image
The position of target.
Alternatively, the function of each module in above-mentioned apparatus embodiment is realized when the computer program is executed by processor, such as
Module 201-204 in Fig. 2:
Detection module 201 obtains the current figure using the predefined type target in object detector detection present image
First object frame as in;
Prediction module 202 is obtained the second target frame in the previous frame image of the present image, is predicted using fallout predictor
Position of the second target frame in the present image obtains prediction of the second target frame in the present image
Frame;
Matching module 203 is obtained for matching the first object frame in the present image with the prediction block
The matching result of the first object frame and the prediction block;
Update module 204, for the matching result according to the first object frame and the prediction block, in the current figure
The position of more fresh target as in.
Illustratively, the computer program 303 can be divided into one or more modules, one or more of
Module is stored in the memory 301, and is executed by the processor 302, to complete this method.It is one or more of
Module can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing the computer
Implementation procedure of the program 303 in the computer installation 30.For example, the computer program 303 can be divided into Fig. 2
Detection module 201, prediction module 202, matching module 203, update module 204, each module concrete function is referring to embodiment two.
The computer installation 30 can be the calculating such as desktop PC, notebook, palm PC and cloud server
Equipment.It will be understood by those skilled in the art that the schematic diagram 3 is only the example of computer installation 30, do not constitute to meter
The restriction of calculation machine device 30 may include perhaps combining certain components or different portions than illustrating more or fewer components
Part, such as the computer installation 30 can also include input-output equipment, network access equipment, bus etc..
Alleged processor 302 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor 302 is also possible to any conventional processing
Device etc., the processor 302 are the control centres of the computer installation 30, are entirely calculated using various interfaces and connection
The various pieces of machine device 30.
The memory 301 can be used for storing the computer program 303, and the processor 302 is by operation or executes
The computer program or module being stored in the memory 301, and the data being stored in memory 301 are called, it realizes
The various functions of the computer installation 30.The memory 302 can mainly include storing program area and storage data area,
In, storing program area can application program needed for storage program area, at least one function (such as sound-playing function, image
Playing function etc.) etc.;Storage data area, which can be stored, uses created data (such as audio number according to computer installation 30
According to, phone directory etc.) etc..In addition, memory 301 may include high-speed random access memory, it can also include non-volatile deposit
Reservoir, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other
Volatile solid-state part.
If the integrated module of the computer installation 30 is realized in the form of software function module and as independent production
Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention realizes
All or part of the process in above-described embodiment method can also instruct relevant hardware to complete by computer program,
The computer program can be stored in a computer storage medium, which, can be real when being executed by processor
The step of existing above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer journey
Sequence code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described computer-readable
Medium may include: any entity or device, recording medium, USB flash disk, mobile hard that can carry the computer program code
Disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory
(RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs to illustrate
It is that the content that the computer-readable medium includes can be fitted according to the requirement made laws in jurisdiction with patent practice
When increase and decrease, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier wave letter
Number and telecommunication signal.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module
It divides, only a kind of logical function partition, there may be another division manner in actual implementation.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in a processing module
It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one
In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the present invention.Any attached associated diagram label in claim should not be considered as right involved in limitation to want
It asks.Furthermore, it is to be understood that one word of " comprising " is not excluded for other modules or step, odd number is not excluded for plural number.It is stated in system claims
Multiple modules or device can also be implemented through software or hardware by a module or device.The first, the second equal words
It is used to indicate names, and does not indicate any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference
Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention
Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. a kind of method for tracking target, which is characterized in that the described method includes:
Using the predefined type target in object detector detection present image, the first object in the present image is obtained
Frame;
The second target frame in the previous frame image of the present image is obtained, predicts that the second target frame exists using fallout predictor
Position in the present image obtains prediction block of the second target frame in the present image;
First object frame in the present image is matched with the prediction block, obtain the first object frame with it is described
The matching result of prediction block;
According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.
2. the method as described in claim 1, which is characterized in that the object detector is to speed up region convolutional neural networks mould
Type, quickening region convolutional neural networks model includes that network and fast area convolutional neural networks are suggested in region, described to add
Fast region convolutional neural networks model follows the steps below training before the predefined type target in detection described image:
First training step is suggested network using region described in Imagenet model initialization, is assembled for training using the first training sample
Practice the region and suggests network;
Second training step suggests that network generates the first training sample using the region after training in first training step
This concentrates the candidate frame of each sample image, utilizes the candidate frame training fast area convolutional neural networks;
Third training step, using described in the fast area convolutional neural networks initialization after training in second training step
Network is suggested in region, suggests network using first training sample set training region;
4th training step suggests fast area described in netinit using the region after training in the third training step
Convolutional neural networks, and the convolutional layer is kept to fix, use first training sample set training fast area convolution
Neural network.
3. method according to claim 2, which is characterized in that quickening region convolutional neural networks model uses ZF frame
Frame, the region suggest that network and the fast area convolutional neural networks share 5 convolutional layers.
4. the method as described in claim 1, which is characterized in that the fallout predictor is the depth for being built with feature pyramid network
Neural network model.
5. the method as described in claim 1, which is characterized in that predict the second target frame in institute using fallout predictor described
The position in present image is stated, obtains the second target frame before the prediction block in the present image, the method is also
Include:
The fallout predictor is trained using the second training sample set, second training sample set includes different illumination, shape
Become the sample image with high-speed moving object.
6. the method as described in claim 1, which is characterized in that the first object frame by the present image with it is described
Prediction block carries out matching
The overlapping area ratio for calculating the first object frame and the prediction block, it is each according to the overlapping area ratio-dependent
To the matched first object frame and the prediction block;Or
The first object frame is calculated at a distance from the central point of the prediction block, determines that every a pair is matched according to the distance
The first object frame and the prediction block.
7. the method as described in claim 1, which is characterized in that according to the first object frame and the prediction block
With as a result, the position of more fresh target includes: in the present image
If the first object frame is matched with the prediction block, by the position of the first object frame in the present image
As the position after the corresponding target update of the prediction block;
If the first object frame and any prediction block mismatch, by the first object frame in the present image
Position of the position as new target;
If the prediction block and any first object frame mismatch, in the present image that the prediction block is corresponding
Target as lost target.
8. a kind of target tracker, which is characterized in that described device includes:
Detection module, for obtaining the present image using the predefined type target in object detector detection present image
In first object frame;
Prediction module, the second target frame in the previous frame image for obtaining the present image utilize fallout predictor to predict institute
Position of the second target frame in the present image is stated, prediction of the second target frame in the present image is obtained
Frame;
Matching module obtains described for matching the first object frame in the present image with the prediction block
The matching result of one target frame and the prediction block;
Update module, for the matching result according to the first object frame and the prediction block, in the present image more
The position of fresh target.
9. a kind of computer installation, it is characterised in that: the computer installation includes processor, and the processor is deposited for executing
The computer program stored in reservoir is to realize the method for tracking target as described in any one of claim 1-7.
10. a kind of computer storage medium, computer program is stored in the computer storage medium, it is characterised in that: institute
It states and realizes the method for tracking target as described in any one of claim 1-7 when computer program is executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910064675.5A CN109903310A (en) | 2019-01-23 | 2019-01-23 | Method for tracking target, device, computer installation and computer storage medium |
PCT/CN2019/091160 WO2020151167A1 (en) | 2019-01-23 | 2019-06-13 | Target tracking method and device, computer device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910064675.5A CN109903310A (en) | 2019-01-23 | 2019-01-23 | Method for tracking target, device, computer installation and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109903310A true CN109903310A (en) | 2019-06-18 |
Family
ID=66944120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910064675.5A Pending CN109903310A (en) | 2019-01-23 | 2019-01-23 | Method for tracking target, device, computer installation and computer storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109903310A (en) |
WO (1) | WO2020151167A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443210A (en) * | 2019-08-08 | 2019-11-12 | 北京百度网讯科技有限公司 | A kind of pedestrian tracting method, device and terminal |
CN110490902A (en) * | 2019-08-02 | 2019-11-22 | 西安天和防务技术股份有限公司 | Method for tracking target, device, computer equipment applied to smart city |
CN110517292A (en) * | 2019-08-29 | 2019-11-29 | 京东方科技集团股份有限公司 | Method for tracking target, device, system and computer readable storage medium |
CN110605724A (en) * | 2019-07-01 | 2019-12-24 | 青岛联合创智科技有限公司 | Intelligence endowment robot that accompanies |
CN110738125A (en) * | 2019-09-19 | 2020-01-31 | 平安科技(深圳)有限公司 | Method, device and storage medium for selecting detection frame by using Mask R-CNN |
CN110738687A (en) * | 2019-10-18 | 2020-01-31 | 上海眼控科技股份有限公司 | Object tracking method, device, equipment and storage medium |
CN110838125A (en) * | 2019-11-08 | 2020-02-25 | 腾讯医疗健康(深圳)有限公司 | Target detection method, device, equipment and storage medium of medical image |
CN111199182A (en) * | 2019-11-12 | 2020-05-26 | 恒大智慧科技有限公司 | Lost object method, system and storage medium based on intelligent community |
CN111709975A (en) * | 2020-06-22 | 2020-09-25 | 上海高德威智能交通***有限公司 | Multi-target tracking method and device, electronic equipment and storage medium |
CN111754541A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Target tracking method, device, equipment and readable storage medium |
CN112150505A (en) * | 2020-09-11 | 2020-12-29 | 浙江大华技术股份有限公司 | Target object tracker updating method and device, storage medium and electronic device |
CN112184770A (en) * | 2020-09-28 | 2021-01-05 | 中国电子科技集团公司第五十四研究所 | Target tracking method based on YOLOv3 and improved KCF |
CN112308045A (en) * | 2020-11-30 | 2021-02-02 | 深圳集智数字科技有限公司 | Detection method and device for dense crowd and electronic equipment |
CN112446229A (en) * | 2019-08-27 | 2021-03-05 | 北京地平线机器人技术研发有限公司 | Method and device for acquiring pixel coordinates of marker post |
CN112749590A (en) * | 2019-10-30 | 2021-05-04 | 上海高德威智能交通***有限公司 | Object detection method, device, computer equipment and computer readable storage medium |
CN113034541A (en) * | 2021-02-26 | 2021-06-25 | 北京国双科技有限公司 | Target tracking method and device, computer equipment and storage medium |
CN113112866A (en) * | 2021-04-14 | 2021-07-13 | 深圳市旗扬特种装备技术工程有限公司 | Intelligent traffic early warning method and intelligent traffic early warning system |
WO2021142571A1 (en) * | 2020-01-13 | 2021-07-22 | 深圳大学 | Twin dual-path target tracking method |
CN113673541A (en) * | 2021-10-21 | 2021-11-19 | 广州微林软件有限公司 | Image sample generation method for target detection and application |
CN115457036A (en) * | 2022-11-10 | 2022-12-09 | 中国平安财产保险股份有限公司 | Detection model training method, intelligent counting method and related equipment |
CN117315028A (en) * | 2023-10-12 | 2023-12-29 | 北京多维视通技术有限公司 | Method, device, equipment and medium for positioning fire point of outdoor fire scene |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN107516303A (en) * | 2017-09-01 | 2017-12-26 | 成都通甲优博科技有限责任公司 | Multi-object tracking method and system |
CN108875588A (en) * | 2018-05-25 | 2018-11-23 | 武汉大学 | Across camera pedestrian detection tracking based on deep learning |
CN109117794A (en) * | 2018-08-16 | 2019-01-01 | 广东工业大学 | A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing |
-
2019
- 2019-01-23 CN CN201910064675.5A patent/CN109903310A/en active Pending
- 2019-06-13 WO PCT/CN2019/091160 patent/WO2020151167A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN107516303A (en) * | 2017-09-01 | 2017-12-26 | 成都通甲优博科技有限责任公司 | Multi-object tracking method and system |
CN108875588A (en) * | 2018-05-25 | 2018-11-23 | 武汉大学 | Across camera pedestrian detection tracking based on deep learning |
CN109117794A (en) * | 2018-08-16 | 2019-01-01 | 广东工业大学 | A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110605724A (en) * | 2019-07-01 | 2019-12-24 | 青岛联合创智科技有限公司 | Intelligence endowment robot that accompanies |
CN110490902A (en) * | 2019-08-02 | 2019-11-22 | 西安天和防务技术股份有限公司 | Method for tracking target, device, computer equipment applied to smart city |
CN110490902B (en) * | 2019-08-02 | 2022-06-14 | 西安天和防务技术股份有限公司 | Target tracking method and device applied to smart city and computer equipment |
CN110443210B (en) * | 2019-08-08 | 2021-11-26 | 北京百度网讯科技有限公司 | Pedestrian tracking method and device and terminal |
CN110443210A (en) * | 2019-08-08 | 2019-11-12 | 北京百度网讯科技有限公司 | A kind of pedestrian tracting method, device and terminal |
CN112446229B (en) * | 2019-08-27 | 2024-07-16 | 北京地平线机器人技术研发有限公司 | Pixel coordinate acquisition method and device for marker link |
CN112446229A (en) * | 2019-08-27 | 2021-03-05 | 北京地平线机器人技术研发有限公司 | Method and device for acquiring pixel coordinates of marker post |
CN110517292A (en) * | 2019-08-29 | 2019-11-29 | 京东方科技集团股份有限公司 | Method for tracking target, device, system and computer readable storage medium |
US11455735B2 (en) | 2019-08-29 | 2022-09-27 | Beijing Boe Technology Development Co., Ltd. | Target tracking method, device, system and non-transitory computer readable storage medium |
CN110738125A (en) * | 2019-09-19 | 2020-01-31 | 平安科技(深圳)有限公司 | Method, device and storage medium for selecting detection frame by using Mask R-CNN |
CN110738125B (en) * | 2019-09-19 | 2023-08-01 | 平安科技(深圳)有限公司 | Method, device and storage medium for selecting detection frame by Mask R-CNN |
WO2021051601A1 (en) * | 2019-09-19 | 2021-03-25 | 平安科技(深圳)有限公司 | Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium |
CN110738687A (en) * | 2019-10-18 | 2020-01-31 | 上海眼控科技股份有限公司 | Object tracking method, device, equipment and storage medium |
CN112749590B (en) * | 2019-10-30 | 2023-02-07 | 上海高德威智能交通***有限公司 | Object detection method, device, computer equipment and computer readable storage medium |
CN112749590A (en) * | 2019-10-30 | 2021-05-04 | 上海高德威智能交通***有限公司 | Object detection method, device, computer equipment and computer readable storage medium |
CN110838125B (en) * | 2019-11-08 | 2024-03-19 | 腾讯医疗健康(深圳)有限公司 | Target detection method, device, equipment and storage medium for medical image |
CN110838125A (en) * | 2019-11-08 | 2020-02-25 | 腾讯医疗健康(深圳)有限公司 | Target detection method, device, equipment and storage medium of medical image |
CN111199182A (en) * | 2019-11-12 | 2020-05-26 | 恒大智慧科技有限公司 | Lost object method, system and storage medium based on intelligent community |
WO2021142571A1 (en) * | 2020-01-13 | 2021-07-22 | 深圳大学 | Twin dual-path target tracking method |
CN111709975A (en) * | 2020-06-22 | 2020-09-25 | 上海高德威智能交通***有限公司 | Multi-target tracking method and device, electronic equipment and storage medium |
CN111709975B (en) * | 2020-06-22 | 2023-11-03 | 上海高德威智能交通***有限公司 | Multi-target tracking method, device, electronic equipment and storage medium |
CN111754541B (en) * | 2020-07-29 | 2023-09-19 | 腾讯科技(深圳)有限公司 | Target tracking method, device, equipment and readable storage medium |
CN111754541A (en) * | 2020-07-29 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Target tracking method, device, equipment and readable storage medium |
CN112150505A (en) * | 2020-09-11 | 2020-12-29 | 浙江大华技术股份有限公司 | Target object tracker updating method and device, storage medium and electronic device |
CN112184770A (en) * | 2020-09-28 | 2021-01-05 | 中国电子科技集团公司第五十四研究所 | Target tracking method based on YOLOv3 and improved KCF |
CN112308045A (en) * | 2020-11-30 | 2021-02-02 | 深圳集智数字科技有限公司 | Detection method and device for dense crowd and electronic equipment |
CN112308045B (en) * | 2020-11-30 | 2023-11-24 | 深圳集智数字科技有限公司 | Method and device for detecting dense crowd and electronic equipment |
CN113034541A (en) * | 2021-02-26 | 2021-06-25 | 北京国双科技有限公司 | Target tracking method and device, computer equipment and storage medium |
CN113112866A (en) * | 2021-04-14 | 2021-07-13 | 深圳市旗扬特种装备技术工程有限公司 | Intelligent traffic early warning method and intelligent traffic early warning system |
CN113673541A (en) * | 2021-10-21 | 2021-11-19 | 广州微林软件有限公司 | Image sample generation method for target detection and application |
CN115457036A (en) * | 2022-11-10 | 2022-12-09 | 中国平安财产保险股份有限公司 | Detection model training method, intelligent counting method and related equipment |
CN117315028B (en) * | 2023-10-12 | 2024-04-30 | 北京多维视通技术有限公司 | Method, device, equipment and medium for positioning fire point of outdoor fire scene |
CN117315028A (en) * | 2023-10-12 | 2023-12-29 | 北京多维视通技术有限公司 | Method, device, equipment and medium for positioning fire point of outdoor fire scene |
Also Published As
Publication number | Publication date |
---|---|
WO2020151167A1 (en) | 2020-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109903310A (en) | Method for tracking target, device, computer installation and computer storage medium | |
CN109886998A (en) | Multi-object tracking method, device, computer installation and computer storage medium | |
CN108121986B (en) | Object detection method and device, computer device and computer readable storage medium | |
Zhang et al. | C2FDA: Coarse-to-fine domain adaptation for traffic object detection | |
Lee et al. | Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm | |
Yi et al. | ASSD: Attentive single shot multibox detector | |
CN112052787B (en) | Target detection method and device based on artificial intelligence and electronic equipment | |
CN109584248A (en) | Infrared surface object instance dividing method based on Fusion Features and dense connection network | |
CN106845430A (en) | Pedestrian detection and tracking based on acceleration region convolutional neural networks | |
CN107851191A (en) | The priori based on context for the object detection in image | |
CN110334705A (en) | A kind of Language Identification of the scene text image of the global and local information of combination | |
CN106204522A (en) | The combined depth of single image is estimated and semantic tagger | |
KR102462934B1 (en) | Video analysis system for digital twin technology | |
CN110533695A (en) | A kind of trajectory predictions device and method based on DS evidence theory | |
CN111126459A (en) | Method and device for identifying fine granularity of vehicle | |
WO2019108250A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
CN115512251A (en) | Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement | |
CN109766822A (en) | Gesture identification method neural network based and system | |
Pei et al. | Localized traffic sign detection with multi-scale deconvolution networks | |
CN110196917A (en) | Personalized LOGO format method for customizing, system and storage medium | |
Li et al. | Gated auxiliary edge detection task for road extraction with weight-balanced loss | |
Tao et al. | An adaptive frame selection network with enhanced dilated convolution for video smoke recognition | |
Wang et al. | Detection and tracking based tubelet generation for video object detection | |
Deng et al. | Deep learning in crowd counting: A survey | |
Vaishali | Real-time object detection system using caffe model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |