CN109903310A

CN109903310A - Method for tracking target, device, computer installation and computer storage medium

Info

Publication number: CN109903310A
Application number: CN201910064675.5A
Authority: CN
Inventors: 杨国青
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2019-06-18
Also published as: WO2020151167A1

Abstract

The invention belongs to technical field of image processing, a kind of method for tracking target, device, computer installation and storage medium are provided.The method for tracking target includes: to obtain the first object frame in the present image using the predefined type target in object detector detection present image；The second target frame in the previous frame image of the present image is obtained, position of the second target frame in the present image is predicted using fallout predictor, obtains prediction block of the second target frame in the present image；First object frame in the present image is matched with the prediction block, obtains the matching result of the first object frame and the prediction block；According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.The present invention improves the robustness and scene adaptability of target following.

Description

Method for tracking target, device, computer installation and computer storage medium

Technical field

The present invention relates to technical field of image processing, and in particular to a kind of method for tracking target, device, computer installation and Computer storage medium.

Background technique

Target following refers to the moving object (such as automobile and pedestrian in traffic video) in video or image sequence It is tracked, obtains moving object in the position of each frame.Target following is led in video monitoring, automatic Pilot and video entertainment etc. Domain is widely used.

Current target following mainly uses track by detection framework, in video or every frame of image sequence The location information of each target is detected on image by detector, then by the target position information of present frame and former frame Target position information is matched.However, the robustness of existing scheme target following is not high, if illumination changes, with Track is ineffective.

Summary of the invention

In view of the foregoing, it is necessary to propose that a kind of method for tracking target, device, computer installation and computer storage are situated between The robustness and scene adaptability of target following can be improved in matter.

The first aspect of the application provides a kind of method for tracking target, which comprises

Using the predefined type target in object detector detection present image, the first mesh in the present image is obtained Mark frame；

The second target frame in the previous frame image of the present image is obtained, predicts second target using fallout predictor Position of the frame in the present image obtains prediction block of the second target frame in the present image；

First object frame in the present image is matched with the prediction block, obtain the first object frame with The matching result of the prediction block；

According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image It sets.

In alternatively possible implementation, the object detector is to speed up region convolutional neural networks model, described Accelerating region convolutional neural networks model includes that network and fast area convolutional neural networks, the quickening region volume are suggested in region Product neural network model follows the steps below training before the predefined type target in detection described image:

First training step suggests network using region described in Imagenet model initialization, uses the first training sample Suggest network in the collection training region；

Second training step suggests that network generates first instruction using the region after training in first training step The candidate frame for practicing each sample image in sample set utilizes the candidate frame training fast area convolutional neural networks；

Third training step is initialized using the fast area convolutional neural networks after training in second training step Network is suggested in the region, suggests network using first training sample set training region；

4th training step is suggested described in netinit quickly using the region after training in the third training step Region convolutional neural networks, and the convolutional layer is kept to fix, use first training sample set training fast area Convolutional neural networks.

In alternatively possible implementation, quickening region convolutional neural networks model uses ZF frame, the area Suggest that network and the fast area convolutional neural networks share 5 convolutional layers in domain.

In alternatively possible implementation, the fallout predictor is the deep neural network for being built with feature pyramid network Model.

In alternatively possible implementation, predict the second target frame in the current figure using fallout predictor described Position as in, obtains the second target frame before the prediction block in the present image, the method also includes:

The fallout predictor is trained using the second training sample set, second training sample set includes not sharing the same light According to, the sample image of deformation and high-speed moving object.

In alternatively possible implementation, the first object frame by the present image and the prediction block into Row matches

The overlapping area ratio for calculating the first object frame and the prediction block, according to the overlapping area ratio-dependent Every matched first object frame of a pair and the prediction block；Or

The first object frame is calculated at a distance from the central point of the prediction block, determines every a pair of according to the distance The first object frame matched and the prediction block.

In alternatively possible implementation, the matching result according to the first object frame and the prediction block, The position of more fresh target includes: in the present image

If the first object frame is matched with the prediction block, by the first object frame in the present image Position is as the position after the corresponding target update of the prediction block；

If the first object frame and any prediction block mismatch, by first mesh in the present image Mark position of the position of frame as new target；

If the prediction block and any first object frame mismatch, by the prediction block in the present image Corresponding target is as lost target.

The second aspect of the application provides a kind of target tracker, and described device includes:

Detection module, for obtaining described current using the predefined type target in object detector detection present image First object frame in image；

Prediction module, the second target frame in the previous frame image for obtaining the present image are pre- using fallout predictor Position of the second target frame in the present image is surveyed, it is pre- in the present image to obtain the second target frame Survey frame；

Matching module obtains institute for matching the first object frame in the present image with the prediction block State the matching result of first object frame Yu the prediction block；

Update module, for the matching result according to the first object frame and the prediction block, in the present image In more fresh target position.

The third aspect of the application provides a kind of computer installation, and the computer installation includes processor, the processing Device is for realizing the method for tracking target when executing the computer program stored in memory.

The fourth aspect of the application provides a kind of computer storage medium, is stored thereon with computer program, the calculating Machine program realizes the method for tracking target when being executed by processor.

The present invention is obtained in the present image using the predefined type target in object detector detection present image First object frame；The second target frame in the previous frame image of the present image is obtained, fallout predictor prediction described second is utilized Position of the target frame in the present image obtains prediction block of the second target frame in the present image；By institute The first object frame stated in present image is matched with the prediction block, obtains the first object frame and the prediction block Matching result；According to the matching result of the first object frame and the prediction block, the more fresh target in the present image Position.The present invention improves the robustness and scene adaptability of target following.

Detailed description of the invention

Fig. 1 is the flow chart of method for tracking target provided in an embodiment of the present invention.

Fig. 2 is the structure chart of target tracker provided in an embodiment of the present invention.

Fig. 3 is the schematic diagram of computer installation provided in an embodiment of the present invention.

Fig. 4 is the schematic diagram of SiamFC model.

Specific embodiment

To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment In feature can be combined with each other.

In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.

Preferably, method for tracking target of the invention is applied in one or more computer installation.The computer Device is that one kind can be according to the instruction for being previously set or storing, the automatic equipment for carrying out numerical value calculating and/or information processing, Hardware includes but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital Signal Processor, DSP), embedded device etc..

The computer installation can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The computer installation can carry out people by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices with user Machine interaction.

Embodiment one

Fig. 1 is the flow chart for the method for tracking target that the embodiment of the present invention one provides.The method for tracking target is applied to Computer installation.

Method for tracking target of the present invention carries out moving object (such as pedestrian) certain types of in video or image sequence Tracking, obtains position of the moving object in each frame image.The method for tracking target can solve existing scheme can not be right The shortcomings that target of high-speed motion is tracked improves the robustness of target following.

As shown in Figure 1, the method for tracking target includes:

Step 101, it using the predefined type target in object detector detection present image, obtains in the present image First object frame.

The predefined type target may include pedestrian, automobile, aircraft, ship etc..The predefined type target can be A type of target (such as pedestrian) is also possible to a plurality of types of targets (such as pedestrian and automobile).

The object detector can be with classification and return the neural network model of function.In the present embodiment, institute Region convolutional neural networks (Faster Region-Based Convolutional can be to speed up by stating object detector Neural Network, Faster RCNN) model.

Faster RCNN model includes that network (Region Proposal Network, RPN) and quick area are suggested in region Domain convolutional neural networks (Fast Region-based Convolution Neural Network, Fast RCNN).

Network is suggested in the region and the fast area convolutional neural networks have shared convolutional layer, and the convolutional layer is used In the characteristic pattern for extracting image.The region suggests that network generates the candidate frame of image according to the characteristic pattern, and by generation Candidate frame inputs the fast area convolutional neural networks.The fast area convolutional neural networks are according to the characteristic pattern to institute It states candidate frame to be screened and adjusted, obtains the target frame of image.

Before the first object frame using the predefined type target in object detector detection present image, first is used Training sample set is trained the object detector.In training, the convolutional layer extracts the first training sample and concentrates respectively The characteristic pattern of a sample image, the region suggest that network obtains the candidate in each sample image according to the characteristic pattern Frame, the fast area convolutional neural networks are screened and are adjusted to the candidate frame according to the characteristic pattern, are obtained described The target frame of each sample image.The target frame may include different type target (such as pedestrian, automobile, aircraft, ship Deng) target frame.

In a preferred embodiment, quickening region convolutional neural networks model uses ZF frame, and the region is suggested Network and the fast area convolutional neural networks share 5 convolutional layers.ZF frame is a kind of common network structure, by Matthew D Zeiler and Rob Fergus is in 2013 in " Visualizing and Understanding It is proposed in Convolutional Networks " paper, belongs to the variant of AlexNet network.ZF has been carried out micro- based on AlexNet It adjusts, using ReLU activation primitive and cross entropy cost function, more original pixels information is retained using lesser convolution kernel.

It in one embodiment, can be according to the following steps using the first training sample set to quickening region convolutional Neural Network model is trained:

(1) suggest network using region described in Imagenet model initialization, use first training sample set training Suggest network in the region；

(2) region in (1) after training is used to suggest that network generates the time that the first training sample concentrates each sample image Frame is selected, the candidate frame training fast area convolutional neural networks are utilized.At this point, network and fast area volume are suggested in region Product neural network shares convolutional layer not yet；

(3) it uses the fast area convolutional neural networks in (2) after training to initialize the region and suggests network, use the Suggest network in the one training sample set training region；

(4) it uses the region in (3) after training to suggest fast area convolutional neural networks described in netinit, and keeps The convolutional layer is fixed, and the first training sample set training fast area convolutional neural networks are used.At this point, net is suggested in region Network and fast area convolutional neural networks share identical convolutional layer, constitute a unified network model.

The candidate frame that region suggests that network is chosen is more, can screen several according to the target classification score of candidate frame The candidate frame of highest scoring is input to fast area convolutional neural networks, to accelerate the speed of training and detection.

Back-propagation algorithm can be used, network, which is trained, is suggested to region, adjustment region suggests network in training process Network parameter, minimize loss function.Suggest the forecast confidence of the candidate frame of neural network forecast in loss function indicating area With the difference of true confidence level.Loss function may include target classification loss and recurrence loss two parts.

Loss function can be with is defined as:

Wherein, i is the index of candidate frame in a training batch (mini-batch).

It is the target classification loss of candidate frame.N_clsFor the size of training batch, such as 256.p_iIt is i-th A candidate frame is the prediction probability of target.It is GT label, if candidate frame is positive, (label distributed is positive label, referred to as just Candidate frame),It is 1；If candidate frame is negative (label distributed be negative label, referred to as negative candidate frame),It is 0.It may be calculated

It is the recurrence loss of candidate frame.λ is balance weight, can be taken as 10.N_regFor the number of candidate frame Amount.It may be calculatedt_iIt is a coordinate vector, i.e. t_i=(t_x,t_y, t_w,t_h), indicate 4 parametrization coordinates (such as the coordinate and width in the candidate frame upper left corner, height) of candidate frame.Be with The coordinate vector of the corresponding GT bounding box of positive candidate frame, i.e., (such as the real goal frame upper left corner Coordinate and width, height).R is the loss function (smoothL1) with robustness, is defined as:

The training method of fast area convolutional network is referred to the training method that network is suggested in region, no longer superfluous herein It states.

In the present embodiment, negative sample difficulty example is added in the training of fast area convolutional network and excavates (Hard Negative Mining, HNM) method.For being wrongly classified as the negative sample of positive sample by fast area convolutional network (i.e. Difficult example), the information of these negative samples is recorded, during next repetitive exercise, these negative samples are inputted again It is concentrated to the first training sample, and increases the weight of its loss, enhanced its influence to classifier, can guarantee not stop in this way Classify for the negative sample being more difficult so that the feature that classifier is acquired is from the easier to the more advanced, the sample distribution covered is also had more Diversity.

In other examples, the object detector can also be other neural network models, such as region volume Product neural network (RCNN) model accelerates convolutional neural networks (Faster RCNN) model.

When using predefined type target in object detector detection image, described image is inputted into the target detection Device, the object detector detect the predefined type target in image, export the predefined type target in described image First object frame position.For example, 6 first object frames in the object detector output described image.First object Frame is presented in the form of rectangle frame.The position of first object frame can indicate that the position coordinates may include with position coordinates Top left co-ordinate (x, y) He Kuangao (w, h).

The object detector can also export the type of each first object frame, for example, 5 pedestrian's types of output the The first object frame of one target frame and 1 car category.

Step 102, obtain the second target frame in the previous frame image of the present image, predicted using fallout predictor described in Position of the second target frame in the present image obtains prediction block of the second target frame in the present image.

The second target frame in previous frame image is to utilize the predefined type mesh in object detector detection previous frame image Mark obtained target frame.

It predicts position of the second target frame in the present image, obtains the second target frame in the present image In prediction block be position of each second target frame of prediction in the present image, obtain each second target frame described Prediction block in present image.For example, detecting 4 pedestrian target frames in the previous frame image of present image, then described 4 are predicted (namely predict corresponding 4 pedestrians of 4 pedestrian target frames described in position of a pedestrian target frame in the present image Position in present image), obtain prediction block of the 4 pedestrian target frames in the present image.

The fallout predictor can be deep neural network model.

Before being predicted using fallout predictor the second target frame, using the second training sample set to the fallout predictor into Row training.The feature that the fallout predictor learns is depth characteristic, and color characteristic therein accounts for smaller, and the influence being illuminated by the light has Limit.Therefore, the fallout predictor can overcome to a certain extent illumination bring influence, improve target following robustness and Scene adaptability.In the present embodiment, the second training sample set may include a large amount of different illumination, deformation and high-speed motion object The sample image of body.Therefore, the fallout predictor can further overcome the influence of illumination, and can overcome to a certain degree shape Become, the influence of high-speed motion bring, the target of high-speed motion is tracked so that the present invention realizes, improves target following Robustness.

In the present embodiment, can in the deep neural network model construction feature pyramid network (Feature Pyramid Network, FPN), institute is predicted using the deep neural network model for being built with the feature pyramid network State position of the second target frame in the present image.High level of the feature pyramid network low resolution, high semantic information Feature carries out top-down side with the low-level feature of high-resolution, low semantic information and connects, so that the spy under all scales Sign has semantic information abundant.The connection method of feature pyramid network is high-level characteristic to be done 2 times of up-samplings, then with it is right The preceding layer feature answered combines (convolution kernel that preceding layer will pass through 1*1), and combination is exactly the addition between doing pixel.Pass through this The connection of sample, each layer of prediction characteristic pattern used have all merged the feature of different resolution, different semantic intensity, and fusion is not Do the object detection of corresponding resolution sizes respectively with the characteristic pattern of resolution ratio.This ensure that each layer has suitable resolution Rate and strong semantic feature.Construction feature pyramid network can be improved to the second target in the deep neural network model The performance of frame prediction, so that still preferably being predicted the second target frame that deformation occurs.

In one embodiment, the fallout predictor can be SiamFC network (Fully-Convolutional Siamese Network) model, such as it is built with the SiamFC network model of feature pyramid network.

Fig. 4 is the schematic diagram of SiamFC model.In Fig. 4, what z was represented is template image, i.e. second in previous frame image Target frame；What x was represented is region of search, i.e. present image；What is represented is a kind of Feature Mapping operation, and original image is mapped It, can be using the convolutional layer and pond layer in CNN to specific feature space；6*6*128 represents z processThe feature obtained afterwards, It is the feature of a 128 channel 6*6 sizes, similarly, 22*22*128 is that x passes throughFeature afterwards；* convolution operation, 22* are represented The feature of 22*128 obtains the shot chart of a 17*17 by the convolution nuclear convolution of 6*6*128, represents each position in region of search Set the similarity with template image.It with the highest position of similarity similarity of template image is exactly prediction block in region of search Position.

Step 103, the first object frame in the present image is matched with the prediction block, obtains described first The matching result of target frame and the prediction block.

The first object frame and the matching result of the prediction block may include the first object frame and the prediction Frame matching, the first object frame and any prediction block mismatch, the prediction block and any first object frame are not Matching.

In the present embodiment, the overlapping area ratio of the first object frame and the prediction block can be calculated (Intersection over Union, IOU), according to matched first mesh of the every a pair of the overlapping area ratio-dependent Mark frame and the prediction block.

For example, first object frame includes first object frame A1, first object frame A2, first object frame A3, first object frame A4, prediction block include prediction block P1, prediction block P2, prediction block P3, prediction block P4.Prediction block P1 corresponds to the second target frame B1, pre- Survey the corresponding second target frame B2 of frame P2, the corresponding second target frame B3 of prediction block P3, the corresponding second target frame B4 of prediction block P4.For First object frame A1, calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and The overlapping area ratio of prediction block P3, first object frame A1 and prediction block P4, if first object frame A1 is overlapping with prediction block P1 Area ratio is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A1 and prediction block P1 phase Match.Similarly, for first object frame A2, calculate first object frame A2 and prediction block P1, first object frame A2 and prediction block P2, The overlapping area ratio of first object frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 with The overlapping area ratio of prediction block P2 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A2 with Prediction block P2 matches；For first object frame A3, first object frame A3 and prediction block P1, first object frame A3 and prediction are calculated The overlapping area ratio of frame P2, first object frame A3 and prediction block P3, first object frame A3 and prediction block P4, if first object frame The overlapping area ratio of A3 and prediction block P3 is maximum and is greater than or equal to preset threshold (such as 70%), it is determined that first object frame A3 matches with prediction block P3；For first object frame A4, calculate first object frame A4 and prediction block P1, first object frame A4 with The overlapping area ratio of prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4, if the first mesh It marks the overlapping area ratio maximum of frame A4 and prediction block P4 and is greater than or equal to preset threshold (such as 70%), it is determined that the first mesh Mark frame A4 and prediction block P4 matches.

Alternatively, the first object frame can be calculated at a distance from the central point of the prediction block, it is true according to the distance Fixed every matched first object frame of a pair and the prediction block.

For example, including first object frame A1, first object frame A2, first object frame A3, first object in first object frame Frame A4, prediction block include in the example of prediction block P1, prediction block P2, prediction block P3, prediction block P4, for first object frame A1, Calculate first object frame A1 and prediction block P1, first object frame A1 and prediction block P2, first object frame A1 and prediction block P3, first Target frame A1 at a distance from the central point of prediction block P4, if first object frame A1 it is minimum at a distance from the central point of prediction block P1 and Less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame A1 matches with prediction block P1.Similarly, Similarly, for first object frame A2, first object frame A2 and prediction block P1, first object frame A2 and prediction block P2, the are calculated At a distance from the central point of one target frame A2 and prediction block P3, first object frame A2 and prediction block P4, if first object frame A2 and pre- It surveys the distance minimum of the central point of frame P2 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that first object frame A2 matches with prediction block P2；For first object frame A3, calculate first object frame A3 and prediction block P1, first object frame A3 with Prediction block P2, first object frame A3 are at a distance from the central point of prediction block P3, first object frame A3 and prediction block P4, if the first mesh Mark frame A3 is minimum at a distance from the central point of prediction block P3 and is less than or equal to pre-determined distance (such as 10 pixels), it is determined that First object frame A3 matches with prediction block P3；For first object frame A4, first object frame A4 and prediction block P1, first are calculated The central point of target frame A4 and prediction block P2, first object frame A4 and prediction block P3, first object frame A4 and prediction block P4 away from From if first object frame A4 is minimum at a distance from the central point of prediction block P4 and is less than or equal to pre-determined distance (such as 10 pictures Vegetarian refreshments), it is determined that first object frame A4 matches with prediction block P4.

Step 104, it according to the matching result of the first object frame and the prediction block, is updated in the present image The position of target.

According to obtaining the matching result of the first object frame and the prediction block, the more fresh target in the present image Position may include:

The method for tracking target of embodiment one is obtained using the predefined type target in object detector detection present image First object frame in the present image；The second target frame in the previous frame image of the present image is obtained, using pre- It surveys device and predicts position of the second target frame in the present image, obtain the second target frame in the present image In prediction block；First object frame in the present image is matched with the prediction block, obtains the first object The matching result of frame and the prediction block；According to the matching result of the first object frame and the prediction block, described current The position of more fresh target in image.Embodiment one improves the robustness and scene adaptability of target following.

Embodiment two

Fig. 2 is the structure chart of target tracker provided by Embodiment 2 of the present invention.The target tracker 20 is applied In computer installation.The target following of the present apparatus to moving object (such as pedestrian) certain types of in video or image sequence into Line trace obtains position of the moving object in each frame image.Target following can be improved in the target tracker 20 Robustness and scene adaptability.As shown in Fig. 2, the target tracker 20 may include detection module 201, prediction module 202, matching module 203, update module 204.

Detection module 201, for obtaining described work as using the predefined type target in object detector detection present image First object frame in preceding image.

In a preferred embodiment, quickening region convolutional neural networks model uses ZF frame, and the region is suggested Network and the fast area convolutional neural networks share 5 convolutional layers.

Wherein, i is the index of candidate frame in a training batch (mini-batch).

It is the recurrence loss of candidate frame.λ is balance weight, can be taken as 10.N_regFor the number of candidate frame Amount.It may be calculatedt_iIt is a coordinate vector, i.e. t_i=(t_x,t_y, t_w,t_h), indicate 4 parametrization coordinates (such as the coordinate and width in the candidate frame upper left corner, height) of candidate frame.Be with just The coordinate vector of the corresponding GT bounding box of candidate frame, i.e., (such as the seat in the real goal frame upper left corner Be marked with and width, height).R is the loss function (smoothL1) with robustness, is defined as:

Prediction module 202, the second target frame in the previous frame image for obtaining the present image, utilizes fallout predictor It predicts position of the second target frame in the present image, obtains the second target frame in the present image Prediction block.

The fallout predictor can be deep neural network model.

In one embodiment, the fallout predictor can be SiamFC network (Fully-Convolutional Siamese Network) model, such as it is built with the SiamFC network model of feature pyramid network.Fig. 4 is SiamFC model Schematic diagram.

In Fig. 4, what z was represented is template image, i.e., the second target frame in previous frame image；What x was represented is region of search, That is present image；What φ was represented is a kind of Feature Mapping operation, and original image is mapped to specific feature space, can be used Convolutional layer and pond layer in CNN；6*6*128 represents the feature that z is obtained after φ, is the spy of a 128 channel 6*6 sizes Sign, similarly, 22*22*128 is feature of the x after φ；* convolution operation is represented, the feature of 22*22*128 is by the volume of 6*6*128 Product nuclear convolution, obtains the shot chart of a 17*17, represents the similarity of each position and template image in region of search.Search It is exactly the position of prediction block with the highest position of similarity similarity of template image in region.

Matching module 203 is obtained for matching the first object frame in the present image with the prediction block The matching result of the first object frame and the prediction block.

Update module 204, for the matching result according to the first object frame and the prediction block, in the current figure The position of more fresh target as in.

The present embodiment has supplied a kind of target tracker 20.The target following is to certain kinds in video or image sequence The moving object (such as pedestrian) of type tracks, and obtains position of the moving object in each frame image.The target following Device 20 obtains the first object in the present image using the predefined type target in object detector detection present image Frame；The second target frame in the previous frame image of the present image is obtained, predicts that the second target frame exists using fallout predictor Position in the present image obtains prediction block of the second target frame in the present image；By the current figure First object frame as in is matched with the prediction block, obtains the matching knot of the first object frame Yu the prediction block Fruit；According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.This Embodiment improves the robustness and scene adaptability of target following.

Embodiment three

The present embodiment provides a kind of computer storage medium, it is stored with computer program in the computer storage medium, it should The step in above-mentioned method for tracking target embodiment, such as step shown in FIG. 1 are realized when computer program is executed by processor 101-104:

Step 101, it using the predefined type target in object detector detection present image, obtains in the present image First object frame；

Step 102, obtain the second target frame in the previous frame image of the present image, predicted using fallout predictor described in Position of the second target frame in the present image obtains prediction block of the second target frame in the present image；

Step 103, the first object frame in the present image is matched with the prediction block, obtains described first The matching result of target frame and the prediction block；

Alternatively, the function of each module in above-mentioned apparatus embodiment is realized when the computer program is executed by processor, such as Module 201-204 in Fig. 2:

Detection module 201 obtains the current figure using the predefined type target in object detector detection present image First object frame as in；

Prediction module 202 is obtained the second target frame in the previous frame image of the present image, is predicted using fallout predictor Position of the second target frame in the present image obtains prediction of the second target frame in the present image Frame；

Matching module 203 is obtained for matching the first object frame in the present image with the prediction block The matching result of the first object frame and the prediction block；

Example IV

Fig. 3 is the schematic diagram for the computer installation that the embodiment of the present invention four provides.The computer installation 30 includes storage Device 301, processor 302 and it is stored in the computer program that can be run in the memory 301 and on the processor 302 303, such as target following program.The processor 302 realizes above-mentioned method for tracking target when executing the computer program 303 Step in embodiment, such as step 101-104 shown in FIG. 1:

Illustratively, the computer program 303 can be divided into one or more modules, one or more of Module is stored in the memory 301, and is executed by the processor 302, to complete this method.It is one or more of Module can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing the computer Implementation procedure of the program 303 in the computer installation 30.For example, the computer program 303 can be divided into Fig. 2 Detection module 201, prediction module 202, matching module 203, update module 204, each module concrete function is referring to embodiment two.

The computer installation 30 can be the calculating such as desktop PC, notebook, palm PC and cloud server Equipment.It will be understood by those skilled in the art that the schematic diagram 3 is only the example of computer installation 30, do not constitute to meter The restriction of calculation machine device 30 may include perhaps combining certain components or different portions than illustrating more or fewer components Part, such as the computer installation 30 can also include input-output equipment, network access equipment, bus etc..

Alleged processor 302 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor 302 is also possible to any conventional processing Device etc., the processor 302 are the control centres of the computer installation 30, are entirely calculated using various interfaces and connection The various pieces of machine device 30.

The memory 301 can be used for storing the computer program 303, and the processor 302 is by operation or executes The computer program or module being stored in the memory 301, and the data being stored in memory 301 are called, it realizes The various functions of the computer installation 30.The memory 302 can mainly include storing program area and storage data area, In, storing program area can application program needed for storage program area, at least one function (such as sound-playing function, image Playing function etc.) etc.；Storage data area, which can be stored, uses created data (such as audio number according to computer installation 30 According to, phone directory etc.) etc..In addition, memory 301 may include high-speed random access memory, it can also include non-volatile deposit Reservoir, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other Volatile solid-state part.

If the integrated module of the computer installation 30 is realized in the form of software function module and as independent production Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention realizes All or part of the process in above-described embodiment method can also instruct relevant hardware to complete by computer program, The computer program can be stored in a computer storage medium, which, can be real when being executed by processor The step of existing above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer journey Sequence code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described computer-readable Medium may include: any entity or device, recording medium, USB flash disk, mobile hard that can carry the computer program code Disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs to illustrate It is that the content that the computer-readable medium includes can be fitted according to the requirement made laws in jurisdiction with patent practice When increase and decrease, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier wave letter Number and telecommunication signal.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the module It divides, only a kind of logical function partition, there may be another division manner in actual implementation.

The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

It, can also be in addition, each functional module in each embodiment of the present invention can integrate in a processing module It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.

The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any attached associated diagram label in claim should not be considered as right involved in limitation to want It asks.Furthermore, it is to be understood that one word of " comprising " is not excluded for other modules or step, odd number is not excluded for plural number.It is stated in system claims Multiple modules or device can also be implemented through software or hardware by a module or device.The first, the second equal words It is used to indicate names, and does not indicate any particular order.

Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. a kind of method for tracking target, which is characterized in that the described method includes:

Using the predefined type target in object detector detection present image, the first object in the present image is obtained Frame；

The second target frame in the previous frame image of the present image is obtained, predicts that the second target frame exists using fallout predictor Position in the present image obtains prediction block of the second target frame in the present image；

First object frame in the present image is matched with the prediction block, obtain the first object frame with it is described The matching result of prediction block；

According to the matching result of the first object frame and the prediction block, the position of more fresh target in the present image.

2. the method as described in claim 1, which is characterized in that the object detector is to speed up region convolutional neural networks mould Type, quickening region convolutional neural networks model includes that network and fast area convolutional neural networks are suggested in region, described to add Fast region convolutional neural networks model follows the steps below training before the predefined type target in detection described image:

First training step is suggested network using region described in Imagenet model initialization, is assembled for training using the first training sample Practice the region and suggests network；

Second training step suggests that network generates the first training sample using the region after training in first training step This concentrates the candidate frame of each sample image, utilizes the candidate frame training fast area convolutional neural networks；

Third training step, using described in the fast area convolutional neural networks initialization after training in second training step Network is suggested in region, suggests network using first training sample set training region；

4th training step suggests fast area described in netinit using the region after training in the third training step Convolutional neural networks, and the convolutional layer is kept to fix, use first training sample set training fast area convolution Neural network.

3. method according to claim 2, which is characterized in that quickening region convolutional neural networks model uses ZF frame Frame, the region suggest that network and the fast area convolutional neural networks share 5 convolutional layers.

4. the method as described in claim 1, which is characterized in that the fallout predictor is the depth for being built with feature pyramid network Neural network model.

5. the method as described in claim 1, which is characterized in that predict the second target frame in institute using fallout predictor described The position in present image is stated, obtains the second target frame before the prediction block in the present image, the method is also Include:

The fallout predictor is trained using the second training sample set, second training sample set includes different illumination, shape Become the sample image with high-speed moving object.

6. the method as described in claim 1, which is characterized in that the first object frame by the present image with it is described Prediction block carries out matching

The overlapping area ratio for calculating the first object frame and the prediction block, it is each according to the overlapping area ratio-dependent To the matched first object frame and the prediction block；Or

The first object frame is calculated at a distance from the central point of the prediction block, determines that every a pair is matched according to the distance The first object frame and the prediction block.

7. the method as described in claim 1, which is characterized in that according to the first object frame and the prediction block With as a result, the position of more fresh target includes: in the present image

If the first object frame is matched with the prediction block, by the position of the first object frame in the present image As the position after the corresponding target update of the prediction block；

If the first object frame and any prediction block mismatch, by the first object frame in the present image Position of the position as new target；

If the prediction block and any first object frame mismatch, in the present image that the prediction block is corresponding Target as lost target.

8. a kind of target tracker, which is characterized in that described device includes:

Detection module, for obtaining the present image using the predefined type target in object detector detection present image In first object frame；

Prediction module, the second target frame in the previous frame image for obtaining the present image utilize fallout predictor to predict institute Position of the second target frame in the present image is stated, prediction of the second target frame in the present image is obtained Frame；

Matching module obtains described for matching the first object frame in the present image with the prediction block The matching result of one target frame and the prediction block；

Update module, for the matching result according to the first object frame and the prediction block, in the present image more The position of fresh target.

9. a kind of computer installation, it is characterised in that: the computer installation includes processor, and the processor is deposited for executing The computer program stored in reservoir is to realize the method for tracking target as described in any one of claim 1-7.

10. a kind of computer storage medium, computer program is stored in the computer storage medium, it is characterised in that: institute It states and realizes the method for tracking target as described in any one of claim 1-7 when computer program is executed by processor.