CN111612002A - Multi-target object motion tracking method based on neural network - Google Patents
Multi-target object motion tracking method based on neural network Download PDFInfo
- Publication number
- CN111612002A CN111612002A CN202010501800.7A CN202010501800A CN111612002A CN 111612002 A CN111612002 A CN 111612002A CN 202010501800 A CN202010501800 A CN 202010501800A CN 111612002 A CN111612002 A CN 111612002A
- Authority
- CN
- China
- Prior art keywords
- frame
- target object
- target
- neural network
- predicted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 16
- 238000012795 verification Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000000717 retained effect Effects 0.000 claims abstract description 3
- 238000012360 testing method Methods 0.000 claims abstract description 3
- 238000004364 calculation method Methods 0.000 claims description 25
- 238000001514 detection method Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 11
- 238000013135 deep learning Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012896 Statistical algorithm Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the steps of dividing an image into a plurality of grids; predicting a plurality of boundary frames for each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; multiplying the predicted category of each grid with the confidence value of each bounding box to obtain a result value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for objects at the same center, and detecting a plurality of target objects overlapped together; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; arranging the label data and the corresponding image data into a training set and a verification set, setting a neural network configuration parameter of the YOLO, taking the arranged training set as input of training of a YOLO model, and then taking the verification set as input of testing of the YOLO model; and tracking the motion of the multi-target object through the trained YOLO model.
Description
Technical Field
The invention relates to a computer video identification technology, in particular to a multi-target object motion tracking method based on a neural network.
Background
The existing statistical methods of target objects comprise RFID and laser line scanning counting, the RFID has higher requirements on external packages of packages, each external package needs to be embedded with a chip, and the cost is high. The limitation of laser line sweep count is that it is missed when the label of the stacked objects, and the outer packaging of the target object, is facing down.
Disclosure of Invention
The invention aims to provide a multi-target object motion tracking method based on a neural network, which is used for solving the problems in the prior art.
The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: designing and training a YOLO neural network model, comprising: dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target; predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers; multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together; performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; arranging, selecting and labeling the acquired image data, marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, arranging the upper left corner coordinate and the lower right corner coordinate into label data required by the YOLO model training, arranging the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the arranged training set as the input of the YOLO model training, and then taking the verification set as the input of the YOLO model testing; carrying out YOLO model training; and tracking the motion of the multi-target object through the trained YOLO model.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the tracking of the multi-target object motion through the trained YOLO model includes: measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references; establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object; and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the calculating of the confidence value includes:
wherein Pr (object) represents the confidence level, and if there is a detection target object falling in the grid, Pr (object) takes 1, otherwise takes 0,is the intersection ratio between the prediction box and the actual object labeling box.
In an embodiment of the method for tracking motion of a multi-target object based on a neural network, the step of multiplying the plurality of predicted classes of the mesh by the confidence value comprises:
Pr(Classii object) is category information of each mesh prediction, which indicates the probability that a detection target belongs to a certain class under the condition that the mesh contains the detection target,is the confidence value.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the situation that the central points of two objects appear in the same grid is processed by setting a preselection frame, and 5 to 9 preselection frames with different length-width ratios and different areas are set for the object in the same center.
According to one embodiment of the multi-target object motion tracking method based on the neural network, the pre-selection frames are clustered by using K-means, the length and the width of the pre-selection frames are obtained, and each pre-selection frame has different area sizes and different length-width ratios.
According to an embodiment of the multi-target object motion tracking method based on the neural network, in the training process of the YOLO model, a difference between a prediction frame output by the model and a trained object actual frame is compared, and a target loss function is set, so that the position of a center point of a predicted boundary frame is subjected to loss calculation as follows:
wherein x isi,yiIs the center of the actual frame of the objectThe coordinates of the point location,is the position coordinate of the center point of the prediction frame, λcoordIs a constant value of the weight, and,is a coefficient constant, when the central point of the predicted boundary frame is in the ith grid, if the image area in which the predicted boundary frame is located contains the target object, the coefficient constant is setThe value is 1, otherwise the value is 0;
the calculation of the width and height loss of the prediction bounding box comprises the following steps:
wherein, wiAnd hiIs the length and width of the actual frame of the object,andis the length and width of the prediction box, λcoordAn image region representing the prediction bounding box does not contain any object;
and performing loss calculation on the category of the prediction bounding box:
wherein p isi(c) Is the actual probability of an object of a certain type,is the probability of a prediction box belonging to a certain category;
the calculation of the confidence loss of the prediction bounding box comprises the following steps:
wherein, CiIs the score of the degree of confidence that the user is,the intersection of the bounding box with the actual object box is predicted,andtaking the opposite value;
calculating the total loss function includes:
8. the method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the reducing the target loss function by the weight adjusting algorithm comprises:
θk=θk-1-Vk;
wherein the content of the first and second substances,is the gradient of the objective loss function, η is the learning rate, θiIs the weight of a connection, gamma is the magnitude of the momentum, ViIs an intermediate variable.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the reducing the target loss function through the weight adjusting algorithm includes:
θk=θk-1-Vk;
wherein the content of the first and second substances,is the gradient of the objective loss function, η is the learning rate, θiIs the weight of a connection, gamma is the magnitude of the momentum, ViIs an intermediate variable.
According to an embodiment of the multi-target object motion tracking method based on the neural network, after the YOLO model is trained for a period of time, the effect of identifying the verification set is output, wherein the effect comprises identification precision and recall rate, whether the YOLO model meets requirements or not is judged according to the result output by the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the identification precision and the recall rate output by the model finally meet the requirements.
According to an embodiment of the method for tracking the motion of the multi-target object based on the neural network, during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.
The multi-target object motion tracking method based on the neural network realizes the tracking and statistics of multi-target objects by utilizing an industrial camera, mainly comprises a multi-target object detection part and a multi-target object tracking part, performs multi-target object detection and identification on each frame of image based on the processing of a video stream, and realizes the tracking and statistics by utilizing the motion information of the multi-target objects. Through a YOLO deep learning algorithm, a general model is modified, the model for the target detection is trained, and then a moving object tracking and de-weighting algorithm is combined, so that the on-line real-time tracking of the multi-target object motion is realized aiming at the problem that a deep learning method is applied to the multi-target object motion tracking.
Drawings
FIG. 1 is a schematic diagram of a multi-target object system engineering document;
FIG. 2 is a graph of collected data;
FIG. 3 is a schematic diagram of a txt file;
FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python;
FIG. 5 is a schematic view of an initial model configuration file;
FIG. 6 is a schematic diagram of the setup of the anchor box;
FIG. 7 is a diagram illustrating parameter settings and network connection settings;
FIG. 8 is a diagram of the effect of model identification;
FIG. 9 is a flow chart of a multi-target object motion tracking method based on a neural network.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
The invention relates to a multi-target object motion tracking method based on a neural network, which mainly comprises the following steps: and detecting and identifying the target object by utilizing a deep learning network. And returning target motion information by using the encoder, and tracking the target by combining the detection result. The adopted deep learning network is a learning framework of YOLO, and converts the problem of Object Detection into a Regression problem. Given an input image, a bounding box of the target and its classification categories are regressed directly over multiple locations of the image. The whole image training model is selected, the target area and the background area can be better distinguished, one image can be input through the YOLO, and the final result including the name of the frame and the object in the frame and score can be directly output. After the identification result and the position information of the target object are obtained, the number of the target objects is counted by combining the motion information of the object, so that the same object cannot be counted twice.
The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: establishing a target detection deep learning network model:
the first step is as follows: setting a YOLO neural network model:
YOLO first divides an image into SxS grids (grid cells), and if the center of a detected object falls within the grid, the grid is responsible for predicting the detected object, wherein the center of the grid does not necessarily coincide with the center of the object.
2. B bounding boxes are predicted for each grid cell, and the predicted B bounding boxes frame the same object. Each bounding box is accompanied by a confidence value in addition to the location information (x, y, w, h) of the bounding box itself. Wherein x and y represent the relative values of the center of the predicted bounding box and the grid boundary, w and h represent the ratio of the width/height (width and height) of the predicted bounding box relative to the width and height of the whole image, confidence is determined by the confidence of the detected object contained in the predicted bounding box and the quasi-accuracy of the prediction of the bounding box, and the confidence value is calculated as follows:
wherein Pr (object) represents the confidence, and if there is a detected object falling in the grid cell, Pr (object) takes 1, otherwise takes 0.Is the intersection ratio (degree of overlap) between the predicted prediction box and the actual object labeling box. Where noted in the calculation: the coordinates (x, y) in the bounding box information are relative values of the center of the predicted bounding box and the grid boundary, and need to be normalized to 0-1, and the length and width (w, h) is the ratio of the width and height of the predicted bounding box to the length and width of the whole image, and needs to be normalized to 0-1.
3. Each grid cell (grid) also predicts C category information. Then S × S grids, each of which predicts B bounding boxes and C classes, and when constructing the neural network to be connected, the output layer of the local neural network is a tensor of S × S (5 × B + C).
4. Multiplying the predicted category information of each grid cell (mesh) with the confidence of each bounding box:
pr (Class) in left side of equationiI object) is the category information of each mesh prediction, which represents the probability that an object belongs to a certain class under the condition that gridcell contains the object.That is, the predicted confidence value of each bounding box. The result of the multiplication obtains a result value confidence score of which class each bounding box belongs to, and the result value confidence score represents the probability that the predicted bounding box belongs to a certain class and also has information of the accuracy of the bounding box. A threshold is then set to filter out the bounding box where the confidence score value is relatively low (value less than 0.5).
5. And carrying out NMS algorithm processing on the reserved prediction boundary box to obtain a unique prediction reference box of a certain class of objects. In the result of prediction, a plurality of predicted boundary frames may be predicted for the same target object, and since the remaining predicted boundary frames have the category information and the confidence score, the highest confidence score may be taken as the predicted reference frame for the boundary frame of the same category, and the predicted boundary frame with a large overlap area with the predicted reference frame may be removed.
6. The concept of adding an anchor box (pre-selection box) to the network is to deal with the situation that the center points of two objects appear in the same grid, and thus one grid needs to predict two objects. The anchor box is generally a preselection box set artificially, and 5 or 9 preselection boxes with different length-width ratios and different areas can be set for the object at the same center. The anchor box is equivalent to taking different windows for the same central point, so as to detect a plurality of target objects overlapped together. In YOLO, the calculation of the anchor box requires clustering with K-means, which results in the length and width of the anchor box, and the center coordinates of the anchor box do not need to be calculated, each anchor box has different area size and different aspect ratio.
7. In the process of training the model, the difference between the prediction frame output by the model and the actual frame of the trained object needs to be compared, so that an objective loss function is set. Since the information carried by the prediction box includes the coordinates of the center point, the width and height of the box and the probability of an object belonging to a certain category, there is also the confidence of the prediction. A loss calculation is made for the predicted center coordinates:
in the above formula, xi,yiIs the coordinates of the center of the actual frame of the object,is the central coordinate of the prediction box, λcoord,Is a constant coefficient. And (3) performing loss calculation on the width and height of the predicted bounding box:
in the above formula, wi,hiIs the length and width of the actual frame of the object,is the length and width of the prediction box, λcoord,Is a constant coefficient. Lossy to category of prediction bounding boxAnd (3) loss calculation:
in the above formula, pi(c) Is the actual probability of an object of a certain type,is the probability of a prediction box belonging to a certain class,is a constant coefficient. And (3) performing loss calculation on the confidence of the predicted bounding box:
in the above formula, CiIs the score of the degree of confidence that the user is,the intersection of the bounding box and the actual object box is predicted.
The total loss function is the sum of:
8. the adjustment strategy of the weight value adopts mini Batch SGD and momentiurn, namely an algorithm of adding momentum by using small Batch random gradient. Through the weight adjustment algorithm, the model is adjusted in the direction of the target loss function generally, and the process fluctuates. The specific calculation formula is as follows:
θk=θk-1-Vk;
in the above formula, the first and second carbon atoms are,is the gradient of the objective loss function, η is the learning rate, θiIs a certain connection weight, gamma is the magnitude of momentum, ViIs an intermediate variable.
Acquiring image data for model training: in the case of continuous photographing, a large amount of image data is acquired, and images containing various kinds of target objects are acquired based on the image data acquired in the application scene.
The YOLO model training includes:
FIG. 3 is a schematic diagram of a txt file; FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python; sorting, selecting and labeling the acquired image data, and marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, wherein the txt file is shown in FIG. 3. The txt file data is collated into an xml file containing the target object using python, as shown in fig. 4. And finally, arranging the label data into label data required by the YOLO model training, and arranging the label data and the corresponding image data into a training set and a verification set.
Downloading a source code and an initial configuration file of the YOLO general model, deploying an environment on a computer according to the analysis, and changing a data input/output interface of the general model so as to facilitate the generation of the target detection model. The initial configuration file mainly comprises cfg and weights files, wherein the cfg is a network structure configuration and parameter file, and the weights files are weight files connected with a network. In the model training process, the program actually relies on the cfg file designed manually by the program, and the weights file is continuously updated. FIG. 5 is a schematic diagram of an initial model configuration file.
Setting the neural network configuration parameters (number of network layers and connection and candidate boxes, etc.) of YOLO, where the values of length and width of the anchor box (preselected box) are generated using K-means, and further manually setting the network layers (convolutional layers, pooling layers, etc.) and level connections, etc. (written from a special cfg file for configuration of model parameters), fig. 6 is a setting diagram of the anchor box; FIG. 7 is a diagram illustrating parameter settings and network connection settings; as shown in fig. 6 and 7. And then importing the generated training set and the verification set to train the YOLO model.
After a period of training, the YOLO model outputs the effect of identifying the verification set, including the identification accuracy and recall rate. And judging whether the YOLO model meets the requirements of adjustment and retraining according to the result output by the model, returning to the fifth step if the YOLO model does not meet the requirements, modifying the configuration parameters (the number of network layers, the connection, the pre-selection frame and the like) of the neural network, and continuously optimizing the YOLO model until the recognition accuracy, the recall rate and the like output by the model meet the requirements finally, wherein fig. 8 is a model recognition effect graph, as shown in fig. 8. Finally, the interface of the encapsulation model is enabled to be called by the target object motion tracking function module.
Fig. 9 is a flowchart of a multi-target object motion tracking method based on a neural network, and as shown in fig. 9, the performing of multi-target object motion tracking based on a neural network by using a trained YOLO model includes:
after a good deep learning model for target detection is obtained, the technology needs to be applied to multi-target object tracking, namely, the same object in pictures of continuous frames needs to be tracked. The target object in the image is likely to have a defect, the target object is tracked, the shielding and stacking conditions exist between the objects, the tracking difficulty is a bit large, and the step of tracking the target object is designed:
the first step is as follows: the tracking process needs to incorporate the motion information of the object, requiring the installation of an encoder. And measuring the motion speed of the object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame by taking the position of the target object of the previous frame and the identification frame as references.
The second step is that: establishing a corresponding relation between multi-target objects appearing successively in time, calling a target detection model interface of this time to obtain position information and an identification frame of the multi-target object of the current frame, solving the corresponding relation between the identification frame calculated in the first step and the identification frame obtained in the second step according to the calculation result of the first step, and when the corresponding relation appears, indicating that the object is not a reappeared object but an object appearing in the previous frame, and not counting repeatedly.
The third step: in the continuous counting process, the image photographed at a certain position is taken as an initial frame, then a certain position of the target object out of the visual field range of the camera is taken as an end frame, and the previous two steps are repeated.
Building a complete counting software framework, integrating a neural network identification module, a target object number counting module, a display module, a camera module and the like; hardware environments (intel i5 processor, 4G running memory, GTX1050 and above version display card 100G storage space, 14 inch display) camera, bracket, network cable, encoder and the like are prepared. Software environment win10, development software VS, QT.
The key of the multi-target moving object detection and tracking system is the identification and tracking of moving target objects. For a detection module, under the same precision condition, the YOLOv3 model has the characteristic of rapidness compared with Fast-R-CNN, the improved YOLOv3 small network architecture is simpler, more convenient and quicker, 78ms is needed for processing one 448 x 448 image under the condition of having a GPU, the speed and the accuracy need to be balanced for the change of the network structure, and finally, the appropriate network structure is determined through continuous adjustment and optimization.
The invention discloses a multi-target object motion tracking method based on a neural network, which can effectively solve the problem that the same parcel is not repeatedly calculated by searching a corresponding relation by using motion information of an object in detail. The algorithm considers the situation that the identification frame of the same target object has large change and tracking frame drop, calculates the corresponding relation, uses the intersection area of the new identification frame and the old identification frame to occupy the proportion of the new frame or the old frame, and sets a threshold value to select the corresponding identification frame for updating and replacing.
The method utilizes the open-source YOLO deep learning model to modify the parameters of the model, including modifying the layer number of the network, the preset candidate frame size of the network, the learning rate and the like. And then, according to the input and output interfaces of the model, modifying the interface form and writing the interface form into a function module for system call. And compiling function codes for describing and predicting the state of the object by utilizing the motion information of the multi-target object. And finally, integrating the core function modules of the two parts, adding other function codes such as data processing of the system, control of output display and the like, and forming a complete multi-target object tracking system.
The invention mainly adopts the neural network algorithm for identifying the target object, and has higher accuracy. The deep learning algorithm has high recognition rate, higher fault tolerance rate on complex conditions, and simple and stable tracking statistical algorithm. The system runs stably and is simple to operate. The invention effectively solves the problems of target object detection and counting under complex conditions by combining a deep learning target detection algorithm and a counting duplication-removing algorithm. The system has no requirement on the external package of the target object, and the required cost is low.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A multi-target object motion tracking method based on a neural network is characterized by comprising the following steps:
designing and training a YOLO neural network model, comprising:
dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target;
predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value;
predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers;
multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value;
processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects;
setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together;
performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm;
under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; and
the method comprises the steps of sorting, selecting and labeling collected image data, marking out the coordinates of the upper left corner and the lower right corner of a target object in an image, sorting the coordinates into label data required by the training of a YOLO model, sorting the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the sorted training set as the input of the training of the YOLO model, and then taking the verification set as the input of the test of the YOLO model; carrying out YOLO model training; and
and tracking the motion of the multi-target object through the trained YOLO model.
2. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the tracking the motion of the multi-target object through the trained YOLO model comprises:
measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references;
establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object;
and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.
3. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 1, wherein the calculation of the confidence value comprises:
4. The method of claim 1, wherein multiplying the predicted categories of the mesh by the confidence value comprises:
5. The method for tracking the motion of multiple target objects based on the neural network as claimed in claim 1, wherein the pre-selection frames are set to deal with the situation that the central points of two objects appear on the same grid, and 5 to 9 pre-selection frames with different length-width ratios and different areas are set for the objects in the same center.
6. The neural network-based multi-target object motion tracking method according to claim 1, wherein the pre-selection boxes are clustered by using K-means to obtain the length and width of the pre-selection boxes, and each pre-selection box has a different area size and a different aspect ratio.
7. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein in the process of training the YOLO model, the difference between the predicted frame output by the model and the actual frame of the trained object is compared, and a target loss function is set, so that the position of the central point of the predicted boundary frame is subjected to loss calculation as follows:
wherein x isi,yiIs the position coordinate of the central point of the actual frame of the object,is the position coordinate of the center point of the prediction frame, λcoordIs a constant value of the weight, and,is a coefficient constant, when the central point of the predicted boundary frame is in the ith grid, if the image area in which the predicted boundary frame is located contains the target object, the coefficient constant is setThe value is 1, otherwise the value is 0;
the calculation of the width and height loss of the prediction bounding box comprises the following steps:
wherein, wiAnd hiIs the length and width of the actual frame of the object,andis the length and width of the prediction box, λcoordAn image region representing the prediction bounding box does not contain any object;
and performing loss calculation on the category of the prediction bounding box:
wherein p isi(c) Is the actual probability of an object of a certain type,is the probability of a prediction box belonging to a certain category;
the calculation of the confidence loss of the prediction bounding box comprises the following steps:
wherein, CiIs the score of the degree of confidence that the user is,the intersection of the bounding box with the actual object box is predicted,andtaking the opposite value;
calculating the total loss function includes:
8. the method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the reducing the target loss function by the weight adjusting algorithm comprises:
θk=θk-1-Vk;
9. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the YOLO model outputs the recognition effect on the verification set after a period of training, the recognition effect comprises recognition accuracy and recall rate, whether the YOLO model meets the requirements or not is judged according to the output result of the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the recognition accuracy and the recall rate output by the last model meet the requirements.
10. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 2, wherein during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010501800.7A CN111612002A (en) | 2020-06-04 | 2020-06-04 | Multi-target object motion tracking method based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010501800.7A CN111612002A (en) | 2020-06-04 | 2020-06-04 | Multi-target object motion tracking method based on neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111612002A true CN111612002A (en) | 2020-09-01 |
Family
ID=72196934
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010501800.7A Pending CN111612002A (en) | 2020-06-04 | 2020-06-04 | Multi-target object motion tracking method based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111612002A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112306104A (en) * | 2020-11-17 | 2021-02-02 | 广西电网有限责任公司 | Image target tracking holder control method based on grid weighting |
CN112329768A (en) * | 2020-10-23 | 2021-02-05 | 上善智城(苏州)信息科技有限公司 | Improved YOLO-based method for identifying fuel-discharging stop sign of gas station |
CN112613564A (en) * | 2020-12-25 | 2021-04-06 | 桂林汉璟智能仪器有限公司 | Target detection post-processing method for eliminating overlapped frames |
CN112784694A (en) * | 2020-12-31 | 2021-05-11 | 杭州电子科技大学 | EVP-YOLO-based indoor article detection method |
CN112926681A (en) * | 2021-03-29 | 2021-06-08 | 复旦大学 | Target detection method and device based on deep convolutional neural network |
CN113283307A (en) * | 2021-04-30 | 2021-08-20 | 北京雷石天地电子技术有限公司 | Method and system for identifying object in video and computer storage medium |
CN113470073A (en) * | 2021-07-06 | 2021-10-01 | 浙江大学 | Animal center tracking method based on deep learning |
CN114022558A (en) * | 2022-01-05 | 2022-02-08 | 深圳思谋信息科技有限公司 | Image positioning method and device, computer equipment and storage medium |
CN114648685A (en) * | 2022-03-23 | 2022-06-21 | 成都臻识科技发展有限公司 | Method and system for converting anchor-free algorithm into anchor-based algorithm |
WO2022162766A1 (en) * | 2021-01-27 | 2022-08-04 | オリンパス株式会社 | Information processing system, endoscope system, information processing method, and annotation data generation method |
CN115410136A (en) * | 2022-11-01 | 2022-11-29 | 济钢防务技术有限公司 | Laser explosive disposal system emergency safety control method based on convolutional neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447033A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Vehicle front obstacle detection method based on YOLO |
CN110059554A (en) * | 2019-03-13 | 2019-07-26 | 重庆邮电大学 | A kind of multiple branch circuit object detection method based on traffic scene |
CN110837762A (en) * | 2018-08-17 | 2020-02-25 | 南京理工大学 | Convolutional neural network pedestrian recognition method based on GoogLeNet |
-
2020
- 2020-06-04 CN CN202010501800.7A patent/CN111612002A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837762A (en) * | 2018-08-17 | 2020-02-25 | 南京理工大学 | Convolutional neural network pedestrian recognition method based on GoogLeNet |
CN109447033A (en) * | 2018-11-14 | 2019-03-08 | 北京信息科技大学 | Vehicle front obstacle detection method based on YOLO |
CN110059554A (en) * | 2019-03-13 | 2019-07-26 | 重庆邮电大学 | A kind of multiple branch circuit object detection method based on traffic scene |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329768A (en) * | 2020-10-23 | 2021-02-05 | 上善智城(苏州)信息科技有限公司 | Improved YOLO-based method for identifying fuel-discharging stop sign of gas station |
CN112306104A (en) * | 2020-11-17 | 2021-02-02 | 广西电网有限责任公司 | Image target tracking holder control method based on grid weighting |
CN112613564A (en) * | 2020-12-25 | 2021-04-06 | 桂林汉璟智能仪器有限公司 | Target detection post-processing method for eliminating overlapped frames |
CN112784694A (en) * | 2020-12-31 | 2021-05-11 | 杭州电子科技大学 | EVP-YOLO-based indoor article detection method |
WO2022162766A1 (en) * | 2021-01-27 | 2022-08-04 | オリンパス株式会社 | Information processing system, endoscope system, information processing method, and annotation data generation method |
CN112926681A (en) * | 2021-03-29 | 2021-06-08 | 复旦大学 | Target detection method and device based on deep convolutional neural network |
CN112926681B (en) * | 2021-03-29 | 2022-11-29 | 复旦大学 | Target detection method and device based on deep convolutional neural network |
CN113283307A (en) * | 2021-04-30 | 2021-08-20 | 北京雷石天地电子技术有限公司 | Method and system for identifying object in video and computer storage medium |
CN113470073A (en) * | 2021-07-06 | 2021-10-01 | 浙江大学 | Animal center tracking method based on deep learning |
CN114022558A (en) * | 2022-01-05 | 2022-02-08 | 深圳思谋信息科技有限公司 | Image positioning method and device, computer equipment and storage medium |
CN114648685A (en) * | 2022-03-23 | 2022-06-21 | 成都臻识科技发展有限公司 | Method and system for converting anchor-free algorithm into anchor-based algorithm |
CN115410136A (en) * | 2022-11-01 | 2022-11-29 | 济钢防务技术有限公司 | Laser explosive disposal system emergency safety control method based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111612002A (en) | Multi-target object motion tracking method based on neural network | |
CN111062413B (en) | Road target detection method and device, electronic equipment and storage medium | |
CN109978893B (en) | Training method, device, equipment and storage medium of image semantic segmentation network | |
CN109784293B (en) | Multi-class target object detection method and device, electronic equipment and storage medium | |
CN105574550A (en) | Vehicle identification method and device | |
CN110765865B (en) | Underwater target detection method based on improved YOLO algorithm | |
CN111160469A (en) | Active learning method of target detection system | |
CN111368636A (en) | Object classification method and device, computer equipment and storage medium | |
US20230137337A1 (en) | Enhanced machine learning model for joint detection and multi person pose estimation | |
CN114821102A (en) | Intensive citrus quantity detection method, equipment, storage medium and device | |
CN108133235A (en) | A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure | |
CN110084284A (en) | Target detection and secondary classification algorithm and device based on region convolutional neural networks | |
CN110070106A (en) | Smog detection method, device and electronic equipment | |
CN111353440A (en) | Target detection method | |
CN112785557A (en) | Belt material flow detection method and device and belt material flow detection system | |
CN110490058B (en) | Training method, device and system of pedestrian detection model and computer readable medium | |
CN110414544B (en) | Target state classification method, device and system | |
CN112241736A (en) | Text detection method and device | |
CN113192017A (en) | Package defect identification method, device, equipment and storage medium | |
CN111666872A (en) | Efficient behavior identification method under data imbalance | |
Klausner et al. | Distributed multilevel data fusion for networked embedded systems | |
CN113887455B (en) | Face mask detection system and method based on improved FCOS | |
CN113496501B (en) | Method and system for detecting invader in dynamic scene based on video prediction | |
CN115171011A (en) | Multi-class building material video counting method and system and counting equipment | |
CN112001388B (en) | Method for detecting circular target in PCB based on YOLOv3 improved model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200901 |
|
RJ01 | Rejection of invention patent application after publication |