CN111612002A - Multi-target object motion tracking method based on neural network - Google Patents

Multi-target object motion tracking method based on neural network Download PDF

Info

Publication number
CN111612002A
CN111612002A CN202010501800.7A CN202010501800A CN111612002A CN 111612002 A CN111612002 A CN 111612002A CN 202010501800 A CN202010501800 A CN 202010501800A CN 111612002 A CN111612002 A CN 111612002A
Authority
CN
China
Prior art keywords
frame
target object
target
neural network
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010501800.7A
Other languages
Chinese (zh)
Inventor
刘哲
黄博才
刘少君
罗建坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Qiezhi Intelligent Technology Co ltd
Original Assignee
Guangzhou Qiezhi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Qiezhi Intelligent Technology Co ltd filed Critical Guangzhou Qiezhi Intelligent Technology Co ltd
Priority to CN202010501800.7A priority Critical patent/CN111612002A/en
Publication of CN111612002A publication Critical patent/CN111612002A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the steps of dividing an image into a plurality of grids; predicting a plurality of boundary frames for each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; multiplying the predicted category of each grid with the confidence value of each bounding box to obtain a result value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for objects at the same center, and detecting a plurality of target objects overlapped together; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; arranging the label data and the corresponding image data into a training set and a verification set, setting a neural network configuration parameter of the YOLO, taking the arranged training set as input of training of a YOLO model, and then taking the verification set as input of testing of the YOLO model; and tracking the motion of the multi-target object through the trained YOLO model.

Description

Multi-target object motion tracking method based on neural network
Technical Field
The invention relates to a computer video identification technology, in particular to a multi-target object motion tracking method based on a neural network.
Background
The existing statistical methods of target objects comprise RFID and laser line scanning counting, the RFID has higher requirements on external packages of packages, each external package needs to be embedded with a chip, and the cost is high. The limitation of laser line sweep count is that it is missed when the label of the stacked objects, and the outer packaging of the target object, is facing down.
Disclosure of Invention
The invention aims to provide a multi-target object motion tracking method based on a neural network, which is used for solving the problems in the prior art.
The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: designing and training a YOLO neural network model, comprising: dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target; predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers; multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together; performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; arranging, selecting and labeling the acquired image data, marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, arranging the upper left corner coordinate and the lower right corner coordinate into label data required by the YOLO model training, arranging the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the arranged training set as the input of the YOLO model training, and then taking the verification set as the input of the YOLO model testing; carrying out YOLO model training; and tracking the motion of the multi-target object through the trained YOLO model.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the tracking of the multi-target object motion through the trained YOLO model includes: measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references; establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object; and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the calculating of the confidence value includes:
Figure BDA0002525001530000021
wherein Pr (object) represents the confidence level, and if there is a detection target object falling in the grid, Pr (object) takes 1, otherwise takes 0,
Figure BDA0002525001530000022
is the intersection ratio between the prediction box and the actual object labeling box.
In an embodiment of the method for tracking motion of a multi-target object based on a neural network, the step of multiplying the plurality of predicted classes of the mesh by the confidence value comprises:
Figure BDA0002525001530000031
Pr(Classii object) is category information of each mesh prediction, which indicates the probability that a detection target belongs to a certain class under the condition that the mesh contains the detection target,
Figure BDA0002525001530000032
is the confidence value.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the situation that the central points of two objects appear in the same grid is processed by setting a preselection frame, and 5 to 9 preselection frames with different length-width ratios and different areas are set for the object in the same center.
According to one embodiment of the multi-target object motion tracking method based on the neural network, the pre-selection frames are clustered by using K-means, the length and the width of the pre-selection frames are obtained, and each pre-selection frame has different area sizes and different length-width ratios.
According to an embodiment of the multi-target object motion tracking method based on the neural network, in the training process of the YOLO model, a difference between a prediction frame output by the model and a trained object actual frame is compared, and a target loss function is set, so that the position of a center point of a predicted boundary frame is subjected to loss calculation as follows:
Figure BDA0002525001530000033
wherein x isi,yiIs the center of the actual frame of the objectThe coordinates of the point location,
Figure BDA0002525001530000034
is the position coordinate of the center point of the prediction frame, λcoordIs a constant value of the weight, and,
Figure BDA0002525001530000035
is a coefficient constant, when the central point of the predicted boundary frame is in the ith grid, if the image area in which the predicted boundary frame is located contains the target object, the coefficient constant is set
Figure BDA0002525001530000041
The value is 1, otherwise the value is 0;
the calculation of the width and height loss of the prediction bounding box comprises the following steps:
Figure BDA0002525001530000042
wherein, wiAnd hiIs the length and width of the actual frame of the object,
Figure BDA0002525001530000043
and
Figure BDA0002525001530000044
is the length and width of the prediction box, λcoordAn image region representing the prediction bounding box does not contain any object;
and performing loss calculation on the category of the prediction bounding box:
Figure BDA0002525001530000045
wherein p isi(c) Is the actual probability of an object of a certain type,
Figure BDA0002525001530000046
is the probability of a prediction box belonging to a certain category;
the calculation of the confidence loss of the prediction bounding box comprises the following steps:
Figure BDA0002525001530000047
wherein, CiIs the score of the degree of confidence that the user is,
Figure BDA0002525001530000048
the intersection of the bounding box with the actual object box is predicted,
Figure BDA0002525001530000049
and
Figure BDA00025250015300000410
taking the opposite value;
calculating the total loss function includes:
Figure BDA00025250015300000411
8. the method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the reducing the target loss function by the weight adjusting algorithm comprises:
Figure BDA0002525001530000051
Figure BDA0002525001530000052
θk=θk-1-Vk
wherein the content of the first and second substances,
Figure BDA0002525001530000056
is the gradient of the objective loss function, η is the learning rate, θiIs the weight of a connection, gamma is the magnitude of the momentum, ViIs an intermediate variable.
According to an embodiment of the multi-target object motion tracking method based on the neural network, the reducing the target loss function through the weight adjusting algorithm includes:
Figure BDA0002525001530000053
Figure BDA0002525001530000054
θk=θk-1-Vk
wherein the content of the first and second substances,
Figure BDA0002525001530000055
is the gradient of the objective loss function, η is the learning rate, θiIs the weight of a connection, gamma is the magnitude of the momentum, ViIs an intermediate variable.
According to an embodiment of the multi-target object motion tracking method based on the neural network, after the YOLO model is trained for a period of time, the effect of identifying the verification set is output, wherein the effect comprises identification precision and recall rate, whether the YOLO model meets requirements or not is judged according to the result output by the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the identification precision and the recall rate output by the model finally meet the requirements.
According to an embodiment of the method for tracking the motion of the multi-target object based on the neural network, during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.
The multi-target object motion tracking method based on the neural network realizes the tracking and statistics of multi-target objects by utilizing an industrial camera, mainly comprises a multi-target object detection part and a multi-target object tracking part, performs multi-target object detection and identification on each frame of image based on the processing of a video stream, and realizes the tracking and statistics by utilizing the motion information of the multi-target objects. Through a YOLO deep learning algorithm, a general model is modified, the model for the target detection is trained, and then a moving object tracking and de-weighting algorithm is combined, so that the on-line real-time tracking of the multi-target object motion is realized aiming at the problem that a deep learning method is applied to the multi-target object motion tracking.
Drawings
FIG. 1 is a schematic diagram of a multi-target object system engineering document;
FIG. 2 is a graph of collected data;
FIG. 3 is a schematic diagram of a txt file;
FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python;
FIG. 5 is a schematic view of an initial model configuration file;
FIG. 6 is a schematic diagram of the setup of the anchor box;
FIG. 7 is a diagram illustrating parameter settings and network connection settings;
FIG. 8 is a diagram of the effect of model identification;
FIG. 9 is a flow chart of a multi-target object motion tracking method based on a neural network.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
The invention relates to a multi-target object motion tracking method based on a neural network, which mainly comprises the following steps: and detecting and identifying the target object by utilizing a deep learning network. And returning target motion information by using the encoder, and tracking the target by combining the detection result. The adopted deep learning network is a learning framework of YOLO, and converts the problem of Object Detection into a Regression problem. Given an input image, a bounding box of the target and its classification categories are regressed directly over multiple locations of the image. The whole image training model is selected, the target area and the background area can be better distinguished, one image can be input through the YOLO, and the final result including the name of the frame and the object in the frame and score can be directly output. After the identification result and the position information of the target object are obtained, the number of the target objects is counted by combining the motion information of the object, so that the same object cannot be counted twice.
The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: establishing a target detection deep learning network model:
the first step is as follows: setting a YOLO neural network model:
YOLO first divides an image into SxS grids (grid cells), and if the center of a detected object falls within the grid, the grid is responsible for predicting the detected object, wherein the center of the grid does not necessarily coincide with the center of the object.
2. B bounding boxes are predicted for each grid cell, and the predicted B bounding boxes frame the same object. Each bounding box is accompanied by a confidence value in addition to the location information (x, y, w, h) of the bounding box itself. Wherein x and y represent the relative values of the center of the predicted bounding box and the grid boundary, w and h represent the ratio of the width/height (width and height) of the predicted bounding box relative to the width and height of the whole image, confidence is determined by the confidence of the detected object contained in the predicted bounding box and the quasi-accuracy of the prediction of the bounding box, and the confidence value is calculated as follows:
Figure BDA0002525001530000071
wherein Pr (object) represents the confidence, and if there is a detected object falling in the grid cell, Pr (object) takes 1, otherwise takes 0.
Figure BDA0002525001530000072
Is the intersection ratio (degree of overlap) between the predicted prediction box and the actual object labeling box. Where noted in the calculation: the coordinates (x, y) in the bounding box information are relative values of the center of the predicted bounding box and the grid boundary, and need to be normalized to 0-1, and the length and width (w, h) is the ratio of the width and height of the predicted bounding box to the length and width of the whole image, and needs to be normalized to 0-1.
3. Each grid cell (grid) also predicts C category information. Then S × S grids, each of which predicts B bounding boxes and C classes, and when constructing the neural network to be connected, the output layer of the local neural network is a tensor of S × S (5 × B + C).
4. Multiplying the predicted category information of each grid cell (mesh) with the confidence of each bounding box:
Figure BDA0002525001530000081
pr (Class) in left side of equationiI object) is the category information of each mesh prediction, which represents the probability that an object belongs to a certain class under the condition that gridcell contains the object.
Figure DEST_PATH_FDA0002525001520000032
That is, the predicted confidence value of each bounding box. The result of the multiplication obtains a result value confidence score of which class each bounding box belongs to, and the result value confidence score represents the probability that the predicted bounding box belongs to a certain class and also has information of the accuracy of the bounding box. A threshold is then set to filter out the bounding box where the confidence score value is relatively low (value less than 0.5).
5. And carrying out NMS algorithm processing on the reserved prediction boundary box to obtain a unique prediction reference box of a certain class of objects. In the result of prediction, a plurality of predicted boundary frames may be predicted for the same target object, and since the remaining predicted boundary frames have the category information and the confidence score, the highest confidence score may be taken as the predicted reference frame for the boundary frame of the same category, and the predicted boundary frame with a large overlap area with the predicted reference frame may be removed.
6. The concept of adding an anchor box (pre-selection box) to the network is to deal with the situation that the center points of two objects appear in the same grid, and thus one grid needs to predict two objects. The anchor box is generally a preselection box set artificially, and 5 or 9 preselection boxes with different length-width ratios and different areas can be set for the object at the same center. The anchor box is equivalent to taking different windows for the same central point, so as to detect a plurality of target objects overlapped together. In YOLO, the calculation of the anchor box requires clustering with K-means, which results in the length and width of the anchor box, and the center coordinates of the anchor box do not need to be calculated, each anchor box has different area size and different aspect ratio.
7. In the process of training the model, the difference between the prediction frame output by the model and the actual frame of the trained object needs to be compared, so that an objective loss function is set. Since the information carried by the prediction box includes the coordinates of the center point, the width and height of the box and the probability of an object belonging to a certain category, there is also the confidence of the prediction. A loss calculation is made for the predicted center coordinates:
Figure BDA0002525001530000091
in the above formula, xi,yiIs the coordinates of the center of the actual frame of the object,
Figure BDA0002525001530000092
is the central coordinate of the prediction box, λcoord
Figure BDA0002525001530000093
Is a constant coefficient. And (3) performing loss calculation on the width and height of the predicted bounding box:
Figure BDA0002525001530000094
in the above formula, wi,hiIs the length and width of the actual frame of the object,
Figure BDA0002525001530000095
is the length and width of the prediction box, λcoord
Figure BDA0002525001530000096
Is a constant coefficient. Lossy to category of prediction bounding boxAnd (3) loss calculation:
Figure BDA0002525001530000097
in the above formula, pi(c) Is the actual probability of an object of a certain type,
Figure BDA0002525001530000098
is the probability of a prediction box belonging to a certain class,
Figure BDA0002525001530000099
is a constant coefficient. And (3) performing loss calculation on the confidence of the predicted bounding box:
Figure BDA00025250015300000910
in the above formula, CiIs the score of the degree of confidence that the user is,
Figure BDA0002525001530000101
the intersection of the bounding box and the actual object box is predicted.
The total loss function is the sum of:
Figure BDA0002525001530000102
8. the adjustment strategy of the weight value adopts mini Batch SGD and momentiurn, namely an algorithm of adding momentum by using small Batch random gradient. Through the weight adjustment algorithm, the model is adjusted in the direction of the target loss function generally, and the process fluctuates. The specific calculation formula is as follows:
Figure BDA0002525001530000103
Figure BDA0002525001530000104
θk=θk-1-Vk
in the above formula, the first and second carbon atoms are,
Figure BDA0002525001530000105
is the gradient of the objective loss function, η is the learning rate, θiIs a certain connection weight, gamma is the magnitude of momentum, ViIs an intermediate variable.
Acquiring image data for model training: in the case of continuous photographing, a large amount of image data is acquired, and images containing various kinds of target objects are acquired based on the image data acquired in the application scene.
The YOLO model training includes:
FIG. 3 is a schematic diagram of a txt file; FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python; sorting, selecting and labeling the acquired image data, and marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, wherein the txt file is shown in FIG. 3. The txt file data is collated into an xml file containing the target object using python, as shown in fig. 4. And finally, arranging the label data into label data required by the YOLO model training, and arranging the label data and the corresponding image data into a training set and a verification set.
Downloading a source code and an initial configuration file of the YOLO general model, deploying an environment on a computer according to the analysis, and changing a data input/output interface of the general model so as to facilitate the generation of the target detection model. The initial configuration file mainly comprises cfg and weights files, wherein the cfg is a network structure configuration and parameter file, and the weights files are weight files connected with a network. In the model training process, the program actually relies on the cfg file designed manually by the program, and the weights file is continuously updated. FIG. 5 is a schematic diagram of an initial model configuration file.
Setting the neural network configuration parameters (number of network layers and connection and candidate boxes, etc.) of YOLO, where the values of length and width of the anchor box (preselected box) are generated using K-means, and further manually setting the network layers (convolutional layers, pooling layers, etc.) and level connections, etc. (written from a special cfg file for configuration of model parameters), fig. 6 is a setting diagram of the anchor box; FIG. 7 is a diagram illustrating parameter settings and network connection settings; as shown in fig. 6 and 7. And then importing the generated training set and the verification set to train the YOLO model.
After a period of training, the YOLO model outputs the effect of identifying the verification set, including the identification accuracy and recall rate. And judging whether the YOLO model meets the requirements of adjustment and retraining according to the result output by the model, returning to the fifth step if the YOLO model does not meet the requirements, modifying the configuration parameters (the number of network layers, the connection, the pre-selection frame and the like) of the neural network, and continuously optimizing the YOLO model until the recognition accuracy, the recall rate and the like output by the model meet the requirements finally, wherein fig. 8 is a model recognition effect graph, as shown in fig. 8. Finally, the interface of the encapsulation model is enabled to be called by the target object motion tracking function module.
Fig. 9 is a flowchart of a multi-target object motion tracking method based on a neural network, and as shown in fig. 9, the performing of multi-target object motion tracking based on a neural network by using a trained YOLO model includes:
after a good deep learning model for target detection is obtained, the technology needs to be applied to multi-target object tracking, namely, the same object in pictures of continuous frames needs to be tracked. The target object in the image is likely to have a defect, the target object is tracked, the shielding and stacking conditions exist between the objects, the tracking difficulty is a bit large, and the step of tracking the target object is designed:
the first step is as follows: the tracking process needs to incorporate the motion information of the object, requiring the installation of an encoder. And measuring the motion speed of the object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame by taking the position of the target object of the previous frame and the identification frame as references.
The second step is that: establishing a corresponding relation between multi-target objects appearing successively in time, calling a target detection model interface of this time to obtain position information and an identification frame of the multi-target object of the current frame, solving the corresponding relation between the identification frame calculated in the first step and the identification frame obtained in the second step according to the calculation result of the first step, and when the corresponding relation appears, indicating that the object is not a reappeared object but an object appearing in the previous frame, and not counting repeatedly.
The third step: in the continuous counting process, the image photographed at a certain position is taken as an initial frame, then a certain position of the target object out of the visual field range of the camera is taken as an end frame, and the previous two steps are repeated.
Building a complete counting software framework, integrating a neural network identification module, a target object number counting module, a display module, a camera module and the like; hardware environments (intel i5 processor, 4G running memory, GTX1050 and above version display card 100G storage space, 14 inch display) camera, bracket, network cable, encoder and the like are prepared. Software environment win10, development software VS, QT.
The key of the multi-target moving object detection and tracking system is the identification and tracking of moving target objects. For a detection module, under the same precision condition, the YOLOv3 model has the characteristic of rapidness compared with Fast-R-CNN, the improved YOLOv3 small network architecture is simpler, more convenient and quicker, 78ms is needed for processing one 448 x 448 image under the condition of having a GPU, the speed and the accuracy need to be balanced for the change of the network structure, and finally, the appropriate network structure is determined through continuous adjustment and optimization.
The invention discloses a multi-target object motion tracking method based on a neural network, which can effectively solve the problem that the same parcel is not repeatedly calculated by searching a corresponding relation by using motion information of an object in detail. The algorithm considers the situation that the identification frame of the same target object has large change and tracking frame drop, calculates the corresponding relation, uses the intersection area of the new identification frame and the old identification frame to occupy the proportion of the new frame or the old frame, and sets a threshold value to select the corresponding identification frame for updating and replacing.
The method utilizes the open-source YOLO deep learning model to modify the parameters of the model, including modifying the layer number of the network, the preset candidate frame size of the network, the learning rate and the like. And then, according to the input and output interfaces of the model, modifying the interface form and writing the interface form into a function module for system call. And compiling function codes for describing and predicting the state of the object by utilizing the motion information of the multi-target object. And finally, integrating the core function modules of the two parts, adding other function codes such as data processing of the system, control of output display and the like, and forming a complete multi-target object tracking system.
The invention mainly adopts the neural network algorithm for identifying the target object, and has higher accuracy. The deep learning algorithm has high recognition rate, higher fault tolerance rate on complex conditions, and simple and stable tracking statistical algorithm. The system runs stably and is simple to operate. The invention effectively solves the problems of target object detection and counting under complex conditions by combining a deep learning target detection algorithm and a counting duplication-removing algorithm. The system has no requirement on the external package of the target object, and the required cost is low.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A multi-target object motion tracking method based on a neural network is characterized by comprising the following steps:
designing and training a YOLO neural network model, comprising:
dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target;
predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value;
predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers;
multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value;
processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects;
setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together;
performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm;
under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; and
the method comprises the steps of sorting, selecting and labeling collected image data, marking out the coordinates of the upper left corner and the lower right corner of a target object in an image, sorting the coordinates into label data required by the training of a YOLO model, sorting the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the sorted training set as the input of the training of the YOLO model, and then taking the verification set as the input of the test of the YOLO model; carrying out YOLO model training; and
and tracking the motion of the multi-target object through the trained YOLO model.
2. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the tracking the motion of the multi-target object through the trained YOLO model comprises:
measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references;
establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object;
and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.
3. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 1, wherein the calculation of the confidence value comprises:
Figure FDA0002525001520000021
wherein Pr (object) represents the confidence level, and if there is a detection target object falling in the grid, Pr (object) takes 1, otherwise takes 0,
Figure FDA0002525001520000022
is the intersection ratio between the prediction box and the actual object labeling box.
4. The method of claim 1, wherein multiplying the predicted categories of the mesh by the confidence value comprises:
Figure FDA0002525001520000031
Pr(Classii object) is category information of each mesh prediction, which indicates the probability that a detection target belongs to a certain class under the condition that the mesh contains the detection target,
Figure FDA0002525001520000032
is the confidence value.
5. The method for tracking the motion of multiple target objects based on the neural network as claimed in claim 1, wherein the pre-selection frames are set to deal with the situation that the central points of two objects appear on the same grid, and 5 to 9 pre-selection frames with different length-width ratios and different areas are set for the objects in the same center.
6. The neural network-based multi-target object motion tracking method according to claim 1, wherein the pre-selection boxes are clustered by using K-means to obtain the length and width of the pre-selection boxes, and each pre-selection box has a different area size and a different aspect ratio.
7. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein in the process of training the YOLO model, the difference between the predicted frame output by the model and the actual frame of the trained object is compared, and a target loss function is set, so that the position of the central point of the predicted boundary frame is subjected to loss calculation as follows:
Figure FDA0002525001520000033
wherein x isi,yiIs the position coordinate of the central point of the actual frame of the object,
Figure FDA0002525001520000034
is the position coordinate of the center point of the prediction frame, λcoordIs a constant value of the weight, and,
Figure FDA0002525001520000035
is a coefficient constant, when the central point of the predicted boundary frame is in the ith grid, if the image area in which the predicted boundary frame is located contains the target object, the coefficient constant is set
Figure FDA0002525001520000036
The value is 1, otherwise the value is 0;
the calculation of the width and height loss of the prediction bounding box comprises the following steps:
Figure FDA0002525001520000041
wherein, wiAnd hiIs the length and width of the actual frame of the object,
Figure FDA0002525001520000042
and
Figure FDA0002525001520000043
is the length and width of the prediction box, λcoordAn image region representing the prediction bounding box does not contain any object;
and performing loss calculation on the category of the prediction bounding box:
Figure FDA0002525001520000044
wherein p isi(c) Is the actual probability of an object of a certain type,
Figure FDA0002525001520000045
is the probability of a prediction box belonging to a certain category;
the calculation of the confidence loss of the prediction bounding box comprises the following steps:
wherein, CiIs the score of the degree of confidence that the user is,
Figure FDA0002525001520000047
the intersection of the bounding box with the actual object box is predicted,
Figure FDA0002525001520000048
and
Figure FDA0002525001520000049
taking the opposite value;
calculating the total loss function includes:
Figure FDA00025250015200000410
8. the method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the reducing the target loss function by the weight adjusting algorithm comprises:
Figure FDA00025250015200000411
Figure FDA00025250015200000412
θk=θk-1-Vk
wherein the content of the first and second substances,
Figure FDA0002525001520000051
is the gradient of the objective loss function, η is the learning rate, θiIs the weight of a connection, gamma is the magnitude of the momentum, ViIs an intermediate variable.
9. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the YOLO model outputs the recognition effect on the verification set after a period of training, the recognition effect comprises recognition accuracy and recall rate, whether the YOLO model meets the requirements or not is judged according to the output result of the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the recognition accuracy and the recall rate output by the last model meet the requirements.
10. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 2, wherein during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.
CN202010501800.7A 2020-06-04 2020-06-04 Multi-target object motion tracking method based on neural network Pending CN111612002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010501800.7A CN111612002A (en) 2020-06-04 2020-06-04 Multi-target object motion tracking method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010501800.7A CN111612002A (en) 2020-06-04 2020-06-04 Multi-target object motion tracking method based on neural network

Publications (1)

Publication Number Publication Date
CN111612002A true CN111612002A (en) 2020-09-01

Family

ID=72196934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010501800.7A Pending CN111612002A (en) 2020-06-04 2020-06-04 Multi-target object motion tracking method based on neural network

Country Status (1)

Country Link
CN (1) CN111612002A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306104A (en) * 2020-11-17 2021-02-02 广西电网有限责任公司 Image target tracking holder control method based on grid weighting
CN112329768A (en) * 2020-10-23 2021-02-05 上善智城(苏州)信息科技有限公司 Improved YOLO-based method for identifying fuel-discharging stop sign of gas station
CN112613564A (en) * 2020-12-25 2021-04-06 桂林汉璟智能仪器有限公司 Target detection post-processing method for eliminating overlapped frames
CN112784694A (en) * 2020-12-31 2021-05-11 杭州电子科技大学 EVP-YOLO-based indoor article detection method
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN113283307A (en) * 2021-04-30 2021-08-20 北京雷石天地电子技术有限公司 Method and system for identifying object in video and computer storage medium
CN113470073A (en) * 2021-07-06 2021-10-01 浙江大学 Animal center tracking method based on deep learning
CN114022558A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Image positioning method and device, computer equipment and storage medium
CN114648685A (en) * 2022-03-23 2022-06-21 成都臻识科技发展有限公司 Method and system for converting anchor-free algorithm into anchor-based algorithm
WO2022162766A1 (en) * 2021-01-27 2022-08-04 オリンパス株式会社 Information processing system, endoscope system, information processing method, and annotation data generation method
CN115410136A (en) * 2022-11-01 2022-11-29 济钢防务技术有限公司 Laser explosive disposal system emergency safety control method based on convolutional neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447033A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Vehicle front obstacle detection method based on YOLO
CN110059554A (en) * 2019-03-13 2019-07-26 重庆邮电大学 A kind of multiple branch circuit object detection method based on traffic scene
CN110837762A (en) * 2018-08-17 2020-02-25 南京理工大学 Convolutional neural network pedestrian recognition method based on GoogLeNet

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837762A (en) * 2018-08-17 2020-02-25 南京理工大学 Convolutional neural network pedestrian recognition method based on GoogLeNet
CN109447033A (en) * 2018-11-14 2019-03-08 北京信息科技大学 Vehicle front obstacle detection method based on YOLO
CN110059554A (en) * 2019-03-13 2019-07-26 重庆邮电大学 A kind of multiple branch circuit object detection method based on traffic scene

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329768A (en) * 2020-10-23 2021-02-05 上善智城(苏州)信息科技有限公司 Improved YOLO-based method for identifying fuel-discharging stop sign of gas station
CN112306104A (en) * 2020-11-17 2021-02-02 广西电网有限责任公司 Image target tracking holder control method based on grid weighting
CN112613564A (en) * 2020-12-25 2021-04-06 桂林汉璟智能仪器有限公司 Target detection post-processing method for eliminating overlapped frames
CN112784694A (en) * 2020-12-31 2021-05-11 杭州电子科技大学 EVP-YOLO-based indoor article detection method
WO2022162766A1 (en) * 2021-01-27 2022-08-04 オリンパス株式会社 Information processing system, endoscope system, information processing method, and annotation data generation method
CN112926681A (en) * 2021-03-29 2021-06-08 复旦大学 Target detection method and device based on deep convolutional neural network
CN112926681B (en) * 2021-03-29 2022-11-29 复旦大学 Target detection method and device based on deep convolutional neural network
CN113283307A (en) * 2021-04-30 2021-08-20 北京雷石天地电子技术有限公司 Method and system for identifying object in video and computer storage medium
CN113470073A (en) * 2021-07-06 2021-10-01 浙江大学 Animal center tracking method based on deep learning
CN114022558A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Image positioning method and device, computer equipment and storage medium
CN114648685A (en) * 2022-03-23 2022-06-21 成都臻识科技发展有限公司 Method and system for converting anchor-free algorithm into anchor-based algorithm
CN115410136A (en) * 2022-11-01 2022-11-29 济钢防务技术有限公司 Laser explosive disposal system emergency safety control method based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN111612002A (en) Multi-target object motion tracking method based on neural network
CN111062413B (en) Road target detection method and device, electronic equipment and storage medium
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN109784293B (en) Multi-class target object detection method and device, electronic equipment and storage medium
CN105574550A (en) Vehicle identification method and device
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN111160469A (en) Active learning method of target detection system
CN111368636A (en) Object classification method and device, computer equipment and storage medium
US20230137337A1 (en) Enhanced machine learning model for joint detection and multi person pose estimation
CN114821102A (en) Intensive citrus quantity detection method, equipment, storage medium and device
CN108133235A (en) A kind of pedestrian detection method based on neural network Analysis On Multi-scale Features figure
CN110084284A (en) Target detection and secondary classification algorithm and device based on region convolutional neural networks
CN110070106A (en) Smog detection method, device and electronic equipment
CN111353440A (en) Target detection method
CN112785557A (en) Belt material flow detection method and device and belt material flow detection system
CN110490058B (en) Training method, device and system of pedestrian detection model and computer readable medium
CN110414544B (en) Target state classification method, device and system
CN112241736A (en) Text detection method and device
CN113192017A (en) Package defect identification method, device, equipment and storage medium
CN111666872A (en) Efficient behavior identification method under data imbalance
Klausner et al. Distributed multilevel data fusion for networked embedded systems
CN113887455B (en) Face mask detection system and method based on improved FCOS
CN113496501B (en) Method and system for detecting invader in dynamic scene based on video prediction
CN115171011A (en) Multi-class building material video counting method and system and counting equipment
CN112001388B (en) Method for detecting circular target in PCB based on YOLOv3 improved model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200901

RJ01 Rejection of invention patent application after publication