CN111612002A

CN111612002A - Multi-target object motion tracking method based on neural network

Info

Publication number: CN111612002A
Application number: CN202010501800.7A
Authority: CN
Inventors: 刘哲; 黄博才; 刘少君; 罗建坤
Original assignee: Guangzhou Qiezhi Intelligent Technology Co ltd
Current assignee: Guangzhou Qiezhi Intelligent Technology Co ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2020-09-01

Abstract

The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the steps of dividing an image into a plurality of grids; predicting a plurality of boundary frames for each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; multiplying the predicted category of each grid with the confidence value of each bounding box to obtain a result value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for objects at the same center, and detecting a plurality of target objects overlapped together; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; arranging the label data and the corresponding image data into a training set and a verification set, setting a neural network configuration parameter of the YOLO, taking the arranged training set as input of training of a YOLO model, and then taking the verification set as input of testing of the YOLO model; and tracking the motion of the multi-target object through the trained YOLO model.

Description

Multi-target object motion tracking method based on neural network

Technical Field

The invention relates to a computer video identification technology, in particular to a multi-target object motion tracking method based on a neural network.

Background

The existing statistical methods of target objects comprise RFID and laser line scanning counting, the RFID has higher requirements on external packages of packages, each external package needs to be embedded with a chip, and the cost is high. The limitation of laser line sweep count is that it is missed when the label of the stacked objects, and the outer packaging of the target object, is facing down.

Disclosure of Invention

The invention aims to provide a multi-target object motion tracking method based on a neural network, which is used for solving the problems in the prior art.

The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: designing and training a YOLO neural network model, comprising: dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target; predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value; predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers; multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value; processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects; setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together; performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm; under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; arranging, selecting and labeling the acquired image data, marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, arranging the upper left corner coordinate and the lower right corner coordinate into label data required by the YOLO model training, arranging the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the arranged training set as the input of the YOLO model training, and then taking the verification set as the input of the YOLO model testing; carrying out YOLO model training; and tracking the motion of the multi-target object through the trained YOLO model.

According to an embodiment of the multi-target object motion tracking method based on the neural network, the tracking of the multi-target object motion through the trained YOLO model includes: measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references; establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object; and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.

According to an embodiment of the multi-target object motion tracking method based on the neural network, the calculating of the confidence value includes:

wherein Pr (object) represents the confidence level, and if there is a detection target object falling in the grid, Pr (object) takes 1, otherwise takes 0,

is the intersection ratio between the prediction box and the actual object labeling box.

In an embodiment of the method for tracking motion of a multi-target object based on a neural network, the step of multiplying the plurality of predicted classes of the mesh by the confidence value comprises:

Pr(Class_ii object) is category information of each mesh prediction, which indicates the probability that a detection target belongs to a certain class under the condition that the mesh contains the detection target,

is the confidence value.

According to an embodiment of the multi-target object motion tracking method based on the neural network, the situation that the central points of two objects appear in the same grid is processed by setting a preselection frame, and 5 to 9 preselection frames with different length-width ratios and different areas are set for the object in the same center.

According to one embodiment of the multi-target object motion tracking method based on the neural network, the pre-selection frames are clustered by using K-means, the length and the width of the pre-selection frames are obtained, and each pre-selection frame has different area sizes and different length-width ratios.

According to an embodiment of the multi-target object motion tracking method based on the neural network, in the training process of the YOLO model, a difference between a prediction frame output by the model and a trained object actual frame is compared, and a target loss function is set, so that the position of a center point of a predicted boundary frame is subjected to loss calculation as follows:

wherein x is_i，y_iIs the center of the actual frame of the objectThe coordinates of the point location,

is the position coordinate of the center point of the prediction frame, λ_coordIs a constant value of the weight, and,

is a coefficient constant, when the central point of the predicted boundary frame is in the ith grid, if the image area in which the predicted boundary frame is located contains the target object, the coefficient constant is set

The value is 1, otherwise the value is 0;

the calculation of the width and height loss of the prediction bounding box comprises the following steps:

wherein, w_iAnd h_iIs the length and width of the actual frame of the object,

and

is the length and width of the prediction box, λ_coordAn image region representing the prediction bounding box does not contain any object;

and performing loss calculation on the category of the prediction bounding box:

wherein p is_i(c) Is the actual probability of an object of a certain type,

is the probability of a prediction box belonging to a certain category;

the calculation of the confidence loss of the prediction bounding box comprises the following steps:

wherein, C_iIs the score of the degree of confidence that the user is,

the intersection of the bounding box with the actual object box is predicted,

and

taking the opposite value;

calculating the total loss function includes:

8. the method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the reducing the target loss function by the weight adjusting algorithm comprises:

θ_k＝θ_k-1-V_k；

wherein the content of the first and second substances,

is the gradient of the objective loss function, η is the learning rate, θ_iIs the weight of a connection, gamma is the magnitude of the momentum, V_iIs an intermediate variable.

According to an embodiment of the multi-target object motion tracking method based on the neural network, the reducing the target loss function through the weight adjusting algorithm includes:

θ_k＝θ_k-1-V_k；

wherein the content of the first and second substances,

According to an embodiment of the multi-target object motion tracking method based on the neural network, after the YOLO model is trained for a period of time, the effect of identifying the verification set is output, wherein the effect comprises identification precision and recall rate, whether the YOLO model meets requirements or not is judged according to the result output by the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the identification precision and the recall rate output by the model finally meet the requirements.

According to an embodiment of the method for tracking the motion of the multi-target object based on the neural network, during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.

The multi-target object motion tracking method based on the neural network realizes the tracking and statistics of multi-target objects by utilizing an industrial camera, mainly comprises a multi-target object detection part and a multi-target object tracking part, performs multi-target object detection and identification on each frame of image based on the processing of a video stream, and realizes the tracking and statistics by utilizing the motion information of the multi-target objects. Through a YOLO deep learning algorithm, a general model is modified, the model for the target detection is trained, and then a moving object tracking and de-weighting algorithm is combined, so that the on-line real-time tracking of the multi-target object motion is realized aiming at the problem that a deep learning method is applied to the multi-target object motion tracking.

Drawings

FIG. 1 is a schematic diagram of a multi-target object system engineering document;

FIG. 2 is a graph of collected data;

FIG. 3 is a schematic diagram of a txt file;

FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python;

FIG. 5 is a schematic view of an initial model configuration file;

FIG. 6 is a schematic diagram of the setup of the anchor box;

FIG. 7 is a diagram illustrating parameter settings and network connection settings;

FIG. 8 is a diagram of the effect of model identification;

FIG. 9 is a flow chart of a multi-target object motion tracking method based on a neural network.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

The invention relates to a multi-target object motion tracking method based on a neural network, which mainly comprises the following steps: and detecting and identifying the target object by utilizing a deep learning network. And returning target motion information by using the encoder, and tracking the target by combining the detection result. The adopted deep learning network is a learning framework of YOLO, and converts the problem of Object Detection into a Regression problem. Given an input image, a bounding box of the target and its classification categories are regressed directly over multiple locations of the image. The whole image training model is selected, the target area and the background area can be better distinguished, one image can be input through the YOLO, and the final result including the name of the frame and the object in the frame and score can be directly output. After the identification result and the position information of the target object are obtained, the number of the target objects is counted by combining the motion information of the object, so that the same object cannot be counted twice.

The invention relates to a multi-target object motion tracking method based on a neural network, which comprises the following steps: establishing a target detection deep learning network model:

the first step is as follows: setting a YOLO neural network model:

YOLO first divides an image into SxS grids (grid cells), and if the center of a detected object falls within the grid, the grid is responsible for predicting the detected object, wherein the center of the grid does not necessarily coincide with the center of the object.

2. B bounding boxes are predicted for each grid cell, and the predicted B bounding boxes frame the same object. Each bounding box is accompanied by a confidence value in addition to the location information (x, y, w, h) of the bounding box itself. Wherein x and y represent the relative values of the center of the predicted bounding box and the grid boundary, w and h represent the ratio of the width/height (width and height) of the predicted bounding box relative to the width and height of the whole image, confidence is determined by the confidence of the detected object contained in the predicted bounding box and the quasi-accuracy of the prediction of the bounding box, and the confidence value is calculated as follows:

wherein Pr (object) represents the confidence, and if there is a detected object falling in the grid cell, Pr (object) takes 1, otherwise takes 0.

Is the intersection ratio (degree of overlap) between the predicted prediction box and the actual object labeling box. Where noted in the calculation: the coordinates (x, y) in the bounding box information are relative values of the center of the predicted bounding box and the grid boundary, and need to be normalized to 0-1, and the length and width (w, h) is the ratio of the width and height of the predicted bounding box to the length and width of the whole image, and needs to be normalized to 0-1.

3. Each grid cell (grid) also predicts C category information. Then S × S grids, each of which predicts B bounding boxes and C classes, and when constructing the neural network to be connected, the output layer of the local neural network is a tensor of S × S (5 × B + C).

4. Multiplying the predicted category information of each grid cell (mesh) with the confidence of each bounding box:

pr (Class) in left side of equation_iI object) is the category information of each mesh prediction, which represents the probability that an object belongs to a certain class under the condition that gridcell contains the object.

That is, the predicted confidence value of each bounding box. The result of the multiplication obtains a result value confidence score of which class each bounding box belongs to, and the result value confidence score represents the probability that the predicted bounding box belongs to a certain class and also has information of the accuracy of the bounding box. A threshold is then set to filter out the bounding box where the confidence score value is relatively low (value less than 0.5).

5. And carrying out NMS algorithm processing on the reserved prediction boundary box to obtain a unique prediction reference box of a certain class of objects. In the result of prediction, a plurality of predicted boundary frames may be predicted for the same target object, and since the remaining predicted boundary frames have the category information and the confidence score, the highest confidence score may be taken as the predicted reference frame for the boundary frame of the same category, and the predicted boundary frame with a large overlap area with the predicted reference frame may be removed.

6. The concept of adding an anchor box (pre-selection box) to the network is to deal with the situation that the center points of two objects appear in the same grid, and thus one grid needs to predict two objects. The anchor box is generally a preselection box set artificially, and 5 or 9 preselection boxes with different length-width ratios and different areas can be set for the object at the same center. The anchor box is equivalent to taking different windows for the same central point, so as to detect a plurality of target objects overlapped together. In YOLO, the calculation of the anchor box requires clustering with K-means, which results in the length and width of the anchor box, and the center coordinates of the anchor box do not need to be calculated, each anchor box has different area size and different aspect ratio.

7. In the process of training the model, the difference between the prediction frame output by the model and the actual frame of the trained object needs to be compared, so that an objective loss function is set. Since the information carried by the prediction box includes the coordinates of the center point, the width and height of the box and the probability of an object belonging to a certain category, there is also the confidence of the prediction. A loss calculation is made for the predicted center coordinates:

in the above formula, x_i，y_iIs the coordinates of the center of the actual frame of the object,

is the central coordinate of the prediction box, λ_coord，

Is a constant coefficient. And (3) performing loss calculation on the width and height of the predicted bounding box:

in the above formula, w_i，h_iIs the length and width of the actual frame of the object,

is the length and width of the prediction box, λ_coord，

Is a constant coefficient. Lossy to category of prediction bounding boxAnd (3) loss calculation:

in the above formula, p_i(c) Is the actual probability of an object of a certain type,

is the probability of a prediction box belonging to a certain class,

is a constant coefficient. And (3) performing loss calculation on the confidence of the predicted bounding box:

in the above formula, C_iIs the score of the degree of confidence that the user is,

the intersection of the bounding box and the actual object box is predicted.

The total loss function is the sum of:

8. the adjustment strategy of the weight value adopts mini Batch SGD and momentiurn, namely an algorithm of adding momentum by using small Batch random gradient. Through the weight adjustment algorithm, the model is adjusted in the direction of the target loss function generally, and the process fluctuates. The specific calculation formula is as follows:

θ_k＝θ_k-1-V_k；

in the above formula, the first and second carbon atoms are,

is the gradient of the objective loss function, η is the learning rate, θ_iIs a certain connection weight, gamma is the magnitude of momentum, V_iIs an intermediate variable.

Acquiring image data for model training: in the case of continuous photographing, a large amount of image data is acquired, and images containing various kinds of target objects are acquired based on the image data acquired in the application scene.

The YOLO model training includes:

FIG. 3 is a schematic diagram of a txt file; FIG. 4 is a schematic diagram of a txt file data being collated into an xml file containing a target object using python; sorting, selecting and labeling the acquired image data, and marking out the upper left corner coordinate and the lower right corner coordinate of the target object in the image, wherein the txt file is shown in FIG. 3. The txt file data is collated into an xml file containing the target object using python, as shown in fig. 4. And finally, arranging the label data into label data required by the YOLO model training, and arranging the label data and the corresponding image data into a training set and a verification set.

Downloading a source code and an initial configuration file of the YOLO general model, deploying an environment on a computer according to the analysis, and changing a data input/output interface of the general model so as to facilitate the generation of the target detection model. The initial configuration file mainly comprises cfg and weights files, wherein the cfg is a network structure configuration and parameter file, and the weights files are weight files connected with a network. In the model training process, the program actually relies on the cfg file designed manually by the program, and the weights file is continuously updated. FIG. 5 is a schematic diagram of an initial model configuration file.

Setting the neural network configuration parameters (number of network layers and connection and candidate boxes, etc.) of YOLO, where the values of length and width of the anchor box (preselected box) are generated using K-means, and further manually setting the network layers (convolutional layers, pooling layers, etc.) and level connections, etc. (written from a special cfg file for configuration of model parameters), fig. 6 is a setting diagram of the anchor box; FIG. 7 is a diagram illustrating parameter settings and network connection settings; as shown in fig. 6 and 7. And then importing the generated training set and the verification set to train the YOLO model.

After a period of training, the YOLO model outputs the effect of identifying the verification set, including the identification accuracy and recall rate. And judging whether the YOLO model meets the requirements of adjustment and retraining according to the result output by the model, returning to the fifth step if the YOLO model does not meet the requirements, modifying the configuration parameters (the number of network layers, the connection, the pre-selection frame and the like) of the neural network, and continuously optimizing the YOLO model until the recognition accuracy, the recall rate and the like output by the model meet the requirements finally, wherein fig. 8 is a model recognition effect graph, as shown in fig. 8. Finally, the interface of the encapsulation model is enabled to be called by the target object motion tracking function module.

Fig. 9 is a flowchart of a multi-target object motion tracking method based on a neural network, and as shown in fig. 9, the performing of multi-target object motion tracking based on a neural network by using a trained YOLO model includes:

after a good deep learning model for target detection is obtained, the technology needs to be applied to multi-target object tracking, namely, the same object in pictures of continuous frames needs to be tracked. The target object in the image is likely to have a defect, the target object is tracked, the shielding and stacking conditions exist between the objects, the tracking difficulty is a bit large, and the step of tracking the target object is designed:

the first step is as follows: the tracking process needs to incorporate the motion information of the object, requiring the installation of an encoder. And measuring the motion speed of the object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame by taking the position of the target object of the previous frame and the identification frame as references.

The second step is that: establishing a corresponding relation between multi-target objects appearing successively in time, calling a target detection model interface of this time to obtain position information and an identification frame of the multi-target object of the current frame, solving the corresponding relation between the identification frame calculated in the first step and the identification frame obtained in the second step according to the calculation result of the first step, and when the corresponding relation appears, indicating that the object is not a reappeared object but an object appearing in the previous frame, and not counting repeatedly.

The third step: in the continuous counting process, the image photographed at a certain position is taken as an initial frame, then a certain position of the target object out of the visual field range of the camera is taken as an end frame, and the previous two steps are repeated.

Building a complete counting software framework, integrating a neural network identification module, a target object number counting module, a display module, a camera module and the like; hardware environments (intel i5 processor, 4G running memory, GTX1050 and above version display card 100G storage space, 14 inch display) camera, bracket, network cable, encoder and the like are prepared. Software environment win10, development software VS, QT.

The key of the multi-target moving object detection and tracking system is the identification and tracking of moving target objects. For a detection module, under the same precision condition, the YOLOv3 model has the characteristic of rapidness compared with Fast-R-CNN, the improved YOLOv3 small network architecture is simpler, more convenient and quicker, 78ms is needed for processing one 448 x 448 image under the condition of having a GPU, the speed and the accuracy need to be balanced for the change of the network structure, and finally, the appropriate network structure is determined through continuous adjustment and optimization.

The invention discloses a multi-target object motion tracking method based on a neural network, which can effectively solve the problem that the same parcel is not repeatedly calculated by searching a corresponding relation by using motion information of an object in detail. The algorithm considers the situation that the identification frame of the same target object has large change and tracking frame drop, calculates the corresponding relation, uses the intersection area of the new identification frame and the old identification frame to occupy the proportion of the new frame or the old frame, and sets a threshold value to select the corresponding identification frame for updating and replacing.

The method utilizes the open-source YOLO deep learning model to modify the parameters of the model, including modifying the layer number of the network, the preset candidate frame size of the network, the learning rate and the like. And then, according to the input and output interfaces of the model, modifying the interface form and writing the interface form into a function module for system call. And compiling function codes for describing and predicting the state of the object by utilizing the motion information of the multi-target object. And finally, integrating the core function modules of the two parts, adding other function codes such as data processing of the system, control of output display and the like, and forming a complete multi-target object tracking system.

The invention mainly adopts the neural network algorithm for identifying the target object, and has higher accuracy. The deep learning algorithm has high recognition rate, higher fault tolerance rate on complex conditions, and simple and stable tracking statistical algorithm. The system runs stably and is simple to operate. The invention effectively solves the problems of target object detection and counting under complex conditions by combining a deep learning target detection algorithm and a counting duplication-removing algorithm. The system has no requirement on the external package of the target object, and the required cost is low.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A multi-target object motion tracking method based on a neural network is characterized by comprising the following steps:

designing and training a YOLO neural network model, comprising:

dividing an image into a plurality of grids, and if the center of a certain detection target is in one grid, the grid is responsible for predicting the corresponding detection target;

predicting a plurality of boundary frames by each grid, wherein the predicted boundary frames frame the same object, and each boundary frame regresses the position information of the boundary frame and predicts a confidence value;

predicting a plurality of categories by each grid, and calculating the tensor of the output layer of the local neural network by a plurality of grid numbers, a plurality of bounding boxes and a plurality of category numbers;

multiplying the plurality of predicted categories of the grid by the confidence value, obtaining a result value by the multiplication result, wherein the result value represents the probability that the predicted bounding box belongs to a certain category, and filtering the bounding box with the result value lower than the threshold value through the threshold value;

processing the retained bounding box to obtain a unique prediction reference frame of a certain class of objects;

setting a plurality of different preselection frames for the object at the same center, and taking different windows for the same center to detect a plurality of target objects overlapped together;

performing loss calculation on the central point position of the predicted boundary frame, and performing loss calculation on the width and the height of the predicted boundary frame; performing loss calculation on the category of the prediction bounding box; performing loss calculation on the confidence coefficient of the predicted bounding box; calculating a total loss function, and reducing a target loss function through a weight value adjusting algorithm;

under the condition of continuous photographing, acquiring a large amount of image data, wherein the image comprises various types of target objects; and

the method comprises the steps of sorting, selecting and labeling collected image data, marking out the coordinates of the upper left corner and the lower right corner of a target object in an image, sorting the coordinates into label data required by the training of a YOLO model, sorting the label data and the corresponding image data into a training set and a verification set, setting the neural network configuration parameters of the YOLO, taking the sorted training set as the input of the training of the YOLO model, and then taking the verification set as the input of the test of the YOLO model; carrying out YOLO model training; and

and tracking the motion of the multi-target object through the trained YOLO model.

2. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the tracking the motion of the multi-target object through the trained YOLO model comprises:

measuring the motion speed of an object by using an encoder, and calculating the position information and the identification frame of the target object of the current frame of the single target object by taking the position of the target object of the previous frame and the identification frame as references;

establishing a corresponding relation between multi-target objects appearing in time in sequence, and calling a YOLO model to obtain position information and an identification frame of the multi-target object of the current frame of the multi-target object;

and (4) solving the corresponding relation between the target objects in the two adjacent frames of images, and when the corresponding relation matched with each other appears, the object is the same target object in the two adjacent frames of images, and the repeated counting is not carried out.

3. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 1, wherein the calculation of the confidence value comprises:

4. The method of claim 1, wherein multiplying the predicted categories of the mesh by the confidence value comprises:

is the confidence value.

5. The method for tracking the motion of multiple target objects based on the neural network as claimed in claim 1, wherein the pre-selection frames are set to deal with the situation that the central points of two objects appear on the same grid, and 5 to 9 pre-selection frames with different length-width ratios and different areas are set for the objects in the same center.

6. The neural network-based multi-target object motion tracking method according to claim 1, wherein the pre-selection boxes are clustered by using K-means to obtain the length and width of the pre-selection boxes, and each pre-selection box has a different area size and a different aspect ratio.

7. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein in the process of training the YOLO model, the difference between the predicted frame output by the model and the actual frame of the trained object is compared, and a target loss function is set, so that the position of the central point of the predicted boundary frame is subjected to loss calculation as follows:

wherein x is_i，y_iIs the position coordinate of the central point of the actual frame of the object,

The value is 1, otherwise the value is 0;

wherein, w_iAnd h_iIs the length and width of the actual frame of the object,

and

and performing loss calculation on the category of the prediction bounding box:

wherein p is_i(c) Is the actual probability of an object of a certain type,

is the probability of a prediction box belonging to a certain category;

wherein, C_iIs the score of the degree of confidence that the user is,

the intersection of the bounding box with the actual object box is predicted,

and

taking the opposite value;

calculating the total loss function includes:

θ_k＝θ_k-1-V_k；

wherein the content of the first and second substances,

9. The method for tracking the motion of the multi-target object based on the neural network as claimed in claim 1, wherein the YOLO model outputs the recognition effect on the verification set after a period of training, the recognition effect comprises recognition accuracy and recall rate, whether the YOLO model meets the requirements or not is judged according to the output result of the model, if the YOLO model does not meet the requirements, the configuration parameters of the neural network are modified, and the YOLO model is optimized again until the recognition accuracy and the recall rate output by the last model meet the requirements.

10. The method for tracking the motion of a multi-target object based on the neural network as claimed in claim 2, wherein during the counting process, a frame in which the target object of a certain continuous frame image appears is taken as a tracking start frame, a frame in which the target object of the continuous frame image disappears is taken as an end frame, and the corresponding relation of the target object is solved for the images of the adjacent frames between the start frame and the end frame, so as to track the target object which newly appears in the certain continuous frame image.