CN117953546B

CN117953546B - Pedestrian retrograde judgment method based on multi-target tracking

Info

Publication number: CN117953546B
Application number: CN202410354175.6A
Authority: CN
Inventors: 张鹏; 董克; 李志超; 李末; 王泽灏; 赵威; 李爱华; 肖景洋; 李刚; 吴敏思
Original assignee: Shenyang Elysan Electronic Technology Co ltd
Current assignee: Shenyang Elysan Electronic Technology Co ltd
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-06-11
Anticipated expiration: 2044-03-27
Also published as: CN117953546A

Abstract

A pedestrian retrograde judgment method based on multi-target tracking belongs to the technical field of behavior detection, and comprises the following steps: step 1, constructing a target detection data set; step 2, constructing an improved YOLOv target detection model; step 3, training an improved YOLOv target detection model, step 4, acquiring an image frame Ft (t=1, 2, 3.) from a monitoring video stream, sending the acquired image frame into the target detection model trained in the step 3, and carrying out target detection on the input image frame to obtain a pedestrian target detection result; step 5, tracking a pedestrian target; and 6, judging the retrograde behavior. The invention increases the attention mechanism in YOLOv target detection model, improves the distinguishing capability and positioning accuracy of the model to the target object.

Description

Pedestrian retrograde judgment method based on multi-target tracking

Technical Field

The invention relates to the technical field of behavior detection, in particular to a pedestrian retrograde judgment method based on multi-target tracking.

Background

The subway station is an important component in the urban rail transit system, and a plurality of mutually connected roads are arranged in the subway station, so that smooth, convenient, fast and efficient transfer service can be provided, and the purposes of shortening the journey time and reducing the transfer cost are achieved. In order to ensure the passing efficiency, a subway station is usually provided with a unidirectional passage, an escalator and the like, and passengers need to pass in a specified direction. When the traffic is low, passengers can freely move in the passage, and a relatively wide space is maintained. However, in peak hours or in crowding, the order of the unidirectional channels is of paramount importance. The subway operator often needs to send special people for guaranteeing, so that congestion and even serious collision caused by the reverse traveling of passengers are prevented. In recent years, a subway station is generally provided with a camera monitoring device, through real-time monitoring pictures, intelligent video analysis of pedestrian retrograde detection is carried out on a single-row area, so that safety can be improved, service is optimized, operation scheduling is improved, effective emergency response is provided in an emergency situation, and the measures are helpful for improving the operation efficiency, passenger trip experience and overall safety level of the subway station.

The Chinese patent application publication No. CN115188072A discloses a pedestrian retrograde detection method and system on an escalator, and the method has the following characteristics in use: the first camera is arranged in front of the running direction of the escalator and is matched with the installation inclination angle of the escalator; secondly, marking the escalator region to be an ROI in advance; thirdly, if the YOLO system recognizes that pedestrians exist, the pedestrians are moving pedestrians, the picture resolution is 416×416, and the detection accuracy is relatively low; fourthly, analyzing the tracking video of each pedestrian by an optical flow method to generate a color value diagram, wherein 1/4 area under the frame represents the movement direction of the escalator, and 3/4 area under the frame represents the movement direction of the pedestrian. This way of demarcating the zones also brings about a certain noise effect, is not accurate enough, is sensitive to brightness and occlusion, and is sensitive to areas lacking texture or areas with repeated textures. If no obvious texture change exists in the image area, the movement information of the pixels is difficult to accurately give out in calculation; it is also possible that the same texture may be repeated, resulting in uncertainty in the calculation result.

The invention Chinese patent application publication No. CN111860282A discloses a subway section passenger flow volume statistics and pedestrian retrograde detection method and system, which are characterized in that: firstly, deepSORT is adopted for multi-target tracking, so that the manual shielding and interference are poor in performance, and misjudgment or tracking failure is easy. Is sensitive to changes in the morphology of the target, such as changes in attitude or orientation, and is prone to tracking failure. Secondly, the tripping wire needs to be arranged, and the region which is easy to be densely covered by people needs to be avoided. And judging whether retrograde movement exists according to the crossing behavior of the pedestrian target. The method can only carry out retrograde judgment at the tripwire position, has a narrow detection range, and can not effectively filter erroneous judgment caused by stay and loiter behaviors.

Chinese patent application publication No. CN110532852a discloses a method for detecting pedestrian anomaly event in subway station based on deep learning. The method is characterized in that: firstly, YOLOv and deepsort are adopted for target detection and tracking, and the shielding interference is poor. Secondly, the first 15 frames of data need to be used, and the data are accumulated for more than two times to meet the rule, so that the data are judged to be retrograde, and real-time detection cannot be achieved.

The Chinese patent application publication No. CN113361351A discloses a retrograde judgment method and a retrograde judgment system based on image recognition, which are used for recognizing the face surface of a single frame image, and the situations of lack of association of front frames and rear frames, such as position adjustment on an escalator, front-rear talking and the like, can cause a certain degree of misjudgment.

In summary, the existing pedestrian retrograde detection method still has some defects in practical application, and mainly comprises the following aspects:

1. shielding problem: when a pedestrian is shielded by other objects or pedestrians, the traditional method based on feature extraction and classification is easy to detect errors, and the direction of the pedestrian cannot be accurately inferred.

2. Complex background interference: in a complex background environment, the accuracy of pedestrian retrograde detection may be affected by background interference. For example, the blurring of the differences in color, texture, etc. between pedestrians and the background may result in increased difficulty in detection.

3. Attitude change: the existing method is sensitive to the gesture change of the pedestrians, and the pedestrians have different gestures and actions in the walking process, so that the method is a challenge for a direction detection algorithm.

Disclosure of Invention

In view of the above-mentioned shortcomings and disadvantages of the prior art, the present invention provides a pedestrian retrograde determination method based on multi-objective tracking, which increases the attention mechanism in YOLOv objective detection model, and improves the distinguishing capability and positioning accuracy of the model to the objective object.

In order to achieve the above purpose, the main technical scheme adopted by the invention comprises the following steps:

a pedestrian retrograde judgment method based on multi-target tracking comprises the following steps:

step 1, constructing a target detection data set

Collecting a network character open source dataset and monitoring pedestrian image data in a scene, marking a rectangular frame of a target of a head and a shoulder part by using LabelImg, and dividing the image which is marked and accords with the standard into a training set and a verification set;

step 2, constructing an improved YOLOv target detection model

The improved YOLOv target detection model is based on YOLOv7 network structure, a layer of convolution layer of an attention mechanism module CBAM is added between the 50 th layer and the 51 th layer, the layer is composed of 1×1 standard convolution layer and CBAM, and CBAM is added into the SPPCSPC layer of the 51 th layer of the YOLOv7 network structure;

step 3, training an improved YOLOv target detection model

Training the training set obtained in the step 1 to the target detection model constructed in the step 2 to obtain a trained target detection model;

Step 4, acquiring an image frame F _t (t=1, 2, 3.) from a monitoring video stream, sending the acquired image frame into the target detection model trained in the step 3, and carrying out target detection on the input image frame to obtain a pedestrian target detection result, wherein the pedestrian target detection result comprises position information of a pedestrian marked by a target detection frame and a target detection category, and the target detection category comprises the front surface and the back surface of the pedestrian;

Step 5, pedestrian target tracking

Carrying out multi-target tracking on the pedestrian target detection result obtained in the step 4 by adopting an OC-SORT algorithm;

the step 5 specifically comprises the following steps:

Step 5.1 obtaining predictions of tracker

Center point track set of target detection frame obtained by detecting pedestrian targetCalculating each track in the image frame/>, according to the observation position of the frame before each track, by using a Kalman filterPrediction frame/>；

Step 5.2 extraction of the track setVelocity vector/>For target detection frame/>And prediction frame/>Performing association;

Setting the coordinates of the center points of the latest two continuous frame positions as ，/>Then: /(I)；

The associated cost is as follows:，/> to calculate IoU value and height intersection ratio between prediction box and target detection box,/> WhereinFor predicting the area of the frame,/>For detecting the area of the frame of the target,/>For predicting the height of the frame,/>For detecting the height of a frame of a target,/>For predicting the height of the frame and the target detection frame, reserving/>Results of/(I)To calculate the velocity vector of the track, the angular difference of the velocity vector formed by the previous frame of the observed position of the track and the new detected position,/>Is a weight factor according to the associated cost/>The method comprises the steps of performing linear distribution by adopting a Hungary algorithm, obtaining the optimal matching of a prediction frame and a target detection frame, and returning three types of results: /(I)Matching trajectories,/>Unmatched tracks,An unmatched target detection frame;

Step 5.3 if there is an unmatched target detection frame And unmatched track/>Performing the second round of association to the unmatched track/>Calculating the remaining target detection frame/>And unmatched track/>Last observed position/>Associated costsReservation/>The result of (2) is linearly distributed by adopting a Hungary algorithm to obtain the optimal matching of the prediction frame and the target detection frame, and update/>、/>、；

Step 5.4 track update

For the detection result and the track which are successfully matched, the track is updated by using the detection resultSpeed vector and observation position of a tracker, unmatched trajectory/>1 Is added to the unmatched count of (2);

Step 5.5 creation of a new track

For unmatched detection resultsCreate new tracks and trackers and initialize/>The position is observed for the first frame in the new track.

Step 6, retrograde behavior determination

Traversing all trajectoriesTrack/>, with length of record of screening observation position being more than 3Extraction/>The initial observation position box ₀ and the observation position box _t at the moment t of each track, the displacement variation of the target in the track is calculated aiming at the observation position extracted from one track, and whether the track has retrograde behavior is judged by combining the detection type of the target and the result of the speed vector.

Further, in the step 2, the number of output channels is 1024 in the convolution layers added between the 50 th layer and the 51 th layer, the reduction coefficient reduction=16 of CBAM added between the 50 th layer and the 51 th layer, and the convolution kernel size k=49.

Further, in the step 2, the attention mechanism module CBAM is composed of a channel attention module and a spatial attention module, where the channel attention module includes a full-connection layer and two layers of ReLU activation functions, and the spatial attention module includes a convolution layer and a sigmoid activation function.

Further, in the step 3, the training method is YOLOv model training method, the training parameters are iteration number 400, image size 640, number of batch pictures 32, and learning rate 0.01.

Further, in the step 5, the OC-SORT algorithm predicts and updates the state of the target track by using a multi-target tracker based on a kalman filter, and matches the state with the track direction consistency through HIoU.

Further, if the current moment track is setIf the air is empty, step 5.5 is performed.

Further, step 5.6 is included to delete the track if there is a mismatch count of the track reaching 30.

Further, the judging method for judging whether the track has the retrograde behavior in the step 6 is as follows: the displacement variation is greater than 0.25, the counting ratio of the front category in track record is greater than 0.7, the target detection category in the last observation position is the front, and the speed vector of the trackPresence/>A trajectory satisfying the above condition is determined to be retrograde.

The beneficial effects of the invention are as follows:

1. The invention adopts an improved YOLOv target detection model, increases a target detection algorithm of a attention mechanism, can extract richer image features, and improves the distinguishing capability and positioning precision of the model to a target object.

2. The method adopts the OC-SORT algorithm to track multiple targets, and can improve the tracking capability of targets in crowded and shielding scenes.

3. According to the tracking result, the retrograde judgment can be performed by only 3 frames at least, the real-time requirement can be met, the retrograde behavior of the pedestrian is comprehensively judged by combining the behavior characteristics of the human body, and the judgment result is more accurate.

4. The method has high portability, and can well run for a new scene by only needing a small amount of debugging work.

Detailed Description

The present invention will be described in detail below with reference to specific embodiments for better explaining the present invention.

The invention provides a pedestrian retrograde judgment method based on multi-target tracking, which is applied to a field where whether retrograde behaviors exist on pedestrians is judged, such as: and the subway unidirectional channel is used for carrying out retrograde judgment on the behavior that pedestrians face the camera and approach the camera.

The method comprises the following steps:

step 1, constructing a target detection data set

Collecting an open source data set of a network character and pedestrian image data in a monitoring scene, specifically collecting 5500 images, marking a rectangular frame of a target of a head and a shoulder part by using a picture marking tool LabelImg, and dividing the images which are marked and meet the standard into a training set and a verification set.

Specifically, rectangular box labels fall into three categories: front, back, others, wherein the front includes the complete five sense organs, the back is the complete back, and the rest angles are others.

Step 2, constructing an improved YOLOv target detection model

The improved YOLOv target detection model is based on YOLOv7 network structure, and between the 50 th layer and the 51 th layer, a layer of convolution layers of an attention mechanism module CBAM is added, wherein the layer is composed of 1×1 standard convolution layers and CBAM, specifically, the output channel number is 1024, the reduced-dimension coefficient of cbam=16, and the convolution kernel size k=49. CBAM is added into SPPCSPC layer of the YOLOv network structure 51 th layer; specifically, the reduced-dimension coefficient reduction=16, and the convolution kernel size k=49.

Specifically, the attention mechanism module CBAM is composed of two parts, a channel attention module and a spatial attention module. The channel attention module comprises a full connection layer and two layers of ReLU activation functions, and the spatial attention module comprises a convolution layer and a sigmoid activation function.

Step 3, training an improved YOLOv target detection model

And (3) training the target detection model constructed in the step (2) by the training set obtained in the step (1) to obtain a trained target detection model. Specifically, the training method is YOLOv model training method, the training parameters are iteration times 400, image size 640, batch picture number 32, and learning rate 0.01.

And 4, acquiring an image frame F _t (t=1, 2, 3.) from the monitoring video stream, sending the acquired image frame into the target detection model trained in the step 3, and carrying out target detection on the input image frame to obtain a pedestrian target detection result, wherein the pedestrian target detection result comprises pedestrian position information marked by a target detection frame and a target detection category, and the target detection category comprises the front surface and the back surface of the pedestrian. The model can be continuously detected after training.

Step 5, pedestrian target tracking

And (3) carrying out multi-target tracking on the pedestrian target detection result obtained in the step (4) by adopting an OC-SORT algorithm. The OC-SORT algorithm adopts a multi-target tracker based on a Kalman filter to predict and update the state of a target track, and matches the target track through HIoU and track direction consistency.

Specifically, the step 5 specifically includes the following steps:

Step 5.1 obtaining predictions of tracker

Center point track set of target detection frame obtained by detecting pedestrian targetCalculating each track in the image frame/>, according to the observation position of the frame before each track, by using a Kalman filterPrediction frame/>. Track set/>The method is characterized in that the records of the coordinate positions of the center points of all pedestrians which are successfully matched in each frame of image and the detection category results of the corresponding positions are relatively independent, and unique IDs (i) are allocated for distinguishing different pedestrian targets.

Specifically, the step uses a Kalman filter to approximately represent the displacement of each pedestrian target between frames by using a linear uniform velocity model, and the related parameters are as follows:

Modeling the status of pedestrian targets as Wherein/>The state vector of the target, u and v are respectively the horizontal and vertical coordinates of the center point of the target detection frame, s and r are respectively the area and the length-width ratio of the target detection frame,The time derivatives of u, v, s, respectively.

Observing the vector D, converting the vector D into an initial bounding box obtained by target detection。

The covariance matrix is initially set to；

Process noise covariance；

Measuring noise covariance；

According to the prediction method of Kalman filtering, the state information x _t-1 at the moment t-1 of the track is used to obtain the predicted position of the track at the moment t, namely a predicted frame。

If the current moment trace is setIf the air is empty, step 5.5 is directly performed.

the velocity vector V _t is calculated by using the position change of two consecutive frames in the track, specifically by calculating the difference between the center points of the latest two consecutive frames, and normalizing to obtain the velocity vector. Setting the coordinates of the center points of the latest two continuous frame positions as ，/>

Then:；

The associated cost is as follows: ，/> to calculate IoU value and height intersection ratio between prediction box and target detection box,/> WhereinFor predicting the area of the frame,/>For detecting the area of the frame of the target,/>For predicting the height of the frame,/>For detecting the height of a frame of a target,/>For predicting the height of the frame and the target detection frame, reserving/>Results of/(I)To calculate the velocity vector of the track, the angular difference of the velocity vector formed by the previous frame of the observed position of the track and the new detected position,/>Is a weight factor, in particular,/>. According to the associated cost/>The method comprises the steps of performing linear distribution by adopting a Hungary algorithm, obtaining the optimal matching of a prediction frame and a target detection frame, and returning three types of results: /(I)Matching trajectories,/>Unmatched tracks,/>An unmatched target detection frame;

Step 5.4 track update

Step 5.5 creation of a new track

Step 5.6 if there is a mismatch count of 30 for the track, indicating that the track has not been updated for the set time, the track is deleted.

Step 6, retrograde behavior determination

Traversing all trajectoriesTrack/>, with length of record of screening observation position being more than 3Extraction/>The initial observation position box ₀ and the observation position box _t at the moment t of each track, the displacement variation of the target in the track is calculated aiming at the observation position extracted from one track, and whether the track has retrograde behavior is judged by combining the detection type of the target and the result of the speed vector. Observation position/>Where u is the abscissa of the center point of the target detection frame, v is the ordinate of the center point of the target detection frame, w is the width of the target detection frame, h is the height of the target detection frame, score is the confidence level, and cls is the detection class. In an actual scene, the height of the target detection frame does not greatly fluctuate due to shielding, limb actions and other conditions, and the real size of the target can be well reflected. The following calculation is performed for the observation position extracted from one track:

Calculating displacement variation: Wherein v _t is the ordinate of the center point of the target detection frame at the time t, v ₀ is the ordinate of the center point of the target detection frame at the time of the initial time, h _t is the height of the target detection frame at the time t, and h ₀ is the height of the target detection frame at the time of the initial time;

calculating count duty cycle of frontal category in track log Where count represents the count.

The judging method for judging whether the track has the retrograde behavior comprises the following steps: displacement change amount shift is more than 0.25, and counting ratio of front category in track recordAnd the last observation position, namely the observation position at the moment t, is the front of the target detection class cls _t, and the velocity vector/>, at the moment t, of the trackPresence/>A trajectory satisfying the above condition is determined to be retrograde.

While embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that alterations, modifications, substitutions and variations may be made by those of ordinary skill in the art without departing from the scope of the invention.

Claims

1. A pedestrian retrograde judgment method based on multi-target tracking is characterized by comprising the following steps:

step 1, constructing a target detection data set

step 2, constructing an improved YOLOv target detection model

step 3, training an improved YOLOv target detection model

Step 5, pedestrian target tracking

the step 5 specifically comprises the following steps:

Step 5.1 obtaining predictions of tracker

If the current moment track is setIf the air is empty, executing the step 5.5;

Step 5.2 extraction of the track set Velocity vector/>For target detection frame/>And prediction frame/>Performing association;

Setting the coordinates of the center points of the latest two continuous frame positions as ，/>Then:；

The associated cost is as follows: ，/> to calculate IoU value and height intersection ratio between prediction box and target detection box,/> Wherein/>For predicting the area of the frame,/>For detecting the area of the frame of the target,/>For predicting the height of the frame,/>The height of the target detection frame,For predicting the height of the frame and the target detection frame, reserving/>Results of/(I)To calculate the velocity vector of the track, the angular difference of the velocity vector formed by the previous frame of the observed position of the track and the new detected position,/>Is a weight factor according to the associated cost/>The method comprises the steps of performing linear distribution by adopting a Hungary algorithm, obtaining the optimal matching of a prediction frame and a target detection frame, and returning three types of results: /(I)Matching trajectories,/>Unmatched tracks,An unmatched target detection frame;

Step 5.3 if there is an unmatched target detection frame And unmatched track/>Performing the second round of association to the unmatched track/>Calculating the remaining target detection frame/>And unmatched tracksLast observed position/>Associated costsReservation/>The result of (2) is linearly distributed by adopting a Hungary algorithm to obtain the optimal matching of the prediction frame and the target detection frame, and update/>、/>、；

Step 5.4 track update

Step 5.5 creation of a new track

For unmatched detection resultsCreate new tracks and trackers and initialize/>Observing the position for a first frame in the new track;

Step 6, retrograde behavior determination

Traversing all trajectoriesTrack/>, with length of record of screening observation position being more than 3Extraction/>The initial observation position box ₀ and the t moment observation position box _t of each track, aiming at the observation position extracted from one track, calculating the displacement variation of a target in the track, and judging whether the track has retrograde behavior according to the result of the target detection category and the speed vector; the judging method for judging whether the track has the retrograde behavior in the step 6 is as follows: the displacement variation is greater than 0.25, the counting ratio of the front category in track record is greater than 0.7, the target detection category in the last observation position is the front, and the speed vector/>, of the trackPresence/>A trajectory satisfying the above condition is determined to be retrograde.

2. The pedestrian retrograde judgment method based on multi-objective tracking according to claim 1, wherein: in the step 2, the number of output channels of the convolution layers added between the 50 th layer and the 51 th layer is 1024, the dimension reduction coefficient reduction=16 of CBAM added between the 50 th layer and the 51 th layer is increased, and the convolution kernel size k=49.

3. The pedestrian retrograde judgment method based on multi-objective tracking according to claim 1, wherein: in the step 2, the attention mechanism module CBAM is composed of a channel attention module and a spatial attention module, wherein the channel attention module comprises a full connection layer and two layers of ReLU activation functions, and the spatial attention module comprises a convolution layer and a sigmoid activation function.

4. The pedestrian retrograde judgment method based on multi-objective tracking according to claim 1, wherein: in the step 3, the training method is YOLOv model training method, the training parameters are iteration number 400, image size 640, batch picture number 32, and learning rate 0.01.

5. The pedestrian retrograde judgment method based on multi-objective tracking according to claim 1, wherein: in the step 5, the OC-SORT algorithm adopts a multi-target tracker based on a kalman filter to predict and update the state of the target track, and matches the track direction consistency through HIoU.

6. The pedestrian retrograde judgment method based on multi-objective tracking according to claim 1, wherein: step 5.6 is also included, if there is a mismatch count of tracks reaching 30, deleting the track.