CN112541440B

CN112541440B - Subway people stream network fusion method and people stream prediction method based on video pedestrian recognition

Info

Publication number: CN112541440B
Application number: CN202011485904.XA
Authority: CN
Inventors: 徐超; 高思斌; 李少利; 李永强; 戴李杰
Original assignee: CETHIK Group Ltd
Current assignee: CETHIK Group Ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2023-10-17
Anticipated expiration: 2040-12-16
Also published as: WO2022126669A1; CN112541440A

Abstract

The invention discloses a subway stream network fusion method and a stream prediction method based on video pedestrian recognition, which are characterized in that the video data of subway stations are utilized to carry out statistical analysis on the specific direction of the entering and exiting streams of people in the subway stations, the fusion of the subway network and a ground traffic network enables the movement of people to be carried out in a huge network, each subway station becomes a node in the network, and each subway route and the ground road traffic route become edges of the network. The traffic flow change of the whole traffic network is deduced by using the graph neural network, so that each site, the number of the traffic flows and the direction of the traffic flows are analyzed and predicted. The underground traffic analysis system has the advantages that deeper people flow analysis of the entrances and exits of each subway station is realized, resource scheduling of each station and the entrances and exits is facilitated, and the mutual influence of the above-ground and underground people flows can be timely predicted, so that traffic early warning can be timely carried out on the above-ground or underground, traffic jam is avoided, site security measures are deployed in advance, and the like.

Description

Subway people stream network fusion method and people stream prediction method based on video pedestrian recognition

Technical Field

The invention belongs to the technical field of smart cities, and particularly relates to a subway traffic network fusion method and a traffic prediction method based on video pedestrian recognition.

Background

The application of video monitoring is increasingly applied in the field of digital security protection, people counting through videos is increasingly important, for example, people counting data are utilized in stations, tourist attractions, exhibition areas, commercial streets and other places, so that people can be effectively mobilized, resources are allocated, and better security is provided.

The existing subway people stream prediction is generally carried out based on the import and export card swiping data of each station, and the obtained result can only obtain people stream prediction conditions of the access of the station. Generally, a subway station passenger flow prediction model is constructed by analyzing historical card swiping data of a subway station and a road network map, so as to predict future passenger flow changes of the station, for example, predict the incoming and outgoing times of each station in each period of 10 minutes from 00 to 24 days in the future. In addition, the people flow statistics scheme is mainly used for monitoring the people flow density of the subway carriage in real time through infrared rays, cameras, communication data and the like. For example, the method for image acquisition generally uses different shielding degrees of image light sources acquired by cameras through different crowd densities to obtain crowd distribution images under different densities, performs image analysis processing and front-to-back comparison on the crowd distribution images, and finally comprehensively obtains the condition of the people flow density of the subway carriage.

However, the existing schemes have the following problems: each station generally has a plurality of entrances and exits, the passage of each entrance and exit has a plurality of conditions, the existing scheme can not analyze the subdivision direction of the in-out people flow of the subway station, so that the on-ground and underground traffic networks can not be deeply fused, and the condition that the resources of each entrance and exit of the station can not be reasonably distributed exists; and meanwhile, the influence of the traffic flow of the access site on the ground cannot be obtained.

Disclosure of Invention

The invention aims to provide a subway traffic network fusion method and a traffic prediction method based on video pedestrian recognition, which are used for communicating an overground traffic route, an underground traffic route and a subway route, refining the traffic of each station and the outbound or inbound direction of the traffic, and improving the traffic prediction accuracy.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

the subway stream network fusion method based on video pedestrian recognition is used for realizing the fusion statistics of subways and ground streams of people to assist traffic early warning, and comprises the following steps:

step 1, receiving monitoring images of all entrances and exits of a subway station, wherein the monitoring images are acquired by image acquisition equipment arranged at all the entrances and exits;

Step 2, extracting pedestrian target coordinate frame information and pedestrian target characteristic information in the monitoring image, wherein the pedestrian target characteristic information comprises pedestrian characteristics, pedestrian in-out state and pedestrian out-out or in-out direction;

step 3, based on the monitoring image of the same image acquisition equipment, performing similarity calculation according to the pedestrian target coordinate frame information and the pedestrian target characteristic information to obtain a pedestrian track aiming at the same pedestrian;

step 4, matching the similarity of the pedestrian target characteristic information based on the pedestrian tracks corresponding to different image acquisition devices, combining the pedestrian tracks successfully matched, and updating the pedestrian tracks corresponding to the pedestrians;

step 5, obtaining subway routes, subway stations, entrances and exits of all stations in the designated area and ground traffic routes corresponding to all entrances and exits, and fusing and constructing a subway traffic network map in the designated area;

step 6, counting total incoming and outgoing traffic of each station and incoming and outgoing traffic on traffic lines corresponding to each access in each station according to the latest pedestrian track in the preset time period;

and 7, superposing total inflow and outflow people flow of each station of the subway station and inflow and outflow people flow on traffic lines corresponding to each entrance and exit in each station on the basis of the subway traffic network map to obtain a subway and ground people flow fusion people flow mobile network map.

The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.

Preferably, the step of calculating the similarity according to the pedestrian target coordinate frame information and the pedestrian target feature information based on the monitoring image of the same image acquisition device to obtain a pedestrian track for the same pedestrian comprises the steps of:

step 3.1, acquiring pedestrian target coordinate frame information and pedestrian target characteristic information of the current image acquisition equipment in the current monitoring image;

step 3.2, judging whether a tracking track set corresponding to the image acquisition equipment is empty or not, wherein the tracking track set is used for storing the pedestrian track of the pedestrian, and if the tracking track set is not empty, executing the step 3.3; otherwise, directly adding the obtained pedestrian target coordinate frame information and the pedestrian target characteristic information into the tracking track set and ending;

step 3.3, obtaining estimated target coordinate frame information by adopting unscented Kalman filtering based on the pedestrian track in the track set;

Step 3.4, calculating the coordinate frame similarity between the current pedestrian target and the stored pedestrian target one by one according to the pedestrian target coordinate frame information and the estimated target coordinate frame information, calculating the feature similarity between the current pedestrian target and the stored pedestrian target one by one based on the pedestrian target feature information of the pedestrian track and the pedestrian target feature information in the current monitoring image, and obtaining the similarity between the current pedestrian target and the stored pedestrian target by one based on weighted summation of the coordinate frame similarity and the feature similarity;

step 3.5, matching the pedestrian track in the track set and the pedestrian target acquired at the time by adopting a Hungary matching algorithm based on the similarity between the current pedestrian target and the stored pedestrian targets;

step 3.6, if the pedestrian target which is not successfully matched at this time exists, the pedestrian target coordinate frame information corresponding to the pedestrian target and the pedestrian target characteristic information are directly added into the tracking track set, and the new track is marked; if the pedestrian track and the pedestrian target are successfully matched, updating the pedestrian track of the pedestrian according to the pedestrian target coordinate frame information and the pedestrian target characteristic information corresponding to the pedestrian target; if the pedestrian track marked as the new track exists in the tracking track set, successfully matching the pedestrian track for a plurality of times, removing the new mark of the pedestrian track; if a continuous multi-frame unsuccessfully matched pedestrian track exists in the tracking track set, the pedestrian target is considered to be separated from the monitoring range of the current image acquisition equipment, and the pedestrian track is marked as a separation track; if the pedestrian track marked as the departure track is not successfully matched within the specified time threshold, the pedestrian track is considered to be finished, and the pedestrian track is deleted from the tracking track set.

Preferably, the matching of the similarity of the pedestrian target feature information based on the pedestrian tracks corresponding to different image acquisition devices combines the pedestrian tracks successfully matched, and updates the pedestrian track corresponding to the pedestrian, including:

step 4.1, taking a tracking track set corresponding to one image acquisition device, and calculating the similarity between the pedestrian tracks marked as new tracks in the tracking track set and the pedestrian tracks marked as leaving tracks in the tracking track sets corresponding to the other image acquisition devices one by one;

step 4.2, if the similarity is larger than a preset threshold, the matching of the two pedestrian tracks is successful;

and 4.3, combining the two successfully matched pedestrian tracks to obtain a new pedestrian track of the pedestrian, and replacing the corresponding pedestrian track in the tracking track set where the new track is positioned by using the new pedestrian track.

The invention also provides a people stream prediction method for carrying out people stream prediction based on the fusion of the subway and the ground people stream so as to assist traffic early warning, which comprises the following steps:

obtaining a pedestrian flow mobile network diagram in a specified time period by using a subway pedestrian flow network fusion method based on video pedestrian recognition;

Predicting total incoming and outgoing predicted people flow of each station in a specified time period in the future by utilizing a graph neural network based on a people flow mobile network graph;

based on the inflow and outflow people flow on the traffic route corresponding to each gateway of each site in the people flow mobile network diagram, acquiring the inflow and outflow proportion average value of the traffic route corresponding to each gateway of each site;

and distributing the total incoming and outgoing predicted traffic of each station according to the incoming and outgoing ratio average value to obtain the incoming and outgoing predicted traffic on the traffic route corresponding to each access of each station.

Preferably, the method for obtaining the people flow mobile network map within a specified time period by using the subway people flow network fusion method based on video pedestrian recognition comprises the following steps:

Preferably, the predicting, based on the people flow mobile network map, total incoming and outgoing predicted people flow of each station in a future specified time period by using a map neural network includes:

in the people stream mobile network diagram, a station of a subway is taken as a vertex, traffic routes corresponding to all entrances and exits of the station are taken as edges, each vertex is provided with a characteristic vector containing total incoming and outgoing people flow, and a model for constructing the people stream mobile network diagram is as follows:

G _t ＝(V _t ，ε，W)

Wherein G is _t A network diagram of people flow movement at time t, V _t Epsilon represents the space between the vertices, which is a vector composed of the eigenvectors of all verticesAn edge set, W is the weight of the adjacent matrix, and t is the current moment;

when predicting the people flow of each vertex, based on the characteristic vector of the vertex from the time t-M+1 to the time t in the historical time period, predicting the characteristic vector from the time t+1 to the time t+H in the future appointed time period, wherein M, H is a preset coefficient, and constructing a people flow prediction target model as follows:

in the method, in the process of the invention,v for predicting the feature vector from time t+1 to time t+H _t-M+1 ，...，v _t A feature vector representing the input-M+1 time to t time;

and solving the traffic prediction target model by adopting a graph neural network based on the constructed traffic prediction target model to obtain the total incoming and outgoing predicted traffic of each station in a specified time period in the future.

According to the subway traffic network fusion method and the traffic prediction method based on video pedestrian recognition, the video data of the subway station is utilized to carry out statistical analysis on the specific direction of the in-out traffic of the subway station, and the on-ground and underground road traffic lines and the subway lines are opened to form a complete traffic network which can be metered on the ground and underground; the convergence of the subway network and the ground traffic network enables the movement of people to be performed in a huge network, each subway station becomes a node in the network, and each subway line and the ground road traffic line become edges of the network. The traffic flow change of the whole traffic network is deduced by using the graph neural network, so that each site, the number of the traffic flows and the direction of the traffic flows are analyzed and predicted. The underground traffic analysis system has the advantages that deeper people flow analysis of the entrances and exits of each subway station is realized, resource scheduling of each station and the entrances and exits is facilitated, and the mutual influence of the above-ground and underground people flows can be timely predicted, so that traffic early warning can be timely carried out on the above-ground or underground, traffic jam is avoided, site security measures are deployed in advance, and the like.

Drawings

FIG. 1 is a flow chart of a subway stream network fusion method based on video pedestrian recognition;

FIG. 2 is a training schematic diagram of an SSD destination detection network of the present invention;

FIG. 3 is a schematic diagram of training a MobileNet neural network according to the present invention;

FIG. 4 is a schematic diagram of the distillation operation of the network neural network of the present invention;

FIG. 5 is a flow chart of pedestrian trajectory tracking in accordance with the present invention;

FIG. 6 is a flow chart of a multi-factor fusion pedestrian target tracking method of the present invention;

FIG. 7 is a flow chart of a target tracking method based on pedestrian image features of the present invention;

FIG. 8 is a schematic representation of one embodiment of a subway traffic network map of the present invention;

FIG. 9 is a schematic diagram of problem modeling based on spatio-temporal sequences in accordance with the present invention;

fig. 10 is a schematic structural view of a framework of the STGCN of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

In one embodiment, a subway traffic network fusion method based on video pedestrian recognition is provided, traffic routes on the ground and traffic connections of underground subway routes are established, and incoming traffic directions and outgoing traffic directions of all entrances and exits of all stations of the subway are analyzed. The correlation of the above-ground (i.e. ground) and underground (i.e. subway) people flow overcomes the defect that the existing above-ground or underground people flow statistics only consider the above-ground or underground single-layer people flow, but do not further consider the factors of the mutual influence of the above-ground and underground people flows, so that the accuracy of the above-ground or underground statistics or prediction is insufficient. Based on the above-ground and underground network fusion people stream statistical method, the resource scheduling of each entrance and exit of each station of the subway can be utilized, the above-ground traffic network can be combined to perform people stream early warning on the subway station, and meanwhile, the underground subway network can be combined to perform people stream early warning on an above-ground traffic route, so that the foresight and timeliness of traffic control are improved.

As shown in fig. 1, the subway stream network fusion method based on video pedestrian recognition in this embodiment includes the following steps:

and step 1, receiving monitoring images of all entrances and exits of the subway station, wherein the monitoring images comprise images acquired by image acquisition equipment arranged at all the entrances and exits.

Because the above-ground and underground people stream data are required to be associated, the monitoring range of the image acquisition equipment is required to comprise the whole entrance and the above-ground traffic route corresponding to the entrance when the image acquisition equipment is deployed, and a foundation is laid for identifying the entrance and exit states, the exit or the entrance directions of pedestrians.

The image capturing devices in this embodiment may be optical cameras, binocular cameras, TOF cameras, etc., and each image capturing device has a unique device id in order to distinguish each image capturing device. Thus, the device id of the image acquisition device corresponding to the monitoring image and the corresponding timestamp are acquired while the monitoring image is received.

It should be noted that, generally, the use requirement of the invention can be met by installing one image acquisition device at each entrance of the subway station, but the invention is not limited to installing one image acquisition device at each entrance, and under the condition of actual monitoring requirement or monitoring precision requirement, a plurality of image acquisition devices can be installed at one entrance so as to more comprehensively acquire video image information, and image acquisition devices can also be installed along the traffic route corresponding to the inside of the subway station and the entrance so as to expand the video image acquisition range and obtain more comprehensive and complete people flow statistics or pedestrian tracks.

And 2, extracting pedestrian target coordinate frame information and pedestrian target characteristic information in the monitoring image, wherein the pedestrian target characteristic information comprises pedestrian characteristics, pedestrian in-out state and pedestrian out-out or in-out direction.

The pedestrian target coordinate frame information and the pedestrian target feature information are basic information for identifying and positioning a pedestrian target, and the embodiment is based on a neural network to extract two parts of information. The existing target recognition neural network and feature recognition neural network are more, and the embodiment does not limit the adopted neural network. In order to facilitate understanding of the present embodiment, the SSD destination detection network extracts the pedestrian destination coordinate frame information, and the MobileNet neural network extracts the pedestrian destination feature information.

As shown in fig. 2, the training application process of the SSD destination detection network as the destination identification algorithm includes the following specific steps:

1. and constructing a pedestrian target data set, wherein the data set comprises image data and labeling data, and the labeling data marks a region of a pedestrian target in a designated image.

2. And (3) counting the aspect ratio of the pedestrian targets in the data set, clustering the aspect ratio data by adopting a clustering method to obtain n clustering centers, namely n aspect ratios, and adopting the n aspect ratios as the proportion of the anchor frames in the target identification network.

3. The image data is enhanced to obtain training data such as color change, random cropping, image zoom-in and out, rotation, etc.

4. The training data is input into an SSD target detection network for target recognition, and the neural network outputs coordinate frame information of a pedestrian in an image.

5. And removing the repeated frames by adopting a non-maximum suppression method (NMS) for the coordinate frames output by the SSD destination detection network to obtain the final output coordinate frames.

6. And selecting a frame with the cross ratio of the marked coordinate frame being larger than a larger threshold value as a positive sample, and selecting a frame with the cross ratio of the marked coordinate frame being smaller than a smaller threshold value as a negative sample. And randomly selecting a certain proportion and a certain number of samples from the positive and negative samples as training samples to train the neural network.

7. And calculating a loss function according to the final training sample coordinate frame and the labeling data, and adjusting the neural network parameters by adopting a back propagation method.

8. And obtaining a final SSD destination detection network after certain training.

In the application of the SSD destination detection network, the SSD destination detection network obtained through training receives image input, outputs the image as a pedestrian coordinate frame, performs non-maximum value inhibition operation on the coordinate frame, deletes repeated coordinate frames, finally sets a certain threshold value, and takes the coordinate frame as output pedestrian destination coordinate frame information when the reliability of the coordinate frame is greater than the threshold value.

As shown in fig. 3, the training application process of the MobileNet neural network as the pedestrian re-recognition algorithm specifically includes the following steps:

1. and constructing a pedestrian re-identification data set, wherein the data set comprises image data and id data of pedestrians corresponding to the image.

2. 3 pictures are selected in one training, namely two different pictures of the pedestrian A, one picture of other pedestrians, and the three pictures are respectively input into the MobileNet neural network after image enhancement, and pedestrian characteristics are output.

3. And calculating a ternary loss function according to the output of the neural network, wherein the loss function can make the image characteristics of pedestrians of the same person more similar and the image characteristics of different pedestrians more dissimilar, so that the ternary loss function can be used for training a pedestrian re-recognition algorithm. After the loss function is calculated, the neural network is trained using back propagation.

In deploying the target detection network, distillation operations need to be performed on the network neural network for faster network operation, as shown in fig. 4 in particular. The teacher model is a trained relatively large neural network model, which is generally high in accuracy, but has many parameters of the network and is slow in running time. The student model is generally a model with a small parameter amount, if the model is directly trained by using the labeling data, the model is difficult to train, and the neural network is distilled to enable the student model to learn from the labeling data and the teacher model at the same time, so that a good result can be obtained.

In the application of the MobileNet neural network, the final MobileNet neural network obtained through training receives image data and pedestrian id data, and outputs pedestrian target characteristic information corresponding to each pedestrian.

The target recognition neural network, the feature recognition neural network and the corresponding training application method provided by the embodiment can completely, comprehensively and accurately extract corresponding data in the monitoring image, and improve high-quality basic information for people stream statistics.

Since the real track of the pedestrian cannot be accurately identified according to the state of the pedestrian in one monitoring image alone, the track tracking of the pedestrian is required. As shown in fig. 5, the multi-target tracking method in this embodiment is divided into two parts, wherein one is a multi-factor fusion pedestrian target tracking method under the monitoring screen of the same image acquisition device, and the method generates a pedestrian track under the monitoring screen of the image acquisition device. The other is a target tracking method based on pedestrian characteristics across image acquisition devices, which is used for matching the pedestrian track of the same pedestrian under different image acquisition devices. The pedestrian target tracking method of the cross-image acquisition equipment directly adopts the pedestrian target characteristic information to calculate the similarity, and the similarity is larger than a certain threshold value, so that the same pedestrian is judged, and related tracks are associated. By combining the two methods, cross-region pedestrian track data can be obtained, so that complete pedestrian track tracking is performed, and the accuracy of people flow statistics is improved.

And 3, based on the monitoring image of the same image acquisition equipment, calculating the similarity according to the pedestrian target coordinate frame information and the pedestrian target characteristic information to obtain the pedestrian track aiming at the same pedestrian. Namely a multi-factor fusion pedestrian target tracking method, as shown in fig. 6, specifically comprising the following steps:

and 3.1, acquiring pedestrian target coordinate frame information and pedestrian target characteristic information of the current image acquisition equipment in the current monitoring image.

Step 3.2, judging whether a tracking track set corresponding to the image acquisition equipment is empty or not, wherein the tracking track set is used for storing the pedestrian track of the pedestrian, and if the tracking track set is not empty, executing the step 3.3; otherwise, the obtained pedestrian target coordinate frame information and the pedestrian target characteristic information are directly added into the tracking track set and ended.

And 3.3, obtaining estimated target coordinate frame information by adopting unscented Kalman filtering based on the pedestrian track in the track set.

The unscented Kalman filtering is developed based on Kalman filtering and transformation, and the Kalman filtering under the linear assumption is applied to a nonlinear system by using lossless transformation. Unscented Kalman filtering is used to estimate the location of each pedestrian trajectory that is already present at the current time.

And 3.4, calculating the coordinate frame similarity between the current pedestrian target and the stored pedestrian target one by one according to the pedestrian target coordinate frame information and the estimated target coordinate frame information, calculating the feature similarity between the current pedestrian target (namely the acquired pedestrian target) and the stored pedestrian target one by one based on the pedestrian target feature information of the pedestrian track and the pedestrian target feature information in the monitored image, and obtaining the similarity between the current pedestrian target and the stored pedestrian target by one based on the weighted summation of the coordinate frame similarity and the feature similarity.

Based on a pedestrian target estimation frame in the track set predicted by unscented Kalman filtering and a pedestrian target frame obtained by current detection, calculating the IOU of a pedestrian target, the center point distance of the pedestrian target, the size difference of the pedestrian target and the like, and carrying out weighted summation on the characteristic similarity (such as cosin similarity of the characteristic and the like) of the index and the characteristic information of the pedestrian target to construct the similarity between the existing track and the pedestrian to be matched.

The similarity between the current pedestrian target and the stored pedestrian target obtained through the embodiment integrates the coordinate frame similarity and the feature similarity, and the pedestrian targets are matched from multiple directions, so that the accuracy of the pedestrian track is remarkably improved. It should be noted that, the calculation of the similarity of the coordinate frame and the feature similarity is a mature technology in the field of tracking the pedestrian track, and in this embodiment, no description will be given. And the weight of the weighted summation is set according to the emphasis point in actual use.

In order to intuitively express the similarity between every two pedestrian targets, a similarity matrix can be used for storage, the stored pedestrian targets are longitudinally in the similarity matrix, the current pedestrian targets are transversely in the similarity matrix, and the corresponding values in the matrix are the similarities of the corresponding stored pedestrian targets and the current pedestrian targets.

And 3.5, matching the pedestrian track in the track set and the pedestrian target acquired at the time by adopting a Hungary matching algorithm based on the similarity between the current pedestrian target and the stored pedestrian targets.

When the pedestrian track of each pedestrian is generated under the same image acquisition equipment, the pedestrian track of each pedestrian is updated in real time, and when the pedestrian is newly added in the monitoring range, the newly added pedestrian is confirmed by continuous and repeated successful matching so as to avoid false detection; and after the pedestrian track is successfully matched for a plurality of times, deleting the pedestrian track to reduce the storage pressure and the matching pressure and improve the matching speed.

And 4, matching the similarity of the pedestrian target characteristic information based on the pedestrian tracks corresponding to different image acquisition devices, combining the pedestrian tracks successfully matched, and updating the pedestrian tracks corresponding to the pedestrians. Namely, a target tracking method based on pedestrian image characteristics, as shown in fig. 7, specifically comprises the following steps:

and 4.1, taking a tracking track set corresponding to one image acquisition device, and calculating the similarity between the pedestrian tracks marked as new tracks in the tracking track set and the pedestrian tracks marked as leaving tracks in the tracking track set corresponding to the other image acquisition devices one by one.

Because pedestrians cannot appear in two monitoring pictures at the same time under normal conditions, the embodiment only takes leaving tracks corresponding to other image acquisition equipment for matching, so that the matching result is ensured to accord with normal pedestrian movement behaviors, and meanwhile, the characteristic matching pressure is reduced, and the cross-region matching speed is improved.

And 4.2, if the similarity is larger than a preset threshold value, the matching of the two pedestrian tracks is successful. In this embodiment, the calculated similarity is calculated based on the pedestrian target feature information carried by the two pedestrian tracks, and the pedestrian target feature information obtained by calculation may be the pedestrian target feature information in the latest monitoring image in the pedestrian track, or may be the average value of the pedestrian target feature information in the latest monitoring images. The similarity may be cosine similarity and is matched in the hungarian algorithm.

In the embodiment, two sections of pedestrian tracks successfully matched are combined, and the combination preferably splices the two sections of pedestrian tracks according to a time sequence to obtain the pedestrian track conforming to the real moving path of the pedestrian, and the combined cross-region pedestrian track is moved to the tracking track set where the new track is located, namely the leaving track is moved from the original tracking track set to the tracking track set where the new track is located, so that the combined management of the pedestrian tracks is realized.

And 5, acquiring subway routes, subway stations, entrances and exits of all stations in the designated area and ground traffic routes corresponding to all entrances and exits, and fusing and constructing a subway traffic network map in the designated area.

In this embodiment, the ground traffic route corresponding to each gateway is understood as the ground road where the gateway is located. The ground road where the access is taken out is the basic operation of the ground and underground traffic network fusion of the invention, and the same is true of the installation of the image acquisition equipment in the step 1, if the image acquisition equipment is not only installed at the access, but also extends to all the ground roads within the preset range of the subway station, the corresponding ground traffic route can also be other ground roads related to the ground road extension where the access is located where the image acquisition equipment is located. The final subway traffic network diagram is shown in fig. 8, wherein points represent subway stations, solid lines represent ground roads, broken lines represent subway routes, namely, the subway stations are taken as points, the subway routes and the traffic routes are taken as edges, and the connection points of the traffic routes and the subway stations are used for illustrating entrances and exits. Naturally, because the invention mainly counts the people flow at the entrance and the exit of the subway station and the people flow in the ground road, the formed subway traffic network map can also be a network map which only takes the subway station as a point and takes the ground road as a side.

And 6, counting total incoming and outgoing traffic of each station and incoming and outgoing traffic on traffic lines corresponding to each access in each station according to the latest pedestrian track in the preset time period.

The pedestrian track at least comprises a pedestrian moving path in the same image acquisition equipment, and the moving path has a direction, so that the pedestrian can be identified to be in-station or out-of-station according to the pedestrian track, and the total in-station and out-of-station people flow in a specified time period can be obtained correspondingly through statistics.

And because the monitoring range of the image acquisition equipment comprises an overground traffic route, the monitoring picture comprises a process of entering the picture from a certain direction of the traffic route to enter the picture or moving the picture from the exit to a certain direction of the traffic route, namely corresponding entering and exiting people flow on a traffic route can be obtained, the entering and exiting people flow on the traffic route comprises a pedestrian exiting or entering direction (taking the case that the exiting and exiting people flow on the traffic route only comprises two directions of left turn and right turn, and the exiting people flow on the traffic route comprises the people flow entering the traffic route after the exiting and right turn).

The finally obtained people flow mobile network diagram shows the in-out people flow of each entrance and exit of each subway station, wherein the in-out people flow comprises in-out people flow, out-out people flow, and the flow direction of the in-out people flow into a traffic route (for example, the traffic route corresponding to the entrance and exit is north-south extending, the people flow going backward after out-out) is from the coming of the traffic route (for example, the traffic route corresponding to the entrance and exit is north-south extending, the in-out people flow is from the route to the south, and the in-out people flow is from the route to the north).

The pedestrian flow direction or the arrival direction is obtained by overlapping a plurality of monitoring images according to time sequence and the pedestrian in-and-out state and the pedestrian out-and-in direction of the pedestrians in each frame of monitoring image, and then generating a pedestrian track. The method is convenient for analyzing the conditions of traffic congestion and the like possibly existing in the ground traffic route based on subway traffic, is convenient for taking measures such as evacuation and early warning in time, and has important help for traffic early warning in scenic spots, urban trunk roads and nearby urban activities.

In another embodiment, a traffic prediction method is further provided, and traffic prediction is performed based on the fusion of a subway and a ground traffic, so as to assist in traffic early warning, and the traffic prediction method includes:

And obtaining a pedestrian flow mobile network diagram in a specified time period by using a subway pedestrian flow network fusion method based on video pedestrian recognition.

Based on the people flow mobile network diagram, the total incoming and outgoing of each station in a specified time period in the future are predicted by using the map neural network to predict the people flow.

Based on the traffic flow of the entering and exiting stations on the traffic route corresponding to each entrance and exit of each station in the traffic flow mobile network diagram, the average value of the entering and exiting proportion of the traffic route corresponding to each entrance and exit of each station is obtained.

In another embodiment, the obtaining a people flow mobile network diagram within a specified time period by using the subway people flow network fusion method based on video pedestrian recognition includes:

In another embodiment, the step of performing similarity calculation according to the pedestrian target coordinate frame information and the pedestrian target feature information based on the monitoring image of the same image acquisition device to obtain a pedestrian track for the same pedestrian includes:

And 3.4, calculating the coordinate frame similarity between the current pedestrian target and the stored pedestrian target one by one according to the pedestrian target coordinate frame information and the estimated target coordinate frame information, calculating the feature similarity between the current pedestrian target and the stored pedestrian target one by one based on the pedestrian target feature information of the pedestrian track and the pedestrian target feature information in the current monitoring image, and obtaining the similarity between the current pedestrian target and the stored pedestrian target by one based on weighted summation of the coordinate frame similarity and the feature similarity.

In another embodiment, the matching of the similarity of the pedestrian target feature information based on the pedestrian tracks corresponding to different image acquisition devices, and combining the pedestrian tracks successfully matched, and updating the pedestrian track of the corresponding pedestrian, includes:

For the specific limitation of obtaining the people flow mobile network diagram within the specified time period by using the subway people flow network fusion method based on video pedestrian recognition in this embodiment, reference may be made to the specific limitation of the subway people flow network fusion method based on video pedestrian recognition, and no further description will be given here. In this embodiment, the underground and overground people stream network diagram is constructed by representing a subway network of the whole city as a diagram, wherein the subway station is a vertex, the subway line and the traffic route on the ground connecting each entrance and exit of the subway station are edges (the traffic route on the ground connecting each entrance and exit of the subway station is also used as an edge, if the edge contains the subway line, the people flow of the subway line is no data, or the people stream movement inside the subway line is obtained by using the existing modes of the subway station card swiping information and the like on the basis of the invention). Each vertex has a feature vector composed of the volume of people in video statistics, and an adjacency matrix can be defined to encode the pairwise dependency between vertices. Thus, the subway network does not need to represent subway stations by grids or capture features by CNNs, but can be described by a general network graph, and irregular space-time dependency relationships at the subway network level, rather than the grid level, can be effectively captured by using a graph neural network (GCN).

The problem modeling is mathematical modeling applied to the people stream prediction of the underground and above-ground people stream statistical network at each side, and people stream values of the network history are utilized to predict the people stream in the next days of the network. In particular, the problem can be modeled by using the spatiotemporal sequence shown in fig. 9, we define a city-wide subway network on a graph and focus on the structured time-series traffic. The model for constructing the people stream mobile network diagram is as follows:

G _t ＝(V _t ，ε，W)

G _t the network diagram of people stream moving at the time t is a diagram composed of a plurality of nodes, V _t Is a finite set of nodes representing vertices in the graph for monitoring the traffic of each node, i.e., V _t Is a vector of traffic of all nodes, ε represents the set of edges between vertices, and W, which shows connectivity between vertices, is the weight of the adjacency matrix.

When predicting the flow of people at each vertex, given the historical data of the first t moments, one or more moments in the future are predicted. The current predicted value is given to historical data from t-M+1 to t time, the current of people from t+1 to t+H is predicted, and a current predicted target model is constructed as follows:

in the method, in the process of the invention,for predicting the feature vector (total in/out flow of people) from time t+1 to time t+H, v _t-M+1 ，...，v _t The feature vector (total ingress/egress traffic) representing the input from-m+1 time to t time. V is that _t+1 ，...，v _t+H Similarly, the characteristic vector from the predicted time t+1 to the time t+H is +.>Is a mathematically expressed way to distinguish between predicted variables.

It is easy to understand that if the input data is the total incoming traffic of a certain station from time t-m+1 to time t, the obtained predictive feature vector is also the total incoming traffic in the appointed time period in the future; similarly, if the input data is the total outbound traffic of a certain station from t-M+1 to t, the obtained predictive feature vector is also the total outbound traffic in a specified time period in the future.

For the graph neural network model, the graph structure data is directly used for extracting high-order features on a spatial domain, and chebyshev polynomial approximation is used, wherein the chebyshev graph convolution formula is as follows:

the model framework used is that of STGCN, and is composed of a plurality of space-time convolution modules, each of which is structured like a sandwich (as shown in fig. 10), with two gating sequence convolution layers and a space-diagram convolution module in between. The Temporal Gated-Conv is used for capturing time correlation and consists of a 1-D Conv and a gating linear unit GLU; the Spatial Graph-Conv is used for capturing Spatial correlation and mainly comprises the chebyshev Graph convolution module.

After the total incoming and outgoing predicted traffic of each site is obtained, the total predicted traffic is distributed according to the proportional average value corresponding to each entrance and exit of the site. The proportion average value of the incoming person flow rate of each entrance and exit of one station is calculated according to the incoming person flow rate of each entrance and exit in the corresponding time period; similarly, the proportion average value of the outbound people flow of each gateway of one site is calculated according to the outbound people flow of each gateway in a corresponding time period. When total predicted traffic is distributed, distribution is also performed based on an incoming proportion average value (namely, the proportion average value of the incoming traffic) and distribution is performed based on an outgoing proportion average value (namely, the proportion average value of the outgoing traffic), so that a predicted result with traceability is obtained, and the predicted result is guaranteed to have practical application value.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The subway stream network fusion method based on video pedestrian recognition is used for realizing the fusion statistics of subways and ground streams of people to assist traffic early warning, and is characterized by comprising the following steps:

2. The subway stream network fusion method based on video pedestrian recognition as set forth in claim 1, wherein the step of performing similarity calculation based on the monitored image of the same image acquisition device according to pedestrian target coordinate frame information and pedestrian target feature information to obtain a pedestrian track for the same pedestrian comprises:

3. The subway pedestrian flow network fusion method based on video pedestrian recognition according to claim 2, wherein the matching of the similarity of pedestrian target feature information based on the pedestrian tracks corresponding to different image acquisition devices, combining the pedestrian tracks successfully matched, and updating the pedestrian tracks corresponding to pedestrians, comprises:

4. The people stream prediction method is used for carrying out people stream prediction based on the fusion of subways and ground people streams to assist traffic early warning, and is characterized by comprising the following steps:

distributing the total incoming and outgoing predicted traffic of each station according to the incoming and outgoing ratio average value to obtain the incoming and outgoing predicted traffic on the traffic route corresponding to each access of each station;

the method for obtaining the people flow mobile network map in the appointed time period by utilizing the subway people flow network fusion method based on video pedestrian recognition comprises the following steps:

5. The people stream prediction method of claim 4, wherein the step of obtaining the pedestrian track for the same pedestrian by performing similarity calculation based on the pedestrian target coordinate frame information and the pedestrian target feature information based on the monitoring image of the same image acquisition device comprises:

6. The pedestrian flow prediction method as set forth in claim 5, wherein the matching of the similarity of the pedestrian target feature information based on the pedestrian trajectories corresponding to different image acquisition devices, combining the pedestrian trajectories successfully matched, and updating the pedestrian trajectories corresponding to pedestrians, includes:

7. The traffic prediction method according to claim 4, wherein predicting the total incoming and outgoing predicted traffic of each site within a specified future time period based on the traffic mobile network map using a map neural network comprises:

G _t ＝(V _t ，ε，W)

Wherein G is _t A network diagram of people flow movement at time t, V _t Epsilon represents an edge set among vertexes, W is the weight of an adjacent matrix, and t is the current moment;

in the method, in the process of the invention,v for predicting the feature vector from time t+1 to time t+H _t-M+1 ,…,v _t A feature vector representing the input t-M+1 time to t time;