CN109064484B

CN109064484B - Crowd movement behavior identification method based on fusion of subgroup component division and momentum characteristics

Info

Publication number: CN109064484B
Application number: CN201810236397.2A
Authority: CN
Inventors: 陈志�; 陈璐; 岳文静; 周传; 刘玲; 龚凯; 掌静
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2022-02-08
Anticipated expiration: 2038-03-21
Also published as: CN109064484A

Abstract

The invention discloses a crowd movement behavior recognition method based on the fusion of subgroup division and momentum characteristics, which comprises the steps of firstly, acquiring the space-time information of a moving target in a video image frame by utilizing an angular point tracking and background modeling method, dividing spatially adjacent crowds into a plurality of subgroups by utilizing the spatial region information of the group distribution in a foreground, and further dividing the subgroups through the movement correlation within a period of time to obtain the subgroups with movement consistency; secondly, extracting three momentum characteristics of the crowd movement for fusion on the basis of sub-population segmentation; and finally, training by taking the fused features and the pixel features of the video frame as the input of a differential cyclic convolution neural network, marking the training video segments into different description words by adopting an artificial marking method, and adjusting the result of the differential cyclic convolution neural network by using marked data to obtain a good training result, so that the motion behaviors of people can be effectively recognized, and a good effect is achieved.

Description

Crowd movement behavior identification method based on fusion of subgroup component division and momentum characteristics

Technical Field

The invention relates to a crowd movement behavior recognition method based on the fusion of subgroup division and momentum characteristics. Momentum feature extraction is carried out on the basis of subgroups, three-dimensional momentum feature video data extracted are input to a differential cycle convolution neural network for training and are converted into crowd behavior labels, the goal of crowd motion behavior identification is achieved, and the method belongs to the field of image processing, video detection and artificial intelligence cross technology application.

Background

The purpose of crowd motion behavior recognition is to divide dense crowds into subgroups through motion trail and foreground extraction from a sequence image, and crowd motion behavior recognition is carried out on the basis of the subgroups. The recognition of the activities at the group level increasingly becomes a hot problem in the field of computer vision, and has wide application in the aspects of intelligent video monitoring, public safety, sports competition and the like. The algorithm for identifying the motion behavior of the crowd in the video image frame mainly comprises a Harris corner detection algorithm, a Gaussian mixture background modeling method and a momentum feature fusion method.

(1) Harris corner detection algorithm: the algorithm uses a fixed window to slide on an image in any direction, the two conditions before and after sliding are compared, the gray level change degree of pixels in the window is large, and when the sliding in any direction exists, the corner point exists in the window. The corner points can effectively reduce the data volume of the information while keeping the important characteristics of the image graph, so that the content of the information is high, the calculation speed is effectively improved, the reliable matching of the image is facilitated, and the real-time processing becomes possible.

(2) The modeling method of the Gaussian mixture background comprises the following steps: the basic idea is to compare an input image with a background model and judge abnormal conditions which do not conform to the background model according to information such as difference values, so as to distinguish foreground pixels from background pixels. The method uses a gray level histogram to represent the distribution of gray level values in an image, and uses the statistical result to segment the image by assuming that the distribution of pixel gray level values in an image sequence obeys a normal distribution function.

(3) The momentum characteristic fusion method comprises the following steps: the method starts from the layer surface of the crowd subgroups, takes the crowd aggregated groups as a research target, and constructs momentum characteristics taking the subgroups as a unit. All sub-groups in a scene are detected through a maximum expectation optimization algorithm, the momentum characteristics such as the collective property, stability and conflict of the groups are extracted aiming at the detected groups, the momentum characteristics facing the sub-groups are formed, the group momentum characteristics can improve the dependence of the individual momentum characteristics on the scene, the independent momentum characteristics of the scene are constructed by group units, and the robustness and the expandability of the crowd movement behavior analysis are improved.

Disclosure of Invention

The invention aims to provide a crowd movement behavior recognition method based on the fusion of subgroup component division and momentum characteristics, solves the problems of division error caused by crowd overlapping in micro-segmentation and neglect of crowd movement details caused by overlarge segmentation granularity in macro-segmentation in a crowd dense scene, and provides a crowd movement momentum characteristic fusion training model based on the problem, so that the movement behaviors of crowds can be effectively recognized.

The invention discloses a crowd movement behavior identification method based on the fusion of subgroup component division and momentum characteristics, which comprises the following steps of:

step 1): the method comprises the steps that a user inputs continuous videos, the videos are divided into continuous video frames, each single pedestrian of each video frame serves as a characteristic tracking point P, and the motion information of the point P is represented by a four-dimensional vector P (P)_x,P_y,P_v,P_d) Is said P_x、P_ySpatial coordinates, P, representing characteristic tracking points_vIndicates the magnitude of the displacement of the point, P_dIndicates the direction of movement of the point, P_dHas a value of

The point set of all the characteristic tracking points of the image frame is marked as O_I＝{P₁,P₂,P₃,P₄}；

Step 2): the movement characteristics of the sub-population are determined by the momentum characteristics, and three different momentum characteristics are defined on the basis of the characteristic tracking points in the sub-population and the sub-population: the motion direction consistency, the space stability and the crowd friction conflict are improved; each sub-population contains H feature tracking points, namely C_k＝(P₁,P₂,...,P_H)；

Step 3): calculating the average value of the description factors in 5 continuous frames, and constructing a vector by using the three average values

Omega (C), rho (C)), jointly form a three-channel image, form data of 224 x 3 dimensions, input into a differential recurrent neural network DRCNN for training, and convert into 4096-dimensional feature vectors, wherein the differential recurrent neural network connects a VGG-16 (Visual Geometry Group-16) model and a 3-layer stacked long-short term memory recurrent neural network LSTM into an end-to-end model, and finally converts the feature vectors into crowd behavior labels by adopting an output function, and the training video segments are labeled as different description vocabularies according to the difference of behavior occurrence main bodies, behavior occurrence places and behaviors by adopting an artificial labeling method, and the result of the differential recurrent neural network is adjusted by using the labeled data to realize the recognition of the crowd motion behaviors.

Wherein the content of the first and second substances,

the step 1) is specifically as follows:

step 1.1): obtaining position information of feature tracking points in continuous video frames through a Harris corner detection algorithm to obtain foreground features of a target group, wherein the Harris corner detection algorithm is to slide a fixed window on an image in any direction, compare two conditions before and after sliding, and judge the gray level change degree of pixels in the window, when the gray level change degree is large when the sliding in any direction exists, the corner points exist in the window, the positions of all the feature tracking points in the continuous video frames are connected in series to obtain the motion trail T of each feature tracking point, and the motion trail set of all the feature tracking points is T_I＝{T₁,T₂,T₃,T₄}；

Step 1.2): and (3) performing foreground extraction by using a Gaussian mixture background modeling method, and for the gray value of a pixel in the current video frame, when the difference value of the mean value of the S-th Gaussian distribution in the Gaussian mixture background meets the formula: i (x, y, t) -mu_s ^t|＜λ·σ_s ^tIs considered to beMatching successfully, i.e. the pixel is background, wherein I (x, y, t) represents the pixel value of the pixel (x, y) at time t, μ_st represents the mean gray value at the time of the S-th Gaussian distribution t, λ represents the multiple coefficient of the standard deviation, σ_st represents the variance of the gray value of the S-th Gaussian distribution at the t moment, the space size of a target group and the distance relation between the target group and surrounding groups are obtained through foreground extraction, the space size is called as foreground plaque, and a plaque set is marked as B_I＝{B₁,B₂,B₃,...,B_kDividing spatially adjacent individuals through spatial relation change;

step 1.3): dividing the dense group by using the divided two kinds of space-time information;

the step 1.3) is specifically as follows:

step 1.3.1): set O of feature tracking points_IDivided into subsets containing a number of points, denoted O_I＝{C_I,F_I}, said C_I＝{C₁,C₂,...,C_KThe image frames are divided into point sets with motion consistency, and form a sub-population of people division, wherein the point set F is a point set_IIf the feature tracking point P is located in the contour range of the patch, dividing the point into the points of the patch, otherwise, rejecting the point;

step 1.3.2): attribute P of feature tracking point P_dRepresenting the motion relation of the feature tracking points between frames, calculating the cosine value of the included angle between the displacement vector value of the feature tracking point P and the X axis, obtaining a direction included angle theta, dividing the direction included angle 0-2 pi into 12 equal parts, and respectively marking each interval as D_i(i ═ 1,2,3., 12) is P_dAssignment, wherein the specific division method formula is as follows:

dividing the characteristic tracking points with similar motion trends into a sub-group through the patch range division and the constraint of the motion direction;

step 1.3.3): correcting abnormal points, and calculating attribute value P of each adjacent point in K adjacent points of the feature tracking point P_dFrequency of occurrence, the maximum frequency of occurrence being marked by a value D_iCharacteristic tracking point P_dIs marked as D_jFor all feature tracking points, when i +1 equals j or i-1 equals j, P of P is added_dThe value is corrected to D_i(ii) a Removing abnormal points: calculating the number I of points with the same motion trend in L adjacent points of the characteristic tracking point P, setting a critical value M, wherein M is less than or equal to L, and when I is<M, consider point P as an outlier, from subgroup C_kAnd (5) removing.

The step 2) is specifically as follows:

step 2.1): extracting the motion direction consistency characteristics: calculating the coordinate and displacement of each feature tracking point in the sub-population, averaging to obtain the centroid coordinate and average displacement of each sub-population in several continuous frames, and calculating the overall motion trend vector of each sub-population

According to the formula:

calculating the velocity correlation of the overall motion trend vector of each sub-population and the vector, wherein

A motion trend vector representing each sub-population,

a motion trend vector representing the k-th adjacent point, N represents the number of divided sub-groups, and the motion trend vector represents the k-th adjacent point

A larger value indicates a higher velocity correlation;

step 2.2): extracting space stability momentum characteristics: space stabilizationSex means that each feature tracking point keeps a stable neighbor in a certain time range and keeps a specific topological structure in a certain time; defining each feature tracking point P in the sub-population_iThe stability at time t is given by the formula:

obtaining where N denotes the number of divided sub-populations, P_iRepresents the ith feature tracking point, | N₁(P_i)\N₁(P_i)^TI represents the average number of points which maintain a stable adjacent relation with adjacent points of one feature tracking point in a certain sub-group from 1 to T, and K represents the number of the most adjacent points; the stability of the distance between each feature tracking point in the sub-population and the adjacent point in the neighborhood thereof can be represented by the following formula:

wherein N represents the number of divided sub-populations, P_iThe ith feature tracking point is represented as,

representing the average distance between the feature tracking point and its k adjacent points, adding the above two stabilities to form the overall stability of the subgroup, using ω (C)_k) Represents;

step 2.3): extracting the characteristics of crowd friction and conflict momentum: the formula for calculating the conflict is as follows:

a motion trend vector representing each sub-population,

represents the motion trend vector, avrg (N), of the k-th neighboring point_other(P_i) ) represents a sub-populationThe adjacency of the feature tracking point in (1) includes the average values of the feature tracking points in other sub-populations, and α and β are weight coefficients.

In the step 1.2), the multiple coefficient lambda of the standard deviation is 2.5.

Has the advantages that: the crowd movement behavior identification method based on the fusion of subgroup component division and momentum characteristics, provided by the invention, has the following specific beneficial effects:

1. the invention analyzes the subgroup division of the crowd under the crowd dense scene from the view of the subgroup, not only can solve the difficulty of separating moving individuals from the dense crowd, but also can obtain the ignored internal characteristics when the crowd is regarded as a whole as a research object.

2. The invention provides an algorithm based on space-time constraint, which takes the temporal motion relation and the spatial proximity among individuals in a motion group as the basis of group division, and the two conditions are constrained mutually to divide the group into sub-groups with motion consistency. The algorithm can be suitable for monitoring video streams with different population densities and observation visual angles, and is simple to implement and high in running speed.

3. The invention trains the abstracted three-dimensional video data by utilizing the differential recursive convolutional neural network, separates the steps of feature extraction and parameter learning, and is suitable for group videos with various resolutions and moving scenes.

Drawings

Fig. 1 is a flow chart of a subgroup division algorithm.

FIG. 2 is a flow chart of a momentum feature fusion algorithm.

Detailed Description

Some embodiments of the invention are described in more detail below with reference to the accompanying drawings.

The user enters 15 seconds of continuous video and divides it into one image frame every 0.5 seconds, i.e. D_I＝{D₁,D₂,D₃,...,D₃₀}，D_tFor the image frame at the time t, the people with motion consistency are divided into subgroups in the crowd dense scene, and the algorithm flow chart is shown in fig. 1.

Taking the first 15 framesVideo image frame, each single pedestrian is used as a tracking characteristic tracking point, and a four-dimensional vector P is used as (P)_x,P_y,P_v,P_d) Representing motion information of a characteristic tracking point P, where P_x,P_ySpatial coordinates, P, representing characteristic tracking points_vIndicates the magnitude of the displacement of the point, P_dRepresents the direction of movement of the point, and has a value of

Coordinate position information of feature tracking points in 15 video frames is obtained through a Harris corner detection algorithm, the coordinate positions of all the feature tracking points in continuous video frames are connected in series to obtain a motion trail T of each feature tracking point, and the motion trails of all the feature tracking points are integrated into T_I＝{T₁,T₂,T₃,T₄Each track comprises a set T of a plurality of characteristic tracking points_i＝{P₁,P₂,P₃,...,P_kAnd the continuous motion track represents the motion relationship of the population within a period of time.

And (3) performing foreground extraction by using a Gaussian mixture background modeling method, and if the difference value between the gray value of the pixel in the current video frame and the mean value of the Kth Gaussian distribution in the Gaussian mixture background meets the formula:

the matching is considered successful, i.e. the pixel is taken as background, where λ is a multiple coefficient of the standard deviation and λ takes a value of 2.5. The space size of a target group and the distance relation between the target group and surrounding groups are obtained through foreground extraction, the space size and the distance relation are called foreground plaques, and a plaque set is recorded as B_I＝{B₁,B₂,B₃,...,B_kAnd (4) dividing spatially adjacent individuals through spatial relation change.

And dividing the dense group by using the divided two kinds of space-time information. Set O of feature tracking points_IDivided into subsets containing a number of points, denoted O_I＝{C_I,F_IIn which C is_I＝{C₁,C₂,...,C_KThe image frames are divided into point sets with motion consistency, and form a sub-population of people division, wherein the point set F is a point set_IIt is the point that is culled. And (3) representing the space information of the patch by using a rectangular frame, dividing a rectangular area by using a coordinate value of a patch boundary pixel position, acquiring coordinates of right lower corners and left upper corners of the rectangle, if the coordinate position of the feature tracking point P is in the outline range of the patch, dividing the point into the points of the patch, and if not, rejecting the points. Attribute P of feature tracking point P_dRepresenting the motion relationship of the feature tracking points from frame to frame. Firstly, calculating the displacement vector of the characteristic tracking point P

And the cosine value of the included angle between the X axis and the X axis so as to obtain the direction included angle theta. Dividing the direction included angle of 0-2 pi into 12 equal parts, and respectively marking each interval as D_i(i-1, 2,3 …,12) is P_dAssignment, wherein the specific division method comprises the following steps:

and dividing the characteristic tracking points with similar motion trends into a sub-population through the restriction of the range division and the motion direction of the plaque.

And correcting and eliminating abnormal points. First, the outlier is corrected, and the attribute value P of each of the K adjacent points of the feature tracking point P is calculated_dThe mark value with the largest frequency of occurrence is D_iCharacteristic tracking point P_dIs marked as D_jFor all feature tracking points, if i +1 ═ j or i-1 ═ j, then P of P will be assigned_dThe value is corrected to D_i(ii) a And secondly, removing the abnormal points, and calculating the number I of points with the same motion trend in the L adjacent points of the characteristic tracking point P. Setting a critical value M (M is less than or equal to L), wherein M is 5, and if I is<M, then the P point is considered as an abnormal point, and the sub-population C_kAnd (5) removing.

Setting the movement characteristic momentum characteristic decision of the sub-population, and deciding the movement characteristic momentum characteristic decision based on the sub-population and the characteristic tracking points in the sub-populationThree different momentum characteristics are defined: uniformity of motion direction, spatial stability, and crowd friction conflict. Suppose that each sub-population contains H feature tracking points, namely C_k＝(P₁,P₂,...,P_H)。

Extracting the motion direction consistency characteristics: calculating the coordinate and displacement of each feature tracking point in the sub-population, averaging to obtain the centroid coordinate and average displacement of each sub-population in several continuous frames, and calculating the overall motion trend vector of each sub-population

According to the formula:

A motion trend vector representing each sub-population,

A larger value indicates a higher velocity correlation.

Extracting space stability momentum characteristics: the spatial stability refers to that each characteristic tracking point keeps a stable neighbor in a certain time range and keeps a specific topological structure in a certain time; defining each feature tracking point P in the sub-population_iAt time t, the region composed of K most adjacent points is N_t(P_i) Stability can be represented by the formula:

obtaining where N denotes the number of divided sub-populations, P_iRepresents the ith feature tracking point, | N₁(P_i)\N₁(P_i)^TI represents the average number of points which maintain a stable adjacent relation with adjacent points of a feature point in a certain sub-population from 1 to T, and K represents the number of the most adjacent points; the stability of the distance between each feature tracking point in the sub-population and the adjacent point in the neighborhood thereof can be represented by the following formula:

representing the average distance between the feature tracking point and its k adjacent points, adding the above two stabilities to form the overall stability of the subgroup, using ω (C)_k) And (4) showing.

Extracting group friction conflict momentum characteristics: the formula for calculating the conflict is as follows: the formula for calculating the conflict is as follows:

wherein N represents the number of divided sub-populations, P_iRepresenting the ith feature tracking point, wherein

A motion trend vector representing each sub-population,

represents the motion trend vector, avrg (N), of the k-th neighboring point_other(P_i) A) represents the average of the feature tracking points in other sub-populations, including the adjacency of feature tracking points in one sub-population, and α and β are weighting coefficients.

Calculating the average value of the description factors in the first 5 frames, and constructing a vector by using the three average values

Similar to RGB values of pixel points, the RGB values form a three-channel image together, and the three-channel image forms dimensions of 224 multiplied by 3The data is input into a differential cyclic convolution neural network (DRCNN) for training and is converted into 4096 characteristic vectors. The method comprises the steps that a differential cyclic convolution neural network VGG-16 (Visual Geometry Group-16) model and a 3-layer stacked Long Short-Term Memory recurrent neural network (LSTM) are connected to an end-to-end model, training accuracy is improved, feature vectors are converted into crowd behavior labels by adopting an output function, training video segments are marked as different description vocabularies according to different behavior occurrence main bodies, behavior occurrence places and behaviors per se by adopting an artificial marking method, and the results of the differential cyclic convolution neural network are adjusted by using marked data, so that the recognition of the crowd movement behaviors is realized.

Claims

1. A crowd movement behavior identification method based on the fusion of subgroup component division and momentum characteristics is characterized by comprising the following steps:

Step 3): computing descriptions within successive 5 framesAverage value of the factor, constructing a vector using three average values: (

Omega (C), rho (C)), jointly form a three-channel image, form data of 224 x 3 dimensions, input into a differential recursive convolutional neural network DRCNN for training, convert into 4096-dimensional characteristic vectors, the said differential recursive convolutional neural network connects VGG-16 model and 3 layers of stacked long and short term memory recursive neural network LSTM to the end-to-end model, finally adopt the output function to convert the characteristic vectors into crowd behavior labels, adopt the method of artificial marking to train the video segment according to behavior occurrence subject, behavior occurrence place, behavior difference, mark as different description vocabulary, use the data with mark to adjust the result of the differential recursive neural network, realize the recognition of crowd movement behavior;

the step 1) is specifically as follows:

Step 1.2): and (3) performing foreground extraction by using a Gaussian mixture background modeling method, and for the gray value of a pixel in the current video frame, when the difference value of the mean value of the S-th Gaussian distribution in the Gaussian mixture background meets the formula:

the matching is considered to be successful, i.e. the pixel is taken as background, wherein I (x, y, t) represents that the pixel point (x, y) is at the time of tThe value of the pixel of (a) is,

represents the mean gray value at the time t of the S-th gaussian distribution, lambda represents the coefficient of the multiple of the standard deviation,

representing the variance of the gray value of the S-th Gaussian distribution t moment, extracting the space size of the target group and the distance relation between the space size and the surrounding groups through the foreground, and obtaining the foreground plaque, wherein the plaque set is marked as B_I＝{B₁,B₂,B₃,...,B_kDividing spatially adjacent individuals through spatial relation change;

step 1.3): and dividing the dense group by using the divided two kinds of space-time information.

2. The method for identifying the motion behavior of the crowd based on the fusion of the subgroup component division and the momentum feature according to claim 1, wherein the step 1.3) specifically comprises the following steps:

step 1.3.2): attribute P of feature tracking point P_dRepresenting the motion relation of the feature tracking points between frames, calculating the cosine value of the included angle between the displacement vector value of the feature tracking point P and the X axis, obtaining a direction included angle theta, dividing the direction included angle 0-2 pi into 12 equal parts, and dividing each interval into 12 equal partsAre respectively marked as D_i(i ═ 1,2,3., 12) is P_dAssignment, wherein the specific division method formula is as follows:

3. The method for identifying the motion behavior of the crowd based on the fusion of the subgroup component division and the momentum feature according to claim 1, wherein the step 2) is specifically as follows:

According to the formula:

A motion trend vector representing each sub-population,

A larger value indicates a higher velocity correlation;

step 2.2): extracting space stability momentum characteristics: the spatial stability refers to that each characteristic tracking point keeps a stable neighbor in a certain time range and keeps a specific topological structure in a certain time; defining each feature tracking point P in the sub-population_iThe stability at time t is given by the formula:

step 2.3): extracting the characteristics of crowd friction and conflict momentum: conflict calculating methodThe formula is as follows:

a motion trend vector representing each sub-population,

4. The method for identifying the motion behaviors of the people based on the fusion of the subgroup component and the momentum characteristics as claimed in claim 1, wherein the standard deviation multiple coefficient lambda in the step 1.2) is 2.5.