CN113627383A

CN113627383A - Pedestrian loitering re-identification method for panoramic intelligent security

Info

Publication number: CN113627383A
Application number: CN202110978611.3A
Authority: CN
Inventors: 张楠; 黄绩; 程德强; 寇旗旗; 赵凯; 吕晨
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-11-09

Abstract

The invention provides a pedestrian loitering re-identification method for panoramic intelligent security, which mainly comprises the following parts of video picture acquisition, picture quality evaluation, pedestrian detection and re-identification, loitering judgment: the first part is a picture acquisition part which is used for realizing real-time preview of a security video, intercepting pictures according to seconds and collecting the pictures in a memory; the second part is picture quality evaluation, and some pictures are screened out due to the fact that the captured pictures are possibly blurred, too many occlusion objects and the like; the third part is pedestrian detection and pedestrian re-identification, in the third part, detection and pedestrian feature identification are combined, and the intercepted panoramic picture is sent into a single neural network to jointly process two tasks of pedestrian detection and re-identification; and the fourth step is pedestrian loitering judgment, and whether the pedestrians loiter or not is judged by judging whether the camera id is the same or not and the picture interval duration.

Description

Pedestrian loitering re-identification method for panoramic intelligent security

Technical Field

The invention belongs to the technical field of pedestrian loitering re-identification, and particularly relates to a pedestrian loitering re-identification method for panoramic intelligent security.

Background

With the rapid development of modern information technology and the coming of new infrastructure policy, the concept of 'smart city' is gradually changed into reality. The smart city is the integration of the real world and the digital world established based on the digital city, the Internet of things and cloud computing, and intelligent management and operation of the city are achieved. The intelligent security is a main application scene of a smart city, and the video monitoring system plays a key role in constructing panoramic intelligent security. For videos in intelligent video monitoring, human-free intelligent analysis such as target detection, classification, identification, tracking, feature point extraction, motion estimation and the like is attracting more and more attention. The detection of wandering behavior of a pedestrian is to determine whether a person stays in a place for more than a certain period of time or the motion trajectory is abnormal (for example, repeatedly walking back and forth in a place), which is a common video analysis technique.

The pedestrian loitering re-identification is a focus hot spot of the intelligent security at present, and is mainly divided into four parts, namely video picture processing, pedestrian detection, pedestrian re-identification and loitering judgment. Pedestrian loitering detection is generally classified into pedestrian abnormal behavior recognition and has certain defects. Firstly, the characteristics of the abnormal behaviors of the pedestrians are manually extracted by the traditional pedestrian behavior recognition method and are sent to a simple classifier such as a support vector machine for classification, the method not only consumes a large amount of manpower and financial resources, but also has poor detection effect on the wandering behaviors of the pedestrians, and professional treatment is not carried out aiming at specificity; secondly, the traditional method for identifying abnormal behaviors of pedestrians mainly uses a front background modeling method to obtain pedestrians, and is very easy to be interfered by noisy background information, so that the detection error rate is high. For the aspect of pedestrian detection, the traditional mode adopts a front-back separation mode of background modeling, the effect is often poor, and improvement is needed. The existing pedestrian re-identification technology can make up the visual limitation of the existing fixed camera, is applied to the field of panoramic intelligent security by combining with the pedestrian detection and tracking technology, and effectively participates in detecting the behavior of wandering pedestrians. The pedestrian re-identification method mainly comprises unsupervised pedestrian re-identification and supervised pedestrian re-identification.

Because the identity of the pedestrian in the surveillance video is unknown, only unsupervised pedestrian re-identification can be used. At present, unsupervised pedestrian re-identification still has great progress space for similarity discrimination of samples and identification of difficultly-corrected samples (pedestrians with similar appearance and clothes but different identities). The uncertainty of the result caused by different viewpoints, low resolution of image change, illumination change, occlusion, background confusion, unreliable bounding box generation and other factors also needs to be improved by a proper method to summarize the existing defects: the traditional pedestrian loitering detection has poor detection effect on the situations of pedestrian tracking loss and pedestrian going out of a video picture and turning back again; the pedestrian panoramic picture obtained from the real-time monitoring video stream contains a large amount of background information, and the picture is sent to a pedestrian re-identification step, so that the identification efficiency is poor due to a noisy background, and therefore a pedestrian loitering re-identification method for panoramic intelligent security needs to be designed to solve the problems.

Disclosure of Invention

The invention aims to provide a pedestrian loitering re-identification method for panoramic intelligent security, which can solve the problems.

The technical scheme adopted by the invention is as follows:

a pedestrian loitering re-identification method for panoramic intelligent security comprises the following steps:

a. collecting panoramic pictures of the real-time security monitoring video;

b. carrying out quality evaluation on the acquired panoramic picture;

c. carrying out pedestrian detection and re-identification combined processing on the screened panoramic picture;

d. and (4) carrying out pedestrian loitering judgment on the pseudo label distribution result obtained in the step of pedestrian detection and re-identification.

The invention is further improved in that: in the step a, the security video panoramic picture is acquired based on the python version of OpenCV, a cv2 and a NumPy library are prepared, real-time preview is realized on the security camera in a stream pushing mode, the frame number of the camera is read to capture the panoramic picture, and the picture is saved.

The invention is further improved in that: and c, evaluating the quality of the picture in the step b based on the image evaluation model, judging whether the pedestrians in the panoramic picture are fuzzy and shielded too much, and screening out the video screenshots of which the evaluation results are lower than the grading threshold.

The invention is further improved in that: and c, adopting a new deep learning framework to carry out combined processing on pedestrian detection and pedestrian re-identification.

The invention is further improved in that: the new deep learning framework is mainly divided into 5 modules, a Convolutional Neural Network (CNN) module, a pedestrian detection module, a pooling layer module, a mutual neighbor-based pseudo tag distribution module and a loss function module, wherein the pedestrian loitering judgment is based on a pseudo tag distribution result, whether camera ids of pedestrian pictures are the same or not is judged in a group of similar pedestrian features with the same pseudo tag, and if the camera ids are different, the possibility of loitering of the pedestrian is eliminated; and if the camera id is the same, judging the frame number interval of the pedestrian picture.

The invention is further improved in that: the basic model of the convolutional neural network CNN module is ResNet50, a main stem consists of the first four layers of ResNet50, an attention mechanism is added to the first layer of the network and the last layer of the convolutional layer for extracting features, an instant Normalization is added before a residual block relu of each layer for eliminating the influence of an image background,

indicating the second in the activation map

The elements of the group are selected from the group,

and

is a dimension that spans a space in which,

is a characteristic channel, and the characteristic channel is,

is an index of the images in the batch,

is a small constant which is a constant number of times,

and

respectively, height and width.

The invention is further improved in that: the pedestrian detection module is based on the feature map, converts pedestrian features in the feature map by using a 512 x 3 convolution layer, predicts whether an anchor frame contains a pedestrian or not by using an anchor point and a SoftMax classifier at each position of the feature map, and further comprises a linear regression for adjusting the position of the anchor frame.

The invention is further improved in that: the pooling layer module includes: region of interest Pooling (Rol Pooling) and Global Average Pooling (Global Average Pooling).

The invention is further improved in that: the mutual neighbor-based pseudo label distribution module calculates the nearest neighbor relation of all the characteristic vectors, and then divides the whole characteristic vector space into a plurality of different clusters by utilizing the transitivity to obtain the pseudo labels.

The invention is further improved in that: the loss function module comprises a cross entropy loss function and a triple loss function.

Has the advantages that:

firstly, aiming at the problem of poor detection effect of the conventional pedestrian loitering, the deep learning is combined with the conventional method for the first time, and the deep learning is utilized to complete the detection and the re-identification of the pedestrian in a CNN convolutional neural network, so that the complexity of detecting the pedestrian loitering by utilizing the complete deep learning method is simplified, and the loitering detection accuracy of the conventional method is improved;

secondly, when the panoramic picture obtained from the real-time monitoring camera passes through the pedestrian detection module, because the panoramic picture contains a large amount of background information, certain interference is caused to the extraction of the characteristics of the pedestrians. The invention utilizes an attention mechanism and IN (attention Normalization), and realizes weakening the background influence of pictures and more paying attention to extracting the characteristics of pedestrians by adding the attention mechanism after the ResNet50 convolution layer and adding an IN module (attention Normalization) IN the residual block.

Drawings

FIG. 1 is a system block diagram of a pedestrian loitering re-identification method for panoramic intelligent security;

FIG. 2 is a security video picture capture flow diagram of the present invention;

FIG. 3 is a pedestrian detection and re-identification framework of the present invention;

FIG. 4 is a ResNet50 residual block with IN added (left) and a ResNet50 residual block without IN added (right) of the present invention;

FIG. 5 is a network abstraction backbone model of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments.

The invention provides a pedestrian loitering re-identification method applied to panoramic intelligent security. The method mainly comprises the following steps of video picture acquisition, picture quality evaluation, pedestrian detection and re-identification, loitering judgment, wherein the first part is a picture acquisition part which is used for realizing real-time preview of a security video, intercepting pictures in seconds and collecting the pictures in a memory; the second part is picture quality evaluation, and some pictures are screened out due to the fact that the captured pictures are possibly blurred, too many occlusion objects and the like; the third part is pedestrian detection and pedestrian re-identification, in the third part, detection and pedestrian feature identification are combined, and the intercepted panoramic picture is sent into a single neural network to jointly process two tasks of pedestrian detection and re-identification; and the fourth step is pedestrian loitering judgment, and whether the pedestrians loiter or not is judged by judging whether the camera id is the same or not and the picture interval duration. The overall process system is shown in figure 1.

A first part: video picture capture

The security video picture acquisition process is shown in fig. 2. We use the python version of OpenCV to implement a real-time video preview screenshot. Firstly, cv2 and a NumPy library are prepared, real-time preview is realized on a security camera in a flow pushing mode, and the mode of the camera is as follows:videoCapture=cv2.VideoCapture(1). Because the moving speed of the pedestrian does not reach a very high speed, the pedestrian can be extracted according to the second, the pedestrian can be intercepted once every two frames of the camera according to the frames per second of the camera, and the stored pictures are placed in an updatable storage.

A second part: picture quality assessment

In the pictures intercepted by the panoramic security video in seconds, the pedestrian characteristics and the effect possibly reflected by some pictures are not good. For example, the pedestrian cannot distinguish specific personal features in a fuzzy manner, and the pedestrian is blocked by buildings, obstacles, passing vehicles and the like to have most of main features, and the pedestrian cannot be selected in such cases because the pedestrian in the next step cannot be effectively detected and identified. An image evaluation model is adopted to evaluate pedestrian images in the captured images, judge the fuzzy degree and the integrity degree of the pedestrians, and screen out the video screenshots of which the evaluation results are lower than a grading threshold value.

And a third part: pedestrian detection and re-identification

The part is a core part of pedestrian loitering detection, the pedestrian detection and the pedestrian re-identification are regarded as two independent tasks in the past task, the pedestrian detection and the pedestrian re-identification are jointly processed, and a new deep learning framework is provided as shown in fig. 3.

The framework is mainly divided into 5 modules, the first Module is a convolutional neural network CNN, the basic model of the CNN is ResNet50, in (input normalization) is added into a residual block in order to eliminate the influence of an image background, and in order to extract pedestrian features more intensively, Attention mechanisms are added into the ResNet network, namely Channel Attention Module and Spatial Attention Module respectively. The second module is pedestrian detection, which uses convolution layer to convert the pedestrian character in the character graph, uses anchor point and SoftMax classifier to predict whether the anchor frame contains pedestrian at each position of the character graph, and it also includes a linear regression to adjust the position of the anchor frame. The third module is a Pooling layer, which is to send the region of 1024 × 14 obtained from the feature map into a region-of-interest Pooling layer (Rol Pooling), then pass through a Global Average Pooling layer (Global Average Pooling), and finally integrate to obtain 2048-dimensional feature vectors. The fourth module is based on Mutual neighbor Pseudo label assignment (Mutual neighbor Neighbors Pseudo label), which can better explore the similarity between samples. The last module is a penalty function that calculates the penalty based on the assigned pseudo label.

A first module: CNN convolutional neural network

The basic model of the CNN convolutional neural network is ResNet50, the backbone is actually composed of the first four layers of ResNet50, each layer has a set of residual blocks, and IN (IN is the instant Normalization, i.e., the case Normalization, BN is Batch Normalization, IN is a variation of BN) is added just before the residual block relu, as shown IN fig. 4.

The residual block is reconstructed by adding IN before relu. IN is a variant of Batch Normalization (BN). The difference IN the calculations between them is that IN is feature normalized using the statistics of a single sample rather than the statistics of a batch of samples. IN is mainly used for style transformation fields to filter out instance-specific contrasts from the content, the addition of which can significantly improve model performance, can be written as:

wherein the content of the first and second substances,

indicating the second in the activation map

The elements of the group are selected from the group,

and

is a dimension that spans a space in which,

is a characteristic channel, and the characteristic channel is,

is an index of the images in the batch,

is a small constant which is a constant number of times,

and

respectively, height and width.

The CNN trunk is preceded by a 7 × 7 convolutional layer (i.e., the first convolutional layer conv 1), followed by 4 blocks (conv 2_ x to conv5_ x), which respectively contain 3, 4, 6, and 3 residual units. We add the attention mechanism at the first layer of the network and the last layer of the convolutional layer for feature extraction, and the added renet 50 backbone network is as in fig. 5. A given input image stem CNN will produce feature maps of 1024 channels with a resolution of 1/16 of the original image.

A second module: pedestrian detection

The panoramic picture generates a feature map through a convolutional neural network CNN, and pedestrian detection is to predict a pedestrian boundary frame on the basis of the feature map. We add 512 x 3 convolution layer on the feature map, then anchor classification and anchor regression to predict the pedestrian interested region, and generate many interested region bounding boxes in the pedestrian region prediction stage, and the output of this stage is the bounding box list of the possible positions of the pedestrian. We first transformed the pedestrian features using 512 x 3 convolutional layers, using 9 anchor points and a SoftMax classifier at each position of the feature map to predict whether each bounding box contains a pedestrian. Then, to further determine the anchor frame, the linear regressor is used to adjust the anchor frame position, and we will leave the first 128 adjusted bounding boxes after non-maximum suppression as the final choice. Since pedestrian detection will inevitably contain some false alarms and misalignment cases, the SoftMax classifier and linear regression are again used to exclude non-people and refine the location.

A third module: pooling layer

The pooling layer includes two layers of region-of-interest pooling and global average pooling, and the pedestrian region prediction stage generates many regions of interest, which may slow down performance and processing speed, where region-of-interest pooling is needed. Region of interest pooling for each region of interest from the input list, a portion of the corresponding input feature map is taken and scaled to some predefined size, the scaling being done by: (1) dividing the prediction region into equal sized portions (the number of which is the same as the output dimension); (2) finding the maximum value of each part; (3) these maxima are copied to the output. Finally, from a list of bounding rectangle boxes with different sizes, a list of corresponding feature maps with a fixed size can be obtained quickly. The dimensionality of the region of interest pooling output does not actually depend on the size of the input element map, nor on the region of interest size, which is determined only by the number of portions into which we divide the predicted pedestrian.

The region of interest pooling layer is used for obtaining 1024 × 14 regions from the feature map, and then the regions are sent to ResNet50, and the rest conv4_4 to conv5_3 are followed by a global average pooling layer, and the regions are integrated to finally obtain 2048-dimensional feature vectors.

A fourth module: mutual nearest neighbor based pseudo label allocation

After 2048-dimensional feature vectors are obtained, we learn the similarity between pedestrian feature vectors based on a Mutual Nearest Neighbors Pseudo label assignment Method (MNNPL). The MNNPL method is based on a transitive k-nearest-neighbor relationship, where k-nearest-neighbor means that two samples are located in k-nearest-neighbor of each other, and is expressed as follows:

wherein

Representing the nearest neighbor of k,

the subscript of (b) represents the image index.

The MNNPL method has excellent performanceIn clustering algorithms, in particular, mutual nearest neighbor means that two samples are k nearest neighbors to each other, and when k is small, it is considered as a strong constraint, which can be used to solve the problem of small inter-class distance caused by similar clothes. Meanwhile, there is a certain correlation between the viewing angles, for example, the front and back faces and the side faces are somewhat similar although the difference between the front and back faces of the pedestrian is large, so the side faces can be used to establish the connection between the front face and the back face. For example, defining the front, sides and back of a pedestrian, respectively, then

Is the corresponding characteristic. Since the back of a pedestrian may be significantly different from the front,

. However, both front and rear are similar to the sides, and therefore

And

i.e. labels

= label

And a label

= label

. Then, the label is obtained according to the transmission principle

= label

. Thus, we can use a transferable interconnectThe nearest neighbor relation solves the problem of large intra-class distance caused by the viewing angle. The overall process of MNNPL is roughly as follows: firstly, calculating the nearest neighbor relation of all feature vectors; secondly, the whole feature vector space is divided into a plurality of different clusters by utilizing transmissibility, and the pseudo labels are obtained.

A fifth module: loss function

The cross entropy loss and triplet loss functions are applied simultaneously in the training. After all 2048-dimensional feature vectors are clustered to generate pseudo labels, the pseudo labels and classification results generated by a classifier are used together to calculate cross entropy loss, and the cross entropy loss is used

The sampling method forms small batches, i.e. each small batch is composed of k pictures of p pedestrians, and the cross entropy loss can be written as follows:

is an image

Belong to the label

The prediction probability of (2).

In this batch, in addition to calculating the classification penalty, the hard triplet penalty is also calculated as follows:

wherein

、

、

Respectively features extracted from the anchor image, the positive examples and the negative examples,

is the super-edge parameter.

The total loss is the sum of the above two losses, which is defined as

The fourth part: loitering judgment

In the step of pedestrian detection and re-identification, the obtained panoramic picture is input into a convolutional neural network to detect pedestrians and extract the characteristics of the pedestrians, k nearest neighbors of all characteristic vectors are calculated through a KMNN algorithm, and finally the k nearest neighbors are divided into a plurality of clusters by utilizing transmissibility and pseudo labels are distributed. In this process, we find a class of pictures with features similar to pedestrians. According to the final pseudo label distribution result, carrying out wandering judgment. Because each panoramic picture has some attribute information such as camera id, frame number and the like, in a group of similar pedestrian features with the same pseudo tag, whether the camera id of the pedestrian picture is the same or not is judged firstly, and if the camera id of the pedestrian picture is different, the possibility of wandering pedestrians is eliminated; if the camera id is the same, judging whether the frame number interval of the pedestrian picture is larger than M (M is a constant, the normal non-loitering stay time is the number of frames per second), and if the frame number interval of the pedestrian picture is larger than M, determining that the pedestrian has loitering behavior; if M is less than M, it is considered that the pedestrian is less likely to wander.

The invention has the beneficial effects that:

firstly, a traditional method is combined with deep learning, panoramic picture frames are continuously acquired from a real-time security video by the traditional method, the acquired panoramic pictures are sent to a deep learning model, and two tasks of pedestrian detection and pedestrian recognization are combined, so that not only is a complex network for detecting wandering of pedestrians simplified by the complete deep learning method, but also the accuracy of wandering detection of the traditional method is improved;

secondly, for the influence of the background information of the pedestrian picture, a method for simultaneously adding an IN module and an attention mechanism IN a ResNet50 network is firstly proposed, the IN (instant normalization) module inhibits the influence of the picture background, the attention mechanism focuses more on the extraction of the pedestrian features and reduces the attention to the background information, and the influence of the background information on the final pedestrian recognition result is effectively reduced by adding the IN (instant normalization) module and the attention mechanism.

The above are merely preferred embodiments of the present invention.

Claims

1. A pedestrian loitering re-identification method for panoramic intelligent security is characterized by comprising the following steps: the method comprises the following steps:

a. collecting panoramic pictures of the real-time security monitoring video;

b. carrying out quality evaluation on the acquired panoramic picture;

2. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 1, is characterized in that: in the step a, the security video panoramic picture is acquired based on the python version of OpenCV, a cv2 and a NumPy library are prepared, real-time preview is realized on the security camera in a stream pushing mode, the frame number of the camera is read to capture the panoramic picture, and the picture is saved.

3. The method for recognizing the loitering of the pedestrian in the panoramic intelligent security and protection as claimed in claim 1, wherein in the step b, the image quality evaluation is based on an image evaluation model, whether the pedestrian in the panoramic image is fuzzy or not and is shielded too much is judged, and a video screenshot with an evaluation result lower than a grading threshold value is screened out.

4. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 1, is characterized in that: and c, adopting a new deep learning framework to carry out combined processing on pedestrian detection and pedestrian re-identification.

5. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 4, is characterized in that: the new deep learning framework is mainly divided into 5 modules, a Convolutional Neural Network (CNN) module, a pedestrian detection module, a pooling layer module, a mutual neighbor-based pseudo tag distribution module and a loss function module, wherein the pedestrian loitering judgment is based on a pseudo tag distribution result, whether camera ids of pedestrian pictures are the same or not is judged in a group of similar pedestrian features with the same pseudo tag, and if the camera ids are different, the possibility of loitering of the pedestrian is eliminated; and if the camera id is the same, judging the frame number interval of the pedestrian picture.

6. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 5, is characterized in that: the basic model of the convolutional neural network CNN module is ResNet50, a main stem consists of the first four layers of ResNet50, an attention mechanism is added to the first layer of the network and the last layer of the convolutional layer for extracting features, an instant Normalization is added before a residual block relu of each layer for eliminating the influence of an image background,

indicating the second in the activation map

The elements of the group are selected from the group,

and

is a dimension that spans a space in which,

is a characteristic channel, and the characteristic channel is,

is an index of the images in the batch,

is a small constant which is a constant number of times,

and

respectively, height and width.

7. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 5, is characterized in that: the pedestrian detection module is based on the feature map, converts pedestrian features in the feature map by using a 512 x 3 convolution layer, predicts whether an anchor frame contains a pedestrian or not by using an anchor point and a SoftMax classifier at each position of the feature map, and further comprises a linear regression for adjusting the position of the anchor frame.

8. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 5, is characterized in that: the pooling layer module includes: region of interest Pooling (Rol Pooling) and Global Average Pooling (Global Average Pooling).

9. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 5, is characterized in that: the mutual neighbor-based pseudo label distribution module calculates the nearest neighbor relation of all the characteristic vectors, and then divides the whole characteristic vector space into a plurality of different clusters by utilizing the transitivity to obtain the pseudo labels.

10. The pedestrian loitering re-identification method for panoramic intelligent security, according to claim 5, is characterized in that: the loss function module comprises a cross entropy loss function and a triple loss function.