CN115937906B

CN115937906B - Occlusion scene pedestrian re-identification method based on occlusion suppression and feature reconstruction

Info

Publication number: CN115937906B
Application number: CN202310121979.7A
Authority: CN
Inventors: 韩守东; 章孜闻; 郭维; 刘东海生
Original assignee: Wuhan Tuke Intelligent Technology Co ltd
Current assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-06-06
Anticipated expiration: 2043-02-16
Also published as: CN115937906A

Abstract

The invention discloses a method for re-identifying pedestrians in a blocked scene based on blocking suppression and feature reconstruction, and belongs to the technical field of image processing. The invention firstly uses a random grid alignment block-shaped shielding enhancement strategy to generate a shielding-simulated enhanced image sample, which is used for self-supervising training of a shielding sensor and can predict the shielding position in a pedestrian image. The invention firstly uses the occlusion suppression encoder to extract the characteristics of the input image, the encoder adopts the self-attention mechanism to divide the image into blocks and carries out full information exchange among the image blocks, in the process, the result of occlusion perception is utilized to suppress the characteristic transmission of the occlusion position, and the global characteristics of the non-occlusion region of interest can be generated. Then, the invention uses the characteristic repair network to reconstruct the complete pedestrian characteristic, and finally obtains a robust characteristic expression. The global features constructed by the method can reduce shielding interference and improve the retrieval accuracy under shielding scenes.

Description

Occlusion scene pedestrian re-identification method based on occlusion suppression and feature reconstruction

Technical Field

The invention relates to the field of pedestrian re-identification in image processing and machine vision, in particular to a method for re-identifying pedestrians in an occlusion scene based on occlusion suppression and feature reconstruction.

Background

Pedestrian re-recognition is an important research topic in the field of computer vision, and aims to match the same pedestrian images under different shots, so that the research can be applied to tasks such as pedestrian retrieval, pedestrian retrieval and the like under a monitoring scene. Conventional pedestrian re-recognition based on complete pedestrian images has achieved great success in recent years, however, pedestrian re-recognition in occluded scenes remains a great challenge in which a portion of the pedestrian image is required to be retrieved in a gallery as a query image. Pedestrian targets are frequently shielded in a real monitoring scene, so that the practicability of the pedestrian re-identification method is greatly improved by enhancing the stability of the model in the shielding scene.

The difficulty of obscuring the task of pedestrian re-recognition is in two aspects. Firstly, the characteristic with discrimination is difficult to extract when the key parts of pedestrians are blocked, and secondly, the non-target pedestrians can bring interference characteristics under the condition that people keep off the pedestrians, so that the false matching is easy to generate. The current work mainly solves the shielding problem in pedestrian re-recognition from two ideas. The first is to make full use of global information to produce a robust feature representation. This approach digs discriminatory features from as many locations or scales as possible in order to cope with occlusion scenes, which may reduce mistakes when certain areas are occluded. The second is to enhance the local features of the critical region with additional cues. In occlusion scenes, enhancement of local features for certain key locations is critical. Some efforts have attempted to find critical locations that are not occluded using additional cues.

In the occlusion scene, if all local areas are fully utilized to extract a unified feature, the feature interference caused by the occlusion object is easy. This can result in many mismatches, such as different pedestrian images that match the same obstruction. The existing work uses an additional model to extract the skeleton of the pedestrian, predicts the visibility of each part of the pedestrian, then suppresses the blocked local features and enhances the visible local features. However, this introduces additional computational overhead and the external model may fail when the target pedestrian is occluded by other pedestrians.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides the pedestrian re-identification method of the shielded scene based on shielding inhibition and feature reconstruction, which can improve the pedestrian re-identification precision in the shielded scene.

According to a first aspect of the present invention, there is provided an occlusion scene pedestrian re-recognition method based on occlusion suppression and feature reconstruction, comprising:

step 1, carrying out data enhancement on a complete pedestrian image by using a grid alignment block-shaped shielding enhancement strategy to generate a shielding enhancement image simulating shielding and a shielding label corresponding to the shielding enhancement image;

step 2, constructing an occlusion sensor, and training the occlusion sensor by using the occlusion enhanced image and the corresponding occlusion label;

step 3, constructing a feature extraction network, and respectively extracting features of the complete pedestrian image and the shielding enhanced image to obtain complete pedestrian image features and shielding enhanced image features; when the characteristics of the shielding enhanced image are extracted, shielding interference is restrained by using a shielding sensing result of the shielding sensor;

step 4, constructing a feature reconstruction network, and training the feature reconstruction network by utilizing the shielding enhanced image features and the complete pedestrian image features;

step 5, carrying out shielding sensing on the shielding pedestrian image of the real scene by using the shielding sensor to obtain a shielding sensing result; extracting features of the shielding pedestrian image, and inhibiting the features of the shielding area by using the shielding perception result to obtain global features of the visible area of the pedestrian concerned; performing feature reconstruction on the global features based on the feature reconstruction network to obtain final global features for pedestrian re-recognition; and calculating a feature distance based on the final global feature to finish pedestrian re-recognition.

On the basis of the technical scheme, the invention can also make the following improvements.

Optionally, the generating the occlusion enhanced image and the corresponding occlusion tag in the step 1 includes:

scaling the complete pedestrian image to a set size and dividing the complete pedestrian image into image blocks according to an equally-divided grid;

setting a plurality of shielding enhancement images as a batch of images, setting shielding proportion, circularly and randomly generating rectangles aligned with grids, adding the rectangles into a mask set until the total area of all rectangles in the mask set accords with the shielding proportion, forming an irregular block mask based on the mask set, and generating a plurality of random masks of the batch of images after sampling for a plurality of times;

and randomly selecting other complete pedestrian images with different identities from the batch images during each sampling, randomly selecting areas with the same shape to cover the complete pedestrian images to be processed, and generating the shielding enhanced image and the corresponding shielding label.

Optionally, the occlusion sensor is composed of a plurality of self-attention modules and a linear layer;

the input of the occlusion sensor is an input sequence consisting of an image block feature embedding sequence and an occlusion indication feature initialized to an all-zero vector.

Optionally, the training process of the occlusion sensor includes:

the self-attention module integrates information carried in the image block feature embedding and updates the shielding indication feature

-said occlusion indication feature is +_ by a linear layer>

Is converted into the shielding perception result +.>

And the shielding prediction is supervised by a shielding label corresponding to the shielding enhanced image.

Optionally, the feature extraction network constructed in the step 3 is composed of multiple layers of self-attention modules;

the input of the feature extraction network is an input sequence consisting of an image block feature embedding sequence and an identity classification indicating feature.

Optionally, the feature extraction network extraction process includes: integrating information carried in the image block feature embedding, updating the identity classification indicating feature, and generating an attention matrix by each layer of self-attention module; n elements in the first row of the attention matrix represent information transmission intensity from N image block embedded features to identity indication features;

when the characteristics of the shielding enhanced image are extracted, the attention matrix is corrected according to the shielding perception result, so that the characteristics of the image blocks with high shielding scores are embedded into the characteristics with smaller weight in the characteristic exchange process, and the corrected attention is used for calculating the characteristic update.

Optionally, the feature reconstruction network constructed in the step 4 is composed of two branches of self-attention layers, where the two branches are respectively: the global feature construction network and the complete feature reasoning network;

the global feature construction network constructs and obtains complete global features based on the complete pedestrian image features

The method comprises the steps of carrying out a first treatment on the surface of the The complete characteristic reasoning network obtains a reconstructed global characteristic (I) based on the occlusion enhanced image characteristic reasoning>

。

Optionally, the objective of the feature reconstruction network is to construct features from incomplete images that are as similar as possible to features of the complete images by a complete feature inference network;

the overall loss of the characteristic reconstruction network during training is that

;

wherein ,

indicating identity classification loss, < >>

Representing triplet loss, ++>

Representing occlusion prediction loss, inference loss->

, wherein />

Indicating Euclidean distance, ">

、/>

、/>

and />

The balance weights for class 4 losses are respectively.

The invention provides a method for re-identifying pedestrians in a shielded scene based on shielding inhibition and feature reconstruction, which has the beneficial effects that:

1. a data enhancement strategy is designed, which can generate an image with enhanced shielding and a corresponding mark for enhancing the understanding capability of the model on shielding conditions;

2. the method for determining the key visual parts during local feature enhancement is improved, and the shielding perception is carried out in a self-supervision mode instead of being dependent on an additional model;

3. the method improves the mode of enhancing local features during feature extraction, does not additionally generate independent local feature re-weighting fusion, directly generates global features focusing on specific local, and is more suitable for a feature extraction network based on a self-focusing mechanism;

4. a feature reconstruction network is designed, incomplete features can be repaired in the shielding scene, complete global features are reconstructed, and the precision of pedestrian re-identification in the shielding scene is improved.

Drawings

FIG. 1 is a diagram of an overall network architecture of an embodiment of occlusion scene pedestrian re-recognition based on occlusion suppression and feature reconstruction provided by the present invention;

FIG. 2 is a network training flow chart of an embodiment of occlusion scene pedestrian re-recognition based on occlusion suppression and feature reconstruction provided by the present invention;

FIG. 3 is a network prediction flow chart of an embodiment of occlusion scene pedestrian re-recognition based on occlusion suppression and feature reconstruction provided by the present invention;

FIG. 4 is an exemplary diagram of a tag generated by grid alignment block-shaped occlusion enhancement for an occlusion scene pedestrian re-recognition method based on occlusion suppression and feature reconstruction provided by the present invention;

fig. 5 is a schematic diagram of an embodiment of feature reconstruction of an occlusion scene pedestrian re-recognition method based on occlusion suppression and feature reconstruction.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

According to the shielding scene pedestrian re-identification method based on shielding inhibition and feature reconstruction, a random grid alignment block-shaped shielding enhancement strategy is used for generating a shielding-simulated enhanced image sample, the shielding sensor is used for self-supervision training, and shielding positions in pedestrian images can be predicted. The invention firstly uses the occlusion suppression encoder to extract the characteristics of the input image, the encoder adopts the self-attention mechanism to divide the image into blocks and carries out full information exchange among the image blocks, in the process, the result of occlusion perception is utilized to suppress the characteristic transmission of the occlusion position, and the global characteristics of the non-occlusion region of interest can be generated. Then, the invention uses the characteristic repair network to reconstruct the complete pedestrian characteristic, and finally obtains a robust characteristic expression.

Fig. 1 is an overall network structure diagram of an embodiment of re-identifying a pedestrian in an occlusion scene based on occlusion suppression and feature reconstruction, and fig. 2 and fig. 3 are a network training flowchart and a network prediction flowchart of an embodiment of re-identifying a pedestrian in an occlusion scene based on occlusion suppression and feature reconstruction, respectively, provided by the invention, and as can be seen in conjunction with fig. 1-fig. 3, the re-identifying method includes:

and step 1, carrying out data enhancement on the complete pedestrian image by using a grid alignment block-shaped shielding enhancement strategy to generate a shielding enhancement image simulating shielding and a shielding label corresponding to the shielding enhancement image.

In particular implementations, a complete pedestrian image may be obtained from a pedestrian re-recognition public dataset.

And 2, constructing an occlusion sensor, and training the occlusion sensor by using the occlusion enhanced image and the corresponding occlusion label.

Step 3, constructing a feature extraction network, and respectively extracting features of the complete pedestrian image and the shielding enhanced image to obtain complete pedestrian image features and shielding enhanced image features; and when the characteristics of the shielding enhanced image are extracted, the shielding interference is restrained by using the shielding sensing result of the shielding sensor.

And 4, constructing a feature reconstruction network, and training the feature reconstruction network by utilizing the shielding enhanced image features and the complete pedestrian image features.

The invention provides a method for re-identifying pedestrians in a shielding scene based on shielding inhibition and characteristic reconstruction, which is characterized in that firstly, a grid alignment blocky shielding enhancement strategy is provided for training the perception capability of a model to shielding in a self-supervision mode; secondly, constructing an occlusion sensor which can predict the occlusion score of each image block after the pedestrian image is segmented; thirdly, a feature reconstruction network is constructed, which can repair the blocked incomplete features and reconstruct the complete global features.

Example 1

The embodiment 1 provided by the present invention is an embodiment of a method for identifying pedestrian in an occlusion scene based on occlusion suppression and feature reconstruction, and as can be known from fig. 1 to fig. 3, the embodiment of the method includes:

In a possible embodiment, the generating the occlusion enhanced image and the corresponding occlusion tag in the step 1 includes:

and scaling the complete pedestrian image to a set size and dividing the complete pedestrian image into image blocks according to an equally-divided grid.

Setting a plurality of shielding enhancement images as a batch of images, setting shielding proportion, circularly and randomly generating rectangles aligned with grids, adding the rectangles into a mask set until the total area of all rectangles in the mask set accords with the shielding proportion, forming an irregular block mask based on the mask set, and generating a plurality of random masks of the batch of images after sampling for a plurality of times.

And selecting other complete pedestrian images with different identities from the batch images at each sampling, randomly selecting areas with the same shape to cover the complete pedestrian images to be processed, and generating the shielding enhanced image and the corresponding shielding label.

In specific implementation, the process may be specifically: first scale the original image to

And divide it into +_ according to a uniform division grid>

Image block, wherein->

Representing the side length of the image block. Setting a shielding proportion->

Then continuously generating random size with length and width of +.>

Is added to an initially empty set until all rectangles in the set take and produce a shape with an area sufficiently close to +.>

And completing the construction of the random irregular block mask. Each sampling will generate a random mask for all images in a batch, and randomly select other pedestrian images with different identities from the same batch, and select the same-shaped area to cover the original image, thus obtaining the shielding enhanced image. At the same time, a size of +.>

Occlusion tag matrix>

The matrix element has a value of 0 or 1, wherein 0 represents an original image block, and 1 represents a simulated occlusion image block.

FIG. 4 is a diagram showing an example of grid alignment block-shaped shielding enhancement according to the method for re-identifying pedestrians in a shielded scene based on shielding inhibition and feature reconstruction; in fig. 4, white represents the mask and black represents the original.

In one possible embodiment, the occlusion sensor consists of multiple layers of self-attention modules and one linear layer; in particular, a 3-layer self-attention module is possible.

The input of the occlusion sensor is an input sequence composed of an image block characteristic embedding sequence and an occlusion indicating characteristic initialized to be an all-zero vector

。

In a possible embodiment, the training process of the occlusion sensor includes:

the occlusion sensor uses a self-attention mechanism for the input, and the self-attention module integrates information carried in the image block feature embedding to update the occlusion indication feature

-said occlusion indication feature is +_ by a linear layer>

Is converted into the shielding perception result +.>

Wherein, the shielding prediction is supervised by the shielding label corresponding to the shielding enhanced image obtained in the step 1, and shielding prediction loss is generated>

, wherein ,/>

Representing the cross entropy function.

Through the steps, the construction and training of the shielding perceptron are completed.

In a possible embodiment, the feature extraction network constructed in the step 3 suppresses feature embedding of the occlusion image block by using the result of occlusion sensing, and is composed of multiple layers of self-attention modules; specifically, it may be an 11-layer self-attention module.

The input of the characteristic extraction network is embedded sequence and an identity component by image block characteristicsInput sequence of class indication feature components

。

In one possible embodiment, the feature extraction network will use a self-attention mechanism for the above-mentioned input, the extraction process comprising: integrating information carried in the image block feature embedding, and updating the identity classification indicating feature

In this process, each layer of self-attention module generates an attention matrix; wherein N elements of the first row of the attention matrix are denoted +.>

N image blocks are embedded with characteristic directions>

Is used for the information transmission intensity of the (a). />

When the characteristics of the shielding enhanced image are extracted, the attention matrix is corrected according to the shielding perception result, so that the characteristics of the image blocks with high shielding score are embedded into the characteristics with smaller weight in the characteristic exchange process, and the corrected attention is subjected to force diagram

For computing a feature update.

In specific implementation, the correction process may be:

, wherein />

Representing Hadamard product, ->

Representing unit vector +_>

Indicating the degree of correction of the attention map.

Fig. 5 is a schematic diagram of an embodiment of feature reconstruction of an occlusion scene pedestrian re-recognition method based on occlusion suppression and feature reconstruction according to the present invention, and as can be seen in conjunction with fig. 5,

in a possible embodiment, the feature reconstruction network constructed in the step 4 is composed of two branches of self-attention layers, where the two branches are respectively: global feature building networks and complete feature reasoning networks.

。

The feature reconstruction network is trained in a self-supervision mode, incomplete features of a part of visual areas focused on are restored to global features of complete pedestrians at a feature level, and the obtained features are more discriminative in pedestrian re-recognition of a shielding scene.

In particular, the goal of the feature reconstruction network is to construct features from incomplete images that are as similar as possible to the features of the complete image by the complete feature inference network.

;

wherein ,

indicating identity classification loss, < >>

Representing triplet loss, ++>

Representing occlusion prediction loss, inference loss->

, wherein />

Representing euclidean distance and back-propagating only through branches of the complete feature inference network; />

、/>

、/>

and />

The balance weights for class 4 losses are respectively.

In the method for re-identifying the pedestrian in the shielded scene based on shielding inhibition and feature reconstruction, in the prediction stage, firstly, the shielding perceptron obtained by training in the step 2 is utilized to perform shielding perception on the image of the shielded pedestrian in the real scene query set, so as to obtain a shielding perception result

The method comprises the steps of carrying out a first treatment on the surface of the Then, the feature extraction network obtained in the step 3 is utilized to extract features of the image of the blocked pedestrians, and the feature of the blocked area is restrained in a mode of correcting attention force diagram by utilizing the blocking sensing result obtained in the step 5, so that global features of the visible area of the pedestrians concerned are generated; and finally, training one branch of the obtained feature reconstruction network, namely the complete feature reasoning network, by utilizing the step 4, and carrying out feature reconstruction based on the obtained global features to obtain the final global features for pedestrian re-recognition. And finally, calculating the feature distances by using the obtained global features and the global features extracted from the atlas in the same mode, sequencing according to cosine distances, and sequentially outputting the feature distances from near to far, thereby completing the task of re-identifying pedestrians.

The beneficial effects include:

1. a data enhancement strategy is designed, which can generate an image with enhanced shielding and corresponding labels for enhancing the understanding ability of the model to the shielding condition.

2. The method for determining the key visual parts during local feature enhancement is improved, and the method does not depend on an additional model any more, but performs shielding perception in a self-supervision mode.

3. The method improves the local feature enhancement mode during feature extraction, does not additionally generate independent local feature re-weighted fusion, directly generates global features focusing on specific local, and is more suitable for a feature extraction network based on a self-focusing mechanism.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The method for re-identifying the pedestrian in the shielding scene based on shielding inhibition and feature reconstruction is characterized by comprising the following steps:

step 4, constructing a feature reconstruction network, and training the feature reconstruction network by utilizing the shielding enhanced image features and the complete pedestrian image features; the feature reconstruction network constructed in the step 4 is composed of two branches of self-attention layers, wherein the two branches are respectively: the global feature construction network and the complete feature reasoning network;

The method comprises the steps of carrying out a first treatment on the surface of the The complete characteristic reasoning network obtains reconstruction based on the occlusion enhanced image characteristic reasoningGlobal feature->

；

Step 5, in the prediction stage, performing shielding sensing on the shielding pedestrian image of the real scene by using the shielding sensor trained in the step 2 to obtain a shielding sensing result; extracting features of the blocked pedestrian image by using the feature extraction network obtained in the step 3, and inhibiting features of a blocked area by using the blocking sensing result in a mode of correcting attention force diagram to obtain global features of a visible area of the pedestrian concerned; performing feature reconstruction on the global features by utilizing one branch of the feature reconstruction network obtained by training in the step 4 to obtain final global features for pedestrian re-recognition; calculating feature distances by using the global features extracted in the same mode from the global features and the atlas, sequencing according to cosine distances, and sequentially outputting the feature distances from near to far to finish pedestrian re-recognition;

the process of generating the occlusion enhanced image and the corresponding occlusion label in the step 1 includes:

2. The re-recognition method according to claim 1, wherein the occlusion sensor is composed of a multi-layer self-attention module and a linear layer;

3. The re-recognition method of claim 2, wherein the training process of the occlusion sensor comprises:

-said occlusion indication feature is +_ by a linear layer>

Is converted into the shielding perception result +.>

4. The re-recognition method according to claim 1, wherein the feature extraction network constructed in the step 3 is constituted by a multi-layer self-attention module;

5. The re-identification method of claim 4, wherein the feature extraction network extraction process comprises: integrating information carried in the image block feature embedding, updating the identity classification indicating feature, and generating an attention matrix by each layer of self-attention module; n elements in the first row of the attention matrix represent information transmission intensity from N image block embedded features to identity indication features;

6. The re-identification method according to claim 1, characterized in that the goal of the feature reconstruction network is to construct features as similar as possible to the features of a complete image from incomplete images by a complete feature inference network;