CN108960124B

CN108960124B - Image processing method and device for pedestrian re-identification

Info

Publication number: CN108960124B
Application number: CN201810691348.8A
Authority: CN
Inventors: 郭英强; 张默; 孙海涌
Original assignee: Beijing Moshanghua Technology Co ltd
Current assignee: Beijing Moshanghua Technology Co ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2021-10-01
Anticipated expiration: 2038-06-28
Also published as: CN108960124A

Abstract

The application discloses an image processing method and device for pedestrian re-identification. The image processing method for pedestrian re-identification comprises the steps of extracting position information of a pedestrian in an image to be identified; segmenting a pedestrian image according to the position information; executing a preset network model test on the pedestrian image, wherein the preset network model at least comprises an attention branch when being tested; and identifying a target image according to the preset network model training result. The technical problem that the recognition accuracy is low in pedestrian image re-recognition is solved. The application can realize the following beneficial effects: the representation capability of the pedestrian image in the pedestrian picture is improved, the key information in the image is extracted, and the influence brought by the background is weakened.

Description

Image processing method and device for pedestrian re-identification

Technical Field

The application relates to the technical field of image recognition, in particular to an image processing method for pedestrian re-recognition.

Background

With the advance of smart city construction and the innovation of intelligent security technology, an intelligent video analysis technology in a video monitoring system becomes an effective means for saving manpower and material resources and improving the monitoring safety level. And the pedestrian re-identification task can complete the retrieval of related pedestrians in the monitoring field, and the labor cost is reduced. The pedestrian picture that usually detects out disturbs greatly, and the background is noisy, and the regional position that occupies of pedestrian is unfixed, also is difficult to accomplish the alignment like face identification.

Aiming at the problem of low identification precision in pedestrian image re-identification in the related technology, no effective solution is provided at present.

Disclosure of Invention

The present application mainly aims to provide an image processing method and an image processing device for pedestrian re-identification, so as to solve the problem of low identification accuracy in pedestrian image re-identification.

In order to achieve the above object, according to one aspect of the present application, there is provided an image processing method for pedestrian re-recognition.

The image processing method for pedestrian re-identification according to the application comprises the following steps:

extracting the position information of the pedestrian in the image to be identified;

segmenting a pedestrian image according to the position information;

executing a preset network model test on the pedestrian image, wherein the preset network model at least comprises an attention branch when being tested; and

and identifying a target image according to the preset network model training result.

Further, the extracting the position information of the pedestrian in the image to be recognized comprises:

acquiring a pedestrian video screenshot in an image acquisition device;

training a model at least comprising a DPM algorithm for detecting pedestrian position information according to the pedestrian video screenshot; and

and obtaining the position information of the pedestrian in the image to be recognized by the model execution position detection task at least comprising the DPM algorithm.

Further, the segmenting the pedestrian image according to the position information includes:

and segmenting the pedestrian from the image to be recognized according to the trained model for extracting the position of the pedestrian image, and storing the segmented pedestrian image.

Further, the performing of the preset network model training on the pedestrian image comprises:

training a depth residual error network model through the pedestrian image; and

and extracting high-dimensional features of the pedestrian image according to the depth residual error network model.

Further, the identifying the target image according to the preset network model training result includes:

training a combined Bayesian matrix according to the preset network model training result;

calculating the similarity between the pedestrian image and a target image according to the combined Bayesian matrix;

and screening the pedestrian images meeting the preset conditions.

In order to achieve the above object, according to another aspect of the present application, there is provided an image processing apparatus for pedestrian re-recognition.

An image processing apparatus for pedestrian re-recognition according to the present application includes:

the extraction module is used for extracting the position information of the pedestrian in the image to be identified;

the segmentation module is used for segmenting a pedestrian image according to the position information;

the training module is used for executing preset network model training on the pedestrian image, wherein the training of the preset network model at least comprises an attention branch; and

and the recognition module is used for recognizing the target image according to the preset network model training result.

Further, the extraction module comprises:

the acquisition unit is used for acquiring a pedestrian video screenshot in the image acquisition device;

the first training unit is used for training a model which is used for detecting pedestrian position information and at least comprises a DPM algorithm according to the pedestrian video screenshot; and

and the detection unit is used for executing a position detection task through the model at least comprising the DPM algorithm to obtain the position information of the pedestrian in the image to be recognized.

Further, the segmentation module comprises:

and the segmentation and storage unit is used for segmenting the pedestrian from the image to be recognized according to the trained model for extracting the position of the pedestrian image and storing the segmented pedestrian image.

Further, the training module comprises:

the second training unit is used for training a depth residual error network model through the pedestrian image; and

and the extraction unit is used for extracting the high-dimensional characteristics of the pedestrian image according to the depth residual error network model.

Further, the identification module comprises:

the third training unit is used for training a combined Bayesian matrix according to the preset network model training result;

the calculating unit is used for calculating the similarity between the pedestrian image and the target image according to the combined Bayesian matrix;

and the screening unit is used for screening the pedestrian images meeting the preset conditions.

In the embodiment of the application, the pedestrian image is segmented according to the position information by extracting the position information of the pedestrian in the image to be recognized, and a deep learning network model is adopted for testing, so that the purpose of extracting the image characteristics of the pedestrian is achieved, the similarity between the pedestrian image and the target image is further calculated through the combined Bayesian matrix, and finally a plurality of pedestrian images with higher similarity are output for manual comparison, so that the technical effect of finding the target image from mass video data is realized, and the technical problem of lower recognition precision in the re-recognition of the pedestrian image is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic diagram of an image processing method for pedestrian re-identification according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of an image processing method for pedestrian re-identification according to a second embodiment of the present application;

FIG. 3 is a schematic diagram of an image processing method for pedestrian re-identification according to a third embodiment of the present application;

FIG. 4 is a schematic diagram of an image processing method for pedestrian re-identification according to a fourth embodiment of the present application;

FIG. 5 is a schematic diagram of an image processing apparatus for pedestrian re-identification according to a first embodiment of the present application;

FIG. 6 is a schematic diagram of an image processing apparatus for pedestrian re-identification according to a second embodiment of the present application;

fig. 7 is a schematic diagram of an image processing apparatus for pedestrian re-recognition according to a third embodiment of the present application;

FIG. 8 is a schematic diagram of an image processing apparatus for pedestrian re-identification according to a fourth embodiment of the present application;

FIG. 9 is a flowchart of an image processing method for pedestrian re-identification according to an embodiment of the present application;

FIG. 10a is a diagram of an attention branch structure according to an embodiment of the present application; and

fig. 10b is a diagram of a prior convolutional neural network structure according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.

Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.

Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

According to an embodiment of the present invention, there is provided an image processing method for pedestrian re-recognition, as shown in fig. 1, the method including steps S102 to S108 as follows:

step S102, extracting the position information of the pedestrian in the image to be identified;

preferably, the image to be recognized may be a video shot containing a pedestrian.

The position information of the pedestrian may be a specific motion frame position of the pedestrian image in the video.

The step of extracting the position information of the pedestrian in the image to be identified can be to acquire a video image containing the pedestrian, and the video image containing the pedestrian can be acquired through an image acquisition device such as a camera.

Step S104, segmenting a pedestrian image according to the position information;

preferably, the segmenting of the pedestrian image according to the position information may be a picture with the pedestrian image being cut out according to a specific motion frame position of the pedestrian image in the video.

Step S106, executing a preset network model test on the pedestrian image, wherein the preset network model at least comprises an attention branch when being tested; and

preferably, the preset network model may be a deep learning network model fused with a convolutional neural network and an attention branch.

Preferably, the preset network model can be a ResNet-50 neural network model, the network of the preset network model comprises 49 convolutional layers and 1 full-connection layer, a residual error network structure is introduced into the network, a large number of network stacking can be carried out, the feature vector of the higher layer is extracted, and finally the pedestrian picture represented by the 512-dimensional vector is obtained. And training the network using the training set.

As shown in fig. 10, an attention mechanism is added to an original convolutional neural network structure, in the attention mechanism network, after a feature map is obtained by a network structure, two routes of processing are performed on the network structure, one route is directly transmitted to Scale of a next layer as shown in fig. 10a, and the other route is subjected to full connection and Softmax operation to perform two classifications to obtain a probability about each region in the feature map, then Scale operation is performed on two routes, that is, each region in the feature map is assigned with different weights, and finally feature representation about a pedestrian picture is obtained through global pooling.

And S108, identifying a target image according to the preset network model training result.

Preferably, the training results may be network model parameters.

The target image may be an image to be matched, for example, an image of a suspect when a police department searches for the suspect, and an image of a missing person when looking for the missing person.

And enabling the pedestrian image to be retrieved to flow through the trained network model, calculating the similarity between the pedestrian feature obtained as a result and the target pedestrian image, and judging whether the pedestrian image and the target image are the same target or not.

As shown in fig. 2, the extracting of the position information of the pedestrian in the image to be recognized includes steps S202 to S206 as follows:

step S202, acquiring a pedestrian video screenshot in an image acquisition device;

preferably, the image acquisition device may be a monitoring camera or the like.

The pedestrian video screenshot in the image acquisition device can be acquired through model training, and the video screenshot containing the pedestrian image in the mass video can be acquired.

Step S204, training a model at least comprising a DPM algorithm for detecting pedestrian position information according to the pedestrian video screenshot; and

preferably, the DPM algorithm is a deformable component model, and is a component-based detection algorithm, and the gradient direction histogram is calculated first, and then the support vector machine is used for training to obtain the gradient model of the object.

The model which is trained according to the pedestrian video screenshot and used for detecting the pedestrian position information and at least comprises a DPM algorithm can detect the pedestrian position by utilizing the DPM characteristics. And cutting the pedestrian picture into 384x128 size for subsequent steps.

And step S206, executing a position detection task through the model at least comprising the DPM algorithm to obtain the position information of the pedestrian in the image to be identified.

Preferably, the pedestrian position is detected using the DPM feature. And cutting the pedestrian picture into 384x128 size for subsequent steps.

As an embodiment of the present invention, the segmenting the pedestrian image according to the position information includes:

Preferably, the pedestrian picture is cropped to a size of 384x128 and saved in the corresponding file location.

As shown in fig. 3, the performing of the preset network model training on the pedestrian image includes steps S302 to S304 as follows:

step S302, training a depth residual error network model through the pedestrian image; and

preferably, a ResNet-50 neural network model is used, the network comprises 49 convolutional layers and 1 full-connection layer, a residual error network structure is introduced into the network, a large number of network stacking can be carried out, the feature vector of a higher layer is extracted, and finally a 512-dimensional vector representation is obtained to obtain a pedestrian picture. And training the network using the training set.

And step S304, extracting the high-dimensional characteristics of the pedestrian image according to the depth residual error network model.

Preferably, the high-dimensional features may be facial features or other body part features of a pedestrian, may also be wearing features, or the like.

And (4) extracting pedestrian picture features by using the neural network combined by the convolutional neural network and the attention system, and storing the pedestrian feature library.

As shown in fig. 4, the identifying a target image according to the preset network model training result includes steps S402 to S406 as follows:

step S402, training a combined Bayesian matrix according to the preset network model training result;

preferably, the training of the joint bayesian matrix according to the preset network model training result may be a training of the joint bayesian matrix according to the high-dimensional features of the pedestrian extracted by the depth residual error network model.

It should be noted that joint bayes is used to measure the pedestrian feature distance. The specific principle and the calculation process are as follows:

according to the joint Bayesian algorithm, the distribution of target pedestrians can be described as x ═ mu +Epsilon, where x is defined as the representation of a pedestrian, mu represents the difference of different people, and epsilon represents the difference of the same person under different light, posture and expression. The two latent variables μ and ε distributions obey two Gaussian distributions: n (0, S)_μ) And N (0, S)_ε).S_μAnd S_εRepresenting the covariance matrix to be determined, can be obtained by training.

Step S404, calculating the similarity between the pedestrian image and the target image according to the combined Bayesian matrix;

preferably, further, according to the joint bayesian algorithm, the formula of the calculated distance is finally obtained as follows:

wherein A is: a ═ S_μ+S_ε)^-1-(F+G)

G can be found by the following equation:

and step S406, screening the pedestrian images meeting the preset conditions.

Preferably, the preset condition may be how many pedestrian images with the highest similarity rank are set by the system, for example, 20 images with the highest similarity to the target pedestrian image are automatically screened out and displayed through an interface, and then manually determined.

As shown in fig. 9, the flow of the image processing method for pedestrian re-identification is as follows:

collecting a video image;

and detecting the pedestrians by the pictures containing the pedestrians, and cutting out the pictures only containing the single pedestrians.

The CNN is trained.

Training a network fusing the CNN and attention mechanism.

And (4) extracting pedestrian picture features by using the neural network luo fused with the CNN and the attention mechanism, and storing a pedestrian feature library.

And training a combined Bayesian matrix by using the obtained pedestrian features.

And calculating the similarity between the pedestrian features by using a combined Bayesian method.

And selecting 20 pieces with the highest similarity to be displayed through an interface, and judging by a person.

From the above description, it can be seen that the present invention achieves the following technical effects: the invention discloses an image processing method for pedestrian re-identification, which is based on a convolutional neural network and an attention mechanism and comprises the following steps: extracting pedestrian features by using a neural network fused with the CNN and the attention mechanism and storing the pedestrian features in a feature library; training a combined Bayesian matrix by using the pedestrian characteristics; measuring the similarity of the target pedestrian and the pedestrian in the contained library by using a combined Bayesian matrix; displaying the first 20 most similar ones through an interface according to the obtained similarity, and judging by a person; the above process is repeated to complete the pedestrian recognition. The pedestrian re-identification method can accurately execute the pedestrian re-identification task, has better representation on the pedestrian image, and greatly improves the identification precision.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present invention, there is also provided an image processing apparatus for implementing the above-described image processing method for pedestrian re-recognition, as shown in fig. 5, the apparatus including:

the extraction module 10 is used for extracting the position information of the pedestrian in the image to be identified;

the segmentation module 20 is used for segmenting the pedestrian image according to the position information;

the training module 30 is configured to execute a preset network model test on the pedestrian image, where the preset network model test at least includes an attention branch; and

and the recognition module 40 is configured to recognize a target image according to the preset network model training result.

As shown in fig. 6, the extraction module includes:

Preferably, the segmentation module comprises:

As shown in fig. 7, the training module 30 includes:

a second training unit 301, configured to train a depth residual error network model through the pedestrian image; and

an extracting unit 302, configured to extract a high-dimensional feature of the pedestrian image according to the depth residual error network model.

As shown in fig. 8, the identification module 40 includes:

a third training unit 401, configured to train a joint bayesian matrix according to the preset network model training result;

a calculating unit 402, configured to calculate a similarity between the pedestrian image and a target image according to the joint bayesian matrix;

a screening unit 403, configured to screen pedestrian images that meet preset conditions.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image processing method for pedestrian re-identification, comprising:

segmenting a pedestrian image according to the position information;

identifying a target image according to the preset network model training result;

the extracting of the position information of the pedestrian in the image to be recognized comprises:

acquiring a pedestrian video screenshot in an image acquisition device;

obtaining the position information of the pedestrian in the image to be recognized by the model execution position detection task at least comprising the DPM algorithm; cropping the pedestrian picture to 384x128 size;

wherein, training the combined Bayesian matrix through the preset network model training result comprises:

according to the joint bayesian measure of pedestrian feature distance, the distribution of target pedestrians can be described as x ═ μ + e, where x is defined as the representation of the pedestrian, μ represents the difference of different people, and e represents the same person in different light raysThe distribution of the two potential variables mu and epsilon obeys two Gaussian distributions: n (0, S)_μ) And N (0, S)_ε).S_μAnd S_εRepresents the covariance matrix to be determined;

calculating the similarity between the pedestrian image and a target image according to the combined Bayesian matrix; and

screening satisfies pedestrian's image of predetermineeing the condition, include: screening 20 pedestrian images with the highest similarity to the target pedestrian image;

wherein the position information of the pedestrian further includes: the specific action frame position of the pedestrian image in the video;

executing a preset network model test on the pedestrian image, wherein the preset network model at least comprises an attention branch when being tested, and the method comprises the following steps:

the method comprises the steps that a preset network model is a ResNet-50 neural network model, the network model comprises 49 convolutional layers and 1 full-connection layer, a residual error network structure is introduced into a network, and the network model is trained by using a training set;

adding an attention mechanism into the neural network model, processing the pedestrian image based on the attention mechanism network, and obtaining a characteristic representation of the pedestrian image;

in the attention mechanism network, after a network structure obtains a feature map, processing two routes on the feature map, directly transmitting the feature map to Scale of a next layer in one route, performing full connection and Softmax operation on the other route, performing secondary classification on the feature map to obtain the probability of each region in the feature map, performing Scale operation on the two routes, namely allocating different weights to each region in the feature map, and finally performing global pooling to obtain feature representation of a pedestrian picture;

training a combined Bayesian matrix through the preset network model training result, and further comprising:

training a combined Bayesian matrix according to the high-dimensional characteristics of the pedestrian image extracted by the depth residual error network model;

the image processing method for pedestrian re-identification comprises the following steps:

collecting a video image;

detecting the pedestrians by the pictures containing the pedestrians, and cutting out the pictures only containing the single pedestrians;

training a CNN;

training a network fusing the CNN and the attention mechanism;

extracting pedestrian picture features by using a neural network fused with a CNN (central nervous system) and an attention mechanism, and storing a pedestrian feature library;

training a combined Bayesian matrix by using the obtained pedestrian characteristics;

calculating the similarity between pedestrian features by using a combined Bayesian method;

selecting 20 pieces with highest similarity to be displayed through an interface;

and repeating the image processing process of pedestrian re-identification to obtain a pedestrian identification result.

2. The image processing method according to claim 1, wherein the segmenting the pedestrian image according to the position information includes:

3. The image processing method according to claim 1, wherein the performing of the preset network model training on the pedestrian image comprises:

training a depth residual error network model through the pedestrian image; and

4. The image processing method according to claim 1, wherein the identifying a target image according to the preset network model training result comprises:

and screening the pedestrian images meeting the preset conditions.

5. An image processing apparatus for pedestrian re-recognition, comprising:

the recognition module is used for recognizing a target image according to the preset network model training result;

the extraction module comprises:

the detection unit is used for executing a position detection task through the model at least comprising the DPM algorithm to obtain the position information of the pedestrian in the image to be identified; cropping the pedestrian picture to 384x128 size;

according to the combined Bayes measure pedestrian characteristic distance, the distribution of target pedestrians can be described as x ═ mu + epsilon, wherein x is defined as the representation of the pedestrian, mu represents the difference of different people, epsilon represents the difference of the same people under different light rays, postures and expressions, and the two potential variables mu and epsilon are distributed according to two Gaussian distributions: n (0, S)_μ) And N (0, S)_ε).S_μAnd S_εRepresents the covariance matrix to be determined;

the training module is used for executing a preset network model test on the pedestrian image, wherein the preset network model is tested at least comprises an attention branch, and the training module comprises:

collecting a video image;

training a CNN;

training a network fusing the CNN and the attention mechanism;

6. The image processing apparatus according to claim 5, wherein the segmentation module comprises:

7. The image processing apparatus of claim 5, wherein the training module comprises:

8. The image processing apparatus according to claim 5, wherein the identification module comprises: