CN112149470B

CN112149470B - Pedestrian re-identification method and device

Info

Publication number: CN112149470B
Application number: CN201910575126.4A
Authority: CN
Inventors: 魏艾
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2023-09-05
Anticipated expiration: 2039-06-28
Also published as: CN112149470A

Abstract

The embodiment of the application provides a pedestrian re-identification method and device, wherein the method comprises the following steps: obtaining an image to be identified containing a figure picture, extracting human body characteristics of the image to be identified, obtaining human body characteristic data of the image to be identified, carrying out image segmentation on the image to be identified, obtaining image segmentation data of the image to be identified, determining the image type of the image to be identified according to the image segmentation data, if the image type is a human body partial image, extracting network, image segmentation data and human body characteristic data based on local characteristics corresponding to the pre-trained image type, obtaining human body partial characteristic data of the image to be identified, and determining whether the figure in the image to be identified is a figure represented by the preset human body partial characteristic data based on the matching result of the human body partial characteristic data of the image to be identified and the preset human body partial characteristic data. Based on the above processing, the accuracy of the pedestrian re-recognition result can be improved.

Description

Pedestrian re-identification method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pedestrian re-recognition method and apparatus.

Background

Pedestrian re-recognition (Person-identification) is also called pedestrian re-recognition, and is a process of determining whether a Person displayed in an image (may be called an image to be recognized) is a specific Person by using an image processing technology, and is widely used in the fields of intelligent video monitoring, intelligent security, and the like.

In the related art, the pedestrian re-recognition may include the steps of: extracting human body characteristics of the image to be identified to obtain human body characteristic data (which can be called human body characteristic data to be identified) of the image to be identified, then matching the human body characteristic data to be identified with preset human body characteristic data, and determining whether the person in the image to be identified is the person represented by the preset human body characteristic data according to the matching result.

In the above process, if the figure picture in the image to be recognized is incomplete, the human feature data to be recognized cannot effectively embody the local feature of the figure. For example, when only the upper body of a person is displayed in an image to be recognized, human body feature data obtained by extracting human body features of the image to be recognized cannot effectively embody the upper body features of the person. Therefore, the pedestrian re-recognition is directly performed according to the human body characteristic data, which may result in lower accuracy of the pedestrian re-recognition result.

Disclosure of Invention

The embodiment of the application aims to provide a pedestrian re-identification method and device, which can improve the accuracy of a pedestrian re-identification result. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present application discloses a pedestrian re-recognition method, including:

acquiring an image to be identified containing a figure picture;

extracting human body characteristics of the image to be identified to obtain human body characteristic data of the image to be identified;

image segmentation is carried out on the image to be identified to obtain image segmentation data of the image to be identified, wherein the image segmentation data are used for representing a human body part to which pixel points contained in the figure picture belong;

determining the image type of the image to be identified according to the image segmentation data;

if the image type is a human body local image, obtaining human body local feature data of the image to be identified based on a local feature extraction network, the image segmentation data and the human body feature data corresponding to the image type trained in advance;

and determining whether the person in the image to be identified is the person represented by the preset human body local feature data or not based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

Optionally, the local feature extraction network includes a squeeze and stimulus network SeNet;

the local feature extraction network, the image segmentation data and the human feature data corresponding to the image type based on the pre-training, and the human local feature data of the image to be identified are obtained, including:

inputting the human body characteristic data into the SeNet to obtain output data of the SeNet;

performing feature fusion on the output data of the SeNet and the image segmentation data, and performing convolution processing on a feature fusion result;

and based on the convolution processing result, weighting the output data of the SeNet to obtain the human body local characteristic data of the image to be identified.

Optionally, the loss function of the local feature extraction network includes a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result and an expected output result of the local feature extraction network, a training sample of the preset network model includes an image with the same image type as the image to be identified, and the preset network model is used for acquiring human body local feature data of an image with the same image type as the image to be identified.

Optionally, the determining whether the person in the image to be identified is the person represented by the preset human body local feature data based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data includes:

calculating the similarity between the human body local characteristic data of the image to be identified and preset human body local characteristic data;

and determining whether the person in the image to be identified is the person represented by the preset human body local feature data according to the obtained similarity.

Optionally, the method further comprises:

if the image type of the image to be identified is a human body integral image, determining whether the person in the image to be identified is the person represented by the preset human body characteristic data according to the human body characteristic data of the image to be identified and the preset human body characteristic data.

Optionally, the determining, according to the image segmentation data, an image type of the image to be identified includes:

if the proportion of the pixels belonging to the upper half of the human body in the human figure picture is larger than or equal to a first preset threshold value, and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is smaller than a second preset threshold value, determining that the image type of the image to be recognized is an upper half human body partial image;

If the proportion of the pixels belonging to the upper half of the human body in the human figure picture is smaller than the first preset threshold value, and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is larger than or equal to the second preset threshold value, determining that the image type of the image to be identified is the partial image of the lower half of the human body;

if the proportion of the pixels belonging to the upper half of the human body in the human figure picture is larger than or equal to the first preset threshold value, and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is larger than or equal to the second preset threshold value, determining that the image type of the image to be recognized is the human body integral image.

In order to achieve the above object, an embodiment of the present application discloses a pedestrian re-recognition device, including:

the acquisition module is used for acquiring an image to be identified containing a figure picture;

the extraction module is used for extracting human body characteristics of the image to be identified to obtain human body characteristic data of the image to be identified;

The segmentation module is used for carrying out image segmentation on the image to be identified to obtain image segmentation data of the image to be identified, wherein the image segmentation data are used for representing a human body part to which the pixel points contained in the figure picture belong;

the determining module is used for determining the image type of the image to be identified according to the image segmentation data;

the first processing module is used for obtaining the human body local feature data of the image to be identified based on a local feature extraction network, the image segmentation data and the human body feature data corresponding to the image type trained in advance if the image type is a human body local image;

the identification module is used for determining whether the person in the image to be identified is the person represented by the preset human body local feature data or not based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

the first processing module is specifically configured to input the human body characteristic data into the SeNet to obtain output data of the SeNet;

Optionally, the identifying module is specifically configured to calculate a similarity between the local feature data of the human body of the image to be identified and preset local feature data of the human body;

Optionally, the apparatus further includes:

and the second processing module is used for determining whether the person in the image to be identified is the person represented by the preset human body characteristic data according to the human body characteristic data and the preset human body characteristic data of the image to be identified if the image type of the image to be identified is the human body integral image.

Optionally, the determining module is specifically configured to determine that, if a proportion of pixels belonging to an upper body of the human body in the image to be identified is greater than or equal to a first preset threshold, and a proportion of pixels belonging to a lower body of the human body in the image to be identified is less than a second preset threshold, an image type of the image to be identified is an upper body human body partial image;

In another aspect of the present application, in order to achieve the above object, an embodiment of the present application further discloses an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the pedestrian re-recognition method according to the first aspect when executing the program stored in the memory.

In yet another aspect of the present application, there is also provided a computer readable storage medium having instructions stored therein which, when run on a computer, implement the pedestrian re-recognition method as described in the first aspect above.

In yet another aspect of the present implementation, the embodiment of the present application further provides a computer program product containing instructions, which when executed on a computer, cause the computer to perform the pedestrian re-recognition method described in the first aspect.

The embodiment of the application provides a pedestrian re-recognition method, which comprises the steps of obtaining human body local feature data of an image to be recognized based on a local feature extraction network corresponding to a pre-trained image type, image segmentation data of the image to be recognized and human body feature data if the image type of the image to be recognized is a human body local image, and then determining whether a person in the image to be recognized is a person represented by the preset human body local feature data based on a matching result of the human body local feature data of the image to be recognized and the preset human body local feature data. Because the local characteristic data of the person in the image can be effectively reflected based on the image segmentation data and the human body characteristic data and the human body local characteristic data acquired through the local characteristic extraction network, the pedestrian re-recognition is performed according to the human body local characteristic data of the image to be recognized and the preset human body local characteristic data, and compared with the prior art, the pedestrian re-recognition is performed directly according to the human body characteristic data of the image to be recognized, and the accuracy of the pedestrian re-recognition result can be improved.

Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a pedestrian re-recognition method provided by an embodiment of the present application;

FIG. 2 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a SeNet configuration according to an embodiment of the present application;

fig. 4 is a block diagram of a space selection network according to an embodiment of the present application;

fig. 5 is a schematic diagram of a feature extraction network according to an embodiment of the present application;

fig. 6 is a flowchart of an example of a pedestrian re-recognition method according to an embodiment of the present application;

fig. 7 is a block diagram of a pedestrian re-recognition device according to an embodiment of the present application;

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the prior art, if a person displayed in an image to be identified is incomplete, human feature data obtained by extracting features of the image to be identified cannot effectively embody local features of the person, and further, pedestrian re-identification is directly performed according to the human feature data, which may result in lower accuracy of a pedestrian re-identification result.

In order to solve the above problems, the embodiment of the present application provides a pedestrian re-recognition method, which may be applied to an electronic device, where the electronic device may be a terminal or a server, and the electronic device is used to re-recognize a pedestrian for an image to be recognized.

After the electronic device obtains the image to be identified, the electronic device can perform image segmentation on the image to be identified to obtain image segmentation data of the image to be identified, and perform human body feature extraction on the image to be identified to obtain human body feature data of the image to be identified.

The electronic device may then determine an image type of the image to be identified based on the image segmentation data of the image to be identified. If the image type of the image to be identified is a human body local image, the electronic equipment can obtain the human body local feature data of the image to be identified based on the local feature extraction network corresponding to the image type trained in advance, the image segmentation data of the image to be identified and the human body feature data.

Furthermore, the electronic device may determine whether the person in the image to be identified is the person represented by the preset human body local feature data based on a matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

Based on the image segmentation data and the human body characteristic data, the local characteristic data of the human body can be obtained through the local characteristic extraction network, so that the local characteristic of the person in the image can be effectively reflected, and the electronic equipment can conduct pedestrian re-recognition according to the human body local characteristic data of the image to be recognized and the preset human body local characteristic data, and compared with the prior art, the pedestrian re-recognition can be directly conducted according to the human body characteristic data of the image to be recognized, and the accuracy of the pedestrian re-recognition result can be improved.

Referring to fig. 1, fig. 1 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present application, where the method may include the following steps:

s101: and acquiring an image to be identified containing the figure picture.

The person picture in the image to be identified may be a complete person picture or a non-complete person picture, for example, the person picture in the image to be identified may be a picture of an upper body of a person, or the person picture in the image to be identified may be a picture of a lower body of a person.

The image to be identified can be one or a plurality of images. For example, the image to be identified may be a certain image frame in the monitoring video image, or the image to be identified may be a plurality of image frames in the monitoring video image.

In the application embodiment, when the user needs to determine whether the person displayed in a certain image (i.e., the image to be identified) is a specific person, the user may input the image to be identified to the electronic device. For example, the user may input an image to be recognized to the electronic device through an input part of the electronic device.

Correspondingly, the electronic device can acquire the image to be identified, and further, the electronic device can judge whether the person in the image to be identified is a specific person according to the pedestrian re-identification method provided by the embodiment of the application.

If the image to be identified is an image frame, the electronic equipment can process the image to be identified according to the pedestrian re-identification method of the embodiment of the application; if the image to be identified is a plurality of image frames, the electronic device can process each image to be identified in turn according to the pedestrian re-identification method of the embodiment of the application.

S102: and extracting human body characteristics of the image to be identified to obtain the human body characteristic data of the image to be identified.

In the embodiment of the application, after the electronic device obtains the image to be identified, the electronic device may perform human body feature extraction on the image to be identified, and further, may obtain human body feature data of the image to be identified.

In the step, the method for extracting the human body characteristics of the image to be identified by the electronic equipment is consistent with the method in the prior art. In one implementation, the electronic device may process the image to be identified according to a pre-trained human feature extraction network to obtain human feature data of the image to be identified. The human feature extraction Network may be a Residual Network 50 or other neural Network.

S103: and carrying out image segmentation on the image to be identified to obtain image segmentation data of the image to be identified.

The image segmentation data may be used to represent a human body part to which a pixel point included in the human picture belongs.

In the embodiment of the application, after the electronic device obtains the image to be identified, the electronic device may further perform feature extraction on the image to be identified, and further may obtain image segmentation data of the image to be identified.

In one implementation manner, according to a preset image segmentation method, the electronic device may use the identifier a to represent a pixel point belonging to the upper half of the human body in the image to be identified, and use the identifier B to represent a pixel point belonging to the lower half of the human body in the image to be identified, so as to obtain image segmentation data of the image to be identified.

In another implementation manner, according to a preset image segmentation method, the electronic device may use the identifier C to represent a pixel point belonging to a head of a human body in the image to be identified, use the identifier D to represent a pixel point belonging to a trunk of the human body in the image to be identified, and use the identifier E to represent a pixel point belonging to a limb of the human body in the image to be identified, so as to obtain image segmentation data of the image to be identified.

The manner of image segmentation of the image to be identified by the electronic device is not limited thereto, and in actual operation, the manner of image segmentation of the image to be identified by the electronic device may be determined by the user according to service requirements and experience.

S104: and determining the image type of the image to be identified according to the image segmentation data.

The image types may include a partial image of a human body and a whole image of a human body, among others. The human body partial image may be further divided into an upper body partial image and a lower body partial image.

In the embodiment of the application, after the electronic device obtains the image segmentation data of the image to be identified, the electronic device may determine the image type of the image to be identified according to the image segmentation data of the image to be identified.

Optionally, the electronic device may determine the image type of the image to be identified according to the number of pixels belonging to the upper body of the human body and the number of pixels belonging to the lower body of the human body in the image to be identified, that is, S104 may include the following three cases:

in the first case, if the proportion of the pixels belonging to the upper half of the human body in the human figure picture is greater than or equal to a first preset threshold value and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is less than a second preset threshold value, determining that the image type of the image to be recognized is the upper half human body partial image.

And if the proportion of the pixels belonging to the upper half of the human body in the human figure picture is smaller than the first preset threshold value and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is larger than or equal to the second preset threshold value, determining that the image type of the image to be recognized is the partial image of the lower half of the human body.

And thirdly, if the proportion of the pixels belonging to the upper half of the human body in the human figure picture is greater than or equal to a first preset threshold value and the proportion of the pixels belonging to the lower half of the human body in the human figure picture is greater than or equal to a second preset threshold value, determining that the image type of the image to be recognized is the whole human body image.

Wherein the first preset threshold and the second preset threshold may be empirically set by a user.

In one implementation, the electronic device may count the number of pixels belonging to the upper body of the human body (may be referred to as a first number) and the number of pixels belonging to the lower body of the human body (may be referred to as a second number) in the image to be identified according to the image segmentation data of the image to be identified.

If the ratio of the first number to the total number of the pixel points in the image to be identified is greater than or equal to a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be identified is less than a second preset threshold value, the electronic device can determine that the image to be identified is an upper body human body partial image.

If the ratio of the first number to the total number of the pixel points in the image to be identified is smaller than a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be identified is larger than or equal to a second preset threshold value, the electronic device can determine that the image to be identified is a lower body human body partial image.

If the ratio of the first number to the total number of the pixel points in the image to be identified is greater than or equal to a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be identified is greater than or equal to a second preset threshold value, the electronic device can determine that the image to be identified is a human body whole image.

S105: if the image type is a human body local image, the network, the image segmentation data and the human body characteristic data are extracted based on the local characteristic corresponding to the pre-trained image type, and the human body local characteristic data of the image to be identified are obtained.

In the embodiment of the application, when the electronic device determines that the image to be identified is a human body local image, the electronic device may process image segmentation data and human body feature data of the image to be identified according to the local feature extraction network corresponding to the image type of the image to be identified, so as to obtain the human body local feature data of the image to be identified.

It may be appreciated that if the image to be identified is a partial image of the upper body, the electronic device may process the image segmentation data and the human body feature data of the image to be identified according to a local feature extraction network (may be referred to as a first local feature extraction network) corresponding to the partial image of the upper body, to obtain the human body partial feature data of the image to be identified. The obtained human body local characteristic data (which can be called as upper body characteristic data) can effectively embody the upper body characteristics of the person in the image to be identified.

If the image to be identified is a lower body human body partial image, the electronic device may process the image segmentation data and the human body characteristic data of the image to be identified according to a partial characteristic extraction network (may be referred to as a second partial characteristic extraction network) corresponding to the lower body human body partial image, so as to obtain the human body partial characteristic data of the image to be identified. The obtained human body local characteristic data (which can be called as lower body characteristic data) can effectively embody the lower body characteristics of the characters in the image to be identified.

S106: and determining whether the person in the image to be identified is the person represented by the preset human body local feature data or not based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

The preset human body local feature data may be human body local feature data of a specific person image. The specific person image may be a person image in a preset gallery.

In the embodiment of the application, after the electronic device obtains the human body local feature data of the image to be identified, the electronic device may match the human body local feature data of the image to be identified with the preset human body local feature data, and determine whether the person in the image to be identified is the person represented by the preset human body local feature data according to the matching result.

In one implementation manner, for each person image in a preset gallery, the electronic device may perform human feature extraction on the person image to obtain human feature data of the person image, and perform image segmentation on the person image to obtain image segmentation data of the person image.

Then, the electronic device may process the human body feature data and the image segmentation data of the person image according to the first local feature extraction network to obtain the upper body feature data of the person image, and process the human body feature data and the image segmentation data of the person image according to the second local feature extraction network to obtain the lower body feature data of the person image, and further, the electronic device may obtain the upper body feature data and the lower body feature data of each person image in the preset gallery.

If the image to be identified is an upper body partial image, the electronic device can match the upper body characteristic data of the image to be identified with the upper body characteristic data of each character image in the preset gallery, and determine whether the character in the image to be identified is the character to which the character image in the preset gallery belongs according to the matching result.

If the image to be identified is a partial image of the lower body, the electronic device can match the lower body characteristic data of the image to be identified with the lower body characteristic data of each character image in the preset gallery, and determine whether the character in the image to be identified is the character to which the character image in the preset gallery belongs according to the matching result.

Optionally, the electronic device may perform pedestrian re-recognition according to the similarity of the feature data, referring to fig. 2, S106 may include the following steps:

s1061: and calculating the similarity between the human body local characteristic data of the image to be identified and the preset human body local characteristic data.

In the application embodiment, after the electronic device obtains the human body local feature data of the image to be identified, the electronic device may calculate the similarity between the human body local feature data of the image to be identified and the preset human body local feature data, so as to perform subsequent processing.

In this step, the electronic device may calculate, according to a preset similarity algorithm, similarity between the human body local feature data of the image to be identified and the preset human body local feature data. For example, the electronic device may calculate cosine similarity between the human body local feature data of the image to be identified and the preset human body local feature data, as similarity between the human body local feature data of the image to be identified and the preset human body local feature data.

In one implementation manner, if the image to be identified is a partial image of the upper body, the electronic device may calculate similarity (which may be referred to as upper body similarity) between the upper body feature data of the image to be identified and the upper body feature data of each of the person images in the preset gallery.

If the image to be identified is a partial image of the lower body, the electronic device may calculate the similarity (which may be referred to as the lower body similarity) between the lower body feature data of the image to be identified and the lower body feature data of each person image in the preset gallery.

S1062: and determining whether the person in the image to be identified is the person represented by the preset human body local characteristic data according to the obtained similarity.

In the application embodiment, the electronic device may determine whether the person in the image to be identified is the person represented by the preset human body local feature data according to the similarity between the human body local feature data of the image to be identified and the preset human body local feature data.

In one implementation manner, if the image to be identified is a partial image of an upper body, and the calculated upper body similarity is greater than a first preset similarity threshold, the electronic device may determine that the person in the image to be identified is a person to which the person image in the preset gallery belongs. In addition, the electronic device may further determine a person image with the highest similarity to the upper body feature data of the image to be identified in the preset gallery, and determine the person in the image to be identified as the person in the person image.

If the image to be identified is a partial image of the lower body and the calculated lower body similarity is greater than a second preset similarity threshold, the electronic device may determine that the person in the image to be identified is the person to which the person image in the preset gallery belongs. In addition, the electronic device may further determine a person image with the highest similarity with the lower body feature data of the image to be identified in the preset gallery, and determine the person in the image to be identified as the person in the person image.

Alternatively, to further improve the accuracy of pedestrian re-recognition, the local feature extraction network may include a SeNet (Squeeze-and-Excitation Networks, squeeze and excite network). Accordingly, S105 may include the steps of:

step one, inputting the human body characteristic data into the SeNet to obtain output data of the SeNet.

In the application embodiment, the electronic device may input the human body feature data of the image to be identified into a pre-trained SeNet, to obtain output data of the SeNet.

The structure of SeNet can be seen in fig. 3. The human body characteristic data of the image to be identified may include C characteristic maps, each of which is h×w. Wherein, C can represent the channel number of the neural network for extracting human body characteristics of the image to be identified, and the value of C can be set by a user according to experience. H is the height of the image to be identified, and W is the width of the image to be identified.

The C characteristic patterns of H multiplied by W are subjected to global pooling to obtain a vector of 1 multiplied by C. The 1×1×c vector passes through the full connection layer (Fully Connected Layers, FC) to obtain a 1×1×c/r vector, where r is a preset coefficient, and the value of r can be set empirically by the user. Then, the 1×1×c/r vector is subjected to a modified linear unit (Rectified Linear Unit, reLU) and full link layer to obtain a 1×1×c vector. The resulting 1×1×c vector is processed according to the activation function, still resulting in a 1×1×c vector. Then, weighting (Scale) is performed on the human body feature data of the image to be identified according to the 1×1×c vector obtained by the activation function, so as to obtain C feature maps of h×w after the weighting. Wherein the activation function may be a Sigmoid (S-shaped) function.

The C values in the 1×1×c vector obtained according to the activation function may represent the weighting weights of the C h×w feature maps, respectively. In the weighting process, each h×w feature pattern may be multiplied by a weighting weight corresponding to the h×w feature pattern, and C h×w feature patterns after the weighting process may be obtained.

Because the influence of different parts of the character on different channels is different, the C weighting weights obtained by the pre-trained SeNet can reflect the influence degree of different parts of the human body on the channels, and further, the characteristic spectrum is weighted on the channels aiming at different parts of the human body, and the C characteristic spectrums of H multiplied by W after the weighting treatment can reflect the characteristics of different parts of the human body more effectively.

And secondly, carrying out feature fusion on the output data of the SeNet and the image segmentation data, and carrying out convolution processing on a feature fusion result.

In the embodiment of the application, after the electronic device obtains the output data of the SeNet, the electronic device may perform feature fusion on the output data of the SeNet and the image segmentation data of the image to be identified, and perform convolution processing on the result of the feature fusion.

In one implementation, the electronic device may input the output data of the SeNet and the image segmentation data of the image to be identified to a fusion (connection) layer of the network to perform feature fusion, and then the electronic device may input a result of the feature fusion to a Convolution (connection) layer of the network to perform Convolution processing on the result of the feature fusion and obtain a result of the Convolution processing.

And thirdly, weighting the output data of the SeNet based on the convolution processing result to obtain the human body local characteristic data of the image to be identified.

In the embodiment of the application, after the convolution processing result is obtained, the electronic device may perform weighting processing on the output data of the SeNet based on the convolution processing result, so as to obtain the local feature data of the human body of the image to be identified.

Therefore, based on the second step and the third step, because the image segmentation data can represent different image areas in the image to be identified (can also be called as different image spaces in the image to be identified), the electronic device processes the image segmentation data and the human body feature data of the image to be identified according to the neural network, so that the neural network can learn how to use the image segmentation data and the human body feature data of the image, the feature map is weighted in space aiming at different parts of the human body, and the obtained human body local feature data can effectively represent the local features of the human body.

In one implementation, the network portion (which may be referred to as a space selection network) corresponding to the second and third steps may be referred to as fig. 4. The feature data may be C h×w feature maps obtained in the step one after the weighting process, and the image segmentation data may be h×w×1 vectors. After the fusion layer and the convolution layer, a vector of h×w×1 can be obtained. The h×w values in the obtained vector may respectively represent the weighting weight of each pixel point in the feature map.

For each feature map obtained in the first step, the electronic device may multiply the feature value of each pixel in the feature map with the weighting weight corresponding to the pixel point in the vector h×w×1 obtained in the second step, so as to obtain a feature map after weighting processing.

Then, the electronic equipment can obtain the characteristic spectrum subjected to channel weighting and space weighting, and the obtained characteristic spectrum can effectively reflect the local characteristics of the human body.

It can be understood that in actual operation, after the feature map is obtained, the electronic device may further perform operations such as pooling, feature encoding, and the like on the feature map to obtain corresponding character features.

In addition, to further improve the accuracy of the local feature extraction network, the electronic device may also instruct the training process of the local feature extraction network based on the teacher network.

Optionally, the loss function of the local feature extraction network may include a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result and an expected output result of the local feature extraction network, a training sample of the preset network model includes an image having the same image type as the image to be identified, and the preset network model is used for acquiring human body local feature data of an image having the same image type as the image to be identified.

In the application embodiment, the electronic device may use the output result of the preset network model as a teacher signal to guide the training process of the local feature extraction network, where the structure of the preset network model is more complex, and the local feature data of the human body can be effectively extracted. The preset network model may include a network part (which may be referred to as an upper body teacher network) for extracting the characteristic data of the upper body of the human body, and a network part (which may be referred to as a lower body teacher network) for extracting the characteristic data of the lower body of the human body.

The desired output result of the local feature extraction network may be a sample identification of the training sample, the sample identification representing the person to which the training sample belongs.

The electronic device may acquire a difference value (may be referred to as a first difference value) between the image feature data output by the preset network model and the human body local feature data output by the local feature extraction network, and a difference value (may be referred to as a second difference value) between an actual output result and an expected output result of the local feature extraction network, and train the local feature extraction network according to the first difference value and the second difference value. In one implementation, the first difference may be represented by a Euclidean metric (Euclidean Distance) and the second difference may be represented by a softmax penalty.

For example, the electronic device may train the upper body teacher network, calculate euclidean metrics of image feature data output by the upper body teacher network and human body local feature data output by the first local feature extraction network, and softmax loss of actual output results and expected output results of the first local feature extraction network, and train the first local feature extraction network according to a sum of the obtained euclidean metrics and the softmax loss. The input data of the upper body teacher network may be an upper body human body partial image of a pedestrian, the output data may be pedestrian information of the pedestrian, and the upper body teacher network is used for acquiring upper body characteristic data.

The electronic device may train the lower body teacher network, calculate euclidean metrics of the image feature data output by the lower body teacher network and the human body local feature data output by the second local feature extraction network, and a softmax loss of an actual output result and an expected output result of the first local feature extraction network, and train the second local feature extraction network according to a sum value of the obtained euclidean metrics and the softmax loss. The input data of the lower body teacher network may be a lower body human body partial image of a pedestrian, the output data may be pedestrian information of the pedestrian, and the lower body teacher network is used for acquiring lower body characteristic data.

Based on the processing, the upper body teacher network can effectively extract the characteristics of the upper body of the human body, and the first local characteristic extraction network is guided by the upper body teacher network to train, so that the first local characteristic extraction network can also effectively extract the characteristics of the upper body of the human body; similarly, the lower body teacher network can effectively extract the characteristics of the lower body of the human body, and the lower body teacher network is used for guiding the second local characteristic extraction network to train, so that the second local characteristic extraction network can also effectively extract the characteristics of the lower body of the human body.

In addition, the electronic device can also carry out pedestrian re-identification on the whole human body, and optionally, the method can further comprise the following steps:

In the application embodiment, when the electronic device determines that the image to be identified is a human body integral image, the electronic device may determine, according to human body feature data of the image to be identified and preset human body feature data, whether the person in the image to be identified is a person represented by the preset human body feature data.

In one implementation manner, the electronic device may calculate the similarity between the human body feature data of the image to be identified and the human body feature data of each human body image in the preset gallery, and then, the electronic device may determine whether the person in the image to be identified is the person to which the person image in the preset gallery belongs according to each similarity.

The method for determining whether the person in the image to be identified is the person to which the person image in the preset gallery belongs according to the similarities by the electronic device is similar to step S1062, and will not be repeated here.

In another implementation manner, the electronic device may perform weighting processing on the human body feature data of the image to be identified based on the image segmentation data of the image to be identified to obtain weighted human body feature data (may be referred to as first human body feature data), in addition, for each human body picture in the preset gallery, the electronic device may perform weighting processing on the human body feature data of the human body picture based on the image segmentation data of the human body picture to obtain weighted human body feature data (may be referred to as second human body feature data), and then the electronic device may calculate similarity between the first human body feature data and each second human body feature data, and further, the electronic device may determine whether the human body in the image to be identified is the human body to which the human body image in the preset gallery belongs according to each similarity.

For example, the human body feature data of the image to be recognized may include C feature maps, each of which is h×w. H is the height of the image to be identified, and W is the width of the image to be identified. The image segmentation data of the image to be identified may be a vector of h×w, where h×w values in the vector may respectively represent a weighting weight of each pixel point in the feature map.

For each feature map, the electronic device may multiply the feature value of each pixel in the feature map with the weighting weight corresponding to the pixel point in the image segmentation data, so as to obtain weighted integral feature data (i.e., first human feature data).

The method for the electronic device to acquire the second human body characteristic data is similar to the method for acquiring the first human body characteristic data, and will not be described herein.

Referring to fig. 5, fig. 5 is a block diagram of a feature extraction network according to an embodiment of the present application.

The main network can extract human body characteristics of an input image to obtain human body characteristic data of the image, and the human body characteristic data can be the human body characteristic extraction network.

If the image is a whole human body image, human body characteristic data of the image is subjected to pooling and characteristic coding to obtain the whole body characteristics of the people in the image.

If the image is an upper body partial image, the human body characteristic data of the image is subjected to upper body channel selection processing to obtain the characteristic data after weighting processing on the channel. The upper body channel selection may be SeNet in the above embodiments.

Then, the upper body spatial selection processing may be performed on the processing result of upper body channel selection and the image division data of the image, to obtain the feature data after the spatial weighting processing. The upper body spatial selection may be the spatial selection network of fig. 4 in the above-described embodiment. Similarly, the upper body characteristics of the person in the image can be obtained by pooling and feature encoding the processing results of the upper body space selection.

The network part formed by the upper body channel selection, the upper body space selection, pooling and feature coding can be called an upper body feature extraction network.

In fig. 5, the process of acquiring the lower body features of the person in the image is similar to the process of acquiring the upper body features of the person in the image, and will not be described here. Similarly, the network part formed by the lower body channel selection, the lower body space selection, pooling and feature coding can be called a lower body feature extraction network.

Referring to fig. 6, fig. 6 is a flowchart of an example of a pedestrian re-recognition method according to an embodiment of the present application, where the method may include the following steps:

s601: and acquiring an image to be identified containing the figure picture.

S602: and extracting human body characteristics of the image to be identified to obtain the human body characteristic data of the image to be identified.

S603: and carrying out image segmentation on the image to be identified to obtain image segmentation data of the image to be identified.

The image segmentation data is used for representing a human body part to which a pixel point included in the figure picture belongs.

S604: and determining the image type of the image to be identified according to the image segmentation data of the image to be identified.

The image type is an upper body human body partial image, a lower body human body partial image or a human body whole image.

S605: if the image to be identified is an upper body human body partial image, processing image segmentation data and human body characteristic data of the image to be identified based on a pre-trained upper body characteristic extraction network to obtain upper body characteristic data of a person in the image to be identified.

S606: and calculating the similarity between the upper body characteristic data of the person in the image to be identified and the preset upper body characteristic data, and determining whether the person in the image to be identified is the person represented by the preset upper body characteristic data according to the obtained similarity.

S607: and if the image to be identified is a lower body human body partial image, processing image segmentation data and human body characteristic data of the image to be identified based on a pre-trained lower body characteristic extraction network to obtain lower body characteristic data of a person in the image to be identified.

S608: and calculating the similarity between the lower body characteristic data of the person in the image to be identified and the preset lower body characteristic data, and determining whether the person in the image to be identified is the person represented by the preset lower body characteristic data according to the obtained similarity.

S609: if the image to be identified is a human body integral image, calculating the similarity between human body characteristic data of the image to be identified and preset human body characteristic data, and determining whether the person in the image to be identified is the person represented by the preset human body characteristic data according to the obtained similarity.

Corresponding to the method embodiment of fig. 1, referring to fig. 7, fig. 7 is a block diagram of a pedestrian re-recognition device according to an embodiment of the present application, where the device may include:

an acquisition module 701, configured to acquire an image to be identified including a portrait frame;

the extracting module 702 is configured to perform human feature extraction on the image to be identified, so as to obtain human feature data of the image to be identified;

the segmentation module 703 is configured to perform image segmentation on the image to be identified to obtain image segmentation data of the image to be identified, where the image segmentation data is used to represent a human body part to which a pixel point included in the figure picture belongs;

a determining module 704, configured to determine an image type of the image to be identified according to the image segmentation data;

the first processing module 705 is configured to obtain, if the image type is a human body local image, human body local feature data of the image to be identified based on a local feature extraction network, the image segmentation data, and the human body feature data corresponding to the image type trained in advance;

the identifying module 706 is configured to determine whether the person in the image to be identified is a person represented by the preset human body local feature data based on a matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

the first processing module 705 is specifically configured to input the human body characteristic data into the SeNet to obtain output data of the SeNet;

Optionally, the identifying module 706 is specifically configured to calculate a similarity between the local feature data of the human body of the image to be identified and preset local feature data of the human body;

Optionally, the apparatus further includes:

Optionally, the determining module 704 is specifically configured to determine that the image type of the image to be identified is an upper body partial image if the proportion of the pixel points belonging to the upper body in the image to be identified is greater than or equal to a first preset threshold, and the proportion of the pixel points belonging to the lower body in the image to be identified is less than a second preset threshold;

The embodiment of the present application further provides an electronic device, as shown in fig. 8, including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete communication with each other through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801, when executing the program stored in the memory 803, implements the following steps:

acquiring an image to be identified containing a figure picture;

The communication bus 804 mentioned above for the electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus 804 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface 802 is used for communication between the electronic device and other devices described above.

The memory 803 may include a random access memory (Random Access Memory, abbreviated as RAM) or may include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. Optionally, the memory 803 may also be at least one memory device located remotely from the aforementioned processor.

The processor 801 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), and the like; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

According to the electronic equipment provided by the embodiment of the application, when the pedestrian re-identification is performed, the pedestrian re-identification is performed according to the human body local characteristic data of the image to be identified and the preset human body local characteristic data, and compared with the prior art, the accuracy of the pedestrian re-identification result can be improved by directly performing the pedestrian re-identification according to the human body characteristic data of the image to be identified.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions which, when run on a computer, cause the computer to execute the pedestrian re-recognition method provided by the embodiment of the application.

Specifically, the pedestrian re-recognition method includes:

acquiring an image to be identified containing a figure picture;

It should be noted that other implementation manners of the pedestrian re-recognition method are partially the same as those of the foregoing method embodiment, and are not repeated here.

By running the instructions stored in the computer readable storage medium provided by the embodiment of the application, when the pedestrian re-identification is performed, the pedestrian re-identification is performed according to the human body local characteristic data of the image to be identified and the preset human body local characteristic data, and compared with the prior art, the accuracy of the pedestrian re-identification result can be improved by directly performing the pedestrian re-identification according to the human body characteristic data of the image to be identified.

The embodiment of the application also provides another computer program product containing instructions, which when run on a computer, cause the computer to execute the pedestrian re-recognition method provided by the embodiment of the application.

Specifically, the pedestrian re-recognition method includes:

acquiring an image to be identified containing a figure picture;

By running the computer program product provided by the embodiment of the application, when the pedestrian re-recognition is performed, the pedestrian re-recognition is performed according to the human body local characteristic data of the image to be recognized and the preset human body local characteristic data, and compared with the prior art, the accuracy of the pedestrian re-recognition result can be improved by directly performing the pedestrian re-recognition according to the human body characteristic data of the image to be recognized.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for an apparatus, an electronic device, a computer readable storage medium, a computer program product embodiment, the description is relatively simple, as it is substantially similar to the method embodiment, as relevant see the partial description of the method embodiment.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method of pedestrian re-identification, the method comprising:

acquiring an image to be identified containing a figure picture;

determining whether the person in the image to be identified is the person represented by the preset human body local feature data or not based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data;

The local feature extraction network comprises an extrusion and excitation network SeNet;

2. The method according to claim 1, wherein the loss function of the local feature extraction network includes a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result of the local feature extraction network and a desired output result, and the training sample of the preset network model includes an image having the same image type as the image to be identified, and the preset network model is used for acquiring human body local feature data of an image having the same image type as the image to be identified.

3. The method according to claim 1, wherein the determining whether the person in the image to be recognized is the person represented by the preset human body partial feature data based on the matching result of the human body partial feature data of the image to be recognized and the preset human body partial feature data includes:

4. The method according to claim 1, wherein the method further comprises:

5. The method according to claim 1, wherein said determining the image type of the image to be identified from the image segmentation data comprises:

6. A pedestrian re-identification device, the device comprising:

the identification module is used for determining whether the person in the image to be identified is the person represented by the preset human body local feature data or not based on the matching result of the human body local feature data of the image to be identified and the preset human body local feature data;

7. The apparatus according to claim 6, wherein the loss function of the local feature extraction network includes a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result of the local feature extraction network and a desired output result, and the training sample of the preset network model includes an image having the same image type as the image to be identified, and the preset network model is used for acquiring human body local feature data of an image having the same image type as the image to be identified.

8. The device according to claim 6, wherein the identification module is specifically configured to calculate a similarity between the body local feature data of the image to be identified and preset body local feature data;

9. The apparatus of claim 6, wherein the apparatus further comprises:

10. The apparatus according to claim 6, wherein the determining module is specifically configured to determine that the image type of the image to be recognized is an upper body partial image if the proportion of the pixels belonging to the upper body in the image to be recognized is greater than or equal to a first preset threshold and the proportion of the pixels belonging to the lower body in the image to be recognized is less than a second preset threshold;

11. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the method steps of any one of claims 1-5 when executing a program stored on the memory.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.