CN111027434A

CN111027434A - Training method and device for pedestrian recognition model and electronic equipment

Info

Publication number: CN111027434A
Application number: CN201911215728.5A
Authority: CN
Inventors: 黄厚景; 林锦彬
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2018-12-29
Filing date: 2019-12-02
Publication date: 2020-04-17
Anticipated expiration: 2039-12-02
Also published as: CN111027434B

Abstract

A training method and device for a pedestrian recognition model and electronic equipment are disclosed. The model training method comprises the following steps: acquiring a first picture with a pedestrian identity label from a source domain, and acquiring a second picture without the pedestrian identity label from a target domain to be tested; inputting the first picture into a convolutional neural network to obtain a first characteristic diagram, and calculating according to the first characteristic diagram and the identity label of the first picture to obtain an identity loss value; performing component segmentation on the first characteristic diagram to obtain a first segmentation area prediction diagram, and calculating to obtain a first segmentation loss value according to the first segmentation area prediction diagram; inputting the second picture into a convolutional neural network to obtain a second feature map, carrying out component segmentation on the second feature map to obtain a second segmentation area prediction map, and calculating according to the second segmentation area prediction map to obtain a second segmentation loss value; and performing parameter adjustment on the pedestrian recognition model according to the identity loss value, the first segmentation loss value and the second segmentation loss value to obtain the trained pedestrian recognition model, so that the technical problem of poor cross-domain recognition performance of the model in the prior art is solved.

Description

Training method and device for pedestrian recognition model and electronic equipment

Technical Field

The present disclosure relates to the field of software technologies, and in particular, to a training method and device for a pedestrian recognition model, and an electronic device.

Background

The application scenarios of pedestrian re-identification are very wide, including pedestrian retrieval in video monitoring, multi-target tracking in intelligent retail and the like. In the practical application process, videos acquired under different scenes have the characteristics of respective domains (domains), if cross-Domain testing is performed, namely training is performed in a Source Domain (Source Domain) and testing is performed in a Target Domain (Target Domain), the performance of the existing pedestrian re-identification model is greatly reduced, because the existing pedestrian re-identification model is usually trained by using samples in the Source Domain, and the samples in the Source Domain and the samples in the Target Domain may have differences of morphological characteristics and the like, so that the performance of the existing pedestrian re-identification model on the Target Domain is reduced, the defect of the generalization capability of the existing model is also reflected, and a new model training method is urgently needed to improve the generalization capability of the pedestrian re-identification model.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems.

According to a first aspect of the present application, there is provided a training method of a pedestrian recognition model, the method comprising:

acquiring a first picture from a source domain, and acquiring a second picture from a target domain to be tested, wherein the pictures in the source domain have identity marks of pedestrians, and the pictures in the target domain have no identity marks of pedestrians;

inputting the first picture as a first training sample into a convolutional neural network in a pedestrian recognition model to obtain a first feature map, and calculating according to the first feature map and the identity label of the first picture to obtain an identity loss value;

performing component division on the first characteristic diagram to obtain a first divided region prediction diagram, and calculating according to the first divided region prediction diagram to obtain a first division loss value;

inputting the second picture into the convolutional neural network as a second training sample to obtain a second feature map, performing component segmentation on the second feature map to obtain a second segmentation area prediction map, and calculating according to the second segmentation area prediction map to obtain a second segmentation loss value;

and performing parameter adjustment on the pedestrian recognition model according to the identity loss value, the first segmentation loss value and the second segmentation loss value to obtain the trained pedestrian recognition model.

According to a second aspect of the present application, there is provided a pedestrian recognition method, applied to the pedestrian recognition model obtained by training the training method of the pedestrian recognition model of the first aspect, the recognition method including:

respectively inputting a query picture and each test picture in a target domain into the pedestrian recognition model to recognize and output P feature vectors of the query picture and P feature vectors of each test picture, wherein the P feature vectors respectively correspond to P component regions of a recognized picture;

calculating and obtaining the similarity between the query picture and each test picture based on the P-th feature vector of the query picture and the P-th feature vector of each test picture, wherein P belongs to [1, P ];

obtaining a similar picture identification result of the query picture in the target domain based on the similarity between the query picture and each test picture;

and obtaining the pedestrian recognition result according to the recognition result.

According to a third aspect of the present application, there is provided a training apparatus of a pedestrian recognition model, the apparatus including:

the image acquisition module is used for acquiring a first image from a source domain and acquiring a second image from a target domain to be tested, wherein the image in the source domain has the identity mark of a pedestrian, and the image in the target domain has no identity mark of the pedestrian;

the identification module is used for inputting the first picture as a first training sample into a convolutional neural network in a pedestrian identification model to obtain a first feature map, and calculating to obtain an identity loss value according to the first feature map and the identity label of the first picture;

the segmentation module is used for carrying out component segmentation on the first characteristic diagram to obtain a first segmentation area prediction diagram and calculating to obtain a first segmentation loss value according to the first segmentation area prediction diagram;

the recognition module is further configured to input the second picture into the convolutional neural network as a second training sample to obtain a second feature map, and the segmentation module is further configured to perform component segmentation on the second feature map to obtain a second segmentation area prediction map, and calculate a second segmentation loss value according to the second segmentation area prediction map;

and the parameter adjusting module is used for carrying out parameter adjustment on the pedestrian recognition model according to the identity loss value, the first segmentation loss value and the second segmentation loss value to obtain the trained pedestrian recognition model.

According to a fourth aspect of the present application, there is provided a pedestrian recognition device applied to the pedestrian recognition model obtained by training the training method of the pedestrian recognition model according to the first aspect, the recognition device comprising:

the input module is used for respectively inputting a query picture and each test picture in a target domain into the pedestrian recognition model to recognize and output P characteristic vectors of the query picture and P characteristic vectors of each test picture, wherein the P characteristic vectors respectively correspond to P component regions of a recognized picture;

the calculation module is used for calculating and obtaining the similarity between the query picture and each test picture based on the P-th feature vector of the query picture and the P-th feature vector of each test picture, and P belongs to [1, P ];

the query module is used for obtaining a similar picture identification result of the query picture in the target domain based on the similarity between the query picture and each test picture;

and the result acquisition module is used for acquiring the pedestrian recognition result according to the recognition result.

According to a fifth aspect of the present application, there is provided a computer-readable storage medium storing a computer program for executing the method for training a pedestrian recognition model provided in the first aspect or executing the method for pedestrian recognition provided in the second aspect.

According to a sixth aspect of the present application, there is provided an electronic apparatus comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute the training method of the pedestrian recognition model provided in the first aspect, or execute the pedestrian recognition method provided in the second aspect.

One or more technical solutions in the embodiments of the present application have at least the following technical effects:

the embodiment of the application provides a training method of a pedestrian recognition model, which comprises the steps of obtaining training samples from a source domain with an identity label and a target domain without the identity label; identifying the identity of the picture in the source domain to obtain an identity loss value, and carrying out component segmentation on the picture in the source domain to obtain a segmentation loss value; performing component segmentation on the picture in the target domain to obtain a segmentation loss value; the model parameters are adjusted through the segmentation loss value of the picture component segmentation in the source domain and the segmentation loss value of the picture component segmentation in the target domain, so that the alignment between the model obtained through training and the picture component in the target domain is realized when the pedestrian recognition model is trained on the basis of the picture in the source domain, the model obtained through training naturally adapts to the scene of the target domain, the generalization capability of the pedestrian recognition model is improved, and the accuracy of cross-domain recognition is improved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic diagram of a pedestrian recognition model according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a method for training a pedestrian recognition model provided by an exemplary embodiment of the present application;

FIG. 3 is a flow chart for obtaining an identity loss value as provided by an exemplary embodiment of the present application;

FIG. 4 is a keypoint detection schematic provided by an exemplary embodiment of the present application;

FIG. 5 is a flow chart for obtaining segmentation loss values as provided by an exemplary embodiment of the present application;

FIG. 6 is a flow chart of a method of pedestrian identification provided by an exemplary embodiment of the present application;

FIG. 7 is a block diagram illustrating an exemplary training apparatus for a pedestrian recognition model according to an embodiment of the present disclosure;

FIG. 8 is a block diagram illustrating a refinement of a training apparatus for a pedestrian recognition model according to an exemplary embodiment of the present application;

fig. 9 is a block diagram illustrating a pedestrian recognition apparatus according to an exemplary embodiment of the present application;

FIG. 10 is a detailed block diagram of an apparatus for pedestrian recognition provided in an exemplary embodiment of the present application;

fig. 11 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

The embodiment provides a training method and device for a pedestrian recognition model and electronic equipment. Respectively obtaining pictures from a source domain and a target domain to be tested during model training; carrying out pedestrian identification on the picture in the source domain to obtain an identity loss value, and carrying out component segmentation to obtain a first segmentation loss value; performing component segmentation on the picture in the target domain to obtain a second segmentation loss value; the method comprises the steps of carrying out parameter adjustment on a pedestrian recognition model according to an identity loss value, a first segmentation loss value and a second segmentation loss value to obtain a trained pedestrian recognition model, carrying out parameter adjustment on the pedestrian recognition model through the first segmentation loss value and the second segmentation loss value to enable the pedestrian recognition model to be aligned with a picture in a target domain when the pedestrian recognition model is trained on the basis of the picture in a source domain, enabling the trained model to naturally adapt to a scene of the target domain, solving the technical problem that the generalization capability of the pedestrian recognition model in the prior art is low, and further improving the accuracy of cross-domain recognition.

Exemplary model Structure

An embodiment of the present application provides a pedestrian recognition model, as shown in fig. 1, the model includes: a Convolutional Neural network 11 (CNN), a Part alignment pooling module 12 (PAP), an Embedding Layer 13 (Em), a classifier 14 (FC), and a Part Segmentation module 15 (PS). The CNN is used for extracting a feature map; PAP is used to pool CNN extracted feature maps; em is used for carrying out feature vector mapping on feature vectors obtained by PAP pooling; the FC is used for carrying out classified prediction according to the feature vector obtained by Em mapping; the PS is used to perform component segmentation on the CNN extracted feature map and calculate component segmentation loss.

Exemplary method

Example 1

Fig. 2 is a flowchart illustrating a method for training a pedestrian recognition model according to an exemplary embodiment of the present application. The embodiment can be applied to a pedestrian recognition model, as shown in fig. 2, and includes the following steps:

step 201, a first picture is obtained from a source domain, and a second picture is obtained from a target domain to be tested, wherein the pictures in the source domain have identity marks of pedestrians, and the pictures in the target domain have no identity marks of pedestrians.

The source domain and the target domain refer to data sets from different scenes, such as pictures from different scenes, and the clarity, brightness, angle of view, picture style, etc. of the data sets have different features, for example, images from road monitoring, images from home cameras, and images captured by vehicle-mounted cameras, and the image features thereof have obvious differences. For the model obtained by training in the source domain, the application in the target domain is cross-domain (cross domain).

Step 202, inputting the first picture as a first training sample into a convolutional neural network in a pedestrian recognition model to obtain a first feature map, and calculating according to the first feature map and the identity label of the first picture to obtain an identity loss value.

The convolutional neural network CNN may adopt ResNet-50, and includes network layers such as convolution, pooling, activation, and the like. And extracting features of the first picture by using a Convolutional Neural Network (CNN) to obtain a first feature map with the dimension of CxHxW, wherein C represents the channel number of the Convolutional Neural Network (CNN), H represents the height of the first feature map, and W represents the width of the first feature map. C. H, W, the embodiment is not limited specifically, and different values may be taken according to the size, accuracy, etc. of the model. The first picture is from a source domain, and aiming at a first feature map of the first picture, a component alignment pooling module PAP performs maximum pooling operation on P component areas on the feature map to obtain P feature vectors with vector dimensionality C, and an embedding layer Em performs mapping on the feature vectors to obtain P feature vectors with dimensionality d; and classifying the characteristic vector output by the embedded layer Em by the classifier FC, namely identifying the pedestrian, and further calculating an identity loss value according to the identification result and the identity label of the first picture. It should be noted that the value of P in fig. 1 is 9, but this embodiment does not limit the specific value of P.

And 203, performing component division on the first characteristic diagram to obtain a first divided region prediction diagram, and calculating to obtain a first divided loss value according to the first divided region prediction diagram.

The part segmentation is completed by a part segmentation module PS, and is used for predicting the pedestrian part region and calculating the loss value of the part segmentation.

And 204, inputting the second picture as a second training sample into the convolutional neural network to obtain a second feature map, performing component segmentation on the second feature map to obtain a second segmentation area prediction map, and calculating according to the second segmentation area prediction map to obtain a second segmentation loss value.

And the second picture is from the target domain and contains picture characteristics of the target domain, and the picture is subjected to characteristic extraction through a Convolutional Neural Network (CNN) and then is segmented through a component segmentation module (PS) component so as to calculate a second segmentation loss value corresponding to the second picture.

And step 205, performing parameter adjustment on the pedestrian recognition model according to the obtained identity loss value, the first segmentation loss value and the second segmentation loss value to obtain the trained pedestrian recognition model.

The identity loss value is used for adjusting parameters of a convolutional neural network CNN, a component alignment pooling module PAP, an embedded layer Em and a classifier FC of the pedestrian recognition model. The first segmentation loss value and the second segmentation loss value are used for carrying out parameter adjustment on the convolutional neural network CNN and the component segmentation module PS so as to restrain component segmentation of the source domain picture and the target domain picture, and the source domain picture and the target domain picture are subjected to component alignment in the model training process, so that the capability of the pedestrian recognition model for adapting to a target domain scene is improved, namely the generalization capability of the pedestrian recognition model is improved.

In the above embodiment, the training process of the pedestrian recognition model is as follows:

(1) inputting a picture into a convolutional neural network to obtain a feature map with the dimensionality of CxHxW;

(2) the component alignment pooling module performs pooling on P regions on the feature map to obtain P feature vectors with dimension C, for example, the lower left corner of fig. 1 shows the P regions of the component alignment pooling module for feature pooling, and the value of P in the map is 9;

(3) if the picture is from a source domain, the embedding layer maps the feature vectors to obtain P feature vectors with the dimensionality d; the classifier classifies the feature vectors and calculates a loss function, namely an identity loss value;

(4) if the picture is from a source domain, the feature map also predicts the region of the component through a component segmentation module, and a loss function segmented by the component is calculated, namely a first segmentation loss value is calculated;

(5) if the image is used for mapping the domain, the feature map predicts the region of the component through a component segmentation module, and a loss function of component segmentation is calculated, namely a second segmentation loss value is calculated;

(6) obtaining a total loss of model training comprising: classifying the source domain picture through a classifier, and calculating an identity loss value obtained by a loss function; calculating a first segmentation loss value obtained by a loss function through a prediction component region of a source domain picture; calculating a second segmentation loss value obtained by a loss function through the target domain picture through a prediction component region;

(7) and adjusting model parameters based on the total loss of the model training.

And performing model training on the pedestrian recognition model through a large number of sample pictures from the source domain and the target domain to obtain the trained pedestrian recognition model.

The method comprises the steps of adding a picture of a target domain into training of a pedestrian recognition model as a training sample, additionally carrying out component segmentation on the pictures in a source domain and the target domain during model training to obtain a first segmentation loss value generated by segmentation of a picture component in the source domain and a second segmentation loss value generated by segmentation of the picture component in the target domain, and carrying out model parameter adjustment based on the first segmentation loss value, the second segmentation loss value and an identity loss value of pedestrian recognition to enable the picture in the source domain to be aligned with the picture in the target domain during model training, so that the model obtained through training naturally adapts to a scene of the target domain, the generalization capability of the pedestrian recognition model is improved, and the accuracy of cross-domain recognition is improved.

Example 2

For the training method of the pedestrian recognition model provided in embodiment 1, please refer to fig. 3, in which the identity loss value is obtained by the following steps:

301, acquiring key points of pedestrians in a first picture, and dividing the first picture into P component areas based on the key points, wherein P is more than or equal to 2;

step 302, pooling each component region corresponding to the first feature map according to the division of the P component regions to obtain P feature vectors;

step 303, performing identity prediction according to the P eigenvectors to obtain a predicted value;

and step 304, obtaining an identity loss value according to the obtained predicted value and the identity label of the first picture.

In step 301, the pedestrian key points are key points for distinguishing various body structures in the human body, and may include: some or all of the nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle are predetermined types of keypoints. The detection of the pedestrian key points can be realized by using a key point detection model, and the result of detecting the key points by using the key point detection model is shown in fig. 4.

The component region in step 301, as shown in the lower left corner of fig. 1, may include a head (R1), an upper half (R2), a lower half (R3), thighs (R4), calves (R5), shoes (R6), an upper half (R7), a lower half (R8), and a whole body (R9). Specifically, the key points above the left shoulder and the right shoulder can be obtained and include: a region of the nose, left eye, right eye, left ear, and right ear as a head (R1); acquiring a region among a left shoulder, a right shoulder, a left hip and a right hip of a key point as an upper body region, wherein the upper body region is averagely divided into an upper half part (R2) and a lower half part (R3), the upper half part (R2) is close to the left shoulder and the right shoulder of the key point, and the lower half part (R3) is close to the left hip and the right hip of the key point; acquiring key points of left hip, right hip, left knee and right knee area as thighs (R4); acquiring the areas between the key points of the left knee, the right knee, the left ankle and the right ankle as the lower legs (R5); acquiring the area below the left ankle and the right ankle of the key point as shoes (R6); acquiring the region above the left hip and the right hip as the upper half of the body (R7); acquiring the key points of the area below the left hip and the right hip as the lower half body (R8); the whole map region where all the key points are located is acquired as the whole body (R9).

In the specific implementation process, due to errors of the shielding or key point detection model, some key points may not be detected, and when step 301 is executed, as many key points of pedestrians in the first picture as possible can be obtained; and judging the acquired key points, judging whether the acquired key points contain all key points of a preset type, if not, acquiring that the part region corresponding to the missing key points is an invisible part region, and acquiring that the part region corresponding to the existing key points is a visible part region.

Step 302, pooling each component region corresponding to the first feature map according to the division of the P component regions, and when P feature vectors are obtained, pooling visible component regions corresponding to the first feature map to obtain a feature vector of each visible component region; and outputting a zero vector to the invisible component area as a feature vector of the invisible area.

For the P feature vectors obtained in step 302, when performing the identity prediction in step 303, the method includes: and respectively processing the obtained P feature vectors through P embedding layers Em to obtain P feature vectors with dimensionality d, and then respectively performing class prediction through P classifiers FC to obtain P predicted values. Wherein, one embedded layer comprises a full connection layer, a BN layer and a ReLU layer. An FC layer is a fully connected layer, and the classifier of each component is referred to as a pedestrian classifier.

After step 303, step 304 is performed to calculate an identity loss value. The identity loss value is calculated, that is, a loss function is calculated, the loss function may be a cross entropy loss function, the corresponding obtained loss value may be a cross entropy loss value, and for an invisible component, the cross entropy loss value is set to 0. The total cross entropy loss value of one picture is the sum of the visible component loss values, and can be obtained by the following formula one:

wherein v is_p∈{0，1 indicates whether the pth component is visible, "1" indicates visible, "0" indicates invisible,

represents the cross entropy loss value of the p-th component.

Through the method, the component region is divided based on the preset key points, namely, the constraint of component division is applied to the feature diagram, the local characteristics of the features are guaranteed to be reserved, the alignment strengthening effect is achieved on the basis of component alignment, and the redundancy of the features among different components is reduced. And whether the part area is visible or not is identified and processed, so that the model can also process and identify the shielded condition, and the training and testing stage determines whether each part area is visible or not according to the key points, so that the method can adapt to the condition that the body part of a pedestrian is shielded in an actual scene, and further improves the generalization capability of the model and the accuracy of model identification.

Example 3

In the training method of the pedestrian recognition model provided in embodiment 1, the first segmentation loss value and the second segmentation loss value are obtained by the same method, and the first segmentation loss value is taken as an example to be specifically described below. Referring to fig. 5, a first feature map output by the convolutional neural network CNN is subjected to component segmentation to obtain a first segmentation area prediction map, and a first segmentation loss value is calculated according to the first segmentation area prediction map, which can be obtained by the following steps:

step 501, performing component division on the first feature map to obtain a first divided region prediction map including K divided regions.

Wherein the component segmentation is used for predicting the component area and is done by the component segmentation module PS. The component partitioning module PS may include one deconvolution layer, a BN layer, a ReLU layer, and one 1 × 1 convolution layer. Wherein, the step (stride) of the deconvolution layer is 2, the convolution kernel size is 3x3, the resolution of the output characteristic diagram is 2Hx2W, and the number of output channels is 256. The number of types of part segmentation is K ═ 8, and the part segmentation comprises a background, a head, an upper body, a big arm, a small arm, thighs, calves and shoes. The number of output channels of the 1 × 1 deconvolution layer is the number of categories 8.

Step 502, calculating a cross entropy classification loss value between each of the K partitioned areas and the corresponding partition label of the partitioned area; and the segmentation labels are obtained by predicting the first characteristic diagram through a preset segmentation model.

The segmentation labeling corresponding to the segmentation region can be completed by manually labeling the first feature graph, and the time consumption is long. In order to improve the labeling efficiency, a preset segmentation model is selected to predict the first feature map, and a segmentation prediction result is used as a segmentation label. The component segmentation calculates cross-entropy classification loss for each pixel within each segmented region.

Step 503, obtaining the average cross entropy classification loss value of the K division areas as the first division loss value.

The first segmentation loss value is the total loss value of all the segmentation areas, and the calculation mode is shown as the following formula two:

where K is the number of classes, i.e. the number of segmented regions,

and taking the average loss value of all pixel points in the kth category. Such a calculation first averages the losses in each class and finally averages all classes in order to balance the degree of contribution of different sized components to the total loss value.

After the identity loss value, the first segmentation loss value and the second segmentation loss value are obtained by calculation based on the embodiments 2 and 3, the calculation of the total loss function of the whole pedestrian recognition model training is as shown in the following formula three:

wherein the content of the first and second substances,

the total pedestrian classification loss or identity loss value of all the pictures in the source domain,

the total component division loss of all pictures in the source domain is the first division loss value,

a second segmentation loss value, λ, which is the total part segmentation loss of all pictures in the target domain₁And λ₂Are weights of loss functions, respectively

And

the two weight values may both be set to 1, and also adjusted according to actual requirements.

The segmentation loss of the whole picture is calculated by respectively calculating the segmentation loss of each part, so that the contribution degree of parts with different sizes to the segmentation loss of the whole picture is balanced, the redundancy of features among different parts is reduced, and the parts are aligned more accurately. Furthermore, the total loss value of the model is calculated according to the identity loss value, the first segmentation loss value and the second segmentation loss value, model parameters are adjusted based on the total loss value, the contribution degree of component segmentation and identity prediction to model training is balanced, and the model obtained by training is more stable.

Example 4

Based on the pedestrian recognition model trained in any of the above embodiments, the embodiment also provides a pedestrian recognition method, which can be applied to model testing and also applied to pedestrian recognition in a target domain. Referring to fig. 6, the identification method includes the following steps:

step 601, inputting a query picture and each test picture in a target domain into the pedestrian recognition model respectively, and recognizing and outputting P feature vectors of the query picture and P feature vectors of each test picture.

The P feature vectors respectively correspond to P component areas of the identified pictures, and are feature vectors with the dimension d, which are extracted from each picture through a convolutional neural network of a pedestrian identification model, a component alignment pooling module and an embedding layer. If a component region is not visible, its feature vector is 0.

Step 602, calculating and obtaining the similarity between the query picture and each test picture based on the P-th feature vector of the query picture and the P-th feature vector of each test picture, wherein P belongs to [1, P ].

The similarity between pictures can be characterized by the similarity between the characteristic vectors, such as the distance between the vectors.

Step 603, obtaining a similar picture identification result of the query picture in the target domain based on the similarity between the query picture and each test picture.

If the image with the similarity larger than the set threshold value exists in the target domain, the image is obtained and is output as a similar image identification result similar to the query image.

In a specific implementation process, when the similarity between the query picture and each test picture is obtained, the vector similarity between the p-th feature vector of the query picture and the p-th feature vector of each test picture can be obtained firstly; and then obtaining a vector similarity mean value of the P vector similarities corresponding to the query picture and each test picture, and taking the vector similarity mean value as the similarity between the query picture and each test picture.

Since there may be invisible areas in the component area, only the components visible in the query picture (query) are considered when calculating the distance between the query picture (query) and the test library picture (gallery) or the picture in the target domain. Specifically, the picture I is inquired_qAnd any one of the test library pictures I_gThe distance, i.e., similarity calculation method of g ∈ { 1., N } (N represents the number of pictures of the test library) is as in formula four:

wherein the content of the first and second substances,

respectively represent pictures I_qAnd I_gThe eigenvectors obtained through the p-th embedding layer,

representing a picture I_qWhether the p-th element of (1) is visible, cos _ dist represents the cosine distance. For a query, when a certain part of the query picture is invisible, the distance of the part is not used for the query.

Based on the distances obtained through calculation, the distances obtained through calculation can be sorted, and n pictures corresponding to the first n distances with the largest distance are output to serve as similar pictures detected through testing and similar to the query picture, or the distances obtained through calculation are compared with a set threshold value, and pictures corresponding to the distances larger than the set threshold value are output to serve as similar pictures detected through testing and similar to the query picture, and the identities of people in the similar pictures are similar.

By acquiring the feature vectors of each component region, calculating the similarity between the feature vectors of the corresponding component regions between the two pictures, and calculating the total similarity between the pictures based on the similarity of each component region, the features of each component region are fully considered, so that the picture similarity obtained by calculation is more accurate.

Through the technical scheme, the embodiment can achieve one or more of the following beneficial effects:

1) because the component alignment pooling module PAP acquires the feature map from the convolutional neural network CNN for pooling, the component segmentation module PS also acquires the feature map from the convolutional neural network CNN for component segmentation, all the feature maps are obtained from the last convolutional layer pooling of the convolutional neural network CNN, and different component regions in the feature maps share the whole underlying network, compared with the existing method of respectively sending the cut component small maps into the network to extract features, the model feature extraction method is more efficient.

2) When the distance between the samples is calculated, the similarity between the component regions is calculated by acquiring the feature vectors of the component regions corresponding to each other in the two pictures, so that component alignment is realized. And for the acquisition of the component areas, firstly checking and acquiring key points on the pedestrian, acquiring the specific position of each component area according to the corresponding relation between each key point and the component area, and further realizing the alignment of the component areas through the key points on the pedestrian body.

3) Judging the acquired key points, judging whether the acquired key points contain all key points of a preset type, if not, acquiring that a component region corresponding to a missing key point is an invisible component region, acquiring that a component region corresponding to an existing key point is a visible component region, identifying and processing whether the component region is visible or not by the above steps, so that the model can also process and identify the shielded condition, and determining whether each component region is visible or not according to the key points in the training and testing stages, thereby being capable of adapting to the condition that the part of the body of a pedestrian is shielded in an actual scene, and further improving the generalization capability of the model and the accuracy of model identification.

4) The method comprises the steps of carrying out component region division based on preset key points, namely applying component division constraint on a feature diagram, ensuring that the feature retains local characteristics of the feature, playing a role in strengthening alignment on the basis of component alignment, and reducing redundancy of features among different components.

5) Acquiring training samples from a source domain with an identity label and a target domain without the identity label; identifying the identity of the picture in the source domain to obtain an identity loss value, and carrying out component segmentation on the picture in the source domain to obtain a segmentation loss value; performing component segmentation on the picture in the target domain to obtain a segmentation loss value; the model parameters are adjusted through the segmentation loss value of the picture component segmentation in the source domain and the segmentation loss value of the picture component segmentation in the target domain, so that the alignment between the model obtained through training and the picture component in the target domain is realized when the pedestrian recognition model is trained on the basis of the picture in the source domain, the model obtained through training naturally adapts to the scene of the target domain, the generalization capability of the pedestrian recognition model is improved, and the accuracy of cross-domain recognition is improved.

Exemplary devices

Based on the same inventive concept of the training method of the pedestrian recognition model provided in embodiments 1 to 3 of the present application, this embodiment correspondingly provides a training device of the pedestrian recognition model, please refer to fig. 7, and the device includes:

the image obtaining module 71 is configured to obtain a first image from a source domain, and obtain a second image from a target domain to be tested, where the image in the source domain has an identity label of a pedestrian, and the image in the target domain has no identity label of the pedestrian;

the identification module 72 is configured to input the first picture as a first training sample into a convolutional neural network in a pedestrian identification model to obtain a first feature map, and calculate an identity loss value according to the first feature map and an identity label of the first picture;

a dividing module 73, configured to perform component division on the first feature map to obtain a first divided region prediction map, and calculate a first division loss value according to the first divided region prediction map;

the recognition module 72 is further configured to input the second picture as a second training sample into the convolutional neural network to obtain a second feature map, and the segmentation module is further configured to perform component segmentation on the second feature map to obtain a second segmentation area prediction map, and calculate a second segmentation loss value according to the second segmentation area prediction map;

and a parameter adjusting module 74, configured to perform parameter adjustment on the pedestrian recognition model according to the identity loss value, the first segmentation loss value, and the second segmentation loss value, so as to obtain the trained pedestrian recognition model.

As an alternative implementation, referring to fig. 8, the identification module 72 may include:

a dividing unit 721, configured to obtain a key point of a pedestrian in the first picture, and divide the first picture into P component regions based on the key point, where P is greater than or equal to 2;

a pooling unit 723, configured to separately pool each component region corresponding to the first feature map according to the division of the P component regions, so as to obtain P feature vectors;

a prediction unit 724, configured to perform identity prediction according to the P feature vectors to obtain a prediction value;

the calculating unit 725 is configured to obtain the identity loss value according to the predicted value and the identity label of the first picture.

Wherein the dividing unit 721 includes: a key point obtaining subunit, configured to obtain a key point of a pedestrian in the first picture; the judging subunit is used for judging whether the acquired key points comprise all key points of a preset type, if not, acquiring a component region corresponding to the missing key points as an invisible component region, and acquiring a component region corresponding to the existing key points as a visible component region; the pooling unit 723 is specifically configured to: pooling visible component areas corresponding to the first feature map respectively to obtain a feature vector of each visible component area; outputting a zero vector to the invisible component region as a feature vector of the invisible region.

As an alternative embodiment, as shown in fig. 8, the segmentation module 73 includes:

a dividing unit 731, configured to perform component division on the first feature map to obtain a first divided region prediction map including K divided regions;

a calculating unit 732, configured to calculate, for each of the K partitioned areas, a cross entropy classification loss value between the partitioned area and a partition label corresponding to the partitioned area; the segmentation labels are obtained by predicting the first feature map through a preset segmentation model; and obtaining the average cross entropy classification loss value of the K segmentation areas as the first segmentation loss value.

Based on the same inventive concept of the pedestrian recognition method provided by the above embodiment 4, this embodiment further provides a pedestrian recognition apparatus correspondingly, which is applied to the pedestrian recognition model obtained by training in the above embodiment, as shown in fig. 9, the pedestrian recognition apparatus includes:

an input module 91, configured to input a query picture and each test picture in a target domain into the pedestrian recognition model, respectively, to recognize and output P feature vectors of the query picture and P feature vectors of each test picture, where the P feature vectors correspond to P component regions of an identified picture, respectively;

a calculating module 92, configured to calculate, based on the P-th feature vector of the query picture and the P-th feature vector of each test picture, a similarity between the query picture and each test picture, where P is an element [1, P ];

a query module 93, configured to obtain a similar picture identification result of the query picture in the target domain based on a similarity between the query picture and each test picture;

and a result obtaining module 94, configured to obtain the pedestrian recognition result according to the recognition result.

As an alternative implementation, as shown in fig. 10, the calculation module 92 may include:

the vector calculation unit 921 is configured to obtain a vector similarity between a pth feature vector of the query picture and a pth feature vector of each test picture;

the mean value calculating unit 922 is configured to obtain a mean value of vector similarities of P vector similarities corresponding to the query picture and each test picture, and use the mean value of vector similarities as a similarity between the query picture and each test picture.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Exemplary electronic device

Next, an electronic apparatus provided according to an embodiment of the present application is described with reference to fig. 11. FIG. 11 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

As shown in fig. 11, electronic device 110 includes one or more processors 111 and memory 112.

Processor 111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 110 to perform desired functions.

Memory 112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 111 to implement the above-described pedestrian recognition model training methods of the various embodiments of the present application and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 110 may further include: an input device 113 and an output device 114, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 113 may be, for example, a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 113 may be a communication network connector for receiving the acquired input signal from other devices.

The input device 113 may also include, for example, a keyboard, a mouse, and the like.

The output device 114 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 110 relevant to the present application are shown in fig. 9, and components such as buses, input/output interfaces, and the like are omitted. In addition, electronic device 110 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of training a pedestrian recognition model according to various embodiments of the present application described in the "exemplary methods" section of this specification above.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method of training a pedestrian recognition model according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of training a pedestrian recognition model, the method comprising:

2. The method of claim 1, wherein the calculating the identity loss value according to the first feature map and the identity label of the first picture comprises:

acquiring key points of pedestrians in the first picture, and dividing the first picture into P component areas based on the key points, wherein P is more than or equal to 2;

pooling each component area corresponding to the first feature map according to the division of the P component areas to obtain P feature vectors;

performing identity prediction according to the P eigenvectors to obtain a predicted value;

and obtaining the identity loss value according to the predicted value and the identity label of the first picture.

3. The method of claim 2, wherein obtaining key points of pedestrians in the first picture and dividing the first picture into P regions based on the key points comprises:

acquiring key points of pedestrians in the first picture;

judging whether the acquired key points comprise all key points of a preset type or not, if not, acquiring that the component region corresponding to the missing key point is an invisible component region, and acquiring that the component region corresponding to the existing key point is a visible component region;

the pooling of each component region corresponding to the first feature map according to the division of the P component regions to obtain P feature vectors includes:

pooling visible component areas corresponding to the first feature map respectively to obtain a feature vector of each visible component area;

outputting a zero vector to the invisible component region as a feature vector of the invisible region.

4. The method according to claim 1, wherein the performing component division on the first feature map to obtain a first divided region prediction map, and calculating a first division loss value according to the first divided region prediction map comprises:

performing component division on the first feature map to obtain a first divided region prediction map comprising K divided regions;

calculating a cross entropy classification loss value between each of the K divided regions and a division label corresponding to the divided region; the segmentation labels are obtained by predicting the first feature map through a preset segmentation model;

and obtaining the average cross entropy classification loss value of the K segmentation areas as the first segmentation loss value.

5. A pedestrian recognition method applied to a pedestrian recognition model obtained by training according to the training method of the pedestrian recognition model claimed in any one of claims 1 to 5, the recognition method comprising:

6. The identification method of claim 5, wherein the calculating and obtaining the similarity between the query picture and each test picture based on the p-th feature vector of the query picture and the p-th feature vector of each test picture comprises:

obtaining the vector similarity between the p-th characteristic vector of the query picture and the p-th characteristic vector of each test picture;

and obtaining a vector similarity mean value of the P vector similarities corresponding to the query picture and each test picture, and taking the vector similarity mean value as the similarity between the query picture and each test picture.

7. A training apparatus for a pedestrian recognition model, the apparatus comprising:

8. A pedestrian recognition device applied to a pedestrian recognition model obtained by training according to the training method of the pedestrian recognition model claimed in any one of claims 1 to 5, the recognition device comprising:

9. A computer-readable storage medium storing a computer program for executing the method for training a pedestrian recognition model according to any one of claims 1 to 4 or executing the method for pedestrian recognition according to any one of claims 5 to 6.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute the training method of the pedestrian recognition model according to any one of claims 1 to 4, or execute the pedestrian recognition method according to any one of claims 5 to 6.