WO2021196547A1 - 行人再识别方法、装置、电子设备及存储介质 - Google Patents

行人再识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021196547A1
WO2021196547A1 PCT/CN2020/119546 CN2020119546W WO2021196547A1 WO 2021196547 A1 WO2021196547 A1 WO 2021196547A1 CN 2020119546 W CN2020119546 W CN 2020119546W WO 2021196547 A1 WO2021196547 A1 WO 2021196547A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sub
pedestrian
feature map
sequence
Prior art date
Application number
PCT/CN2020/119546
Other languages
English (en)
French (fr)
Inventor
林宇翔
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2021196547A1 publication Critical patent/WO2021196547A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/30Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Definitions

  • Pedestrian re-identification (person re-identification) is an important research topic in computer vision. Especially with the increasing demand in video surveillance, smart security and other scenarios, pedestrian re-identification has received more and more attention.
  • the adopted pedestrian re-recognition method can only perform pedestrian re-recognition when the to-be-recognized pedestrian image is a full-body image of the to-be-recognized pedestrian to which the to-be-recognized pedestrian image belongs, that is, the to-be-recognized pedestrian image includes all of the body of the to-be-recognized pedestrian.
  • Pedestrian images to be recognized are usually obtained from surveillance images including one or more pedestrians through a pedestrian frame through a pedestrian detection network. All features extracted from the pedestrian image to be recognized are compared with all preset features corresponding to each reference pedestrian image to determine which reference pedestrian image belongs to the same pedestrian and complete pedestrian re-recognition.
  • the actual pedestrian re-recognition scene often occurs in the up and down direction, and a part of the body of the pedestrian to be identified is blocked by other objects, such as other pedestrians, vehicles, obstacles, and the like.
  • the acquired image of the pedestrian to be identified includes only a part of the body of the pedestrian to be identified, and the acquired pedestrian image is not a full-body image of the pedestrian to be identified to which the image of the pedestrian to be identified belongs, but a non-full-body image of the pedestrian to be identified.
  • the reference pedestrian image is not a full-body image of the reference pedestrian to which the reference pedestrian image belongs.
  • the present application provides a pedestrian re-identification method, device, electronic equipment, and storage medium.
  • This application provides a pedestrian re-identification method, device, electronic equipment and storage medium.
  • a pedestrian re-identification method including:
  • the feature extraction network is configured to: extract the feature maps of the pedestrian image to be recognized; for all For each of the preset division numbers, the feature map of the pedestrian image to be identified is divided in a top-to-bottom direction using the preset division number to obtain a corresponding to the preset division number
  • the sequence of sub-characteristic maps to be recognized use all the sub-characteristic map sequences to be recognized as all the sub-characteristic map sequences to be recognized in the pedestrian image to be recognized;
  • a pedestrian re-recognition result of the pedestrian image to be recognized is generated.
  • a pedestrian re-identification device including:
  • the feature extraction unit is configured to input the image of the pedestrian to be identified into the feature extraction network to obtain all the sub-feature map sequences to be identified of the image of the pedestrian to be identified output by the feature extraction network, wherein the feature extraction network is configured to: extract the pedestrian to be identified The feature map of the image; for each preset number of divisions in all the preset number of divisions, the feature map of the image to be identified for pedestrians is divided in the top-to-bottom direction using the preset number of divisions to obtain the same Preset the sub-characteristic sequence to be recognized corresponding to the number of divisions; use all the sub-characteristic sequence to be recognized as all the sub-characteristic sequence to be recognized of the pedestrian image to be recognized;
  • the similarity calculation unit is configured to, for each reference pedestrian image, calculate the pedestrian image to be recognized and the Refer to the similarity of pedestrian images;
  • the generating unit is configured to generate a pedestrian re-recognition result of the pedestrian image to be recognized based on the similarity between the pedestrian image to be recognized and each reference pedestrian image.
  • the pedestrian re-recognition method and device provided in the embodiments of the present application realize that when re-recognizing pedestrians, the situation in which the image of the pedestrian to be identified is a full-body image or a non-full-body image of the pedestrian to be identified to which the image of the pedestrian to be identified belongs is considered.
  • the reference pedestrian image is a full-body image or a non-full-body image of the reference pedestrian to which the reference pedestrian image belongs.
  • the pedestrian image to be recognized is a full-body image or a non-full-body image to which the pedestrian image to be recognized belongs and/or the reference pedestrian image is a full-body image or a non-full-body image of the reference pedestrian to which the reference pedestrian image belongs, both can be more accurate Calculate the similarity between the pedestrian image to be identified and the reference pedestrian image, and complete the pedestrian re-identification more accurately.
  • Fig. 1 shows a flowchart of a pedestrian re-identification method provided by an embodiment of the present application
  • Figure 2 shows a schematic diagram of the effect of obtaining retained images at different retention ratios
  • Fig. 3 shows a structural block diagram of a pedestrian re-identification device provided by an embodiment of the present application.
  • Fig. 1 shows a flowchart of a method provided by an embodiment of the present application, and the method includes:
  • Step 101 Use the feature extraction network to obtain all to-be-recognized sub-feature map sequences of the to-be-recognized pedestrian image.
  • the pedestrian image to be identified is input to the feature extraction network, and all the sub-feature map sequences to be identified of the pedestrian image to be identified output by the feature extraction network are obtained.
  • the feature extraction network is configured to: extract the feature map of the pedestrian image to be recognized; for each preset number of divisions among all the preset number of divisions, with the preset number of divisions, the pedestrian image to be identified in the top-to-bottom direction
  • the feature maps are divided to obtain the sub-feature map sequence to be identified corresponding to the preset number of divisions; all the obtained sub-feature map sequences to be identified are used as all the sub-feature map sequences to be identified of the pedestrian image to be identified.
  • the feature extraction network may include a convolutional neural network for extracting a feature map of the pedestrian image to be recognized.
  • the pedestrian image to be identified belongs to the pedestrian to be identified that appears in the pedestrian image to be identified, and the pedestrian type objects in the pedestrian image to be identified include only the pedestrian to be identified.
  • each sequence of sub-feature to be recognized in the pedestrian image to be recognized any two adjacent sub-feature to be recognized in the sequence of sub-feature to be recognized have no overlap.
  • the size of each sub-feature map to be identified is the same or substantially the same.
  • the feature map of the pedestrian image to be recognized is evenly divided in the top-down direction, and the feature map of the pedestrian image to be recognized in the top-down direction It is divided into a preset number of sub-feature maps with substantially the same size, and the preset number of sub-feature maps with the same size forms a sequence of sub-feature maps corresponding to the preset number of divisions.
  • all preset division numbers include n/2, n/2+1, n/2+2,..., n. Assuming that n is 6, all preset division numbers include 3, 4, 5, and 6.
  • the feature map of the pedestrian image to be recognized is divided evenly from top to bottom, and the feature map of the pedestrian image to be recognized is divided into 3 substantially the same size from the top to the bottom.
  • the three sub-feature maps of basically the same size form a sub-feature map sequence corresponding to the preset number of divisions of 3.
  • the feature maps of the pedestrian images to be recognized are evenly divided in the top-down direction, and the feature maps of the pedestrian images to be recognized are divided into 4 substantially the same size in the top-to-bottom direction
  • the four sub-feature maps of substantially the same size form a sequence of sub-feature maps corresponding to the preset number of divisions 4. And so on.
  • all preset division numbers include 3, 4, 5, and 6.
  • the feature map of the pedestrian image to be identified is divided in the top-down direction, and all the sub-feature map sequences obtained include: the sub-feature map sequence to be identified corresponding to the preset number of divisions 3, and The to-be-identified sub-characteristic sequence corresponding to the preset number of divisions of 4, the to-be-identified sub-characteristic sequence corresponding to the preset number of divisions of 5, and the to-be-recognized sub-characteristic sequence corresponding to the preset number of divisions of 6.
  • the feature maps of the pedestrian image to be recognized are divided uniformly from top to bottom, because it takes into account that there may be multiple types of processing in the up and down direction. Identify the relationship between the image of the pedestrian to be identified and the full-body image of the pedestrian to be identified to which the image of the pedestrian belongs.
  • the ratio of the preset number of divisions to the largest preset number of divisions indicates the ratio of the image of the pedestrian to be identified to the full-body image of the pedestrian to be identified to which the image of the pedestrian belongs.
  • all preset division numbers include 3, 4, 5, and 6, and 6 is the maximum preset division number.
  • the feature map division of the pedestrian image to be recognized in the top-to-bottom direction assumes that the pedestrian image to be recognized is a full-body image 3/6 of the pedestrian to be recognized to which the pedestrian image belongs.
  • the feature map division of the pedestrian image to be recognized in the top-to-bottom direction assumes that the pedestrian image to be recognized is the full-body image 4/6 of the pedestrian to be recognized to which the pedestrian image to be recognized belongs.
  • the feature map division of the pedestrian image to be recognized in the top-to-bottom direction assumes that the pedestrian image to be recognized is a full-body image 5/6 of the pedestrian to be recognized to which the pedestrian image belongs.
  • the feature map division of the pedestrian image to be recognized in the top-to-bottom direction assumes that the pedestrian image to be recognized is a full-body image of the pedestrian to be recognized to which the pedestrian image to be recognized belongs.
  • Step 102 Calculate the similarity between the pedestrian image to be identified and each reference pedestrian image based on the related sub-feature map sequence.
  • the similarity between the to-be-recognized pedestrian image and the reference pedestrian image is calculated.
  • the reference pedestrian image belongs to the reference pedestrian appearing in the reference pedestrian image, and the pedestrian type objects in the reference pedestrian image include only the reference pedestrian.
  • the feature map of the reference pedestrian image can be divided in a top-to-bottom direction by all preset division numbers in advance, and all the reference sub-feature map sequences obtained As all reference sub-feature map sequences of the reference pedestrian image.
  • the feature map of the reference pedestrian image is extracted through a feature extraction network.
  • all preset division numbers include 3, 4, 5, and 6.
  • all preset division numbers include 3, 4, 5, and 6.
  • the feature map of the reference pedestrian image is divided in a top-to-bottom direction based on all preset division numbers, and all reference sub-feature map sequences of the reference pedestrian image include: reference sub-items corresponding to the preset division number 3 A feature map sequence, a reference sub-feature map sequence corresponding to the preset number of divisions 4, a reference sub-feature map sequence corresponding to the preset number of divisions 5, and a reference sub-feature map corresponding to the preset number of divisions 6. sequence.
  • the feature maps of the pedestrian image to be recognized are divided evenly from top to bottom, because it takes into account that there may be a variety of pedestrian images to be recognized and to be recognized in the up and down direction.
  • the feature maps of the image of the pedestrian to be identified are evenly divided from top to bottom. This is due to consideration In the up and down direction, there may be a variety of relationships between the reference pedestrian image and the full-body image of the reference pedestrian to which the reference pedestrian image belongs.
  • Each reference sub-characteristic sequence of all reference sub-characteristic sequence of the reference pedestrian image corresponds to one sub-characteristic sequence of all to-be-recognized sub-characteristic sequence of the pedestrian image to be recognized.
  • the reference sub-characteristic sequence and the to-be-identified sub-characteristic sequence corresponding to the reference sub-characteristic sequence correspond to the same preset number of divisions.
  • all preset division numbers include 3, 4, 5, and 6.
  • the feature maps of the pedestrian image to be identified are divided in the top-to-bottom direction.
  • All the sub-feature map sequences of the pedestrian image to be identified include: corresponding to the preset number of divisions 3
  • the sequence of sub-feature maps to be identified 1 the sequence of sub-feature maps to be identified corresponding to the preset number of divisions 4, the sequence of sub-feature maps to be identified 3 corresponding to the preset number of divisions 5, and the preset number of divisions 6
  • the feature maps of the reference pedestrian image are respectively divided in a top-to-bottom direction.
  • All reference sub-feature map sequences of the reference pedestrian image include: corresponding to the preset division number 3 Reference sub-feature map sequence 1, reference sub-feature map sequence corresponding to the preset number of divisions 4, reference sub-feature map sequence corresponding to the preset number of divisions 5, and reference corresponding to the preset number of divisions 6 Sub feature map sequence 4.
  • the sub-feature map sequence 1 to be identified corresponding to the preset number of divisions 3 corresponds to the reference sub-feature map sequence 1 corresponding to the preset number of divisions 3.
  • the sub-characteristic sequence 2 to be identified corresponding to the preset number of divisions 4 corresponds to the reference sub-characteristic sequence 2 corresponding to the preset number of divisions 4.
  • the sub-characteristic sequence 3 to be identified corresponding to the preset number of divisions of 5 corresponds to the reference sub-characteristic sequence 3 corresponding to the preset number of divisions of 5.
  • the reference sub-characteristic sequence 4 corresponding to the preset number of divisions 6 corresponds to the reference sub-characteristic sequence 4 corresponding to the preset number of divisions 6.
  • the smallest similarity among all the calculated similarities can be used as the similarity between the pedestrian image to be recognized and the reference pedestrian image.
  • the difference between the to-be-recognized pedestrian image and the reference pedestrian image calculates the difference between the to-be-recognized pedestrian image and the reference pedestrian image.
  • the similarity includes: combining the sequence of sub-feature maps of the pedestrian image to be identified and the reference sub-feature map sequence of the reference pedestrian image to obtain a combination of multiple sub-feature map sequences related to the reference pedestrian image.
  • the feature map sequence combination includes: a reference sub feature map sequence, a sub feature map sequence to be identified; for each sub feature map sequence combination, calculate the sub feature map sequence combination corresponding to each sub feature map combination
  • the similarity of two sub-characteristic maps, the sub-characteristic map combination corresponding to the sub-characteristic map sequence combination includes: the sub-characteristic map at the same position in the two sub-characteristic map sequences in the sub-characteristic map sequence combination; will be calculated
  • the average of all the similarities in is used as the similarity corresponding to the sub-feature sequence combination; based on the similarity corresponding to each sub-feature sequence combination, the similarity between the pedestrian image to be identified and the reference pedestrian image is calculated.
  • the sub-characteristic sequence to be identified is respectively combined with each reference sub-characteristic sequence of the reference pedestrian image to obtain multiple sub-characteristic sequence combinations related to the reference pedestrian image.
  • All the sub-characteristic sequence of the pedestrian image to be recognized include: the sub-characteristic sequence to be recognized corresponding to the preset number of divisions 3, the sub-characteristic sequence to be recognized corresponding to the preset number of divisions 4, and the pre-recognized sub-characteristic sequence 2.
  • All the sub-feature map sequences of the reference pedestrian image include: a reference sub-feature map sequence corresponding to the preset number of divisions 3, a reference sub-feature map sequence corresponding to the preset number of divisions 4, and the preset division number
  • the sub-characteristic sequence 1 to be identified is combined with the reference sub-characteristic sequence 1, the reference sub-characteristic sequence 2, the reference sub-characteristic sequence 3, and the reference sub-characteristic sequence 4 to obtain the sub-characteristic sequence 1 to be identified and the reference Sub-characteristic sequence combination composed of sub-characteristic sequence 1, sub-characteristic sequence combination composed of sub-characteristic sequence 1 to be identified and reference sub-characteristic sequence 2, respectively, sub-characteristic sequence combination of to-be-identified sub-characteristic sequence 1 and reference sub-characteristic
  • the sub-characteristic sequence 2 to be identified are respectively combined with the reference sub-characteristic sequence 1, the reference sub-characteristic sequence 2, the reference sub-characteristic sequence 3, the reference sub-characteristic sequence 4, and so on.
  • the similarity corresponding to each sub-feature map sequence combination related to the reference pedestrian image can be calculated, and then, it can be based on each sub-feature map sequence related to the reference pedestrian image. Combine the corresponding similarities, and calculate the similarity between the pedestrian image to be identified and the reference pedestrian image.
  • the similarity corresponding to each sub-feature map sequence combination related to the reference pedestrian image the similarity as the median or the similarity corresponding to all the sub-feature map sequence combinations related to the reference pedestrian image
  • the average value of the similarity is used as the similarity between the pedestrian image to be identified and the reference pedestrian image.
  • calculating the similarity between the pedestrian image to be recognized and the reference pedestrian image includes: The smallest similarity among the similarities corresponding to each sub-feature map sequence combination is used as the similarity between the pedestrian image to be identified and the reference pedestrian image.
  • the sub-characteristic sequence combination includes a to-be-identified sub-characteristic sequence and a reference sub-characteristic sequence.
  • the sub-characteristic map combination corresponding to the sub-characteristic map sequence combination includes: the sub-characteristic maps located at the same position in the two sub-characteristic map sequences in the sub-characteristic map sequence combination.
  • the position of the sub-characteristic map to be identified in the sequence of sub-characteristic maps to be identified is indicated in a top-to-bottom direction, and the sub-characteristic map to be identified is a pedestrian to be identified
  • the sub-feature map to be recognized in the feature map of the image is indicated in a top-to-bottom direction.
  • the feature map of the pedestrian image to be identified in the top-to-bottom direction is divided into three sub-feature maps to be identified, and the three sub-feature maps corresponding to the preset number of divisions of 3 are obtained.
  • the first to-be-recognized sub-feature map in the sequence of to-be-recognized sub-feature maps is the first to-be-recognized sub-feature map of the feature map of the pedestrian image to be recognized in a top-to-bottom direction.
  • the second to-be-recognized sub-feature map in the sequence of to-be-recognized sub-feature maps is the second to-be-recognized sub-feature map of the feature map of the pedestrian image to be recognized in a top-to-bottom direction.
  • the third to-be-recognized sub-feature map in the sequence of to-be-recognized sub-feature maps is the third to-be-recognized sub-feature map of the feature map of the pedestrian image to be recognized in the top-to-bottom direction.
  • the position of the reference sub-feature map in the reference sub-feature map sequence is indicated in the top-to-bottom direction, and the reference sub-feature map is the number of the feature map of the reference pedestrian image.
  • a sub-characteristic map to be referenced is indicated in the top-to-bottom direction, and the reference sub-feature map is the number of the feature map of the reference pedestrian image.
  • each A sub-characteristic map to be identified and a reference sub-characteristic map in the sequence of reference sub-characteristic maps each form a sub-characteristic map combination corresponding to the sub-characteristic map sequence combination.
  • the sequence of sub-feature maps to be identified includes three sub-feature maps to be identified, and the sequence of reference sub-feature maps includes three reference sub-feature maps.
  • composition of the first sub-feature to be recognized in the sequence of sub-feature to be recognized and the first reference sub-feature in the sequence of reference sub-feature maps corresponds to the combination of sub-feature maps corresponding to the combination of the sub-feature map sequence The first sub feature map combination.
  • composition of the second sub-feature to be recognized in the sequence of sub-features to be recognized and the second reference sub-feature in the sequence of reference sub-features corresponds to the combination of sub-features corresponding to the sequence of sub-features
  • the second sub feature map combination
  • composition of the third sub-feature to be recognized in the sequence of sub-features to be recognized and the third reference sub-feature in the sequence of reference sub-features corresponds to the combination of sub-features corresponding to the sequence of sub-features
  • the third sub feature map combination
  • Calculate the similarity between the two sub-feature maps in each sub-feature map combination corresponding to the sub-feature map sequence combination that is, calculate the first sub-feature map to be identified in the sub-feature map sequence to be identified and the reference sub-feature map.
  • the similarity of the first reference sub-feature map in the sequence of feature maps, the similarity of the second sub-feature map to be identified in the sequence of sub-feature maps to be identified and the second reference sub-feature map in the sequence of reference sub-feature maps Similarity, similarity between the third sub-feature to be recognized in the sequence of sub-features to be recognized and the third reference sub-feature in the sequence of reference sub-features.
  • the number of sub-characteristic maps included in the corresponding preset sub-characteristic map sequence with a small number of divisions is represented by N.
  • One sub-characteristic map in the corresponding sub-characteristic map sequence with a small preset number of divisions corresponds to one of the first N sub-characteristic maps in the corresponding sub-characteristic map sequence with a large number of preset divisions.
  • the position of the sub-feature map in the corresponding sub-feature map sequence with a small preset number of divisions corresponds to the sub-feature map corresponding to the sub-feature map
  • the positions of the graphs in the corresponding preset sub-characteristic graph sequence with a large number of divisions are the same.
  • the sub-characteristic map and its corresponding sub-characteristic map form a sub-characteristic map combination.
  • the sub-feature map located after the Nth sub-feature map in the corresponding preset number of sub-feature map sequences does not participate in the calculation of the similarity.
  • the sequence of sub-feature maps to be identified includes 3 sub-feature maps to be identified, and the sequence of reference sub-feature maps includes 5 reference sub-feature maps.
  • N is 3.
  • the corresponding sub-characteristic sequence with a small preset number of divisions is the sub-characteristic sequence to be identified, and the corresponding sub-characteristic sequence with a large preset number of divisions is the reference sub-characteristic sequence.
  • the first to-be-identified sub-characteristic map in the sequence of sub-characteristic maps to be identified and the first reference sub-characteristic map in the reference sub-characteristic map sequence constitute the first sub-characteristic map combination corresponding to the two sub-characteristic map sequences.
  • the second sub-characteristic map to be identified in the sequence of sub-characteristic maps to be identified and the second reference sub-characteristic map in the sequence of reference sub-characteristic maps form a second sub-characteristic map combination corresponding to the two sub-characteristic map sequences.
  • the third to-be-identified sub-feature map in the sequence of sub-feature maps to be identified and the third reference sub-feature map in the reference sub-feature map sequence form a third sub-feature map combination corresponding to the two sub-feature map sequences.
  • the average value of all the calculated similarities is used as the similarity corresponding to the combination of sub-characteristic map sequences.
  • the feature extraction network is trained in multiple training stages, where in each training stage, the following operations are performed: For each pedestrian image used for training, use the pre-trained pedestrian image Set the retention ratio, obtain the retained image belonging to the pedestrian image used for training from the pedestrian image used for training; divide all the retained images obtained in the training phase into multiple groups of retained images; for the multiple groups of retained images For each set of retained images, the feature extraction network is used to extract the feature map of each retained image in the set of retained images; the feature map of each retained image is matched with the preset for training image to which the retained image belongs The preset number of divisions corresponding to the retention ratio, the feature map of the retained image is divided in the top-to-bottom direction to obtain the sub-feature map sequence of the retained image; at least based on the sub-feature map sequence of each retained image , Calculate all the losses corresponding to the set of retained images; based on all the losses corresponding to the set of retained images, update the parameter values of the parameters of the feature extraction network.
  • each training stage that is, each epoch
  • all the pedestrian images used for training in the same training set can be used to train the feature extraction network.
  • the pedestrian image used for training is used to train the feature extraction network in each training stage.
  • multiple preset retention ratios can be set in advance. For example, preset retention ratios of 1, (n-1)/n, (n-2)/n,..., (n/2)/n, etc., assuming that n is 6, preset 1, 5/ 6, 4/6, 3/6 and other preset retention ratios.
  • each pedestrian image used for training has a preset retention ratio.
  • each training stage for each pedestrian image used for training, a reserved image belonging to the pedestrian image used for training is obtained from the pedestrian image used for training.
  • the pedestrian image used for training when the preset retention ratio of the pedestrian image used for training is 1, the pedestrian image used for training is directly used as the reserved image belonging to the pedestrian image used for training. In other words, all the pedestrian images used for training are retained.
  • obtaining the reserved image belonging to the pedestrian image used for training from the pedestrian image used for training is based on the The preset retention ratio of the trained pedestrian image is intercepted, the pedestrian image used for training is intercepted, the retained image belonging to the pedestrian image used for training is obtained, and the part of the preset retention ratio of the pedestrian image used for training is retained, That is, the reserved image belonging to the pedestrian image used for training is a part of the preset reserved ratio of the pedestrian image used for training.
  • the image used for training when the pedestrian image used for training is intercepted at the preset retention ratio of the pedestrian image used for training, the image used for training can be used for training in a top-to-bottom direction.
  • the pedestrian image is intercepted and retained in the top-to-bottom direction of the preset retention ratio of the pedestrian image used for training, that is, the retained image belonging to the pedestrian image used for training is in the top-to-bottom direction.
  • the part of the preset reserved proportion of the pedestrian image used for training is a pedestrian image used for training.
  • the preset retention ratio of the pedestrian image used for training is 5/6
  • the preset retention ratio of the pedestrian image used for training is used in the direction from top to bottom.
  • the above intercepts the pedestrian image used for training, and keeps the 5/6 part of the pedestrian image used for training in the direction from top to bottom, and obtains the reserved image belonging to the pedestrian image used for training.
  • the reserved image of the pedestrian image used for training is a part of 5/6 of the pedestrian image used for training in the top-to-bottom direction.
  • FIG. 2 shows a schematic diagram of the effect of obtaining retained images at different preset retention ratios.
  • FIG. 2 four pedestrian images used for training represented by rectangles are shown.
  • the numerical value in the pedestrian image used for training indicates the preset retention ratio of the pedestrian image used for training.
  • the pedestrian image used for training is directly used as the reserved image.
  • the dotted line in the rectangle representing the pedestrian image used for training is the bottom edge of the reserved image belonging to the pedestrian image used for training, and the top edge of the reserved image belonging to the pedestrian image used for training is the top edge of the pedestrian image used for training. .
  • the reserved image belonging to the pedestrian image used for training is the 5/6 part of the pedestrian image used for training in the top-to-bottom direction .
  • the reserved image belonging to the pedestrian image used for training is a part of 4/6 of the pedestrian image used for training in the top-to-bottom direction .
  • the reserved image belonging to the pedestrian image used for training is the 3/6 part of the pedestrian image used for training in the top-down direction .
  • the number of pedestrian images used for training with the preset retention ratio For each preset retention ratio, the number of pedestrian images used for training with the preset retention ratio.
  • preset retention ratios of 1, (n-1)/n, (n-2)/n,..., (n/2)/n, etc. assuming that n is 6, preset 1, 5/ 6, 4/6, 3/6 and other preset retention ratios.
  • the number of pedestrian images for training with a preset retention ratio of 1 the number of pedestrian images for training with a preset retention ratio of 5/6, and the number of pedestrian images for training are preset It is assumed that the number of pedestrian images used for training with a retention ratio of 4/6 and the number of pedestrian images used for training with a preset retention ratio of 3/6 are assumed.
  • the preset retention ratio of each pedestrian image used for training in the training phase is determined in a random manner, so that for each pedestrian image used for training, each preset It is assumed that the total number of times the retention ratio is applied to the pedestrian image for training is uniform.
  • the preset retention ratio of each pedestrian image used for training is determined in a random manner, that is, in each training stage, for each pedestrian image used for training, from all presets with a uniform probability
  • the preset retention ratio that is the preset retention ratio of the pedestrian image used for training is selected from the retention ratio. Therefore, for each pedestrian image used for training, the total number of times that each preset retention ratio is applied to the pedestrian image used for training is uniform. For each preset retention ratio, the total number of times the preset retention ratio is applied to the pedestrian image for training is the total number of times the preset retention ratio is applied to the pedestrian image for training in all training stages .
  • preset retention ratios of 1, (n-1)/n, (n-2)/n,..., (n/2)/n, etc. assuming that n is 6, preset 1, 5/ 6, 4/6, 3/6 and other preset retention ratios.
  • the preset retention ratio of each pedestrian image used for training is determined in a random manner. Therefore, for each pedestrian image used for training, preset retention ratios such as 1, 5/6, 4/6, 3/6 are applied to the pedestrian image used for training, and 1, 5/6, 4 The preset retention ratios such as /6, 3/6 are applied to the pedestrian image for training, and the total number of times is uniform. For example, the preset retention ratios such as 1, 5/6, 4/6, 3/6 are applied to the The total number of times on the pedestrian images used for training is basically the same.
  • each training stage all the retained images obtained in the training stage are divided into multiple groups of retained images, and each group of retained images obtained in the training stage may include the same number of retained images.
  • the reserved image belongs to a group of reserved images in the plurality of groups of reserved images.
  • each training phase in each training process of the training phase, use a set of retained images from the multiple groups of retained images obtained in the training phase to train the feature extraction network.
  • Each training process of the training phase uses The set of retained images is different.
  • the feature extraction network is used to extract the feature map of each retained image in the set of retained images obtained in the training stage; for the feature of each retained image
  • the feature map of the retained image is divided by the preset number of divisions corresponding to the preset retention ratio of the training image to which the retained image belongs to obtain the sub-feature map sequence of the retained image.
  • the preset number of divisions corresponding to the preset reserved ratio for training images to which the reserved image belongs is the numerator in the preset reserved ratio.
  • a set of reserved images obtained in the training stage is used for training, and the set of reserved images includes 3 reserved images.
  • the first reserved image in the set of reserved images is obtained by intercepting the pedestrian image used for training to which the first reserved image belongs to the preset retention ratio of 5/6 of the pedestrian image used for training .
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the first retained image belongs is 5.
  • the feature map of the first retained image is divided in a top-to-bottom direction to obtain the sub-feature map sequence of the first retained image.
  • the sub-feature map sequence of the first retained image includes 5 sub-feature maps.
  • the second retained image in the set of retained images is obtained by intercepting the pedestrian image used for training to which the second retained image belongs to the preset retention ratio of 4/6 of the pedestrian image used for training. .
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the second retained image belongs is 4.
  • the feature map of the second retained image is divided in a top-to-bottom direction to obtain the sub-feature map sequence of the second retained image.
  • the sub-feature map sequence of the second retained image includes 4 sub-feature maps.
  • the third reserved image in the set of reserved images is obtained by intercepting the pedestrian image used for training to which the third reserved image belongs to the preset retention ratio of 3/6 of the pedestrian image used for training. .
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the third retained image belongs is 3. With the preset number of divisions of 3, the feature map of the third retained image is divided in the direction from top to bottom to obtain the sub-feature map sequence of the third retained image.
  • the sub-feature map sequence of the third retained image includes 3 sub-feature maps.
  • the one can be calculated based on the sub-feature map sequence of each reserved image in the set of reserved images used in the training process. All the losses corresponding to the group of retained images; based on all the losses corresponding to the group of retained images, the parameter values of the parameters of the feature extraction network are updated.
  • the parameter values of the parameters of the feature extraction network are updated according to all the losses corresponding to a set of retained images.
  • all losses corresponding to a set of retained images may be distance losses corresponding to a set of retained images.
  • Metric Learning can be used to calculate the distance loss corresponding to a set of retained images. According to the distance loss corresponding to a set of retained images, the feature extraction network is updated. The parameter value of the parameter.
  • the purpose of updating the parameter values of the feature extraction network is to make the feature map or sub feature map of the pedestrian image belonging to the same pedestrian The degree of similarity between them is getting larger and larger, and the degree of similarity between the feature maps or sub-feature maps of pedestrian images belonging to different pedestrians is getting smaller and smaller.
  • the distance between the sub-feature map sequence of the retained image and each other sub-feature map sequence can be calculated .
  • a group of reserved images includes a reserved image belonging to pedestrian image 1, a reserved image belonging to pedestrian image 2, and a reserved image belonging to pedestrian image 3.
  • Pedestrian image 1 and pedestrian image 2 are all pedestrian images of pedestrian 1.
  • Pedestrian image 3 is a pedestrian image of pedestrian 2 different from pedestrian 1.
  • the loss function Triplet Loss is used to calculate the distance loss corresponding to the set of retained images based on the calculated 3 distances.
  • each sub-feature map in the sub-feature map sequence of the first retained image corresponds to a sub-feature map in the sub-feature map sequence of the second retained image.
  • the position of the sub-feature map in the sub-feature map sequence of the first retained image corresponds to the sub-feature map corresponding to the sub-feature map in the second retained image.
  • the distance between the sub-feature map and the sub-feature map corresponding to the sub-feature map is calculated.
  • the average value of all the calculated distances is taken as the distance between the sequence of sub-feature maps of the first retained image and the sequence of sub-feature maps of the second retained image.
  • the number of sub-feature maps included in the sub-feature map sequence of the first retained image in the two retained images is different from the number of sub-feature maps included in the sub-feature map sequence of the second retained image in the two retained images
  • the number of sub-feature maps included in the sub-feature map sequence with a small number of sub-feature maps included is represented by N
  • the first N sub-feature maps in the sub-feature map sequence with a large number of sub-feature maps included correspond to the included sub-features.
  • the position of the sub-feature map in the sub-feature map sequence with a small number of sub-feature maps is the sub-feature corresponding to the sub-feature map
  • the positions of the pictures in the sequence of sub-characteristic maps with a large number of included sub-characteristic maps are the same.
  • the sub-feature map located after the Nth sub-feature map in the sub-feature map sequence with a large number of sub-feature maps included does not participate in the calculation of the distance.
  • the distance between the sub-feature map and the sub-feature map corresponding to the sub-feature map is calculated.
  • the average value of all the calculated distances is taken as the distance between the sequence of sub-feature maps of the first retained image and the sequence of sub-feature maps of the second retained image.
  • all losses corresponding to the set of retained images include: distance loss corresponding to the set of retained images, and all classification losses corresponding to each retained image in the set of retained images ; At least based on the sub-feature map sequence of each retained image, calculating all losses corresponding to the set of retained images includes:
  • each reserved image in the set of reserved images all the classifiers corresponding to the reserved image are used, based on the feature map of the pedestrian image used for training to which the reserved image belongs and the sub-feature map sequence of the reserved image , Obtain all classification prediction results corresponding to the retained image, where the feature map of the pedestrian image used for training to which the retained image belongs is used as the input of the complete feature map supervised classifier in all the classifiers, and the sub-features of the retained image Each sub-feature map in the image sequence serves as the input of one of the classifiers; for each reserved image in the set of reserved images, based on all the classification prediction results corresponding to the reserved image, the reserved image The classification and labeling results of the pedestrian images used for training are calculated, and all the classification losses corresponding to each retained image are calculated.
  • the metric learning method and the classifier can be used for the training of the feature extraction network at the same time to speed up the convergence speed of the feature extraction network and enhance the discrimination of the feature extraction network.
  • the characteristics of the performance can be used for the training of the feature extraction network at the same time to speed up the convergence speed of the feature extraction network and enhance the discrimination of the feature extraction network.
  • the number of classifiers used for training the feature extraction network training may be n+1, where n is the maximum value among all preset division numbers, that is, the maximum preset division number, for example, n is 6.
  • the input of the classifier is the feature map of the pedestrian image used for training to which the retained image belongs or the sub feature map in the sequence of sub feature maps of the retained image.
  • the predicted classification result output by the classifier can be a pedestrian identification.
  • Each predetermined pedestrian identification belongs to a predetermined pedestrian among all predetermined pedestrians.
  • the all pre-determined all pedestrians are composed of the pedestrians to which each of the pedestrian images used for training belongs.
  • all pedestrian images used for training include at least one pedestrian image used for training belonging to the predetermined pedestrian.
  • the classifier makes predictions based on the input of the classifier, and obtaining the predicted classification result can be equivalent to the classifier predicting which of all the predetermined pedestrians the image of the pedestrian to which the retained image belongs belongs for training.
  • the predicted classification result output by the classifier may also be the probability of each predetermined pedestrian, and the predetermined probability of the pedestrian may indicate the probability of the pedestrian image used for training to which the retained image belongs belongs to the predetermined pedestrian.
  • each classifier can be a softmax classifier, and the softmax loss function can be used to calculate the classification loss corresponding to a set of retained images.
  • the input of the n+1th classifier in the n+1 classifiers is the feature map of the pedestrian image used for training to which the retained image belongs, and the n+1th classifier can be called a complete feature map supervised classifier.
  • the inputs of the 1, 2...n classifiers in the n+1 classifiers are different sub-feature maps in the sub-feature map sequence of the retained image.
  • All classifiers corresponding to any one retained image include the n+1th classifier, which is the complete feature map supervised classifier.
  • each training stage in each training process of the training stage, for each reserved image in the set of reserved images, use all the classifiers corresponding to the reserved image, based on the reserved image to which the reserved image belongs.
  • the feature map of the trained pedestrian image and the sub-feature map sequence of the reserved image are obtained to obtain all classification prediction results corresponding to the reserved image.
  • the feature map of the pedestrian image used for training to which the reserved image belongs corresponds to the complete feature map supervised classifier in all classifiers, and the pedestrian for training to which the reserved image belongs
  • the feature map of the image is used as the input of the n+1th classifier, the complete feature map supervised classifier.
  • each sub-feature map in the sequence of sub-feature maps of the reserved image corresponds to one of all the classifiers corresponding to the reserved image.
  • the sub-feature map is used as the input of the classifier corresponding to the sub-feature map.
  • the classifier corresponding to the sub-feature map is the position in the n+1 classifiers and the sub-feature map in the sub-feature map of the reserved image Classifiers with the same position in the sequence.
  • the first sub-feature map in the sub-feature map sequence of the retained image corresponds to the first classifier in the n+1 classifiers, and the first sub-feature map in the sub-feature map sequence of the retained image is regarded as the first classification
  • the first classifier outputs the prediction result of the first classification corresponding to the reserved image
  • the second sub-feature map in the sub-feature map sequence of the reserved image corresponds to the second classification of the n+1 classifiers
  • the second sub-feature map in the sub-feature map sequence of the retained image is used as the input of the second classifier, and the second classifier outputs the second classification prediction result corresponding to the retained image, and so on.
  • the training image to which the reserved image belongs is used as the input of the n+1th classifier, that is, the complete feature map supervised classifier, and the n+1th classifier outputs the last classification prediction result corresponding to the reserved image.
  • the classification and labeling result of the pedestrian image used for training may be a predetermined pedestrian identification, and the predetermined pedestrian identification belongs to the pedestrian image used for training.
  • the n+1 classifiers are n+1 softmax classifiers.
  • a set of reserved images obtained in the training stage is used for training, and the set of reserved images includes 3 reserved images.
  • the first reserved image in the set of reserved images is adjusted in the top-to-bottom direction by using the preset retention ratio of 5/6 of the pedestrian image used for training to which the first reserved image belongs.
  • the pedestrian image used for training to which the retained image belongs is intercepted and obtained.
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the first retained image belongs is 5.
  • the feature map of the first retained image is divided in a top-to-bottom direction to obtain the sub-feature map sequence of the first retained image.
  • the sub-feature map sequence of the first retained image includes 5 sub-feature maps.
  • All softmax classifiers corresponding to the sub-feature map sequence of the first retained image include: the first softmax classifier, the second softmax classifier, the third softmax classifier, and the fourth softmax classifier , The 5th softmax classifier, the n+1th is the 7th softmax classifier.
  • the first sub-feature map in the sub-feature map sequence of the first retained image is used as the input of the first softmax classifier, and the first softmax classifier outputs the first classification prediction result corresponding to the first retained image, Using the softmax loss function, based on the first classification prediction result corresponding to the first retained image and the classification and labeling result of the pedestrian image used for training to which the first retained image belongs, the first one corresponding to the first retained image is obtained Classification loss.
  • the second sub-feature map in the sub-feature map sequence of the first retained image is used as the input of the second softmax classifier, and the second softmax classifier outputs the second classification prediction result corresponding to the first retained image, Use the softmax loss function to obtain the second classification loss corresponding to the first retained image based on the second classification prediction result corresponding to the first retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs. And so on.
  • the feature map for the training image to which the first retained image belongs is used as the input of the seventh classifier, and the seventh classifier outputs the sixth classification prediction result corresponding to the first retained image, using the softmax loss function based on The sixth classification prediction result corresponding to the first retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs are obtained, and the sixth classification loss corresponding to the first retained image is obtained.
  • All classification losses corresponding to the first retained image include: the first classification loss corresponding to the first retained image, the second classification loss corresponding to the first retained image, and the second classification loss corresponding to the first retained image. Three classification losses, the fourth classification loss corresponding to the first retained image, the fifth classification loss corresponding to the first retained image, and the sixth classification loss corresponding to the first retained image.
  • the second retained image in the set of retained images is assigned to the retained image in the top-to-bottom direction by using the preset retention ratio of 4/6 of the pedestrian image used for training to which the second retained image belongs. Obtained by intercepting the pedestrian images during training.
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the second retained image belongs is 4. With the preset number of divisions of 4, the feature map of the second retained image is divided in a top-to-bottom direction to obtain the sub-feature map sequence of the second retained image.
  • the sub-feature map sequence of the second retained image includes 4 sub-feature maps.
  • All softmax classifier classifiers corresponding to the sub-feature map sequence of the second retained image include: the first softmax classifier, the second softmax classifier, the third softmax classifier, and the fourth softmax classifier , The n+1th is the 7th classifier.
  • the first sub-feature map in the sub-feature map sequence of the second retained image is used as the input of the first softmax classifier, and the first softmax classifier outputs the first classification prediction result corresponding to the second retained image,
  • the softmax loss function is used to obtain the first classification loss corresponding to the second retained image based on the first classification prediction result corresponding to the second retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs.
  • the second sub-feature map in the sub-feature map sequence of the second retained image is used as the input of the second softmax classifier, and the second softmax classifier outputs the second classification prediction result corresponding to the second retained image, Use the softmax loss function to obtain the second classification loss corresponding to the second retained image based on the second classification prediction result corresponding to the second retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs. And so on.
  • the feature map for the training image to which the second retained image belongs is used as the input of the seventh softmax classifier, and the seventh softmax classifier outputs the fifth classification prediction result corresponding to the second retained image.
  • Use the softmax loss function to obtain the fifth classification loss corresponding to the second retained image based on the fifth classification prediction result corresponding to the second retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs. And so on.
  • All classification losses corresponding to the second retained image include: the first classification loss corresponding to the second retained image, the second classification loss corresponding to the second retained image, and the second classification loss corresponding to the second retained image. 3 classification losses, the fourth classification loss corresponding to the second retained image, and the fifth classification loss corresponding to the first retained image.
  • the third retained image in the set of retained images is assigned to the retained image in the top-to-bottom direction by using the preset retention ratio of 3/6 of the pedestrian image used for training to which the third retained image belongs. Obtained by intercepting the pedestrian images during training.
  • the preset number of divisions corresponding to the preset retention ratio for the training image to which the third retained image belongs is 3. According to the preset number of divisions of 3, the third retained image is divided in the top-to-bottom direction to obtain the sub-feature map sequence of the third retained image.
  • the sub-feature map sequence of the third retained image includes 3 sub-feature maps.
  • All softmax classifiers corresponding to the sub-feature map sequence of the third retained image include: the first softmax classifier, the second softmax classifier, the third softmax classifier, and the n+1th, that is, the seventh A softmax classifier.
  • the first sub-feature map in the sub-feature map sequence of the third retained image is used as the input of the first softmax classifier, and the first softmax classifier outputs the first classification prediction result corresponding to the third retained image.
  • Use the softmax loss function to obtain the first classification loss corresponding to the third retained image based on the first classification prediction result corresponding to the third retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs. And so on.
  • the feature map for the training image to which the third retained image belongs is used as the input of the seventh softmax classifier, and the seventh classifier outputs the fourth classification prediction result corresponding to the third retained image.
  • the softmax loss function is used to obtain the fourth classification loss corresponding to the third retained image based on the fourth classification prediction result corresponding to the third retained image and the classification and labeling result of the pedestrian image used for training to which the retained image belongs.
  • All classification losses corresponding to the third retained image include: the first classification loss corresponding to the third retained image, the second classification loss corresponding to the third retained image, and the third classification loss corresponding to the third retained image. Three classification losses, and the fourth classification loss corresponding to the third retained image.
  • Step 103 Generate a pedestrian re-recognition result based on the similarity between the pedestrian image to be identified and each reference pedestrian image.
  • the pedestrian re-identification result of the pedestrian image to be identified may be a pedestrian identification corresponding to the pedestrian image to be identified.
  • Each reference pedestrian image corresponds to a pedestrian identification.
  • the pedestrian identifier corresponding to the reference pedestrian image is the pedestrian identifier of the pedestrian to which the reference pedestrian belongs.
  • the pedestrian image corresponding to the reference image with the greatest similarity to the pedestrian image to be recognized can be
  • the identifier serves as the pedestrian identifier corresponding to the image of the pedestrian to be identified. Therefore, the identity of the pedestrian in the pedestrian image to be identified, that is, the pedestrian to which the pedestrian image belongs, is determined, and a pedestrian re-identification is completed.
  • FIG. 3 shows a structural block diagram of a pedestrian re-identification device provided by an embodiment of the present application.
  • the device includes: a feature extraction unit 301, a similarity calculation unit 302, and a generation unit 303.
  • the feature extraction unit 301 is configured to input the image of the pedestrian to be identified into the feature extraction network to obtain all the sub-feature map sequences to be identified of the image of the pedestrian to be identified output by the feature extraction network, wherein the feature extraction network is configured to: extract the pedestrian to be identified The feature map of the image; for each preset number of divisions in all the preset number of divisions, the feature map of the image to be identified for pedestrians is divided in the top-to-bottom direction using the preset number of divisions to obtain the same Preset the sub-characteristic sequence to be recognized corresponding to the number of divisions; use all the sub-characteristic sequence to be recognized as all the sub-characteristic sequence to be recognized of the pedestrian image to be recognized;
  • the similarity calculation unit 302 is configured to, for each reference pedestrian image, calculate the to-be-recognized pedestrian image and the to-be-recognized Refer to the similarity of pedestrian images;
  • the generating unit 303 is configured to generate a pedestrian re-recognition result of the pedestrian image to be recognized based on the similarity between the pedestrian image to be recognized and each reference pedestrian image.
  • all preset division numbers include: n/2, n, and at least one other preset division number, where n is the largest preset division number among all preset division numbers, and other preset division numbers are greater than n/2 and less than n.
  • the similarity calculation unit 302 is further configured to:
  • each sub-feature map includes: a reference sub-feature map sequence and a sub-feature map sequence to be identified;
  • the similarity of two sub-characteristic maps in each sub-characteristic map combination corresponding to the sub-characteristic map combination is calculated, and the sub-characteristic map combination includes: The sub-characteristic maps at the same position in the two sub-characteristic map sequences in the characteristic map sequence combination; taking the calculated average of all similarities as the similarity corresponding to the sub-characteristic map sequence combination;
  • the similarity between the pedestrian image to be recognized and the reference pedestrian image is calculated.
  • the calculating the similarity between the pedestrian image to be recognized and the reference pedestrian image based on the similarity corresponding to each combination of the sub-feature map sequence includes:
  • the smallest similarity among the similarities corresponding to each combination of sub-feature map sequences is used as the similarity between the pedestrian image to be identified and the reference pedestrian image.
  • the pedestrian re-identification device further includes:
  • the training unit is configured to train the feature extraction network in multiple training stages, where in each training stage, the following operations are performed:
  • a feature extraction network is used to extract the feature map of each reserved image in the group of reserved images;
  • the preset number of divisions corresponding to the preset retention ratio of the training image to which the retained image belongs, and the feature map of the retained image is divided in a top-to-bottom direction to obtain the sub-feature map of the retained image Sequence; at least based on the sub-feature map sequence of each retained image, calculate all the losses corresponding to the set of retained images; based on all the losses corresponding to the set of retained images, the parameters of the parameters of the feature extraction network The value is updated.
  • the preset retention ratio of each pedestrian image used for training in the training stage is determined in a random manner, so that for each pedestrian image used for training, each The total number of times the preset retention ratio is applied to the pedestrian image for training is uniform.
  • all losses corresponding to the set of reserved images include: distance loss corresponding to the set of reserved images, and all classification losses corresponding to each reserved image;
  • calculating all losses corresponding to the set of reserved images includes:
  • all the classifiers corresponding to the reserved image are used, based on the feature map of the pedestrian image for training to which the reserved image belongs and the sub-groups of the reserved image.
  • Feature map sequence to obtain all classification prediction results corresponding to the retained image wherein the feature map of the pedestrian image used for training to which the retained image belongs is used as the input of the complete feature map supervised classifier among all the classifiers,
  • Each sub-feature map in the sequence of sub-feature maps of the reserved image is used as an input of one of all the classifiers;
  • a storage medium including instructions such as a memory including instructions, and the foregoing instructions may be executed by an electronic device to complete the foregoing method.
  • the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage Equipment, etc.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
  • the various component embodiments of the present application may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to the embodiments of the present application.
  • This application can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • any reference signs placed between parentheses should not be constructed as a limitation to the claims.
  • the word “comprising” does not exclude the presence of elements or steps not listed in the claims.
  • the word “a” or “an” preceding an element does not exclude the presence of multiple such elements.
  • the application can be realized by means of hardware including several different elements and by means of a suitably programmed computer. In the unit claims listing several devices, several of these devices may be embodied in the same hardware item.
  • the use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供了行人再识别方法、装置、电子设备及存储介质,该方法包括:利用特征提取网络得到待识别行人图像的所有待识别子特征图序列;基于待识别行人图像的所有待识别子特征图序列和每一个参考行人图像的所有参考子特征图序列,计算待识别行人图像与每一个参考行人图像的相似度;基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。在待识别行人图像为待识别行人图像所属的全身图或非全身图的情况和/或参考行人图像为参考行人图像所属的参考行人的全身图或非全身图的情况下,均可以较为准确地计算待识别行人图像与参考行人图像的相似度,较为准确地完成行人再识别。

Description

行人再识别方法、装置、电子设备及存储介质
本申请要求在2020年3月31日提交中国专利局、申请号为202010246639.3、发明名称为“行人再识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
计算机视觉领域
背景技术
行人再识别(person re-identification)是计算机视觉中一项重要研究课题,特别是随着视频监控、智慧安防等场景下日益增长的相关需求,行人再识别获得越来越多的关注。
目前,采用的行人再识别方法仅可以在针对待识别行人图像为待识别行人图像所属的待识别行人的全身图即待识别行人图像包括待识别行人的身体的全部的情况下进行行人再识别。待识别行人图像通常依靠行人检测网络通过行人框从包括一个或多个行人的监控图像中获取。利用从待识别行人图像提取的所有特征与每一个参考行人图像对应的所有预设特征进行比较,以确定待识别行人图像与哪一个参考行人图像属于同一个行人,完成行人再识别。
然而,实际的行人再识别场景中经常出现在上下方向,待识别行人的身体的一部分被其他对象例如其他行人、交通工具、障碍物等遮挡的情况。在上述情况下,获取的待识别行人图像仅包括待识别行人的身体的一部分,获取的行人图像不是待识别行人图像所属的待识别行人的全身图,而是待识别行人的非全身图。此外,参考行人图像同样可能出现不是参考行人图像所属的参考行人的全身图的情况。
在出现上述情况时,无法采用目前的行人再识别方法进行行人再识别。因此,如何在出现上述情况是进行行人再识别成为一个亟待解决的问题。
发明内容
为克服相关技术中存在的问题,本申请提供一种行人再识别方法、装置、电子设备及存储介质。
本申请提供一种行人再识别方法、装置、电子设备及存储介质。
根据本申请实施例的第一方面,提供一种行人再识别方法,包括:
将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列,其中,特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以所述预设划分数量,在从上至下的方向上对待识别 行人图像的特征图进行划分,得到与所述预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像的所有待识别子特征图序列;
对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度;
基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。
根据本申请实施例的第二方面,提供一种行人再识别装置,包括:
特征提取单元,被配置为将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列,其中,特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以所述预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行划分,得到与所述预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像的所有待识别子特征图序列;
相似度计算单元,被配置为对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度;
生成单元,被配置为基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。
本申请实施例提供的行人再识别方法、装置,实现了在进行行人再识别时,考虑了待识别行人图像为待识别行人图像所属的待识别行人的全身图或非全身图的情况,同时,考虑了参考行人图像为参考行人图像所属的参考行人的全身图或非全身图的情况。在待识别行人图像为待识别行人图像所属的全身图或非全身图的情况和/或参考行人图像为参考行人图像所属的参考行人的全身图或非全身图的情况下,均可以较为准确地计算待识别行人图像与参考行人图像的相似度,较为准确地完成行人再识别。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1示出了本申请实施例提供的行人再识别方法的流程图;
图2示出了以不同的保留比例获取保留图像的效果示意图;
图3示出了本申请实施例提供的行人再识别装置的结构框图。
具体实施例
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了本申请实施例提供的方法的流程图,该方法包括:
步骤101,利用特征提取网络得到待识别行人图像的所有待识别子特征图序列。
在本申请中,将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列。
特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以该预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行划分,得到与该预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像的所有待识别子特征图序列。
在本申请中,特征提取网络可以包括用于提取待识别行人图像的特征图的卷积神经网络。
待识别行人图像属于在该待识别行人图像中出现的待识别行人,待识别行人图像中的行人类型的对象仅包括该待识别行人。
在本申请中,对于待识别行人图像的每一个待识别子特征图序列,该待识别子特征图序列中的任意两个相邻的待识别子特征图无重叠。每一个待识别子特征图的尺寸相同或基本相同。
对于每一个预设划分数量,以该预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,在从上至下的方向上待识别行人图像的特征图被划分为预设划分数量个尺寸基本相同的子特征图,预设划分数量个尺寸相同的子特征图组成与该预设划分数量相对应的子特征图序列。
例如,所有预设划分数量包括n/2、n/2+1、n/2+2、…、n。假设n为6,所有预设划分数量包括3、4、5、6。
以预设划分数量3,在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,在从上至下的方向上待识别行人图像的特征图被划分为3个尺寸基本相同的子特征图,3个尺寸基本相同的子特征图组成与该预设划分数量3相对应的子特征图序列。以预设划分数量4,在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,在从上至下的方向上待识别行人图像的特征图被划分为4个尺寸基本相同的子特征图,4个尺寸基本相同的子特征图组成与该预设划分数量4相对应的子特征图序列。以此类推。
在以所有预设划分数量,分别在从上至下的方向上对待识别行人图像的 特征图划分之后,将得到的所有子特征图序列作为待识别行人图像的所有子特征图序列。
例如,所有预设划分数量包括3、4、5、6。
以所有预设划分数量,在从上至下的方向上对待识别行人图像的特征图划分,得到的所有子特征图序列包括:与预设划分数量3相对应的待识别子特征图序列、与该预设划分数量4相对应的待识别子特征图序列、与该预设划分数量5相对应的待识别子特征图序列、与该预设划分数量6相对应的待识别子特征图序列。
在本申请中,以每一个该预设划分数量,分别在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,是由于考虑了在上下方向上,可能存在的多种待识别行人图像与待识别行人图像所属的待识别行人的全身图之间的关系。
对于所有预设划分数量中的每一个预设划分数量,该预设划分数量与最大的预设划分数量的比例指示待识别行人图像与待识别行人图像所属的待识别行人的全身图的比例。
例如,所有预设划分数量包括3、4、5、6,6为最大预设划分数量。
以预设划分数量3,在从上至下的方向上对待识别行人图像的特征图划分,是假设待识别行人图像是待识别行人图像所属的待识别行人的全身图3/6的情况。
以预设划分数量4,在从上至下的方向上对待识别行人图像的特征图划分,是假设待识别行人图像是待识别行人图像所属的待识别行人的全身图4/6的情况。
以预设划分数量5,在从上至下的方向上对待识别行人图像的特征图划分,是假设待识别行人图像是待识别行人图像所属的待识别行人的全身图5/6的情况。
以预设划分数量6,在从上至下的方向上对待识别行人图像的特征图划分,是假设待识别行人图像是待识别行人图像所属的待识别行人的全身图的情况。
步骤102,基于相关的子特征图序列,计算待识别行人图像与每一个参考行人图像的相似度。
对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和该参考行人图像的所有参考子特征图序列,计算待识别行人图像与该参考行人图像的相似度。
对于每一个参考行人图像,该参考行人图像属于在该参考行人图像中出现的参考行人,该参考行人图像中的行人类型的对象仅包括该参考行人。
在本申请中,对于每一个参考行人图像,可以预先以所有预设划分数量,分别在从上至下的方向上对该参考行人图像的特征图进行划分,将得到的所有参考子特征图序列作为该参考行人图像的所有参考子特征图序列。参考行人图像的特征图通过特征提取网络提取。
例如,所有预设划分数量包括3、4、5、6。对于每一个参考行人图像,
以所有预设划分数量,分别在从上至下的方向上对该参考行人图像的特征图划分,该参考行人图像的所有参考子特征图序列包括:与预设划分数量3相对应的参考子特征图序列、与该预设划分数量4相对应的参考子特征图序列、与该预设划分数量5相对应的参考子特征图序列、与该预设划分数量 6相对应的参考子特征图序列。
以每一个预设划分数量,分别在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,是由于考虑了在上下方向上,可能存在的多种待识别行人图像与待识别行人图像所属的待识别行人的全身图之间的关系,同理,以每一个预设划分数量,分别在从上至下的方向上对待识别行人图像的特征图进行均匀地划分,是由于考虑了在上下方向上,可能存在的多种参考行人图像与参考行人图像所属的参考行人的全身图之间的关系。
以下说明计算待识别行人图像与一个参考行人图像的相似度的过程,计算待识别行人图像与任意一个参考行人图像的相似度的过程可以参考该过程:
该参考行人图像的所有参考子特征图序列中的每一个参考子特征图序列各自对应待识别行人图像的所有待识别子特征图序列中的一个待识别子特征图序列。
对于该每一个参考子特征图序列,该参考子特征图序列和该参考子特征图序列对应的待识别子特征图序列对应于同一预设划分数量。
例如,所有预设划分数量包括3、4、5、6。
以所有预设划分数量,分别在从上至下的方向上分别对待识别行人图像的特征图进行划分,待识别行人图像的所有待识别子特征图序列包括:与预设划分数量3相对应的待识别子特征图序列1、与该预设划分数量4相对应的待识别子特征图序列2、与预设划分数量5相对应的待识别子特征图序列3、与预设划分数量6相对应的待识别子特征图序列4。
以所有预设划分数量,分别在从上至下的方向上分别对该参考行人图像的特征图进行划分,该参考行人图像的所有参考子特征图序列包括:与预设划分数量3相对应的参考子特征图序列1、与该预设划分数量4相对应的参考子特征图序列2、与预设划分数量5相对应的参考子特征图序列3、与预设划分数量6相对应的参考子特征图序列4。
与预设划分数量3相对应的待识别子特征图序列1对应与预设划分数量3相对应的参考子特征图序列1。
与预设划分数量4相对应的待识别子特征图序列2对应与该预设划分数量4相对应的参考子特征图序列2。
与预设划分数量5相对应的待识别子特征图序列3对应与预设划分数量5相对应的参考子特征图序列3。
与预设划分数量6相对应的参考子特征图序列4对应与预设划分数量6相对应的参考子特征图序列4。
可以计算该参考子特征图序列中的每一个参考子特征图和其对应的待识别子特征图的相似度,即计算待识别子特征图序列1与参考子特征图序列1的相似度、待识别子特征图序列2与参考子特征图序列2的相似度、待识别子特征图序列3与参考子特征图序列3的相似度、待识别子特征图序列4与参考子特征图序列4的相似度。
然后,可以将计算出的所有相似度中的最小的相似度作为待识别行人图像与该参考行人图像的相似度。
在一些实施例中,对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和该参考行人图像的所有参考子特征图序列,计算待识别行人图像与该参考行人图像的相似度包括:将待识别行人图像的待识别 子特征图序列和该参考行人图像的参考子特征图序列进行组合,得到与该参考行人图像相关的多个子特征图序列组合,其中,每一个子特征图序列组合包括:一个参考子特征图序列、一个待识别子特征图序列;对于该每一个子特征图序列组合,计算与该子特征图序列组合相对应的每一个子特征图组合中的两个子特征图的相似度,与该子特征图序列组合相对应的子特征图组合包括:该子特征图序列组合中的两个子特征图序列中的位于同一位置的子特征图;将计算出的所有相似度的平均值作为该子特征图序列组合对应的相似度;基于该每一个子特征图序列组合对应的相似度,计算待识别行人图像与该参考行人图像的相似度。
对于每一个参考行人图像,当将待识别行人图像的待识别子特征图序列和该参考行人图像的参考子特征图序列进行组合时,对于待识别行人图像的每一个待识别子特征图序列,将该待识别子特征图序列分别与该参考行人图像的每一个参考子特征图序列进行组合,得到与该参考行人图像相关的多个子特征图序列组合。
以一个参考行人图像为例,假设所有预设划分数量包括3、4、5、6。
待识别行人图像的所有子特征图序列包括:与预设划分数量3相对应的待识别子特征图序列1、与该预设划分数量4相对应的待识别子特征图序列2、与该预设划分数量5相对应的待识别子特征图序列3、与该预设划分数量6相对应的待识别子特征图序列4。
该参考行人图像的所有子特征图序列包括:与预设划分数量3相对应的参考子特征图序列1、与该预设划分数量4相对应的参考子特征图序列2、与该预设划分数量5相对应的参考子特征图序列3、与该预设划分数量6相对应的参考子特征图序列4。
待识别子特征图序列1分别与参考子特征图序列1、参考子特征图序列2、参考子特征图序列3、参考子特征图序列4进行组合,得到由待识别子特征图序列1与参考子特征图序列1组成的子特征图序列组合、由待识别子特征图序列1与参考子特征图序列2组成的子特征图序列组合、由待识别子特征图序列1分别与参考子特征图序列3组成的子特征图序列组合、由待识别子特征图序列1分别与参考子特征图序列4组成的子特征图序列组合。
待识别子特征图序列2分别与参考子特征图序列1、参考子特征图序列2、参考子特征图序列3、参考子特征图序列4进行组合,以此类推。
在本申请中,对于每一个参考行人图像,可以计算与该参考行人图像相关的每一个子特征图序列组合对应的相似度,然后,可以基于与该参考行人图像相关的每一个子特征图序列组合对应的相似度,计算待识别行人图像与该参考行人图像的相似度。
例如,将与该参考行人图像相关的每一个子特征图序列组合对应的相似度中的作为中位数的相似度或所有与该参考行人图像相关的所有子特征图序列组合对应的相似度中的相似度的平均值作为待识别行人图像与该参考行人图像的相似度。
在一些实施例中,对于每一个参考行人图像,基于与该参考行人图像相关的每一个子特征图序列组合对应的相似度,计算待识别行人图像与该参考行人图像的相似度包括:将该每一个子特征图序列组合对应的相似度中的最小的相似度作为待识别行人图像与该参考行人图像的相似度。
以下说明计算与一个参考行人图像相关的一个子特征图序列组合对应 的相似度的过程:
该子特征图序列组合包括一个待识别子特征图序列、一个参考子特征图序列。
计算与该子特征图序列组合相对应的每一个子特征图组合中的两个子特征图的相似度。然后,将计算出的所有相似度的平均值作为该子特征图序列组合对应的相似度。
与该子特征图序列组合相对应的子特征图组合包括:该子特征图序列组合中的两个子特征图序列中的位于同一位置的子特征图。
在本申请中,对于每一个待识别子特征图,该待识别子特征图在待识别子特征图序列中的位置指示在从上至下的方向上,该待识别子特征图为待识别行人图像的特征图的第几个待识别子特征图。
例如,对于预设划分数量3,在从上至下的方向上待识别行人图像的特征图划分为3个待识别子特征图,得到与该预设划分数量3相对应的、包括该3个待识别子特征图的待识别子特征图序列。该待识别子特征图序列中的第1个待识别子特征图为在从上至下的方向上,待识别行人图像的特征图的第1个待识别子特征图。该待识别子特征图序列中的第2个待识别子特征图为在从上至下的方向上,待识别行人图像的特征图的第2个待识别子特征图。该待识别子特征图序列中的第3个待识别子特征图为在从上至下的方向上,待识别行人图像的特征图的第3个待识别子特征图。
同理,对于每一个参考子特征图,该参考子特征图在参考子特征图序列中的位置指示在从上至下的方向上,该参考子特征图为参考行人图像的特征图的第几个待参考子特征图。
当该子特征图序列组合中的待识别子特征图序列包括的待识别子特征图的数量和该子特征图序列组合中的参考子特征图序列包括的参考子特征图的数量相同时,每一个待识别子特征图各自与参考子特征图序列中的一个参考子特征图组成与该子特征图序列组合相对应的一个子特征图组合。
例如,待识别子特征图序列包括3个待识别子特征图,参考子特征图序列包括3个参考子特征图。
该待识别子特征图序列中的第1个待识别子特征图与参考子特征图序列中的第1个参考子特征图组成与该子特征图序列组合相对应的子特征图组合相对应的第1个子特征图组合。
该待识别子特征图序列中的第2个待识别子特征图与参考子特征图序列中的第2个参考子特征图组成与该子特征图序列组合相对应的子特征图组合相对应的第2个子特征图组合。
该待识别子特征图序列中的第3个待识别子特征图与参考子特征图序列中的第3个参考子特征图组成与该子特征图序列组合相对应的子特征图组合相对应的第3个子特征图组合。
计算与该子特征图序列组合相对应的每一个子特征图组合中的两个子特征图的相似度,即计算该待识别子特征图序列中的第1个待识别子特征图与该参考子特征图序列中的第1个参考子特征图的相似度、该待识别子特征图序列中的第2个待识别子特征图与该参考子特征图序列中的第2个参考子特征图的相似度、该待识别子特征图序列中的第3个待识别子特征图与该参考子特征图序列中的第3个参考子特征图的相似度。
然后,将计算出的所有相似度的平均值作为该子特征图序列组合对应的 相似度。
当该子特征图序列组合中的待识别子特征图序列包括的待识别子特征图的数量和该子特征图序列组合中的参考子特征图序列包括的参考子特征图的数量不相同时,对应的预设划分数量少的子特征图序列包括的子特征图的数量利用N表示。
对应的预设划分数量少的子特征图序列中的一个子特征图各自与对应的预设划分数量多的子特征图序列中的前N个子特征图中的一个子特征图相对应。
对于对应的预设划分数量少的子特征图序列中的每一个子特征图,该子特征图在对应的预设划分数量少的子特征图序列中的位置与该子特征图对应的子特征图在对应的预设划分数量多的子特征图序列中的位置相同。
对于对应的预设划分数量少的子特征图序列中的每一个子特征图,该子特征图与其对应的子特征图组成一个子特征图组合。
位置在对应的预设划分数量多的子特征图序列中的第N个子特征图之后的子特征图不参与相似度的计算。
例如,待识别子特征图序列包括3个待识别子特征图,参考子特征图序列包括5个参考子特征图。N为3。
对应的预设划分数量少的子特征图序列为待识别子特征图序列,对应的预设划分数量多的子特征图序列为参考子特征图序列。
位置在对应的预设划分数量多的子特征图序列中的第3个子特征图之后的子特征图即第4个参考子特征图、第5个参考子特征图不参与相似度的计算。
该待识别子特征图序列中的第1个待识别子特征图与参考子特征图序列中的第1个参考子特征图组成与两个子特征图序列相对应的第1个子特征图组合。该待识别子特征图序列中的第2个待识别子特征图与参考子特征图序列中的第2个参考子特征图组成与两个子特征图序列相对应的第2个子特征图组合。该待识别子特征图序列中的第3个待识别子特征图与参考子特征图序列中的第3个参考子特征图组成与两个子特征图序列相对应的第3个子特征图组合。
计算与该子特征图序列组合相对应的子特征图组合相对应的每一个子特征图组合中的两个子特征图的相似度,得到该待识别子特征图序列中的第1个待识别子特征图与参考子特征图序列中的第1个参考子特征图的相似度、该待识别子特征图序列中的第2个待识别子特征图与参考子特征图序列中的第2个参考子特征图的相似度、该待识别子特征图序列中的第3个待识别子特征图与参考子特征图序列中的第3个参考子特征图的相似度。
然后,将计算出的所有相似度的平均值作为该子特征图序列组合对应的相似度。
在一些实施例中,在多个训练阶段对特征提取网络进行训练,其中,在每一个训练阶段,执行以下操作:对于每一个用于训练的行人图像,以该用于训练的行人图像的预设保留比例,从该用于训练的行人图像获取属于该用于训练的行人图像的保留图像;将在该训练阶段得到的所有保留图像划分为多组保留图像;对于该多组保留图像中的每一组保留图像,利用特征提取网络提取该一组保留图像中的每一个保留图像的特征图;对于该每一个保留图像的特征图,以与该保留图像所属的用于训练图像的预设保留比例相对应的 预设划分数量,在从上至下的方向上对该保留图像的特征图进行划分,得到该保留图像的子特征图序列;至少基于该每一个保留图像的子特征图序列,计算该一组保留图像对应的所有损失;基于该一组保留图像对应的所有损失,对特征提取网络的参数的参数值进行更新。
在本申请中,在每一个训练阶段即每一个epoch,可以使用同一个训练集中的所有用于训练的行人图像对特征提取网络进行训练。
对于每一个用于训练的行人图像,该用于训练的行人图像在每一个训练阶段均被用于对特征提取网络进行训练。
在本申请中,可以预先设置多个预设保留比例。例如,预先设置1、(n-1)/n、(n-2)/n、...、(n/2)/n等预设保留比例,假设n为6,预先设置1、5/6、4/6、3/6等预设保留比例。
在每一个训练阶段,每一个用于训练的行人图像各自具有一个预设保留比例。
在每一个训练阶段,对于每一个用于训练的行人图像,从该用于训练的行人图像获取属于该用于训练的行人图像的保留图像。
对于一个用于训练的行人图像,当该用于训练的行人图像的预设保留比例为1时,该用于训练的行人图像直接作为属于所述用于训练的行人图像的保留图像。换言之,保留该用于训练的行人图像的全部。
对于一个用于训练的行人图像,当该用于训练的行人图像的预设保留比例不是1时,从用于训练的行人图像获取属于该用于训练的行人图像的保留图像为以该用于训练的行人图像的预设保留比例,对该用于训练的行人图像进行截取,得到属于该用于训练的行人图像的保留图像,保留该用于训练的行人图像的预设保留比例的部分,即属于该用于训练的行人图像的保留图像为该用于训练的行人图像的预设保留比例的部分。
对于一个用于训练的行人图像,当以该用于训练的行人图像的预设保留比例,对该用于训练的行人图像进行截取时,可以在从上至下的方向上对该用于训练的行人图像进行截取,保留在从上至下的方向上该用于训练的行人图像的预设保留比例的部分,即属于该用于训练的行人图像的保留图像为在从上至下的方向上该用于训练的行人图像的预设保留比例的部分。
例如,对于一个用于训练的行人图像,当该用于训练的行人图像的预设保留比例5/6时,以该用于训练的行人图像的预设保留比例,在从上至下的方向上对该用于训练的行人图像进行截取,保留在从上至下的方向上该用于训练的行人图像的5/6的部分,得到属于该用于训练的行人图像的保留图像,属于该用于训练的行人图像的保留图像为在从上至下的方向上该用于训练的行人图像的5/6的部分。
请参考图2,其示出了以不同的预设保留比例获取保留图像的效果示意图。
在图2中,示出了4个利用矩形表示的用于训练的行人图像。用于训练的行人图像中的数值表示用于训练的行人图像的预设保留比例。
对于预设保留比例为6/6的用于训练的行人图像,用于训练的行人图像直接作为保留图像。表示用于训练的行人图像的矩形中的虚线为属于用于训练的行人图像的保留图像的底边,属于用于训练的行人图像的保留图像的顶边为用于训练的行人图像的顶边。
对于预设保留比例为5/6的用于训练的行人图像,属于该用于训练的行 人图像的保留图像为在从上至下的方向上该用于训练的行人图像的5/6的部分。
对于预设保留比例为4/6的用于训练的行人图像,属于该用于训练的行人图像的保留图像为在从上至下的方向上该用于训练的行人图像的4/6的部分。
对于预设保留比例为3/6的用于训练的行人图像,属于该用于训练的行人图像的保留图像为在从上至下的方向上该用于训练的行人图像的3/6的部分。
在本申请中,对于每一个训练阶段,可以预先设置在该训练阶段,
对于每一个预设保留比例,具有该预设保留比例的用于训练的行人图像的数量。
例如,预先设置1、(n-1)/n、(n-2)/n、...、(n/2)/n等预设保留比例,假设n为6,预先设置1、5/6、4/6、3/6等预设保留比例。
对于一个训练阶段,预先设置在该训练阶段,具有预设保留比例1的用于训练的行人图像的数量、预先设置具有预设保留比例5/6的用于训练的行人图像的数量、具有预设保留比例4/6的用于训练的行人图像的数量、具有预设保留比例3/6的用于训练的行人图像的数量。
在一些实施例中,对于每一个训练阶段,在该训练阶段中每一个用于训练的行人图像的预设保留比例以随机方式确定,以使得对于每一个用于训练的行人图像,每一个预设保留比例应用在该用于训练的行人图像上的总次数均匀。
在每一个训练阶段,每一个用于训练的行人图像的预设保留比例以随机方式确定,即在每一个训练阶段,对于每一个用于训练的行人图像,以均匀分布的概率从所有预设保留比例中选取出作为该用于训练的行人图像的预设保留比例的预设保留比例。从而,使得对于每一个用于训练的行人图像,每一个预设保留比例应用在该用于训练的行人图像上的总次数均匀。对于每一个预设保留比例,该预设保留比例应用在该用于训练的行人图像上的总次数为在所有训练阶段中该预设保留比例应用在该用于训练的行人图像上的总次数。
例如,预先设置1、(n-1)/n、(n-2)/n、...、(n/2)/n等预设保留比例,假设n为6,预先设置1、5/6、4/6、3/6等预设保留比例。
在每一个训练阶段,每一个用于训练的行人图像的预设保留比例以随机方式确定。从而,对于每一个用于训练的行人图像,1、5/6、4/6、3/6等预设保留比例均应用在该用于训练的行人图像上,并且1、5/6、4/6、3/6等预设保留比例均应用在该用于训练的行人图像上的总次数均匀,例如1、5/6、4/6、3/6等预设保留比例均应用在该用于训练的行人图像上的总次数基本相等。
在本申请中,对于每一个训练阶段,将在该训练阶段得到的所有保留图像划分为多组保留图像,在该训练阶段得到的每一组保留图像包括的保留图像的数量可以相同。对于在该训练阶段所有保留图像中的每一个保留图像,该保留图像属于多组保留图像中的一组保留图像。
对于每一个训练阶段,在该训练阶段的每一次训练过程中,利用在该训练阶段得到的多组保留图像中的一组保留图像对特征提取网络进行训练,该训练阶段的每一次训练过程使用的一组保留图像不同。
对于每一个训练阶段,在该训练阶段的每一次训练过程中,利用特征提取网络提取在该训练阶段得到的一组保留图像中的每一个保留图像的特征图;对于该每一个保留图像的特征图,以与该保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量,对该保留图像的特征图进行划分,得到该保留图像的子特征图序列。
对于在该训练阶段得到的所有保留图像中的每一个保留图像,与该保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为为该预设保留比例中的分子。
例如,在一个训练阶段的一次训练训练过程中,使用在该训练阶段得到的一组保留图像进行训练,该一组保留图像包括3个保留图像。
该一组保留图像中的第1个保留图像通过以该第1个保留图像所属用于训练的行人图像的预设保留比例5/6,对该保留图像所属用于训练的行人图像进行截取得到。与该第1个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为5。以预设划分数量5,在从上至下的方向上对该第1个保留图像的特征图进行划分,得到该第1个保留图像的子特征图序列。该第1个保留图像的子特征图序列包括5个子特征图。
该一组保留图像中的第2个保留图像通过以该第2个保留图像所属用于训练的行人图像的预设保留比例4/6,对该保留图像所属用于训练的行人图像进行截取得到。与该第2个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为4。以预设划分数量4,在从上至下的方向上对该第2个保留图像的特征图进行划分,得到该第2个保留图像的子特征图序列。该第2个保留图像的子特征图序列包括4个子特征图。
该一组保留图像中的第3个保留图像通过以该第3个保留图像所属用于训练的行人图像的预设保留比例3/6,对该保留图像所属用于训练的行人图像进行截取得到。与该第3个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为3。以预设划分数量3,在从上至下的方向上对该第3个保留图像的特征图进行划分,得到该第3个保留图像的子特征图序列。该第3个保留图像的子特征图序列包括3个子特征图。
在本申请中,对于每一个训练阶段,在该训练阶段的每一次训练过程中,可以基于在该训练过程中使用的一组保留图像中的每一个保留图像的子特征图序列,计算该一组保留图像对应的所有损失;基于该一组保留图像对应的所有损失,对特征提取网络的参数的参数值进行更新。
从而,对于每一个训练阶段,在该训练阶段的每一次训练过程中,根据一组保留图像对应的所有损失,对特征提取网络的参数的参数值进行更新。
在本申请中,对于每一个训练阶段,在该训练阶段的每一次训练过程中,一组保留图像对应的所有损失可以为一组保留图像对应的距离损失。
对于每一个训练阶段,在该训练阶段的每一次训练过程中,可以通过度量学习(Metric Learning)计算一组保留图像对应的距离损失,根据一组保留图像对应的距离损失,更新特征提取网络的参数的参数值。
通过度量学习方式计算一组保留图像对应的距离损失,根据一组保留图像对应的距离损失,更新特征提取网络的参数的参数值的目的为使得属于同一行人的行人图像的特征图或子特征图之间的相似度越来越大,属于不同的行人的行人图像的特征图或子特征图之间的相似度越来越小。
当通过度量学习计算一组保留图像对应的距离损失时,可以对于一组保 留图像中的每一个保留图像,计算该保留图像的子特征图序列与每一个其他的子特征图序列之间的距离。
然后,利用损失函数Triplet Loss,基于计算出的所有距离,计算一组保留图像对应的距离损失。
例如,一组保留图像包括属于行人图像1的保留图像、属于行人图像2保留图像、属于行人图像3的保留图像。
行人图像1、行人图像2为均为行人1的行人图像。行人图像3为与行人1不同的行人2的行人图像。
计算属于行人图像1的保留图像的子特征图序列与属于行人图像2的保留图像的子特征图序列之间的距离、属于行人图像1的保留图像的子特征图序列与属于行人图像3的保留图像的子特征图序列之间的距离、属于行人图像2的保留图像的子特征图序列与属于行人图像3的保留图像的子特征图序列之间的距离。
然后,利用损失函数Triplet Loss,基于计算出的3个距离,计算该该一组保留图像对应的距离损失。
以下说明计算两个保留图像的子特征图序列之间的距离的过程,计算任意两个保留图像的子特征图序列之间的距离的过程可以参考该过程:
当该两个保留图像中的第一保留图像的子特征图序列包括的子特征图的数量和该两个保留图像中的第二保留图像的子特征图序列包括的子特征图的数量相同时,第一保留图像的子特征图序列中的每一个子特征图各自对应第二保留图像的子特征图序列中的一个子特征图。对于第一保留图像的子特征图序列中的每一个子特征图,该子特征图在第一保留图像的子特征图序列中的位置与该子特征图对应的子特征图在第二保留图像的子特征图序列中的位置相同。
对于第一保留图像的子特征图序列中的每一个子特征图,计算该子特征图与该子特征图对应的子特征图之间的距离。将计算出的所有距离的平均值作为第一保留图像的子特征图序列与第二保留图像的子特征图序列之间的距离。
当该两个保留图像中的第一保留图像的子特征图序列包括的子特征图的数量和该两个保留图像中的第二保留图像的子特征图序列包括的子特征图的数量不相同时,包括的子特征图数量少的子特征图序列包括的子特征图的数量利用N表示,包括的子特征图数量多的子特征图序列中的前N个子特征图各自对应包括的子特征图数量少的子特征图序列中的一个子特征图。
对于包括的子特征图数量少的子特征图序列中的每一个子特征图,该子特征图在包括的子特征图数量少的子特征图序列中的位置与该子特征图对应的子特征图在包括的子特征图数量多的子特征图序列中的位置相同。
位置在包括的子特征图数量多的子特征图序列中的第N个子特征图之后的子特征图不参与距离的计算。
对于包括的子特征图数量少的子特征图序列中的每一个子特征图,计算该子特征图与该子特征图对应的子特征图之间的距离。
将计算出的所有距离的平均值作为第一保留图像的子特征图序列与第二保留图像的子特征图序列之间的距离。
在一些实施例中,对于每一组保留图像,该一组保留图像对应的所有损失包括:该一组保留图像对应的距离损失、该一组保留图像中的每一个保留 图像对应的所有分类损失;至少基于该每一个保留图像的子特征图序列,计算该一组保留图像对应的所有损失包括:
基于该每一个保留图像的子特征图序列,计算该一组保留图像对应的距离损失;
对于该一组保留图像中的每一个保留图像,利用与该保留图像相对应的所有分类器,基于该保留图像所属的用于训练的行人图像的特征图和所述保留图像的子特征图序列,得到该保留图像对应的所有分类预测结果,其中,该保留图像所属的用于训练的行人图像的特征图作为该所有分类器中的完整特征图监督分类器的输入,该保留图像的子特征图序列中的每一个子特征图各自作为该所有分类器中的一个分类器的输入;对于该一组保留图像中的每一个保留图像,基于该保留图像对应的所有分类预测结果、该保留图像所属用于训练的行人图像的分类标注结果,计算该每一个保留图像对应的所有分类损失。
对于每一个训练阶段,在该训练阶段的每一次训练过程中,可以同时将度量学习方式和分类器用于特征提取网络的训练,以加快特征提取网络的收敛速度以及增强特征提取网络提取具有区分度的特征的性能。
在本申请中,用于对特征提取网络训练进行训练的分类器的数量可以为n+1个,n为所有预设划分数量中的最大值,即最大的预设划分数量,例如,n为6。
分类器的输入为保留图像所属的用于训练的行人图像的特征图或保留图像的子特征图序列中的子特征图。分类器输出的预测分类结果可以为一个行人标识。
每一个预先确定的行人标识各自属于所有预先确定的所有行人中的一个预先确定的行人。该所有预先确定的所有行人由所有用于训练的行人图像中的每一个用于训练的行人图像各自所属的行人组成。
换言之,对于该所有预先确定的行人中的每一个预先确定的行人,所有用于训练的行人图像包括至少一个属于该预先确定的行人的用于训练的行人图像。
分类器基于分类器的输入进行预测,得到预测分类结果可以相当于分类器预测保留图像所属的用于训练的行人图像属于所有预先确定的行人中的哪一个行人。
分类器输出的预测分类结果也可以为每一个预先确定的行人的概率,预先确定的行人的概率可以指示保留图像所属的用于训练的行人图像属于该预先确定的行人的概率。
在本申请中,每一个分类器均可以为softmax分类器,可以利用softmax损失函数计算一组保留图像对应的分类损失。
n+1个分类器中的第n+1个分类器的输入为保留图像所属的用于训练的行人图像的特征图,第n+1个分类器可以称之为完整特征图监督分类器。
n+1个分类器中的第1、2...n个分类器的输入为保留图像的子特征图序列中的不同的子特征图。
任意一个保留图像对应的所有分类器均包括第n+1个分类器即完整特征图监督分类器。
对于每一个训练阶段,在该训练阶段的每一次训练过程中,对于该一组保留图像中的每一个保留图像,利用与该保留图像相对应的所有分类器,基 于该保留图像所属的用于训练的行人图像的特征图和该保留图像的子特征图序列,得到该保留图像对应的所有分类预测结果。
对于该一组保留图像中的每一个保留图像,该保留图像所属的用于训练的行人图像的特征图对应所有分类器中的完整特征图监督分类器,该保留图像所属的用于训练的行人图像的特征图作为第n+1个分类器即完整特征图监督分类器的输入。
对于该一组保留图像中的每一个保留图像,该保留图像的子特征图序列中的每一个子特征图各自对应与该保留图像相对应的所有分类器中的一个分类器。对于该保留图像的子特征图序列中的每一个子特征图,该子特征图作为该子特征图对应的分类器的输入。
对于该保留图像的子特征图序列中的每一个子特征图,该子特征图对应的分类器为在n+1个分类器中的的位置与该子特征图在该保留图像的子特征图序列中的位置相同的分类器。
该保留图像的子特征图序列中的第1个子特征图对应n+1个分类器中的第1个分类器,该保留图像的子特征图序列中的第1个子特征图作为第1个分类器的输入,第1个分类器输出该保留图像对应的第1个分类预测结果,该保留图像的子特征图序列中的第2个子特征图对应n+1个分类器中的第2个分类器,该保留图像的子特征图序列中的第2个子特征图作为第2个分类器的输入,第2个分类器输出该保留图像对应的第2个分类预测结果,以此类推。该保留图像所属的用于训练图像作为第n+1个分类器即完整特征图监督分类器的输入,第n+1个分类器输出该保留图像对应的最后一个分类预测结果。
在本申请中,对于每一个用于训练的行人图像,该用于训练的行人图像的分类标注结果可以为一个预先确定的行人标识,该预先确定的行人标识为该用于训练的行人图像所属的行人的行人标识。
以下说明在一个训练阶段的一次训练训练过程中计算一组保留图像
中的每一个保留图像各自对应的所有分类损失的过程,在一个训练阶段的任意一次训练训练过程中计算任意一组保留图像一组保留图像中的每一个保留图像各自对应的所有分类损失的过程参考该过程:
n+1个分类器为n+1个softmax分类器。
在一个训练阶段的一次训练训练过程中,使用在该训练阶段得到的一组保留图像进行训练,该一组保留图像包括3个保留图像。
假设n为6,该一组保留图像中的第1个保留图像通过以该第1个保留图像所属用于训练的行人图像的预设保留比例5/6,在从上至下的方向上对该保留图像所属用于训练的行人图像进行截取得到。与该第1个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为5。以预设划分数量5,在从上至下的方向上对该第1个保留图像的特征图进行划分,得到该第1个保留图像的子特征图序列。该第1个保留图像的子特征图序列包括5个子特征图。
与该第1个保留图像的子特征图序列相对应的所有softmax分类器分类器包括:第1个softmax分类器、第2个softmax分类器、第3个softmax分类器、第4个softmax分类器、第5个softmax分类器、第n+1个即第7个softmax分类器。
该第1个保留图像的子特征图序列中的第1个子特征图作为第1个 softmax分类器的输入,第1个softmax分类器输出该第1个保留图像对应的第1个分类预测结果,利用softmax损失函数基于该第1个保留图像对应的第1个分类预测结果和该第1个保留图像所属用于训练的行人图像的分类标注结果,得到该第1个保留图像对应的第1个分类损失。
该第1个保留图像的子特征图序列中的第2个子特征图作为第2个softmax分类器的输入,第2个softmax分类器输出该第1个保留图像对应的第2个分类预测结果,利用softmax损失函数基于该第1个保留图像对应的第2个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第1个保留图像对应的第2个分类损失,以此类推。
该第1个保留图像所属的用于训练图像的特征图作为第7个分类器的输入,第7个分类器输出该第1个保留图像对应的第6个分类预测结果,利用softmax损失函数基于该第1个保留图像对应的第6个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第1个保留图像对应的第6个分类损失。
该第1个保留图像对应的所有分类损失包括:该第1个保留图像对应的第1个分类损失、该第1个保留图像对应的第2个分类损失、该第1个保留图像对应的第3个分类损失、该第1个保留图像对应的第4个分类损失、该第1个保留图像对应的第5个分类损失、该第1个保留图像对应的第6个分类损失。
该一组保留图像中的第2个保留图像通过以该第2个保留图像所属用于训练的行人图像的预设保留比例4/6,在从上至下的方向上对该保留图像所属用于训练的行人图像进行截取得到。与该第2个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为4。以预设划分数量4,在从上至下的方向上对该第2个保留图像的特征图进行划分,得到该第2个保留图像的子特征图序列。该第2个保留图像的子特征图序列包括4个子特征图。
与该第2个保留图像的子特征图序列相对应的所有softmax分类器分类器包括:第1个softmax分类器、第2个softmax分类器、第3个softmax分类器、第4个softmax分类器、第n+1个即第7个分类器。
该第2个保留图像的子特征图序列中的第1个子特征图作为第1个softmax分类器的输入,第1个softmax分类器输出该第2个保留图像对应的第1个分类预测结果,利用softmax损失函数基于该第2个保留图像对应的第1个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第2个保留图像对应的第1个分类损失。
该第2个保留图像的子特征图序列中的第2个子特征图作为第2个softmax分类器的输入,第2个softmax分类器输出该第2个保留图像对应的第2个分类预测结果,利用softmax损失函数基于该第2个保留图像对应的第2个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第2个保留图像对应的第2个分类损失,以此类推。
该第2个保留图像所属的用于训练图像的特征图作为第7个softmax分类器的输入,第7个softmax分类器输出该第2个保留图像对应的第5个分类预测结果。利用softmax损失函数基于该第2个保留图像对应的第5个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第2个保留图像对应的第5个分类损失,以此类推。
该第2个保留图像对应的所有分类损失包括:该第2个保留图像对应的 第1个分类损失、该第2个保留图像对应的第2个分类损失、该第2个保留图像对应的第3个分类损失、该第2个保留图像对应的第4个分类损失、该第1个保留图像对应的第5个分类损失。
该一组保留图像中的第3个保留图像通过以该第3个保留图像所属用于训练的行人图像的预设保留比例3/6,在从上至下的方向上对该保留图像所属用于训练的行人图像进行截取得到。与该第3个保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量为3。以预设划分数量3,在从上至下的方向上对该第3个保留图像进行划分,得到该第3个保留图像的子特征图序列。该第3个保留图像的子特征图序列包括3个子特征图。
与该第3个保留图像的子特征图序列相对应的所有softmax分类器包括:第1个softmax分类器、第2个softmax分类器、第3个softmax分类器、第n+1个即第7个softmax分类器。
该第3个保留图像的子特征图序列中的第1个子特征图作为第1个softmax分类器的输入,第1个softmax分类器输出该第3个保留图像对应的第1个分类预测结果。利用softmax损失函数基于该第3个保留图像对应的第1个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第3个保留图像对应的第1个分类损失,以此类推。
该第3个保留图像所属的用于训练图像的特征图作为第7个softmax分类器的输入,第7个分类器输出该第3个保留图像对应的第4个分类预测结果。利用softmax损失函数基于该第3个保留图像对应的第4个分类预测结果和该保留图像所属用于训练的行人图像的分类标注结果,得到该第3个保留图像对应的第4个分类损失。
该第3个保留图像对应的所有分类损失包括:该第3个保留图像对应的第1个分类损失、该第3个保留图像对应的第2个分类损失、该第3个保留图像对应的第3个分类损失、该第3个保留图像对应的第4个分类损失。
步骤103,基于待识别行人图像与每一个参考行人图像的相似度,生成行人再识别结果。
在本申请中,待识别行人图像的行人再识别结果可以为待识别行人图像对应的行人标识。
每一个参考行人图像各自对应一个行人标识。对于每一个参考行人图像,该参考行人图像对应的行人标识为该参考行人所属的行人的行人标识。
在本申请中,当基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果时,可以将与待识别行人图像的相似度最大的参考图像对应的行人标识作为待识别行人图像对应的行人标识。从而,确定待识别行人图像中的行人即待识别行人图像所属的行人的身份,完成一次行人再识别。
请参考图3,其示出了本申请实施例提供的行人再识别装置的结构框图。装置包括:特征提取单元301,相似度计算单元302,生成单元303。
特征提取单元301被配置为将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列,其中,特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以所述预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行划分,得到与所述预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像 的所有待识别子特征图序列;
相似度计算单元302被配置为对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度;
生成单元303被配置为基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。
在一些实施例中,所有预设划分数量包括:n/2、n、至少一个其他预设划分数量,其中,n为所有预设划分数量中最大的预设划分数量,其他预设划分数量大于n/2并且小于n。
在一些实施例中,相似度计算单元302进一步被配置为:
将待识别行人图像的待识别子特征图序列和所述参考行人图像的参考子特征图序列进行组合,得到与所述参考行人图像相关的多个子特征图序列组合,其中,每一个子特征图序列组合包括:一个参考子特征图序列、一个待识别子特征图序列;
对于所述每一个子特征图序列组合,计算与所述子特征图序列组合相对应的每一个子特征图组合中的两个子特征图的相似度,所述子特征图组合包括:所述子特征图序列组合中的两个子特征图序列中的位于同一位置的子特征图;将计算出的所有相似度的平均值作为所述子特征图序列组合对应的相似度;
基于所述每一个子特征图序列组合对应的相似度,计算待识别行人图像与所述参考行人图像的相似度。
在一些实施例中,所述基于所述每一个子特征图序列组合对应的相似度,计算待识别行人图像与所述参考行人图像的相似度包括:
将所述每一个子特征图序列组合对应的相似度中的最小的相似度作为待识别行人图像与所述参考行人图像的相似度。
在一些实施例中,行人再识别装置还包括:
训练单元,被配置为:在多个训练阶段对所述特征提取网络进行训练,其中,在每一个训练阶段,执行以下操作:
对于每一个用于训练的行人图像,以所述用于训练的行人图像的预设保留比例,从所述用于训练的行人图像获取属于所述用于训练的行人图像的保留图像;
将在所述训练阶段得到的所有保留图像划分为多组保留图像;
对于所述多组保留图像中的每一组保留图像,利用特征提取网络提取所述一组保留图像中的每一个保留图像的特征图;对于所述每一个保留图像的特征图,以与所述保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量,在从上至下的方向上对所述保留图像的特征图进行划分,得到所述保留图像的子特征图序列;至少基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的所有损失;基于所述一组保留图像对应的所有损失,对所述特征提取网络的参数的参数值进行更新。
在一些实施例中,对于每一个训练阶段,在所述训练阶段中每一个用于训练的行人图像的预设保留比例以随机方式确定,以使得对于每一个用于训练的行人图像,每一个预设保留比例应用在所述用于训练的行人图像上的总次数均匀。
在一些实施例中,所述一组保留图像对应的所有损失包括:所述一组保 留图像对应的距离损失、所述每一个保留图像对应的所有分类损失;
至少基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的所有损失包括:
基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的距离损失;
对于所述一组保留图像中的每一个保留图像,利用与所述保留图像相对应的所有分类器,基于所述保留图像所属的用于训练的行人图像的特征图和所述保留图像的子特征图序列,得到所述保留图像对应的所有分类预测结果,其中,所述保留图像所属的用于训练的行人图像的特征图作为所述所有分类器中的完整特征图监督分类器的输入,所述保留图像的子特征图序列中的每一个子特征图各自作为所述所有分类器中的一个分类器的输入;
对于所述一组保留图像中的每一个保留图像,基于所述保留图像对应的所有分类预测结果、所述保留图像所属用于训练的行人图像的分类标注结果,计算所述每一个保留图像对应的所有分类损失。
在示例性实施例中,还提供了一种包括指令的存储介质,例如包括指令的存储器,上述指令可由电子设备执行以完成上述方法。可选地,存储介质可以是非临时性计算机可读存储介质,例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的计算处理设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本申请的至少一个实 施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种行人再识别方法,其特征在于,所述方法包括:
    将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列,其中,特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以所述预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行划分,得到与所述预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像的所有待识别子特征图序列;
    对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度;
    基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述所有预设划分数量包括:n/2、n、至少一个其他预设划分数量,其中,n为所有预设划分数量中最大的预设划分数量,其他预设划分数量大于n/2并且小于n。
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度包括:
    将待识别行人图像的待识别子特征图序列和所述参考行人图像的参考子特征图序列进行组合,得到与所述参考行人图像相关的多个子特征图序列组合,其中,每一个子特征图序列组合包括:一个参考子特征图序列、一个待识别子特征图序列;
    对于所述每一个子特征图序列组合,计算与所述子特征图序列组合相对应的每一个子特征图组合中的两个子特征图的相似度,所述子特征图组合包括:所述子特征图序列组合中的两个子特征图序列中的位于同一位置的子特征图;将计算出的所有相似度的平均值作为所述子特征图序列组合对应的相似度;
    基于所述每一个子特征图序列组合对应的相似度,计算待识别行人图像与所述参考行人图像的相似度。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述每一个子特征图序列组合对应的相似度,计算待识别行人图像与所述参考行人图像的相似度包括:
    将所述每一个子特征图序列组合对应的相似度中的最小的相似度作为待识别行人图像与所述参考行人图像的相似度。
  5. 根据权利要求1-4之一所述的方法,其特征在于,所述方法还包括:
    在多个训练阶段对所述特征提取网络进行训练,其中,在每一个训练阶段,执行以下操作:
    对于每一个用于训练的行人图像,以所述用于训练的行人图像的预设保留比例,从所述用于训练的行人图像获取属于所述用于训练的行人图像的保留图像;
    将在所述训练阶段得到的所有保留图像划分为多组保留图像;
    对于所述多组保留图像中的每一组保留图像,利用特征提取网络提取所述一组保留图像中的每一个保留图像的特征图;对于所述每一个保留图像的特征图,以与所述保留图像所属的用于训练图像的预设保留比例相对应的预设划分数量,在从上至下的方向上对所述保留图像的特征图进行划分,得到所述保留图像的子特征图序列;至少基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的所有损失;基于所述一组保留图像对应的所有损失,对所述特征提取网络的参数的参数值进行更新。
  6. 根据权利要求5所述的方法,其特征在于,对于每一个训练阶段,在所述训练阶段中每一个用于训练的行人图像的预设保留比例以随机方式确定,以使得对于每一个用于训练的行人图像,每一个预设保留比例应用在所述用于训练的行人图像上的总次数均匀。
  7. 根据权利要求5所述的方法,其特征在于,所述一组保留图像对应的所有损失包括:所述一组保留图像对应的距离损失、所述一组保留图像中的每一个保留图像对应的所有分类损失;
    至少基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的所有损失包括:
    基于所述每一个保留图像的子特征图序列,计算所述一组保留图像对应的距离损失;
    对于所述每一个保留图像,利用与所述保留图像相对应的所有分类器,基于所述保留图像所属的用于训练的行人图像的特征图和所述保留图像的子特征图序列,得到所述保留图像对应的所有分类预测结果,其中,所述保留图像所属的用于训练的行人图像的特征图作为所述所有分类器中的完整特征图监督分类器的输入,所述保留图像的子特征图序列中的每一个子特征图各自作为所述所有分类器中的一个分类器的输入;
    对于所述每一个保留图像,基于所述保留图像对应的所有分类预测结果、所述保留图像所属用于训练的行人图像的分类标注结果,计算所述每一个保留图像对应的所有分类损失。
  8. 一种行人再识别装置,其特征在于,所述装置包括:
    特征提取单元,被配置为将待识别行人图像输入到特征提取网络,得到特征提取网络输出的待识别行人图像的所有待识别子特征图序列,其中,特征提取网络被配置为:提取待识别行人图像的特征图;对于所有预设划分数量中的每一个预设划分数量,以所述预设划分数量,在从上至下的方向上对待识别行人图像的特征图进行划分,得到与所述预设划分数量相对应的待识别子特征图序列;将得到的所有待识别子特征图序列作为待识别行人图像的所有待识别子特征图序列;
    相似度计算单元,被配置为对于每一个参考行人图像,基于待识别行人图像的所有待识别子特征图序列和所述参考行人图像的所有参考子特征图序列,计算待识别行人图像与所述参考行人图像的相似度;
    生成单元,被配置为基于待识别行人图像与每一个参考行人图像的相似度,生成待识别行人图像的行人再识别结果。
  9. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现如权利要求1至7中任一项所述的方法。
  10. 一种存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如权利要求1至7中任一项所述的方法。
PCT/CN2020/119546 2020-03-31 2020-09-30 行人再识别方法、装置、电子设备及存储介质 WO2021196547A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010246639.3A CN111611846A (zh) 2020-03-31 2020-03-31 行人再识别方法、装置、电子设备及存储介质
CN202010246639.3 2020-03-31

Publications (1)

Publication Number Publication Date
WO2021196547A1 true WO2021196547A1 (zh) 2021-10-07

Family

ID=72203525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119546 WO2021196547A1 (zh) 2020-03-31 2020-09-30 行人再识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111611846A (zh)
WO (1) WO2021196547A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611846A (zh) * 2020-03-31 2020-09-01 北京迈格威科技有限公司 行人再识别方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
CN108960114A (zh) * 2018-06-27 2018-12-07 腾讯科技(深圳)有限公司 人体识别方法及装置、计算机可读存储介质及电子设备
CN109101865A (zh) * 2018-05-31 2018-12-28 湖北工业大学 一种基于深度学习的行人重识别方法
CN109271870A (zh) * 2018-08-21 2019-01-25 平安科技(深圳)有限公司 行人重识别方法、装置、计算机设备及存储介质
CN109784258A (zh) * 2019-01-08 2019-05-21 华南理工大学 一种基于多尺度特征切割与融合的行人重识别方法
CN109886242A (zh) * 2019-03-01 2019-06-14 中国科学院重庆绿色智能技术研究院 一种行人重识别的方法及***
CN111611846A (zh) * 2020-03-31 2020-09-01 北京迈格威科技有限公司 行人再识别方法、装置、电子设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875487B (zh) * 2017-09-29 2021-06-15 北京旷视科技有限公司 行人重识别网络的训练及基于其的行人重识别
CN108564097B (zh) * 2017-12-05 2020-09-22 华南理工大学 一种基于深度卷积神经网络的多尺度目标检测方法
CN108416295B (zh) * 2018-03-08 2021-10-15 天津师范大学 一种基于局部嵌入深度特征的行人再识别方法
CN110363047B (zh) * 2018-03-26 2021-10-26 普天信息技术有限公司 人脸识别的方法、装置、电子设备和存储介质
KR101941994B1 (ko) * 2018-08-24 2019-01-24 전북대학교산학협력단 결합심층네트워크에 기반한 보행자 인식 및 속성 추출 시스템
CN110569731B (zh) * 2019-08-07 2023-04-14 北京旷视科技有限公司 一种人脸识别方法、装置及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778464A (zh) * 2016-11-09 2017-05-31 深圳市深网视界科技有限公司 一种基于深度学习的行人重识别方法和装置
CN109101865A (zh) * 2018-05-31 2018-12-28 湖北工业大学 一种基于深度学习的行人重识别方法
CN108960114A (zh) * 2018-06-27 2018-12-07 腾讯科技(深圳)有限公司 人体识别方法及装置、计算机可读存储介质及电子设备
CN109271870A (zh) * 2018-08-21 2019-01-25 平安科技(深圳)有限公司 行人重识别方法、装置、计算机设备及存储介质
CN109784258A (zh) * 2019-01-08 2019-05-21 华南理工大学 一种基于多尺度特征切割与融合的行人重识别方法
CN109886242A (zh) * 2019-03-01 2019-06-14 中国科学院重庆绿色智能技术研究院 一种行人重识别的方法及***
CN111611846A (zh) * 2020-03-31 2020-09-01 北京迈格威科技有限公司 行人再识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111611846A (zh) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111027493B (zh) 一种基于深度学习多网络软融合的行人检测方法
CN106845487B (zh) 一种端到端的车牌识别方法
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
WO2020151166A1 (zh) 多目标跟踪方法、装置、计算机装置及可读存储介质
US11640518B2 (en) Method and apparatus for training a neural network using modality signals of different domains
WO2017166586A1 (zh) 基于卷积神经网络的图片鉴别方法、***和电子设备
CN109325471B (zh) 一种结合表观特征和时空分布的双流网络行人重识别方法
CN111767882A (zh) 一种基于改进yolo模型的多模态行人检测方法
US20210124928A1 (en) Object tracking methods and apparatuses, electronic devices and storage media
CN107016344A (zh) 视频中品牌识别***及其实现方法
CN110889421A (zh) 目标物检测方法及装置
Xing et al. DE‐SLAM: SLAM for highly dynamic environment
JP2020113274A (ja) 物体認識ニューラルネットワークの訓練方法、装置及びコンピューティングデバイス
CN110781785A (zh) 基于Faster RCNN算法改进的交通场景下行人检测方法
CN110852327A (zh) 图像处理方法、装置、电子设备及存储介质
WO2019229524A2 (zh) 神经网络计算方法和***及相应的双神经网络实现
CN110751027A (zh) 一种基于深度多示例学习的行人重识别方法
CN112200020A (zh) 一种行人重识别方法、装置、电子设备及可读存储介质
CN108491828B (zh) 一种基于层次的成对相似性PVAnet的停车位检测***及方法
WO2021196547A1 (zh) 行人再识别方法、装置、电子设备及存储介质
CN114333062B (zh) 基于异构双网络和特征一致性的行人重识别模型训练方法
CN114463552A (zh) 迁移学习、行人重识别方法及相关设备
CN110717401A (zh) 年龄估计方法及装置、设备、存储介质
CN114820765A (zh) 图像识别方法、装置、电子设备及计算机可读存储介质
CN111931572B (zh) 一种遥感影像的目标检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928664

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928664

Country of ref document: EP

Kind code of ref document: A1