CN117893790A

CN117893790A - Target re-identification method and device based on feature alignment

Info

Publication number: CN117893790A
Application number: CN202311719942.0A
Authority: CN
Inventors: 王旭岩; 蒋召
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-04-16

Abstract

The disclosure provides a target re-identification method and device based on feature alignment. The method comprises the following steps: constructing a target re-identification model by utilizing a feature alignment network, a plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; inputting each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network; calculating a classification loss based on a plurality of recognition results of each set of training images, and calculating a feature alignment loss based on a plurality of image features and fusion features of each set of training images; and optimizing the target re-identification model according to the classification loss and the characteristic alignment loss.

Description

Target re-identification method and device based on feature alignment

Technical Field

The disclosure relates to the technical field of target detection, and in particular relates to a target re-identification method and device based on feature alignment.

Background

The application of the target re-recognition algorithm is wider and wider, and when the scene is complex, the target re-recognition is affected by factors such as blurring and shielding, so that the recognition accuracy is low. At present, the influence of factors such as blurring and shielding on recognition is mainly improved by methods such as data enhancement and feature optimization on blurring and shielding, and the methods have large workload and unsatisfactory effects.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a computer readable storage medium for identifying a target based on feature alignment, so as to solve the problem in the prior art that the accuracy of target identification is low due to blurring and shielding.

In a first aspect of an embodiment of the present disclosure, a method for re-identifying a target based on feature alignment is provided, including: constructing a feature alignment network by using a transducer module, wherein the transducer module is composed of transducer units; constructing a plurality of feature extraction networks for extracting features of different types of images by using a residual error network, and constructing a target re-identification model by using a feature alignment network, a plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; acquiring a training data set, wherein the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in a target re-recognition model is the same as the number of images in each group of training images; inputting each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network; calculating a classification loss based on a plurality of recognition results of each set of training images, and calculating a feature alignment loss based on a plurality of image features and fusion features of each set of training images; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model.

In a second aspect of the embodiments of the present disclosure, there is provided a target re-identification apparatus based on feature alignment, including: a first building block configured to build a feature alignment network using a transducer module, wherein the transducer module is comprised of transducer cells; the second construction module is configured to construct a plurality of feature extraction networks for extracting features of different types of images by using the residual error network, and construct a target re-identification model by using the feature alignment network, the plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; the system comprises an acquisition module, a target re-identification module and a storage module, wherein the acquisition module is configured to acquire a training data set, the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in the target re-identification model is the same as the number of images in each group of training images; a processing module configured to input each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network; a computing module configured to compute a classification loss based on a plurality of recognition results for each set of training images, a feature alignment loss based on a plurality of image features and fusion features for each set of training images; and the optimizing module is configured to optimize the model parameters of the target re-recognition model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-recognition model.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: because the disclosed embodiments construct feature alignment networks by utilizing a transducer module, wherein the transducer module is comprised of transducer cells; constructing a plurality of feature extraction networks for extracting features of different types of images by using a residual error network, and constructing a target re-identification model by using a feature alignment network, a plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; acquiring a training data set, wherein the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in a target re-recognition model is the same as the number of images in each group of training images; inputting each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network; calculating a classification loss based on a plurality of recognition results of each set of training images, and calculating a feature alignment loss based on a plurality of image features and fusion features of each set of training images; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model. By adopting the technical means, the problem of low accuracy caused by blurring and shielding in the prior art of target re-identification can be solved, and the accuracy of target re-identification is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic flow chart of a target re-identification method based on feature alignment according to an embodiment of the disclosure;

FIG. 2 is a flow chart of another feature alignment-based target re-identification method according to an embodiment of the present disclosure

FIG. 3 is a schematic structural diagram of a target re-identification device based on feature alignment according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A method and apparatus for target re-recognition based on feature alignment according to embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a target re-identification method based on feature alignment according to an embodiment of the disclosure. The feature alignment-based target re-identification method of fig. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the target re-identification method based on feature alignment includes:

s101, constructing a feature alignment network by using a transducer module, wherein the transducer module is composed of transducer units;

s102, constructing a plurality of feature extraction networks for extracting features of different types of images by using a residual network, and constructing a target re-identification model by using a feature alignment network, a plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network;

s103, acquiring a training data set, wherein the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in the target re-recognition model is the same as the number of images in each group of training images;

S104, inputting each group of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network;

s105, calculating classification loss based on a plurality of recognition results of each group of training images, and calculating feature alignment loss based on a plurality of image features and fusion features of each group of training images;

s106, optimizing model parameters of the target re-recognition model according to the classification loss and the feature alignment loss to complete training of the target re-recognition model.

The transducer unit is a basic building block in the transducer architecture, and is mainly composed of three parts: encoder (Encoder), decoder (Decoder), and Scaled Dot product attention (Scaled Dot-Product Attention). A transducer module is composed of a plurality of transducer units.

In some embodiments, a plurality of transducer units are utilized to build a transducer module, and a plurality of transducer modules are utilized to build a feature alignment network. For example, the transducer module is obtained by serially connecting three transducer units, and the feature alignment network is obtained by serially connecting two transducer modules.

The multiple feature extraction networks for extracting the features of different types of images are all residual networks, and before the target re-identification model is trained, the multiple feature extraction networks are not distinguished, but because the types of the images processed by each feature extraction network are different, after the target re-identification model is trained, the parameters of the multiple feature extraction networks are distinguished. Similarly, the pooling layer, the full connection layer and the classification layer corresponding to the different feature extraction networks are also the same.

Each set of training images contains a plurality of images about the same object, each image being of one type and the plurality of images being of a different type. Different types of images include visible light images, infrared images, X-ray images, radar images, and the like. The visible light image refers to an image formed by light rays that can be distinguished by the human eye. That is, a so-called photo, a visible light image may provide characteristic information about the color, shape, texture, etc. of an object. The infrared image is an image obtained by infrared rays, the X-ray image is an image obtained by X-rays, and the radar image is an image obtained by radar.

The training of the target re-recognition model is to train the target re-recognition model to determine a specific object from a plurality of objects contained in a training image. The target re-recognition is most commonly pedestrian re-recognition, and the target re-recognition model may be a pedestrian re-recognition model. The model parameters of the target re-identification model are optimized according to the classification loss and the feature alignment loss, which can be obtained by weighted summation of the classification loss and the feature alignment loss, and the model parameters of the target re-identification model are optimized according to the weighted summation result.

According to the technical scheme provided by the embodiment of the application, a feature alignment network is constructed by using a transducer module, wherein the transducer module is composed of transducer units; constructing a plurality of feature extraction networks for extracting features of different types of images by using a residual error network, and constructing a target re-identification model by using a feature alignment network, a plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; acquiring a training data set, wherein the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in a target re-recognition model is the same as the number of images in each group of training images; inputting each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network; calculating a classification loss based on a plurality of recognition results of each set of training images, and calculating a feature alignment loss based on a plurality of image features and fusion features of each set of training images; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model. By adopting the technical means, the problem of low accuracy caused by blurring and shielding in the prior art of target re-identification can be solved, and the accuracy of target re-identification is improved.

Further, constructing a target re-identification model by using the feature alignment network, the plurality of feature extraction networks and the pooling layer, the full connection layer and the classification layer corresponding to each feature extraction network, wherein the method comprises the following steps: constructing a target re-identification network for identifying the image of the type corresponding to the feature extraction network by utilizing a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network; and connecting the characteristic alignment network to the output side of a target re-recognition network for recognizing each type of image to obtain a target re-recognition model.

A feature extraction network and a pooling layer, a full connection layer and a classification layer corresponding to the feature extraction network construct a target re-identification network, and a target re-identification network user identifies images of the type corresponding to the feature extraction network. For example, the object re-recognition model includes three feature extraction networks, and then there are three object re-recognition networks in the object re-recognition model, and the feature alignment network is connected to the output sides of the three object re-recognition networks (the input of the feature alignment network is the output of the three object re-recognition networks).

Further, each set of training images is input into a target re-recognition model: processing the image of the type corresponding to the feature extraction network in the group of training images through each feature extraction network to obtain the image features of the image of the type corresponding to the feature extraction network in the group of training images; processing image features of the images of the type corresponding to the feature extraction network in the group of training images sequentially through a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network to obtain recognition results of the images of the type corresponding to the feature extraction network in the group of training images; and processing the image characteristics of the plurality of images in the group of training images through a characteristic alignment network to obtain fusion characteristics corresponding to the plurality of images in the group of training images.

Each feature extraction network and the pooling layer, full connection layer and classification layer corresponding to each feature extraction network are responsible for processing one type of image. The feature alignment network processes all the image features output by the feature extraction network.

Further, after outputting the recognition result of the image of the type corresponding to the feature extraction network in the set of training images through the classification layer corresponding to each feature extraction network, the method further includes: calculating the classification loss corresponding to each image based on the identification result of each image in each group of training images and the labels of the group of training images, wherein the labels of all images in each group of training images are the same; calculating the feature alignment loss of each set of training images based on the image features of the images and the fusion features of the set of training images; optimizing model parameters of a target re-identification network corresponding to each type according to the classification loss corresponding to all images of the type in the plurality of groups of training images; optimizing model parameters of a feature alignment network and a plurality of feature extraction networks according to the feature alignment loss of the plurality of groups of training images.

The labels of the images are used to label a particular object in the images, because each set of training images contains multiple images of different types for the same object, the labels of all images in each set of training images are identical, and the labels of each image in each set of training images are the labels of the set of training images. The classification loss may be calculated using a cross entropy loss function, one for each image. The feature alignment loss may be calculated using a euclidean distance function, and specifically, based on the image feature of each image in each set of training images and the fusion feature of the set of training images, the feature alignment loss of the image is calculated using a euclidean distance function, and the feature alignment loss of each image in each set of training images is summed to obtain the feature alignment loss of the set of training images. Because one type of target re-recognition network is used to recognize one type of image, model parameters of the target re-recognition network corresponding to that type are optimized according to the classification loss corresponding to all images of each type in the plurality of sets of training images. The feature alignment network processes the outputs of the plurality of feature extraction networks, so that model parameters of the feature alignment network and the plurality of feature extraction networks are optimized according to the feature alignment loss of the plurality of sets of training images.

In some embodiments, the target re-recognition model comprises a first feature extraction network that extracts features of a first type of image, a second feature extraction network that extracts features of a second type of image, and a third feature extraction network that extracts features of a third type of image, each set of training images comprising the first type of image, the second type of image, and the third type of image with respect to the same object; inputting each set of training images into a target re-recognition model: processing the first type image through a first feature extraction network to obtain a first image feature, processing the second type image through a second feature extraction network to obtain a second image feature, and processing the third type image through a third feature extraction network to obtain a third image feature; processing the first image features sequentially through a pooling layer, a full-connection layer and a classification layer corresponding to the first feature extraction network to obtain a first identification result, processing the second image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network to obtain a second identification result, and processing the third image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the third feature extraction network to obtain a third identification result; processing the first image feature, the second image feature and the third image feature through a feature alignment network to obtain a fusion feature; calculating a classification loss based on the first recognition result, the second recognition result, and the third recognition result, and calculating a feature alignment loss based on the first image feature, the second image feature, the third image feature, and the fusion feature; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model.

For example, the first type of image, the second type of image, and the third type of image are visible light image, infrared image, and radar image, respectively, three types of images in each set of training images are input into the target re-recognition model: the first feature extraction network and the pooling layer, the full-connection layer and the classification layer corresponding to the first feature extraction network are used for identifying objects in the visible light image, the second feature extraction network and the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network are used for identifying objects in the infrared image, and the pooling layer, the full-connection layer and the classification layer corresponding to the third feature extraction network and the third feature extraction network are used for identifying objects in the radar image (namely, a first identification result, a second identification result and a third identification result are obtained); and the feature alignment network processes the image features of the visible light image, the infrared image and the radar image to obtain fusion features. Each recognition result and label corresponds to a classification penalty, and each image feature and fusion feature corresponds to a feature alignment penalty.

In some embodiments, the target re-recognition model comprises a first feature extraction network that extracts features of a first type of image and a second feature extraction network that extracts features of a second type of image, each set of training images comprising the first type of image and the second type of image with respect to the same object. Fig. 2 is a process after inputting each set of training images into the target re-recognition model, and fig. 2 is a schematic flow chart of another target re-recognition method based on feature alignment according to an embodiment of the disclosure, as shown in fig. 2, where the method includes:

S201, processing a first type image through a first feature extraction network to obtain a first image feature; processing the second type image through a second feature extraction network to obtain a second image feature;

s202, sequentially processing first image features through a pooling layer, a full-connection layer and a classification layer corresponding to a first feature extraction network to obtain a first recognition result, and sequentially processing second image features through the pooling layer, the full-connection layer and the classification layer corresponding to a second feature extraction network to obtain a second recognition result;

s203, processing the first image feature and the second image feature through a feature alignment network to obtain a fusion feature;

s204, calculating classification loss based on the first recognition result and the second recognition result, and calculating feature alignment loss based on the first image feature, the second image feature and the fusion feature;

s205, optimizing model parameters of the target re-recognition model according to the classification loss and the feature alignment loss to complete training of the target re-recognition model.

For example, the first type of image and the second type of image are visible light images and infrared images, respectively, and the two types of images in each set of training images are input into the target re-recognition model: the first feature extraction network and the pooling layer, the full-connection layer and the classification layer corresponding to the first feature extraction network are used for identifying objects in the visible light image, and the second feature extraction network and the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network are used for identifying objects in the infrared image (namely, a first identification result and a second identification result are obtained); and the feature alignment network processes the image features of the visible light image and the infrared image to obtain fusion features. Each recognition result and label corresponds to a classification penalty, and each image feature and fusion feature corresponds to a feature alignment penalty.

Further, after optimizing the model parameters of the target re-recognition model according to the classification loss and the feature alignment loss to complete the training of the target re-recognition model, the method further comprises: removing the characteristic alignment network from the target re-recognition model to simplify the model structure of the target re-recognition model; acquiring a plurality of target images of different types of the same target object, and inputting the plurality of target images into a target re-identification model: outputting a recognition result of the target image of the type corresponding to the feature extraction network through the classification layer corresponding to each feature extraction network; a re-recognition result of the target object is determined based on the plurality of recognition results.

The feature alignment network is used to optimize the feature extraction network during the training phase and can be removed during the reasoning phase (the phase of processing the target image). The determination of the re-recognition result of the target object based on the plurality of recognition results may be a weighted summation of the plurality of recognition results, with the weighted summation result being the re-recognition result of the target object. In practice, each recognition result is a probability distribution that the respective object in the image is a specific object, and the multiple recognition results are weighted and summed, or the multiple probability distributions are weighted and summed.

Further, after optimizing the model parameters of the target re-recognition model according to the classification loss and the feature alignment loss to complete the training of the target re-recognition model, the method further comprises: connecting a pooling layer, a full connection layer and a classification layer after the features in the target re-identification model are aligned to the network; acquiring a plurality of target images of different types of the same target object, and inputting the plurality of target images into a target re-identification model: outputting the identification result of the target image of the type corresponding to the feature extraction network through the classification layer corresponding to the feature alignment network and the classification layer corresponding to each feature extraction network; a re-recognition result of the target object is determined based on the plurality of recognition results.

According to the embodiment of the application, fusion of the extracted features of the plurality of feature extraction networks is achieved by means of the feature alignment network, and the fusion features are introduced into re-recognition of the target object, so that the accuracy of re-recognition of the target object is improved.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a target re-identification apparatus based on feature alignment according to an embodiment of the present disclosure. As shown in fig. 3, the target re-recognition apparatus based on feature alignment includes:

a first building block 301 configured to build a feature alignment network using a transducer module, wherein the transducer module is comprised of transducer cells;

a second construction module 302 configured to construct a plurality of feature extraction networks for extracting features of different types of images using the residual network, and construct a target re-recognition model using the feature alignment network, the plurality of feature extraction networks, and a pooling layer, a full connection layer, and a classification layer corresponding to each feature extraction network;

an obtaining module 303, configured to obtain a training dataset, where the training dataset includes a plurality of sets of training images, each set of training images including a plurality of images of different types related to the same object, and the number of feature extraction networks in the target re-recognition model is the same as the number of images in each set of training images;

a processing module 304 configured to input each set of training images into a target re-recognition model: outputting image features of images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through a feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network;

A calculation module 305 configured to calculate a classification loss based on a plurality of recognition results for each set of training images, and a feature alignment loss based on a plurality of image features and fusion features for each set of training images;

an optimization module 306 configured to optimize model parameters of the target re-recognition model in accordance with the classification loss and the feature alignment loss to complete training of the target re-recognition model.

In some embodiments, the second building module 302 is further configured to build a target re-recognition network that recognizes the image of the type corresponding to the feature extraction network using each feature extraction network and the pooling layer, full connection layer, and classification layer corresponding to the feature extraction network; and connecting the characteristic alignment network to the output side of a target re-recognition network for recognizing each type of image to obtain a target re-recognition model.

In some embodiments, the processing module 304 is further configured to process, through each feature extraction network, an image of the feature extraction network corresponding type in the set of training images to obtain image features of the feature extraction network corresponding type of image in the set of training images; processing image features of the images of the type corresponding to the feature extraction network in the group of training images sequentially through a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network to obtain recognition results of the images of the type corresponding to the feature extraction network in the group of training images; and processing the image characteristics of the plurality of images in the group of training images through a characteristic alignment network to obtain fusion characteristics corresponding to the plurality of images in the group of training images.

In some embodiments, the optimization module 306 is further configured to calculate a classification penalty for each image in each set of training images based on the recognition result for the image and the labels for the set of training images, wherein the labels for all images in each set of training images are the same; calculating the feature alignment loss of each set of training images based on the image features of the images and the fusion features of the set of training images; optimizing model parameters of a target re-identification network corresponding to each type according to the classification loss corresponding to all images of the type in the plurality of groups of training images; optimizing model parameters of a feature alignment network and a plurality of feature extraction networks according to the feature alignment loss of the plurality of groups of training images.

In some embodiments, the optimization module 306 is further configured such that the target re-recognition model includes a first feature extraction network that extracts features of a first type of image, a second feature extraction network that extracts features of a second type of image, and a third feature extraction network that extracts features of a third type of image, each set of training images including the first type of image, the second type of image, and the third type of image with respect to the same object; inputting each set of training images into a target re-recognition model: processing the first type image through a first feature extraction network to obtain a first image feature, processing the second type image through a second feature extraction network to obtain a second image feature, and processing the third type image through a third feature extraction network to obtain a third image feature; processing the first image features sequentially through a pooling layer, a full-connection layer and a classification layer corresponding to the first feature extraction network to obtain a first identification result, processing the second image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network to obtain a second identification result, and processing the third image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the third feature extraction network to obtain a third identification result; processing the first image feature, the second image feature and the third image feature through a feature alignment network to obtain a fusion feature; calculating a classification loss based on the first recognition result, the second recognition result, and the third recognition result, and calculating a feature alignment loss based on the first image feature, the second image feature, the third image feature, and the fusion feature; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model.

In some embodiments, the optimization module 306 is further configured such that the target re-recognition model includes a first feature extraction network that extracts features of a first type of image and a second feature extraction network that extracts features of a second type of image, each set of training images including the first type of image and the second type of image with respect to the same object. Processing the first type image through a first feature extraction network to obtain a first image feature; processing the second type image through a second feature extraction network to obtain a second image feature; processing the first image features sequentially through a pooling layer, a full-connection layer and a classification layer corresponding to the first feature extraction network to obtain a first identification result, and processing the second image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network to obtain a second identification result; processing the first image feature and the second image feature through a feature alignment network to obtain a fusion feature; calculating a classification loss based on the first recognition result and the second recognition result, and calculating a feature alignment loss based on the first image feature, the second image feature and the fusion feature; and optimizing model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model.

In some embodiments, the optimization module 306 is further configured to remove the feature alignment network from the target re-recognition model to reduce the model structure of the target re-recognition model; acquiring a plurality of target images of different types of the same target object, and inputting the plurality of target images into a target re-identification model: outputting a recognition result of the target image of the type corresponding to the feature extraction network through the classification layer corresponding to each feature extraction network; a re-recognition result of the target object is determined based on the plurality of recognition results.

In some embodiments, the optimization module 306 is further configured to connect the pooling layer, the full connection layer, and the classification layer after feature alignment networks in the target re-recognition model; acquiring a plurality of target images of different types of the same target object, and inputting the plurality of target images into a target re-identification model: outputting the identification result of the target image of the type corresponding to the feature extraction network through the classification layer corresponding to the feature alignment network and the classification layer corresponding to each feature extraction network; a re-recognition result of the target object is determined based on the plurality of recognition results.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 4 is a schematic diagram of an electronic device 4 provided by an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.

The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A target re-identification method based on feature alignment, comprising:

constructing a feature alignment network by using a transducer module, wherein the transducer module is composed of transducer units;

constructing a plurality of feature extraction networks for extracting features of different types of images by using a residual error network, and constructing a target re-identification model by using the feature alignment network, the plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network;

acquiring a training data set, wherein the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in the target re-recognition model is the same as the number of images in each group of training images;

Inputting each set of training images into the target re-recognition model: outputting image features of the images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through the feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network;

calculating a classification loss based on a plurality of recognition results of each set of training images, and calculating a feature alignment loss based on a plurality of image features and fusion features of each set of training images;

and optimizing model parameters of the target re-recognition model according to the classification loss and the characteristic alignment loss to complete training of the target re-recognition model.

2. The method of claim 1, wherein constructing a target re-recognition model using the feature alignment network, the plurality of feature extraction networks, and the pooling layer, the full-connection layer, and the classification layer corresponding to each feature extraction network comprises:

constructing a target re-identification network for identifying the image of the type corresponding to the feature extraction network by utilizing a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network;

And connecting the characteristic alignment network to the output side of a target re-recognition network for recognizing each type of image to obtain the target re-recognition model.

3. The method of claim 1, wherein each set of training images is input into the target re-recognition model:

processing the image of the type corresponding to the feature extraction network in the group of training images through each feature extraction network to obtain the image features of the image of the type corresponding to the feature extraction network in the group of training images;

processing image features of the images of the type corresponding to the feature extraction network in the group of training images sequentially through a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network to obtain recognition results of the images of the type corresponding to the feature extraction network in the group of training images;

and processing the image characteristics of the plurality of images in the group of training images through the characteristic alignment network to obtain fusion characteristics corresponding to the plurality of images in the group of training images.

4. The method according to claim 2, wherein after outputting the recognition result of the image of the type corresponding to the feature extraction network in the set of training images through the classification layer corresponding to the feature extraction network, the method further comprises:

Calculating the classification loss corresponding to each image based on the identification result of each image in each group of training images and the labels of the group of training images, wherein the labels of all images in each group of training images are the same;

calculating the feature alignment loss of each set of training images based on the image features of the images and the fusion features of the set of training images;

optimizing model parameters of a target re-identification network corresponding to each type according to the classification loss corresponding to all images of the type in the plurality of groups of training images;

and optimizing model parameters of the feature alignment network and the feature extraction networks according to the feature alignment loss of the plurality of groups of training images.

5. The method according to claim 1, wherein the method further comprises:

the target re-recognition model comprises a first feature extraction network for extracting features of a first type of image, a second feature extraction network for extracting features of a second type of image and a third feature extraction network for extracting features of a third type of image, and each group of training images comprises the first type of image, the second type of image and the third type of image related to the same object;

inputting each set of training images into the target re-recognition model:

Processing the first type image through the first feature extraction network to obtain a first image feature, processing the second type image through the second feature extraction network to obtain a second image feature, and processing the third type image through the third feature extraction network to obtain a third image feature;

processing the first image features sequentially through a pooling layer, a full-connection layer and a classification layer corresponding to the first feature extraction network to obtain a first identification result, processing the second image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network to obtain a second identification result, and processing the third image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the third feature extraction network to obtain a third identification result;

processing the first image feature, the second image feature and the third image feature through the feature alignment network to obtain a fusion feature;

calculating the classification loss based on the first, second, and third recognition results, and calculating the feature alignment loss based on the first, second, third, and fusion features;

6. The method according to claim 1, wherein the method further comprises:

the target re-recognition model comprises a first feature extraction network for extracting features of a first type of image and a second feature extraction network for extracting features of a second type of image, each set of training images comprising the first type of image and the second type of image with respect to the same object;

inputting each set of training images into the target re-recognition model:

processing the first type image through the first feature extraction network to obtain a first image feature, and processing the second type image through the second feature extraction network to obtain a second image feature;

processing the first image features sequentially through a pooling layer, a full-connection layer and a classification layer corresponding to the first feature extraction network to obtain a first identification result, and processing the second image features sequentially through the pooling layer, the full-connection layer and the classification layer corresponding to the second feature extraction network to obtain a second identification result;

Processing the first image feature and the second image feature through the feature alignment network to obtain a fusion feature;

calculating the classification loss based on the first recognition result and the second recognition result, and calculating the feature alignment loss based on the first image feature, the second image feature and the fusion feature;

7. The method of claim 1, wherein after optimizing model parameters of the target re-recognition model in accordance with the classification loss and the feature alignment loss to complete training of the target re-recognition model, the method further comprises:

removing the characteristic alignment network from the target re-recognition model to simplify the model structure of the target re-recognition model;

acquiring a plurality of target images of different types of the same target object, and inputting the plurality of target images into the target re-identification model:

outputting a recognition result of the target image of the type corresponding to the feature extraction network through the classification layer corresponding to each feature extraction network;

And determining a re-recognition result of the target object based on the plurality of recognition results.

8. A feature alignment-based target re-identification apparatus, comprising:

a first building block configured to build a feature alignment network using a transducer block, wherein the transducer block is comprised of transducer elements;

the second construction module is configured to construct a plurality of feature extraction networks for extracting features of different types of images by using a residual network, and construct a target re-identification model by using the feature alignment network, the plurality of feature extraction networks and a pooling layer, a full connection layer and a classification layer corresponding to each feature extraction network;

the system comprises an acquisition module, a target re-identification module and a storage module, wherein the acquisition module is configured to acquire a training data set, the training data set comprises a plurality of groups of training images, each group of training images comprises a plurality of images of different types related to the same object, and the number of feature extraction networks in the target re-identification model is the same as the number of images in each group of training images;

a processing module configured to input each set of training images into the target re-recognition model: outputting image features of the images of the type corresponding to the feature extraction network in the group of training images through each feature extraction network, outputting fusion features corresponding to a plurality of images in the group of training images through the feature alignment network, and outputting recognition results of the images of the type corresponding to the feature extraction network in the group of training images through a classification layer corresponding to each feature extraction network;

A computing module configured to compute a classification loss based on a plurality of recognition results for each set of training images, a feature alignment loss based on a plurality of image features and fusion features for each set of training images;

and the optimizing module is configured to optimize model parameters of the target re-identification model according to the classification loss and the characteristic alignment loss so as to complete training of the target re-identification model.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.