CN112395449A

CN112395449A - Face retrieval method and device

Info

Publication number: CN112395449A
Application number: CN201911089829.2A
Authority: CN
Inventors: 陈凯; 龚文洪; 申皓全; 王铭学; 赖昌材; 胡翔宇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-08-15
Filing date: 2019-11-08
Publication date: 2021-02-23

Abstract

The application provides a face retrieval method and a face retrieval device, and relates to the fields of artificial intelligence and computer vision. The method comprises the following steps: acquiring a face image to be retrieved; performing feature extraction on the face image through a first feature extraction model to obtain a first face feature; inputting the face image and the face features into a first feature mapping model to obtain standard features corresponding to the output face features, wherein the first feature mapping model is obtained by training according to target features corresponding to the face image; and according to the standard characteristics, carrying out face retrieval on the face image. According to the method and the system, the features extracted by the feature extraction models are spliced, and the spliced features are used as the construction basis of the standard features, so that the face retrieval system can select a proper feature space by utilizing the comprehensive action of the feature extraction models, and the accuracy of face retrieval is improved.

Description

Face retrieval method and device

Technical Field

The application relates to the field of artificial intelligence and computer vision, in particular to a face retrieval method and device.

Background

With the development of science and technology, face retrieval is a new biological recognition technology which integrates computer image processing knowledge and biometric knowledge. At present, the face retrieval is widely applied to relevant scenes such as identity recognition, identity verification and the like (such as security monitoring, entrance guard gates and the like).

In the face retrieval technology, usually, a face image to be retrieved is given, and a face retrieval system compares the face image with a plurality of face images in a specified face library to find out the most similar face image or face images. The face retrieval system does not directly calculate the similarity between the face image to be retrieved and the face images in the face library, but represents all the images as features, and calculates the similarity with each other by using the features, so as to find out the most similar face image or face images. Generally, different face features can be extracted by different feature extraction models, but the face features from different feature extraction models cannot be directly compared, in order to enable the face features from different feature extraction models to be directly compared, the face features extracted by all the feature extraction models need to be mapped to the same feature space, and feature comparison is performed in the feature space.

Disclosure of Invention

The application provides a face retrieval method and a face retrieval device, which are used for selecting a proper feature space by utilizing the comprehensive action of a plurality of feature extraction models and improving the accuracy of face retrieval.

In a first aspect, the present application provides a face retrieval method, which may be applied to relevant scenes such as identity recognition, identity verification, and the like. The face retrieval method may include: acquiring a face image to be retrieved, wherein the face image to be retrieved can be an image shot by a camera or an image manually uploaded by a user; performing feature extraction on the face image through a first feature extraction model to obtain a first face feature; inputting the face image and the first face feature into a first feature mapping model to obtain a standard feature corresponding to the output first face feature, wherein the first feature mapping model is obtained by training according to a target feature corresponding to the face sample image; the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model; and according to the standard characteristics, carrying out face retrieval on the face image.

In the application, a first face feature in a face image may be divided into a structured feature and an unstructured feature, where the structured feature may include a feature for representing a face attribute, and the face attribute may refer to some specific physical meanings of the face image, such as age, gender, and/or angle, which are extracted from the face image through a structured feature extraction model; the unstructured features may include vectors for representing face features, the face features may refer to features without specific physical meanings in the face image, and may be composed of a string of numbers, which may also be referred to as feature vectors, extracted from the face image through an unstructured feature extraction model, and the similarity between the feature vectors may be used to represent the similarity between the face image to be retrieved and the face template image.

In the application, the human face features and the human face image are jointly used as the input of the feature mapping model, when the appropriate standard features are difficult to obtain only by using the human face features, the more appropriate standard features are obtained through the additional information provided by the human face image, and the accuracy of human face retrieval is improved.

According to the method and the system, the features extracted by the feature extraction models are spliced, and the spliced features are used as the construction basis of the standard features, so that the face retrieval system can select a proper feature space by utilizing the comprehensive action of the feature extraction models, and the accuracy of face retrieval is improved. Furthermore, each human face image only needs to pass through one feature extraction model and one feature mapping model to obtain standard features, so that the calculation amount of the system is not multiplied with the number of the models, and the calculation amount of the system is reduced. Furthermore, the feature mapping models correspond to the feature extraction models one to one, and the number of the feature mapping models is consistent with that of the feature extraction models, so that the face retrieval system does not need to train huge number of feature mapping models, and the system calculation amount is reduced.

Based on the first aspect, in some possible embodiments, the target features corresponding to the face sample image are obtained by splicing a plurality of face sample features, and the face sample features are obtained by performing feature extraction on the face sample image by a plurality of second feature extraction models; the plurality of second feature extraction models includes a first feature extraction model; the plurality of second feature extraction models differ at least in the number of different training samples, model structures, training strategies, or feature dimensions.

In the application, the plurality of second feature extraction models may be second feature extraction models of the manufacturer or second feature extraction models from different manufacturers; the plurality of second feature extraction models may include the first feature extraction model, that is, the second feature extraction model may include other feature extraction models on the basis of the first feature extraction model; the training samples, model structures, training strategies, or feature dimensions are different between the plurality of second feature extraction models.

Based on the first aspect, in some possible embodiments, the method further includes: acquiring a human face sample image; inputting the face sample image into a first feature extraction model to obtain an output first face sample feature; and training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model, wherein the second feature mapping model corresponds to the first feature extraction model.

In the present application, the second feature mapping model may be a feature mapping model obtained by a manufacturer through sample training; or the manufacturer and other manufacturers cooperate to train to obtain the feature mapping model, specifically, the input of the second feature mapping model is the face image and the first face feature, the output of the second feature mapping model is the standard feature, the optimization purpose of the training is to fit the target feature as much as possible with the standard feature, the target features are obtained by splicing a plurality of face sample features, the plurality of face sample features are obtained by extracting the features of a face sample image by a first feature extraction model from the manufacturer and a plurality of second feature extraction models from the manufacturer or other different manufacturers, the target features are obtained by splicing a plurality of face sample features respectively extracted by the manufacturer and other manufacturers, when the second feature mapping model is trained, the output of the second feature mapping model needs to be fitted with the target feature as much as possible, and the trained second feature mapping model is the first feature mapping model; and according to the standard characteristics, carrying out face retrieval on the face image.

Based on the first aspect, in some possible embodiments, before training the second feature mapping model according to the face sample image, the first face sample feature, and the target feature corresponding to the face sample image, and obtaining the first feature mapping model satisfying the objective function, the method further includes: acquiring a human face sample image; inputting the face sample image into N second feature extraction models to obtain N output second face sample features, wherein N is a positive integer greater than or equal to 2; multiplying the N second face sample characteristics by the N preset coefficients in a one-to-one correspondence manner; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and acquiring target features corresponding to the face sample image according to the spliced face sample features, wherein the dimension of the target features is less than or equal to the sum of the dimensions of the N second feature extraction models.

Based on the first aspect, in some possible embodiments, the method further includes: acquiring a face sample image, wherein the face sample image has identity information; inputting the face sample image into N second feature extraction models to obtain N output second face sample features; and carrying out face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain N preset coefficients.

Based on the first aspect, in some possible embodiments, the method further includes: configuring N preset coefficients for the N second feature extraction models, wherein the N preset coefficients are equal; or configuring N preset coefficients for the N second feature extraction models according to a preset judgment criterion.

Based on the first aspect, in some possible embodiments, the method further includes: acquiring N corresponding coefficient combinations of the second feature extraction models within a preset coefficient range; correspondingly multiplying the N second face sample characteristics by the coefficient combination; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and according to the characteristics of the spliced human face sample, performing human face retrieval on the human face sample image to obtain a preset coefficient meeting a preset condition in the coefficient combination.

Based on the first aspect, in some possible embodiments, obtaining a target feature corresponding to a face sample image according to a feature of a face sample after splicing includes: reducing the dimension of the spliced human face sample features; and determining the face sample characteristics after dimension reduction as target characteristics corresponding to the face sample images.

Based on the first aspect, in some possible embodiments, the second feature mapping model includes an unique module and a shared module;

correspondingly, training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model, comprising: inputting the face sample image and a plurality of first face sample characteristics into a unique module to obtain output third face sample characteristics, wherein the plurality of first face sample characteristics are extracted from the face sample image through a plurality of different first characteristic extraction models; inputting the third face sample characteristics into a sharing module to obtain standard characteristics corresponding to the plurality of first face sample characteristics; training the unique module and the sharing module according to the face sample image, the first face sample characteristics, the standard characteristics corresponding to the first face sample characteristics and the target characteristics corresponding to the face sample image to obtain a first characteristic mapping model.

Based on the first aspect, in some possible embodiments, the second feature mapping model includes an image branching module, a feature branching module, and a synthesis module;

correspondingly, training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model, comprising: inputting the face sample image into an image branching module to obtain a fourth output face sample characteristic; inputting the first face sample characteristics into a characteristic branch module to obtain fifth face sample characteristics after output, wherein the first face sample characteristics are obtained by extracting a face sample image through a first characteristic extraction model; inputting the fourth face sample characteristic and the fifth face sample characteristic into a synthesis module together to obtain a standard characteristic corresponding to the first face sample characteristic; and training the image branch module, the feature branch module and the comprehensive module according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model.

Based on the first aspect, in some possible implementations, performing face retrieval on a face image according to the standard features includes: determining the similarity between the standard features and the standard features of a first human face sample image, wherein the first human face sample image is any one of the human face sample images; and when the similarity is greater than a first threshold value, the first face sample image is a target of face image retrieval.

In a second aspect, the present application provides a face retrieval apparatus, including: the interface module is used for acquiring a face image to be retrieved, wherein the face image to be retrieved can be an image shot by a camera or an image manually uploaded by a user; the feature extraction module is used for extracting features of the face image through the first feature extraction model to obtain first face features; the characteristic mapping module is used for inputting the face image and the first face characteristic into a first characteristic mapping model to obtain a standard characteristic corresponding to the output first face characteristic, the characteristic output dimension of the first characteristic extraction model is the same as the characteristic input dimension of the first characteristic mapping model, and the first characteristic mapping model is obtained by training according to a target characteristic corresponding to the face sample image; and the face retrieval module is used for carrying out face retrieval on the face image according to the standard characteristics.

Based on the second aspect, in some possible embodiments, the target features corresponding to the face sample image are obtained by stitching a plurality of face sample features, and the face sample features are obtained by performing feature extraction on the face sample image by a plurality of second feature extraction models.

Based on the second aspect, in some possible embodiments, the apparatus further includes: the mapping model training module is used for acquiring a face sample image; inputting the face sample image into a first feature extraction model to obtain an output first face sample feature; and training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model, wherein the second feature mapping model is in one-to-one correspondence with the first feature extraction model. Based on the second aspect, in some possible embodiments, the apparatus further includes: the target characteristic acquisition module is used for acquiring a face sample image before the mapping model training module obtains a first characteristic mapping model meeting a target function; inputting the face sample image into N second feature extraction models to obtain N output second face sample features, wherein N is a positive integer greater than or equal to 2; multiplying the N second face sample characteristics by the N preset coefficients in a one-to-one correspondence manner; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and acquiring target features corresponding to the face sample image according to the spliced face sample features, wherein the dimension of the target features is less than or equal to the sum of the dimensions of the N second feature extraction models.

Based on the second aspect, in some possible embodiments, the target feature obtaining module is further configured to obtain a face sample image, where the face sample image has identity information; inputting the face sample image into N second feature extraction models to obtain N output second face sample features; and carrying out face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain N preset coefficients.

Based on the second aspect, in some possible embodiments, the target feature obtaining module is specifically configured to configure N preset coefficients for the N second feature extraction models, where the N preset coefficients are equal to each other; or configuring N preset coefficients for the N second feature extraction models according to a preset judgment criterion.

Based on the second aspect, in some possible embodiments, the target feature obtaining module is specifically configured to obtain, within a preset coefficient range, N coefficient combinations corresponding to the second feature extraction models; correspondingly multiplying the N second face sample characteristics by the coefficient combination; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and according to the characteristics of the spliced human face sample, performing human face retrieval on the human face sample image to obtain a preset coefficient meeting a preset condition in the coefficient combination.

Based on the second aspect, in some possible embodiments, the target feature obtaining module is further configured to perform dimension reduction on the spliced human face sample features; and determining the face sample characteristics after dimension reduction as target characteristics corresponding to the face sample images.

Based on the second aspect, in some possible embodiments, the second feature mapping model includes an unique module and a shared module;

correspondingly, the mapping model training module is also used for inputting the face sample image and the first face sample characteristics into the unique module to obtain the output third face sample characteristics, and the first face sample characteristics are extracted from the face sample image through different first characteristic extraction models; inputting the third face sample characteristics into a sharing module to obtain standard characteristics corresponding to the plurality of first face sample characteristics; training the unique module and the sharing module according to the face sample image, the first face sample characteristics, the standard characteristics corresponding to the first face sample characteristics and the target characteristics corresponding to the face sample image to obtain a first characteristic mapping model.

Based on the second aspect, in some possible embodiments, the second feature mapping model includes an image branching module, a feature branching module, and a synthesis module;

correspondingly, the mapping model training module is also used for inputting the face sample image into the image branching module to obtain a fourth output face sample characteristic; inputting the first face sample characteristics into a characteristic branch module to obtain fifth face sample characteristics after output, wherein the first face sample characteristics are obtained by extracting a face sample image through a first characteristic extraction model; inputting the fourth face sample characteristic and the fifth face sample characteristic into a synthesis module together to obtain a standard characteristic corresponding to the first face sample characteristic; and training the image branch module, the feature branch module and the comprehensive module according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model.

Based on the second aspect, in some possible embodiments, the face retrieval module is specifically configured to: determining the similarity between the standard features and the standard features of a first human face sample image, wherein the first human face sample image is any one of the human face sample images; and when the similarity is larger than a first threshold value, the first face sample image is a target of face image retrieval.

The interface module mentioned in the second aspect may be a receiving interface, a receiving circuit, a receiver, or the like; the feature extraction module, the feature mapping module, the face retrieval module, the mapping model training module, and the target feature acquisition module may be one or more processors.

In a third aspect, the present application provides a face retrieval device, which may include: a processor and a communication interface, the processor being operable to support the face retrieval device to implement the functionality referred to in the first aspect or any one of the possible implementations of the first aspect, for example: the processor can acquire the face image to be retrieved through the communication interface.

In some possible embodiments, the face retrieval device may further include a memory for storing computer-executable instructions and data necessary for the face retrieval device. When the face retrieval device is running, the processor executes the computer executable instructions stored by the memory to cause the face retrieval device to perform the face retrieval method according to the first aspect or any one of the possible embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium storing instructions for executing the face retrieval method according to any one of the first aspect above when the instructions are executed on a computer.

In a fifth aspect, the present application provides a computer program or a computer program product, which when executed on a computer, causes the computer to implement the face retrieval method of any one of the above first aspects.

It should be understood that the second to fifth aspects of the present application are consistent with the technical solution of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding possible implementation are similar, and are not described again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

FIG. 1 is a schematic flow chart of a face retrieval method;

FIG. 2 is a schematic flow chart of another face retrieval method;

FIG. 3 is a schematic flow chart of another face retrieval method;

FIG. 4 is a schematic diagram of human face features in an embodiment of the present application;

fig. 5 is a schematic flow chart of a face retrieval method in the embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for training a first feature mapping model according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of a method for training a feature mapping model according to an embodiment of the present application;

fig. 8 is a schematic flowchart of a method for obtaining target features corresponding to a face sample image in an embodiment of the present application;

FIG. 9 is a flow chart illustrating a method for training unique modules and shared modules in an embodiment of the present application;

FIG. 10 is a flowchart illustrating a method for performing feature mapping by using a first feature mapping model according to an embodiment of the present application;

FIG. 11 is a flowchart illustrating a method for training an image branching module, a feature branching module, and a synthesis module according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a face retrieval device in an embodiment of the present application;

fig. 13 is a schematic structural diagram of a face retrieval device in the embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings. In the following description, reference is made to the accompanying drawings which form a part hereof and in which is shown by way of illustration specific aspects of embodiments of the present application or in which specific aspects of embodiments of the present application may be employed. It should be understood that embodiments of the present application may be used in other ways and may include structural or logical changes not depicted in the drawings. For example, it should be understood that the disclosure in connection with the described methods may equally apply to the corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more particular method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the described one or more method steps (e.g., a unit performs one or more steps, or multiple units, each of which performs one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a particular apparatus is described based on one or more units, such as functional units, the corresponding method may comprise one step to perform the functionality of the one or more units (e.g., one step performs the functionality of the one or more units, or multiple steps, each of which performs the functionality of one or more of the plurality of units), even if such one or more steps are not explicitly described or illustrated in the figures. Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

The embodiment of the application provides a face retrieval method which can be widely applied to relevant scenes such as identity recognition, identity verification and the like. The face retrieval system does not directly calculate the similarity between the face image to be retrieved and the face images in the face library, but represents all the images as features, and calculates the similarity with each other by using the features, so as to find out the most similar face image or face images. Generally, a plurality of feature extraction models exist in a face retrieval system, all face images need to extract features by using all the feature extraction models, and the features extracted from the same face image are spliced or dimension is compressed by using a dimension reduction method to obtain final output features. Fig. 1 is a schematic flow diagram of a face retrieval method, and as shown in fig. 1, different feature extraction models A, B and C exist in a face retrieval system, a face image 1 and a face image 2 are respectively input into the three feature extraction models to obtain features a1, B1 and C1 of the face image 1 and features a2, B2 and C2 of the face image 2, then the face retrieval system splices the three features of the same face image to obtain final output features, namely, a face image 1 feature and a face image 2 feature, and finally the face retrieval system compares the face image 1 feature with the face image 2 feature, thereby completing face retrieval. However, since each face image needs to go through all the feature extraction models, the amount of computation of the system is multiplied by the number of feature extraction models. Optionally, the feature extraction models A, B and C may be feature extraction models in different feature dimensions, or may be a plurality of feature extraction models with the same function in the same feature dimension.

In the embodiment of the application, a face retrieval method is further provided, in the method, a feature mapping model is adopted, a face retrieval system maps features (namely, source domain features) extracted by one feature extraction model to a feature space (namely, target domain feature space) corresponding to the other feature extraction model, and feature comparison is performed in the feature space, so that mutual search among the features is realized, and face retrieval is further completed. For example, fig. 2 is a schematic flow chart of another face retrieval method, and as shown in fig. 2, a face image 1 obtains a feature a (i.e., a face image 1 feature) through a feature extraction model a, a face image 2 obtains a feature B through a feature extraction model B, a face image 2 obtains a feature C through a feature extraction model C, the feature B and the feature C are respectively mapped to a feature space corresponding to the model a through a feature mapping model to obtain a face image 2 feature 1 and a face image 2 feature 2, and are compared with the face image 1 feature (i.e., the feature a) in the feature space corresponding to the model a, the feature a respectively forms two sets of feature pairs with the face image 2 feature 1 and the face image 2 feature 2, and each set of feature pairs is compared in the feature space corresponding to the model a, thereby realizing mutual search between features and further completing face retrieval. In the embodiment of the application, all the features are mapped to the feature space corresponding to the feature extraction model a, so that the features a do not need to be mapped and can be directly used as the features of the face image 1 to be compared with the features of the face image 2. In order to realize the comparison between the characteristics, a characteristic mapping model needs to be trained between each pair of characteristics, and when the number of the characteristic extraction models is n (n is more than or equal to 2), the number of the characteristic mapping models is n

Can be shown as the following equation (1):

it can be seen that a large number of feature mapping models need to be trained in the face retrieval system. Further, since the feature mapping model only takes the source domain features as input, the effect of feature mapping cannot be guaranteed under the condition that the source domain feature expression capability is poor.

Further, in the embodiment of the present application, a face retrieval method is further provided, in the method, different face features can be extracted by different feature extraction models, but different face features cannot be directly compared, in order to enable different face features to be directly compared, a face retrieval system needs to map all extracted face features to the same feature space, and perform feature comparison in the feature space, thereby implementing mutual search between features, and further completing face retrieval. For example, fig. 3 is a schematic flow chart of a face retrieval method in the embodiment of the present application, and as shown in fig. 3, a face retrieval system inputs a face image 1 into a feature extraction model a to obtain a feature a, a face retrieval system inputs a face image 2 into a feature extraction model B to obtain a feature B, a face retrieval system inputs a face image 3 into a feature extraction model C to obtain a feature C, and then the face retrieval system maps the feature B and the feature C into the same feature space (assumed to be the feature space corresponding to the feature extraction model a) to obtain a face image 2 feature and a face image 3 feature, in the embodiment of the present application, since all the features are mapped into the feature space corresponding to the feature extraction model a, the feature a does not need to be mapped, and can be directly used as the face image 1 feature to be compared with the face image 2 feature and the face image 3 feature, in the feature space corresponding to the feature extraction model A, the features of the face image 2 and the features of the face image 3 can be directly compared, so that mutual search among the features is realized, and face retrieval is further completed. However, for the feature extraction models A, B and C, the representation capabilities of the feature extraction models are different under different scenes, and in a case that the advantages of a single feature extraction model presented under each scene are not prominent, it is a problem to select which feature space corresponding to which model is to be used as the finally mapped feature space; further, since all features are finally mapped to the feature space corresponding to a single feature extraction model, the comprehensive effect of a plurality of feature extraction models is not exerted.

In order to solve the above problems, an embodiment of the present application provides a face retrieval method, which may be applied to the face retrieval system.

It should be noted that, in the embodiment of the present application, fig. 4 is a schematic diagram of a face feature in the embodiment of the present application, and as shown in fig. 4, the face feature in a face image may be divided into a structured feature and an unstructured feature, where the structured feature may include a feature for representing a face attribute, and the face attribute may refer to some specific physical meanings of the face image, such as age, gender, angle, and the like, extracted from the face image through a structured feature extraction model; the unstructured features may include vectors for representing face features, the face features may refer to features without specific physical meanings in the face image, and may be composed of a string of numbers, which may also be referred to as feature vectors, extracted from the face image through an unstructured feature extraction model, and the similarity between the feature vectors may be used to represent the similarity between the face image to be retrieved and the face template image.

The above-mentioned structural feature extraction model and the unstructured feature extraction model are both machine learning models (for example, Convolutional Neural Networks (CNN) — CNN is essentially a mapping from input to output, and can learn a large number of mapping relationships between input and output without any precise mathematical expression between input and output.

In some possible embodiments, the "feature" described in the following embodiments may be an unstructured feature, or may be a spliced feature obtained by splicing a structured feature and an unstructured feature, and the "feature extraction model" may be an unstructured feature extraction model, or may be a model combination composed of a structured feature extraction model and an unstructured feature extraction model, which is not specifically limited in this embodiment.

Fig. 5 is a schematic flow chart of a face retrieval method in the embodiment of the present application, and as shown in fig. 5, the method may include:

s501: acquiring a face image to be retrieved;

in the embodiment of the application, the face image acquired by the face retrieval system may be an image directly captured by the retrieval system, such as an image shot by a camera of the face retrieval system; or the user manually inputs the image of the face retrieval system, if the user needs to retrieve the target person, the image of the target person is directly input into the face retrieval system; but also an image of a person in the face retrieval system gallery.

In the embodiment of the application, the face retrieval system receives an input face image to be retrieved. Optionally, the face retrieval system may also receive an input base image (i.e., a face template image). The face template image can be used for comparing with the face image, so that the mutual searching of the features of the face image is realized, and the face retrieval of the face image is completed.

S502: inputting the face image into a first feature extraction model to obtain a first face feature;

in the embodiment of the application, the face retrieval device can train the first feature extraction model by using a large number of face sample images, so that the first face features of the face images can be obtained after the face retrieval device inputs the face images into the first feature extraction model. Optionally, the first feature extraction model may be an unstructured feature extraction model, or may be a model combination composed of a structured feature extraction model and an unstructured feature extraction model.

S503: inputting the face image and the first face feature into a first feature mapping model for feature mapping to obtain a standard feature corresponding to the output first face feature;

in this embodiment of the present application, the face retrieval device may use a face sample image to train a first feature mapping model for the first feature extraction model in advance, where a feature output dimension of the first feature extraction model is the same as a feature input dimension of the first feature mapping model, and then the face retrieval device may input the face image to be retrieved and the first face feature corresponding to the face image into the first feature mapping model to perform feature mapping, so as to obtain an output standard feature.

It should be noted that the first feature mapping model is obtained by training according to the face sample image and the target features corresponding to the face sample image. Optionally, the target features corresponding to the face sample image are obtained by splicing a plurality of face sample features, and the face sample features are obtained by extracting features of the face sample image by a plurality of second feature extraction models. The plurality of second feature extraction models includes a first feature extraction model; the plurality of second feature extraction models differ at least in the number of different training samples, model structures, training strategies, or feature dimensions.

As a possible implementation manner, the first feature extraction model and the second feature extraction model may be the same feature extraction model or different feature extraction models. The second feature extraction model can be a second feature extraction model of the manufacturer or a second feature extraction model from other different manufacturers; the second feature extraction model may comprise the first feature extraction model, i.e. the second feature extraction model may comprise further feature extraction models on the basis of the first feature extraction model. The training samples, model structures, training strategies or feature dimensions may also be different between second feature extraction models of different vendors. For example, the feature dimension of the second feature extraction model of vendor A is 256 and the feature dimension of the second feature extraction model of vendor B is 512.

S504: and according to the standard characteristics, carrying out face retrieval on the face image.

In the embodiment of the present application, after obtaining the standard features of the face image to be retrieved through S503, the face retrieval device may directly compare the standard features with the standard features of the plurality of face template images to find the most similar features, and further obtain one or more face template images, thereby implementing mutual feature search and completing face retrieval.

It should be noted that the plurality of face template images and the face image to be retrieved can be input into the face retrieval device together, and S501 to S503 are sequentially executed to complete the extraction of the face features, so as to compare the face template images with the standard features of the face image to be retrieved; or the face template images are input into the face retrieval equipment in advance to finish the extraction of the face features, the standard features corresponding to all the face template images are obtained, then the standard features corresponding to the face template images are stored for reading and comparing the standard features corresponding to all the face template images after the standard features of the face images to be retrieved are obtained subsequently, and then the face retrieval is finished. Of course, the face image to be retrieved and the face template image may also be subjected to feature extraction and feature comparison in other manners as long as face retrieval can be completed, and the embodiment of the present application is not particularly limited.

Next, in S504, a face retrieval process is performed on the face image based on the standard features.

When the face recognition device retrieves the face images according to the standard features, the standard features need to be compared with the standard features of the plurality of face sample images, and the standard features of the plurality of face sample images can be provided by the manufacturer or other manufacturers. And determining the standard feature of the face sample image which is compared with the standard feature at present as the standard feature of the first face sample image, determining the similarity between the standard feature and the standard feature of the first face sample image, and when the similarity is greater than a first threshold value, taking the first face sample image as a target of face image retrieval.

The method can realize the mutual searching of the characteristics of different models of the same manufacturer and the mutual searching of the characteristics of different manufacturers.

The following describes the training process of the first feature mapping model in S503.

Fig. 6 is a flowchart illustrating a method for training a first feature mapping model in an embodiment of the present application, and referring to fig. 6, the method may include:

s601: acquiring a human face sample image;

in the embodiment of the application, the face retrieval device obtains an input face sample image, and the face sample image can be one or more images in a large number of face sample images.

S602: inputting the face sample image into a first feature extraction model to obtain an output first face sample feature;

in the embodiment of the application, the face retrieval equipment inputs the face sample image into the first feature extraction model to obtain the first face sample feature of the face sample image. The first face sample feature may be an unstructured feature of the face sample image, or a spliced feature formed by splicing a structured feature and an unstructured feature of the face sample image.

S603: and training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model.

And the second feature mapping model corresponds to the first feature extraction model.

In the embodiment of the application, the face retrieval device may input the face sample image into a plurality of different feature extraction models to obtain a plurality of output face sample features, further multiply each face sample feature by a corresponding preset coefficient, splice the multiplied face sample features, and finally obtain a target feature corresponding to the face sample image according to the spliced face sample features, which may also be referred to as a target standard feature. Then, the face retrieval device inputs the face sample image and the corresponding first face sample feature into the second feature mapping model to obtain an output standard sample feature, and the similarity between the standard sample feature and the target feature is maximized by adjusting the parameters of the second feature mapping model, at this time, the optimized target function can be that the cosine similarity between the standard sample feature and the target feature is as large as possible, when the target function is converged, the second feature mapping model is trained, and the trained second feature model is the first feature mapping model.

In this embodiment of the present application, the second feature mapping model may be a feature mapping model obtained by a manufacturer through sample training; specifically, the facial sample image is input into a plurality of different feature extraction models by facial retrieval equipment of the factory or the factory and other factories to obtain a plurality of output facial sample features, each facial sample feature is multiplied by a corresponding preset coefficient, the multiplied facial sample features are spliced, and finally, the target feature corresponding to the facial sample image is obtained according to the spliced facial sample features. Next, the factory family face retrieval device inputs the face sample image and the corresponding first face sample feature into the second feature mapping model to obtain an output standard sample feature, and maximizes the similarity between the standard sample feature and the target feature (the target feature is obtained only by the factory family or the factory family and other factories in cooperation) by adjusting the parameters of the second feature mapping model, at this time, the optimized target function may be that the cosine similarity between the standard sample feature and the target feature is as large as possible, when the target function converges, the second feature mapping model is trained, and the trained second feature model is the first feature mapping model.

The scheme of the invention can be used for feature inter-searching between models of different versions of the same manufacturer. For example, a manufacturer a has deployed a set of face retrieval systems, and currently needs to upgrade an old feature extraction model to a new feature extraction model, then feature inter-searching between the new and old models can be completed through the following operations: training new and old feature mapping models matched with the new and old feature extraction models; inputting the characteristics extracted by the bottom library image and the old characteristic extraction model into an old characteristic mapping model together to obtain the standard characteristics of the bottom library image; inputting the image to be retrieved into a new feature extraction model to extract features, and inputting the features extracted by the new feature extraction model and the image to be retrieved into a new feature mapping model together to obtain standard features of the image to be retrieved; and searching and comparing the standard features of the image to be searched with the standard features of the images in the bottom library, wherein the images with higher similarity in the bottom library are ranked in the front.

The scheme of the invention can also be used for feature inter-searching between models of the same manufacturer running on different devices. For example, if the model operated by the manufacturer a on the central server is a large feature extraction model (the model structure is more complex), and the model operated on the camera is a small feature extraction model (the model structure is lighter), the feature inter-search between the large and small models can be completed by: training a big and small feature mapping model matched with the big and small feature extraction model; extracting features of the image stored in the central server by using a large feature extraction model, and inputting the features and the image into a large feature mapping model together to obtain standard features of the image of the central server; extracting features of the image stored on the camera by using a small feature extraction model, and inputting the features and the image into a small feature mapping model together to obtain standard features of the camera image; and comparing the standard features of the central server image with the standard features of the camera image to calculate the similarity, and using the similarity for retrieval and sequencing.

The scheme of the invention can also be used for feature inter-searching between models of different manufacturers. For example, if the manufacturer a model is the feature extraction model a and the manufacturer B model is the feature extraction model B, the feature inter-search between A, B models can be completed by: training a feature mapping model A, B matched with the feature extraction model A, B; the image distributed to the manufacturer A utilizes the feature extraction model A to extract features, the features and the image are jointly input into the feature mapping model A, and standard features of the image of the manufacturer A are obtained; the image distributed to the manufacturer B utilizes the feature extraction model B to extract features, the features and the image are jointly input into the feature mapping model B, and standard features of the image of the manufacturer B are obtained; and comparing the standard features of the vendor A image with the standard features of the vendor B image to obtain the similarity of the standard features and using the similarity for searching and sequencing.

The plurality of feature extraction models may be unstructured feature extraction models, or may be a combination of a structured feature extraction model and an unstructured feature extraction model, and the embodiment of the present application is not particularly limited.

For example, fig. 7 is a flowchart illustrating a method for training a feature mapping model in an embodiment of the present application, referring to fig. 7, in the training stage, the face retrieval equipment firstly obtains a target characteristic A corresponding to a face sample image, inputs the face sample image into a characteristic extraction model, obtains the face sample characteristic from the characteristic extraction model, then, the face retrieval equipment inputs the face sample image and the face sample characteristics into a characteristic mapping model corresponding to the characteristic extraction model to obtain output standard sample characteristics B, then, the face retrieval device calculates the similarity between the target feature a and the standard sample feature B, and according to the optimized objective function, namely, the cosine similarity between the features A and B is as large as possible, so that each parameter in the feature mapping model is adjusted until the target function is converged, and thus, the training of the feature mapping model is completed.

In some possible embodiments, the cosine similarity of two features can be calculated by equation (2):

wherein, A is as described above_iAnd B_iRespectively representing the components of the characteristics A and B, wherein k is the number of the components of the characteristics A and B, and k is a positive integer.

Of course, optionally, other functions (such as euclidean similarity, euclidean distance, and the like) capable of measuring the similarity between the features may be used as the objective function, except that the objective function may be cosine similarity, and the embodiment of the present application is not particularly limited.

In some possible embodiments, the face retrieval device may obtain the target feature corresponding to the face sample image by the following method. Firstly, a face retrieval device acquires a face sample image; optionally, the face retrieval device obtains an input face sample image, where the face sample image may be one or more images in a large number of face sample images; then, the face retrieval equipment inputs the face sample image into N second feature extraction models to obtain N output second face sample features; wherein N is a positive integer greater than or equal to 2; optionally, the face retrieval device inputs the face sample image into N different second feature extraction models, where the second feature extraction models may be unstructured feature extraction models or model combinations composed of structured feature extraction models and unstructured feature extraction models, and the embodiment of the present application is not particularly limited. Through the N second feature extraction models, the face retrieval equipment can obtain N second face sample features which are output, and one second feature extraction model outputs one second face sample feature. Next, the face retrieval equipment multiplies the N second face sample characteristics by N preset coefficients in a one-to-one correspondence mode; therefore, for the second feature extraction model with strong and weak self capacity, different coefficients are given to the features extracted by different feature extraction models, and the function of matching each feature extraction model with the capacity of the feature extraction model can be effectively exerted; the face retrieval equipment splices the multiplied N second face sample features to obtain spliced face sample features; and the face retrieval equipment acquires target features corresponding to the face sample image according to the spliced face sample features, wherein the dimension of the target features is less than or equal to the sum of the dimensions of the N second feature extraction models. Therefore, in the process of calculating the target features, the features corresponding to the feature extraction models are spliced, and the spliced features are used as the construction basis of the target features, so that the available information can be maximized.

In some optional embodiments, after a batch of data is prepared, the face retrieval device may obtain a dimension reduction matrix by using, for example, a Principal Component Analysis (PCA) algorithm, a Linear Discriminant Analysis (LDA) algorithm, an Auto Encoder (AE), and the like, and then, for any spliced face sample feature, obtain a dimension reduced face sample feature by multiplying the dimension reduction matrix, and determine the dimension reduced face sample feature as a target feature corresponding to the face sample image. In the embodiment of the application, dimension reduction is performed on the spliced features, so that feature comparison time can be reduced on one hand, and therefore retrieval efficiency is improved, and redundant information can be removed on the other hand, and robustness of standard features is improved.

It should be noted that AE is a neural network that uses a back propagation algorithm to make an output value equal to an input value, and includes an encoder and a decoder, for the embodiment of the present application, the encoder first compresses an input stitching feature into a potential spatial representation, the dimension of the potential spatial representation is lower than that of the input stitching feature, and then the decoder reconstructs an output (the dimension is the same as the input stitching feature) through the potential spatial representation, and the output should be as close as possible to the input stitching feature. To achieve this, the objective function may be set to the cosine similarity of the input stitching feature and the encoder output target feature. After the form of the target function is determined, different face sample characteristics are input, the gradient of the target function with respect to parameters in the encoder and the decoder can be calculated, the parameters in the encoder and the decoder can be adjusted based on the gradient until the change of the target function after the update and the training is smaller than a set value (namely the target function is converged), and thus the parameters of the encoder and the decoder are determined. At this time, the encoder can complete the function of reducing the dimension of the input splicing feature by using the parameters obtained by training.

For example, fig. 8 is a schematic flow chart of a method for obtaining target features corresponding to a face sample image in the embodiment of the present application, and as shown in fig. 8, A, B and C second feature extraction models exist in a face retrieval device, and preset coefficients corresponding to the feature extraction models are respectively: w is a_a＝0.3、w_b＝0.5、w_c0.2; the face sample features obtained by the second feature extraction model A, B and C of the input face sample image are respectively: f_A＝[0.04，…，0.08]、F_B＝[0.06，…，0.03]And F_C＝[0.05，…，0.05](ii) a The face retrieval device may utilize w_a、w_bAnd w_cFor each face sample characteristic F_A、F_BAnd F_CCorrespondingly multiplied bitwise, i.e. the face sample features are weighted with a preset coefficient, in particular F_AEach bit of the sum w_aWF obtained by multiplication_A＝[0.012，…，0.024]，WF_BEach bit of the sum w_bWF obtained by multiplication_B＝[0.03，…，0.015]，F_CEach bit of the sum w_cWF obtained by multiplication_C＝[0.01，…，0.01](ii) a Next, the face retrieval device pair passes through WF_A、WF_BAnd WF_CSplicing is carried out, and the spliced face sample characteristics CWF are obtained as [0.012, …, 0.024, 0.03, …, 0.015, 0.01, … and 0.01](ii) a Finally, the CWF is multiplied by the dimensionality reduction matrix to obtain the target feature SF ═ 0.03, …, 0.07]。

In the embodiment of the present application, the above-mentioned preset coefficient may be obtained in the following manner without being limited thereto.

In the first way, the preset coefficient can be used as a learnable parameter, and the preset coefficient is determined by training a face recognition model. Specifically, firstly, the face retrieval device may obtain a face sample image, and at this time, the face sample image has corresponding identity information; then, the face retrieval equipment inputs the face sample image into N second feature extraction models to obtain N output second face sample features; and then, the face retrieval equipment performs face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain N preset coefficients. Optionally, in the process of obtaining the preset coefficient, a face recognition model may be used, and the input data may be second face sample features extracted by each second feature extraction model and corresponding identity information; the optimized objective function can be that the features of the face samples with the same identity information are as close as possible, and the features of the face samples with different identity information are as far as possible. After the form of the target function is determined, the face retrieval device inputs N second face sample characteristics and corresponding identity information into the face recognition model, calculates the gradient of the target function relative to parameters in the face recognition model and each preset coefficient required to be determined, and can adjust the parameters of the face recognition model and the preset coefficient required to be determined based on the gradient until the change of the updated target function is smaller than a set value (namely the target function is converged), and takes the value of the preset coefficient required to be determined at the moment as a final value, so that the preset coefficient is obtained.

In some possible embodiments, the objective function in obtaining the preset coefficients may be a triple loss function as shown in equation (3):

wherein M is the number of training samples,

and

is the face sample image and its features,

and

the face sample image and the characteristics thereof are the same as the identity information of the face sample image,

and

the face sample image and the characteristics thereof are different from the identity information of the face sample; α is the difference between the desired distance between the pair of positive samples and the distance between the pair of negative samples, and when the distance between the pair of negative samples is greater than the distance between the pair of positive samples by α, then the loss function value for that triplet is 0, otherwise it is greater than 0.

In the embodiment of the application, the aim that the features of the same identity are as close as possible and the features of different identities are as far as possible can be achieved by minimizing the objective function. It should be noted that, in the embodiment of the present application, there is no limitation on the form of the objective function, and all objective functions that can be used for training a single face recognition model can be used in the technical solution described in the embodiment of the present application.

In a second manner, the preset coefficients may be pre-configured for the second feature extraction models, and at this time, N preset coefficients with equal values are configured for the N second feature extraction models.

In a third manner, the preset coefficients may be N preset coefficients configured for the N second feature extraction models according to a preset criterion. For example, assuming that there are A, B and C second feature extraction models in the face retrieval device with retrieval accuracy rates of 0.98, 0.95 and 0.9, respectively, the face retrieval device may determine the preset coefficients of the second feature extraction models A, B and C as 0.98, 0.95 and 0.9; for another example, the similarity between face sample images with the same identity information is used as a judgment standard, the feature extraction model is assumed to aim at a batch of identity information, each identity information corresponds to a plurality of different face sample images, and for each identity information, the face retrieval equipment calculates the average similarity S between features extracted by two face sample images by using the second feature extraction model_aObtaining S corresponding to all identity information_aThen, these S are calculated_aAverage value AS of_aAS like the second feature extraction models A, B and C_a0.8, 0.6, and 0.5, respectively, the preset coefficients of the second feature extraction models A, B and C are determined to be 0.8, 0.6, and 0.5.

In a fourth mode, the preset coefficients may be obtained based on a hyper-parametric search, and all methods used for the hyper-parametric search may be used for coefficient search. Specifically, the face retrieval device may obtain N coefficient combinations corresponding to the second feature extraction models within a preset coefficient range, then multiply the N second face sample features with the coefficient combinations correspondingly, and then splice the multiplied N second face sample features to obtain spliced face sample features; and finally, the face retrieval equipment performs face retrieval on the face sample image according to the spliced face sample characteristics to obtain a preset coefficient meeting a preset condition in the coefficient combination. For example, in the mesh search, it is assumed that A, B and C, which are coefficient ranges from 0 to 1, exist in the face retrieval device, and A, B and C are divided into 10 parts in the coefficient range, that is, 11 coefficients are found in A, B and C, that is, coefficients of 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1, so that A, B and C share 11 × 11 × 11 ═ 1331 coefficient combinations, N face sample sign features are spliced by using the 1331 coefficient combinations, face retrieval is performed by using the spliced face sample features, and a set of hyper-parameters (i.e., coefficient combinations of A, B and C) with the highest retrieval accuracy is determined as a final preset coefficient.

Of course, the above several ways are only examples of determining the preset coefficient, and the face retrieval device may also determine the preset coefficient through other ways, which is not specifically limited in the embodiment of the present application.

In some possible embodiments, the face retrieval device may further perform joint training on mappings of a plurality of face sample features, where the second feature mapping model includes a unique module and a shared module; supposing that a neural network model comprises 7 layers, wherein the front 4 layers can be unique modules, the back 3 layers can be shared modules, the shared modules and the unique modules are actually neural network layers, and the difference is that the parameters of the unique modules can be changed more flexibly to adapt to the characteristics of the second face sample characteristics, the parameters of the shared modules need to process the input of a plurality of unique modules, all the second face sample characteristics are comprehensively utilized, and the parameters have stronger limitation in the training process. Therefore, the unique module can learn the characteristics of the features, and the shared module can learn the common attributes of the features of the models.

Accordingly, the above S603 may include: inputting the face sample image and a plurality of first face sample characteristics into a unique module for characteristic mapping to obtain output third face sample characteristics, wherein the plurality of first face sample characteristics are obtained by extracting the face sample image through a plurality of different first characteristic extraction models; inputting the third face sample characteristics into a sharing module to obtain standard characteristics corresponding to the plurality of first face sample characteristics; training the unique module and the sharing module according to the face sample image, the first face sample characteristics, the standard characteristics corresponding to the first face sample characteristics and the target characteristics corresponding to the face sample image to obtain a first characteristic mapping model.

For example, a plurality of first feature extraction models are input into a face sample image to obtain a plurality of output first face sample features, the face retrieval device takes each first face sample feature and the face sample image as input at the same time, and the optimized target is a target feature corresponding to the fitted face sample image. Fig. 9 is a schematic flow chart of a method for training unique modules and a sharing module in an embodiment of the present application, and as shown in fig. 9, A, B two first feature extraction models exist in a face retrieval device, a face sample image obtains first face sample features A, B through A, B two first feature extraction models, then obtains third face sample features output through the unique module a of the second feature mapping model a and the unique module B of the second mapping model B, simultaneously uses an original face sample image as input of the unique modules of the second feature mapping models a and B, and finally obtains standard features F output through the sharing modules of the mapping models a and B, respectively_AAnd F_BAnd training the optimization target as close as possible to the target feature of the face sample image, namely F_AShould be as similar as possible to the target feature F, F_BShould also be as similar as possible to F. When the target function is converged, the second feature mapping model is trained, and the trained second feature model is the first feature mapping model.

Further, after each first feature mapping model is trained jointly, fig. 10 is a schematic flow chart of a method for performing feature mapping on the first feature mapping model in this embodiment, and as shown in fig. 10, each first feature mapping model can be used independently to map different face features to standard features and to perform face retrieval.

In some possible embodiments, the second feature mapping model includes an image branch module, a feature branch module, and a synthesis module, where the image branch module may be a convolutional neural network, the feature branch module and the synthesis module may be a fully-connected neural network, and the fully-connected neural network and the convolutional neural network function similarly, except that the connection manner of neurons in the network is different.

Accordingly, the above S603 may include: inputting the face sample image into an image branching module to obtain a fourth output face sample characteristic; inputting the first face sample characteristics into a characteristic branch module to obtain fifth output face sample characteristics, wherein the first face sample characteristics are obtained by extracting a face sample image through a first characteristic extraction model; inputting the fourth and fifth face sample characteristics into a synthesis module together to obtain standard characteristics corresponding to the first face sample characteristics; and training the image branch module, the feature branch module and the comprehensive module according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model.

For example, a face sample image is input into a first feature extraction model to obtain an output first face sample feature, the face retrieval device uses the face sample image and the first face sample feature as input of a first feature mapping model at the same time, and an optimized target is a target feature corresponding to the face sample image. Fig. 11 is a schematic flow chart of a method for training an image branching module, a feature branching module, and a synthesis module in an embodiment of the present application, and is shown in fig. 11, a first face sample feature is obtained by a face sample image through a first feature extraction model, a fifth face sample feature after output is obtained by the first face sample feature through the feature branching module, a fourth face sample feature after output is obtained by the face sample image through the image branching module, the fourth face sample feature and the fifth face sample feature are input to the synthesis module together to obtain an output standard feature, and a training optimization target is as close as possible to a target feature of the face sample image. When the target function is converged, the second feature mapping model is trained, and the trained second feature model is the first feature mapping model. Further, after each first feature mapping model is trained, each first feature mapping model may be used to map the face features obtained by the corresponding first feature extraction model to the standard features, and used for face retrieval.

It should be understood that, in this embodiment, the target feature used in training the second feature mapping model may be obtained only by the manufacturer, or may be obtained by the manufacturer in cooperation with other manufacturers.

Therefore, in the embodiment of the application, the face features and the face images are used as the input of the feature mapping model together, and when the appropriate standard features are difficult to obtain only by using the face features, the more appropriate standard features are obtained through the additional information provided by the face images, so that the accuracy of face retrieval is improved. In addition, the features extracted by the feature extraction models are spliced, and the spliced features are used as a construction basis of standard features, so that the face retrieval equipment can select a proper feature space by utilizing the comprehensive action of the feature extraction models, and the accuracy of face retrieval is improved. Furthermore, each human face image only needs to pass through one feature extraction model and one feature mapping model to obtain standard features, so that the calculation amount of the system is not multiplied with the number of the models, and the calculation amount of the system is reduced. Furthermore, the feature mapping models correspond to the feature extraction models one to one, and the number of the feature mapping models is consistent with that of the feature extraction models, so that the face retrieval equipment does not need to train huge number of feature mapping models, and the system calculation amount is reduced.

Based on the same inventive concept as the above method, an embodiment of the present application provides a face retrieval device, which may be a face retrieval device in the face retrieval apparatus described in the above embodiment or a chip or a system on a chip in the face retrieval device, and may also be a functional module in the face retrieval apparatus for implementing the methods described in the above embodiments. The face retrieval device can realize the functions executed by the face retrieval equipment in the embodiments, and the functions can be realized by executing corresponding software through hardware. The hardware or software comprises one or more modules corresponding to the functions. For example, in a possible implementation manner, fig. 12 is a schematic structural diagram of a face retrieving device in an embodiment of the present application, and referring to fig. 12, the face retrieving device 1200 includes: the interface module 1201 is used for acquiring a face image to be retrieved; the feature extraction module 1202 is configured to input the face image into the first feature extraction model to obtain a face feature; a feature mapping module 1203, configured to input the face image and the face features into a first feature mapping model to obtain standard features corresponding to the output face features, where a feature output dimension of the first feature extraction model is the same as a feature input dimension of the first feature mapping model, and the first feature mapping model is obtained by training according to target features corresponding to the face image; and a face retrieval module 1204, configured to perform face retrieval on the face image according to the standard features.

In some possible embodiments, the apparatus further comprises: the mapping model training module is used for acquiring a face sample image; inputting the face sample image into a first feature extraction model to obtain an output first face sample feature; and training the second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain a first feature mapping model, wherein the second feature mapping model is in one-to-one correspondence with the first feature extraction model.

In some possible embodiments, the apparatus further comprises: the target characteristic acquisition module is used for acquiring a face sample image before the mapping model training module obtains a first characteristic mapping model meeting a target function; inputting the face sample image into N second feature extraction models to obtain N output second face sample features, wherein N is a positive integer greater than or equal to 2; multiplying the N second face sample characteristics by the N preset coefficients in a one-to-one correspondence manner; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and acquiring target characteristics corresponding to the face sample image according to the spliced face sample characteristics.

In some possible embodiments, the target feature obtaining module is further configured to obtain a face sample image, where the face sample image has identity information; inputting the face sample image into N second feature extraction models to obtain N output second face sample features; and carrying out face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain N preset coefficients.

In some possible embodiments, the target feature obtaining module is specifically configured to configure N preset coefficients for the N second feature extraction models, where the N preset coefficients are equal to each other; or configuring N preset coefficients for the N second feature extraction models according to a preset judgment criterion.

In some possible embodiments, the target feature obtaining module is specifically configured to obtain, within a preset coefficient range, N coefficient combinations corresponding to the second feature extraction models; correspondingly multiplying the N second face sample characteristics by the coefficient combination; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and according to the characteristics of the spliced human face sample, performing human face retrieval on the human face sample image to obtain a preset coefficient meeting a preset condition in the coefficient combination.

In some possible embodiments, the target feature obtaining module is further configured to perform dimension reduction on the spliced human face sample features; and determining the face sample characteristics after dimension reduction as target characteristics corresponding to the face sample images.

In some possible embodiments, the second feature mapping model comprises a unique module and a shared module;

It should be further noted that, for the specific implementation processes of the interface module 1201, the feature extraction module 1202, the feature mapping module 1203, the face retrieval module 1204, the mapping model training module, and the target feature obtaining module, reference may be made to the detailed description of the embodiments in fig. 4 to fig. 11, and for brevity of the description, details are not repeated here. In this embodiment of the present application, the interface module 1201 may be configured to perform S501 in the foregoing embodiment, the feature extraction module 1202 may be configured to perform S502 in the foregoing embodiment, the feature mapping module 1203 may be configured to perform S503 in the foregoing embodiment, and the face retrieval module 1203 may be configured to perform S504 in the foregoing embodiment.

The interface module mentioned in the embodiment of the present application may be a receiving interface, a receiving circuit, a receiver, or the like; the feature extraction module, the feature mapping module, the face retrieval module, the mapping model training module, and the target feature acquisition module may be one or more processors.

Based on the same inventive concept as the above method, an embodiment of the present application provides a face retrieval device, where fig. 13 is a schematic structural diagram of the face retrieval device in the embodiment of the present application, and referring to solid lines in fig. 13, a face retrieval device 1300 may include: a processor 1301 and a communication interface 1302, the processor 1301 may be configured to support the face retrieval device 1300 to implement the functions involved in the above embodiments, for example: the processor 1301 can acquire a face image to be retrieved through the communication interface 1302.

In some possible embodiments, referring to the dotted line in fig. 13, the face retrieval device 1300 may further include a memory 1303, a memory 1303 for storing computer-executable instructions and data necessary for the face retrieval device 1300. When the face retrieval device 1300 operates, the processor 1301 executes the computer-executable instructions stored in the memory 1303, so that the face retrieval device 1300 executes the face retrieval method described in the above embodiments.

Based on the same inventive concept as the above method, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored, and when the instructions are executed on a computer, the computer-readable storage medium is used to execute the face retrieval method according to the above embodiments.

Based on the same inventive concept as the above method, the embodiments of the present application provide a computer program or a computer program product, which, when executed on a computer, causes the computer to implement the face retrieval method described in the above embodiments.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by an interoperating hardware unit (including one or more processors as described above).

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A face retrieval method is characterized by comprising the following steps:

acquiring a face image to be retrieved;

performing feature extraction on the face image through a first feature extraction model to obtain a first face feature;

inputting the face image and the first face feature into a first feature mapping model to obtain a standard feature corresponding to the output first face feature, wherein the first feature mapping model is obtained by training according to a target feature corresponding to a face sample image; the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model;

and according to the standard characteristics, carrying out face retrieval on the face image.

2. The method according to claim 1, wherein the target features corresponding to the face sample image are obtained by stitching a plurality of face sample features, and the plurality of face sample features are obtained by performing feature extraction on the face sample image by a plurality of second feature extraction models; the plurality of second feature extraction models includes the first feature extraction model; the plurality of second feature extraction models have at least different training samples, model structures, training strategies, or feature dimensions.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

acquiring a human face sample image;

inputting the face sample image into the first feature extraction model to obtain an output first face sample feature;

training a second feature mapping model according to the face sample image, the first face sample feature and a target feature corresponding to the face sample image to obtain the first feature mapping model, wherein the second feature mapping model corresponds to the first feature extraction model.

4. The method according to claim 3, wherein before the training of a second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain the first feature mapping model satisfying an objective function, the method further comprises:

acquiring a human face sample image;

inputting the face sample image into N second feature extraction models to obtain N output second face sample features, wherein N is a positive integer greater than or equal to 2;

multiplying the N second face sample characteristics by N preset coefficients in a one-to-one correspondence manner;

splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics;

and acquiring target features corresponding to the face sample image according to the spliced face sample features, wherein the dimension of the target features is less than or equal to the sum of the dimensions of the N second feature extraction models.

5. The method of claim 4, further comprising:

acquiring a face sample image, wherein the face sample image has identity information;

inputting the face sample image into the N second feature extraction models to obtain the output N second face sample features;

and carrying out face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain the N preset coefficients.

6. The method of claim 4, further comprising:

configuring the N preset coefficients for the N second feature extraction models, wherein the N preset coefficients are equal; or the like, or, alternatively,

and configuring the N preset coefficients for the N second feature extraction models according to a preset judgment criterion.

7. The method of claim 4, further comprising:

acquiring the coefficient combinations corresponding to the N second feature extraction models within a preset coefficient range;

correspondingly multiplying the N second face sample characteristics by the coefficient combination;

and according to the spliced human face sample characteristics, performing human face retrieval on the human face sample image to obtain the preset coefficients meeting preset conditions in the coefficient combination.

8. The method according to any one of claims 4 to 7, wherein the obtaining of the target feature corresponding to the face sample image according to the spliced face sample feature comprises:

reducing the dimension of the spliced human face sample features;

and determining the face sample features after dimension reduction as target features corresponding to the face sample images.

9. The method of any of claims 3 to 8, wherein the second feature mapping model comprises a unique module and a shared module;

the training a second feature mapping model according to the face sample image, the first face sample feature and the target feature corresponding to the face sample image to obtain the first feature mapping model comprises:

inputting the face sample image and a plurality of first face sample features into the unique module to obtain output third face sample features, wherein the plurality of first face sample features are obtained by extracting the face sample image through a plurality of different first feature extraction models;

inputting the third face sample characteristics into the sharing module to obtain standard characteristics corresponding to the plurality of first face sample characteristics;

training the unique module and the sharing module according to the face sample image, the first face sample features, the standard features corresponding to the first face sample features and the target features corresponding to the face sample image to obtain the first feature mapping model.

10. The method of any of claims 3 to 8, wherein the second feature mapping model comprises an image branching module, a feature branching module, and a synthesis module;

inputting the face sample image into the image branching module to obtain a fourth output face sample characteristic;

inputting the first face sample feature into the feature branching module to obtain an output fifth face sample feature, wherein the first face sample feature is obtained by extracting the face sample image through a first feature extraction model;

inputting the fourth face sample feature and the fifth face sample feature into the synthesis module together to obtain a standard feature corresponding to the first face sample feature;

training the image branch module, the feature branch module and the comprehensive module according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature and the target feature corresponding to the face sample image to obtain the first feature mapping model.

11. The method according to any one of claims 1 to 10, wherein the performing face retrieval on the face image according to the standard feature comprises:

determining similarity between the standard features and standard features of a first human face sample image, wherein the first human face sample image is any one of a plurality of human face sample images;

and when the similarity is larger than a first threshold value, the first face sample image is a target of face image retrieval.

12. A face retrieval apparatus, comprising:

the interface module is used for acquiring a face image to be retrieved;

the feature extraction module is used for extracting features of the face image through a first feature extraction model to obtain first face features;

the feature mapping module is used for inputting the face image and the first face feature into a first feature mapping model to obtain a standard feature corresponding to the output first face feature, and the first feature mapping model is obtained by training according to a target feature corresponding to a face sample image; the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model;

and the face retrieval module is used for carrying out face retrieval on the face image according to the standard characteristics.

13. The apparatus according to claim 12, wherein the target features corresponding to the face sample image are obtained by stitching a plurality of face sample features, and the plurality of face sample features are obtained by performing feature extraction on the face sample image by a plurality of second feature extraction models; the plurality of second feature extraction models includes the first feature extraction model; the plurality of second feature extraction models differ at least in terms of training samples, model structure, training strategy, or feature dimensions.

14. The apparatus of claim 12 or 13, further comprising: the mapping model training module is used for acquiring a face sample image; inputting the face sample image into the first feature extraction model to obtain an output first face sample feature; training a second feature mapping model according to the face sample image, the first face sample feature and a target feature corresponding to the face sample image to obtain the first feature mapping model, wherein the second feature mapping model is in one-to-one correspondence with the first feature extraction model.

15. The apparatus of claim 14, further comprising: the target characteristic acquisition module is used for acquiring a face sample image before the mapping model training module obtains the first characteristic mapping model meeting a target function; inputting the face sample image into N second feature extraction models to obtain N output second face sample features, wherein N is a positive integer greater than or equal to 2; multiplying the N second face sample characteristics by N preset coefficients in a one-to-one correspondence manner; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and acquiring target features corresponding to the face sample image according to the spliced face sample features, wherein the dimension of the target features is less than or equal to the sum of the dimensions of the N second feature extraction models.

16. The apparatus of claim 14, wherein the target feature obtaining module is further configured to obtain a face sample image, and the face sample image has identity information; inputting the face sample image into N second feature extraction models to obtain the output N second face sample features; and carrying out face recognition on the face sample image according to the N second face sample characteristics and the identity information to obtain the N preset coefficients.

17. The apparatus according to claim 14, wherein the target feature obtaining module is specifically configured to configure the N preset coefficients for N second feature extraction models, where the N preset coefficients are equal; or, configuring the N preset coefficients for the N second feature extraction models according to a preset criterion.

18. The apparatus according to claim 14, wherein the target feature obtaining module is specifically configured to obtain N corresponding coefficient combinations of the second feature extraction models within a preset coefficient range; correspondingly multiplying the N second face sample characteristics by the coefficient combination; splicing the multiplied N second face sample characteristics to obtain spliced face sample characteristics; and according to the spliced human face sample characteristics, performing human face retrieval on the human face sample image to obtain the preset coefficients meeting preset conditions in the coefficient combination.

19. The apparatus according to any one of claims 15 to 18, wherein the target feature obtaining module is further configured to perform dimension reduction on the spliced human face sample features; and determining the face sample features after dimension reduction as target features corresponding to the face sample images.

20. The apparatus of any of claims 14 to 19, wherein the second feature mapping model comprises a unique module and a shared module;

the mapping model training module is further configured to input the face sample image and a plurality of first face sample features into the unique module to obtain an output third face sample feature, where the plurality of first face sample features are extracted from the face sample image through a plurality of different first feature extraction models; inputting the third face sample characteristics into the sharing module to obtain standard characteristics corresponding to the plurality of first face sample characteristics; training the unique module and the sharing module according to the face sample image, the first face sample features, the standard features corresponding to the first face sample features and the target features corresponding to the face sample image to obtain the first feature mapping model.

21. The apparatus of any of claims 14 to 19, wherein the second feature mapping model comprises an image branching module, a feature branching module, and an integration module;

the mapping model training module is further used for inputting the face sample image into the image branching module to obtain a fourth output face sample characteristic; inputting the first face sample feature into the feature branching module to obtain an output fifth face sample feature, wherein the first face sample feature is obtained by extracting the face sample image through a first feature extraction model; inputting the fourth face sample feature and the fifth face sample feature into the synthesis module together to obtain a standard feature corresponding to the first face sample feature; training the image branch module, the feature branch module and the comprehensive module according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature and the target feature corresponding to the face sample image to obtain the first feature mapping model.

22. The apparatus according to any one of claims 12 to 21, wherein the face retrieval module is specifically configured to:

23. A face retrieval device, characterized by comprising: a processor and a communication interface;

the communication interface is coupled with the processor, and the processor acquires the face image to be retrieved through the communication interface;

the processor is configured to support the face retrieval device to implement the face unlocking method according to any one of claims 1 to 11.

24. The face retrieval device according to claim 22, wherein the face retrieval device further comprises: the memory is used for storing necessary computer execution instructions and data of the face retrieval equipment; when the face retrieval device is running, the processor executes the computer-executable instructions stored by the memory to cause the face retrieval device to perform the face retrieval method of any one of claims 1 to 11.