WO2021027440A1 - 一种人脸检索方法及装置 - Google Patents

一种人脸检索方法及装置 Download PDF

Info

Publication number
WO2021027440A1
WO2021027440A1 PCT/CN2020/100547 CN2020100547W WO2021027440A1 WO 2021027440 A1 WO2021027440 A1 WO 2021027440A1 CN 2020100547 W CN2020100547 W CN 2020100547W WO 2021027440 A1 WO2021027440 A1 WO 2021027440A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
face
face sample
image
features
Prior art date
Application number
PCT/CN2020/100547
Other languages
English (en)
French (fr)
Inventor
陈凯
龚文洪
申皓全
王铭学
赖昌材
胡翔宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911089829.2A external-priority patent/CN112395449A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20852877.8A priority Critical patent/EP4012578A4/en
Publication of WO2021027440A1 publication Critical patent/WO2021027440A1/zh
Priority to US17/671,253 priority patent/US11881052B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • This application relates to the fields of artificial intelligence and computer vision, and in particular to a face retrieval method and device.
  • face retrieval is an emerging biometric technology that combines computer image processing knowledge and biostatistics knowledge.
  • face retrieval is widely used in identity recognition, identity verification and other related scenarios (such as security monitoring and access control gates, etc.).
  • the face retrieval system compares it with multiple face images in the designated face database to find the most similar face Image or multiple face images.
  • the face retrieval system does not directly calculate the similarity between the face image to be retrieved and the face image in the face database, but represents all the images as features, and uses these features to calculate the similarity to each other , And then find the most similar face image or multiple face images.
  • different feature extraction models can extract different face features, but the face features from different feature extraction models cannot be directly compared.
  • the present application provides a face retrieval method and device to use the comprehensive effect of multiple feature extraction models to select an appropriate feature space to improve the accuracy of face retrieval.
  • this application provides a face retrieval method, which can be applied to related scenarios such as identity recognition and identity verification.
  • the above-mentioned face retrieval method may include: acquiring a face image to be retrieved, the face image to be retrieved may be an image taken by a camera or an image manually uploaded by a user; performing feature extraction on the face image through the first feature extraction model to obtain The first face feature; the face image and the first face feature are input into the first feature mapping model, and the output standard features corresponding to the first face feature are obtained.
  • the first feature mapping model is based on the target corresponding to the face sample image It is obtained by feature training; the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model; and the face image is retrieved according to standard features.
  • the first face feature in the face image can be divided into structured features and unstructured features, where the structured features can include features used to characterize the attributes of the face, and the face attributes can refer to the face.
  • Some specific physical meanings of the image such as age, gender, and/or angle, are extracted from the face image through a structured feature extraction model; unstructured features can include vectors used to represent facial features,
  • the face feature can refer to the feature that has no specific physical meaning in the face image. It is composed of a string of numbers and can also be called a feature vector. It is extracted from the face image through an unstructured feature extraction model.
  • the similarity between can be used to represent the similarity between the face image to be retrieved and the face template image.
  • the face retrieval system can use the comprehensive role of multiple feature extraction models to select appropriate
  • the feature space of this system improves the accuracy of face retrieval. Furthermore, since each face image only needs to pass through one feature extraction model and one feature mapping model to obtain standard features, the calculation amount of the system will not increase exponentially with the number of models, reducing the amount of system calculation. Further, because the feature mapping model and the feature extraction model are in a one-to-one correspondence, the number of feature mapping models is the same as the number of feature extraction models, so that the face retrieval system does not need to train a huge number of feature mapping models, reducing the amount of system calculations.
  • the target feature corresponding to the face sample image is obtained by splicing multiple face sample features, and the face sample feature is obtained from multiple second feature extraction models on the face sample image It is obtained by feature extraction; the multiple second feature extraction models include a first feature extraction model; the multiple second feature extraction models have at least different training samples, model structures, training strategies, or feature dimensions.
  • the multiple second feature extraction models can be the second feature extraction models of the manufacturer or the second feature extraction models from other different manufacturers; the multiple second feature extraction models can include the first feature extraction model That is, the second feature extraction model includes the first feature extraction model, and may also have other feature extraction models; the training samples, model structures, training strategies, or feature dimensions among multiple second feature extraction models are different.
  • the above method further includes: obtaining a face sample image; inputting the face sample image into the first feature extraction model to obtain the output first face sample feature; according to the face sample
  • the image, the first face sample feature, and the target feature corresponding to the face sample image are trained on the second feature mapping model to obtain the first feature mapping model.
  • the second feature mapping model corresponds to the first feature extraction model.
  • the second feature mapping model can be the feature mapping model obtained by the manufacturer through sample training; it can also be the feature mapping model obtained through collaborative training between the manufacturer and other manufacturers.
  • the input of the second feature mapping model is human
  • the output of the face image and the first face feature and the second feature mapping model are standard features.
  • the optimization purpose of training is to fit the target feature as much as possible to the standard feature, and the target feature is obtained by splicing multiple face sample features.
  • the face sample features are extracted from the face sample image by the first feature extraction model from the manufacturer and multiple second feature extraction models from the manufacturer or other different manufacturers.
  • the target features are extracted by the manufacturer and other manufacturers separately
  • the second feature mapping model it is necessary to make the output of the second feature mapping model fit the target feature as much as possible.
  • After the second feature mapping model is trained it becomes the first Feature mapping model; according to standard features, face retrieval on face images.
  • the second feature mapping model is trained based on the face sample image, the first face sample feature, and the target feature corresponding to the face sample image, to obtain a
  • the above method further includes: obtaining a face sample image; inputting the face sample image into N second feature extraction models to obtain the output N second face sample features, where N is greater than or equal to 2.
  • the above method further includes: obtaining a sample face image, which has identity information; and inputting the sample face image into N second feature extraction models to obtain the output N The second face sample features; according to the N second face sample features and identity information, face recognition is performed on the face sample image to obtain N preset coefficients.
  • the above method further includes: configuring N preset coefficients for the N second feature extraction models, where the N preset coefficients are equal; or, according to a preset evaluation criterion, N
  • the second feature extraction model is configured with N preset coefficients.
  • the above method further includes: obtaining N second feature extraction model corresponding coefficient combinations within a preset coefficient range; and combining N second face sample features and coefficients Corresponding multiplication; stitch the multiplied N second face sample features to obtain the stitched face sample features; according to the stitched face sample features, perform face retrieval on the face sample image to obtain the coefficient The preset coefficients in the combination that meet the preset conditions.
  • obtaining the target features corresponding to the face sample image includes: reducing the dimensions of the stitched face sample features; The face sample feature of is determined as the target feature corresponding to the face sample image.
  • the second feature mapping model includes a unique module and a shared module
  • the second feature mapping model is trained to obtain the first feature mapping model, including: combining the face sample image with multiple The first face sample feature input unique module to obtain the output third face sample feature.
  • Multiple first face sample features are extracted from face sample images through different multiple first feature extraction models ; Input the third face sample feature into the sharing module to obtain multiple standard features corresponding to the first face sample features; corresponding to the face sample image, multiple first face sample features, and multiple first face sample features Train the unique module and the shared module to obtain the first feature mapping model.
  • the second feature mapping model includes an image branching module, a feature branching module, and a synthesis module;
  • the second feature mapping model is trained to obtain the first feature mapping model, including: inputting the face sample image into the image
  • the branch module obtains the fourth face sample feature after the output; the first face sample feature is input to the feature branch module to obtain the output fifth face sample feature, the first face sample feature is passed by the face sample image
  • the first feature extraction model is extracted; the fourth face sample feature and the fifth face sample feature are input into the integrated module to obtain the standard feature corresponding to the first face sample feature; according to the face sample image, the first face sample feature
  • the sample feature, the standard feature corresponding to the first face sample feature, and the target feature corresponding to the face sample image are trained on the image branching module, the feature branching module and the synthesis module to obtain the first feature mapping model.
  • performing face retrieval on a face image according to the standard feature includes: determining the similarity between the standard feature and the standard feature of the first face sample image, and the first person
  • the face template image is any face sample image among multiple face sample images; when the similarity is greater than the first threshold, the first face sample image is the target of face image retrieval.
  • the present application provides a face retrieval device, including: an interface module for acquiring a face image to be retrieved, the face image to be retrieved may be an image taken by a camera or an image manually uploaded by a user; feature extraction The module is used to perform feature extraction on the face image through the first feature extraction model to obtain the first face feature; the feature mapping module is used to input the face image and the first face feature into the first feature mapping model to obtain the output
  • the standard feature corresponding to the first face feature, the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model, and the first feature mapping model corresponds to the face sample image
  • the target feature training is obtained; the face retrieval module is used to perform face retrieval on the face image according to the standard features.
  • the target feature corresponding to the face sample image is obtained by splicing multiple face sample features, and the face sample feature is obtained from multiple second feature extraction models on the face sample image Resulted from feature extraction.
  • the above-mentioned apparatus further includes: a mapping model training module for obtaining a face sample image; inputting the face sample image into the first feature extraction model to obtain the output first face Sample features; according to the face sample image, the first face sample feature, and the target feature corresponding to the face sample image, the second feature mapping model is trained to obtain the first feature mapping model, the second feature mapping model and the first feature One-to-one correspondence between extraction models.
  • the above-mentioned device further includes: a target feature acquisition module, configured to acquire a face sample image before the mapping model training module obtains the first feature mapping model that satisfies the objective function;
  • the sample image is input into N second feature extraction models, and the output N second face sample features are obtained, where N is a positive integer greater than or equal to 2;
  • the N second face sample features are combined with N preset coefficients One-to-one correspondence multiplication; stitch the multiplied N second face sample features to obtain the stitched face sample features; according to the stitched face sample features, obtain the target feature corresponding to the face sample image, where ,
  • the dimension of the target feature is less than or equal to the sum of the dimensions of the N second feature extraction models.
  • the target feature acquisition module is also used to acquire a face sample image, which has identity information; the face sample image is input into N second feature extraction models to obtain The output N second face sample features; according to the N second face sample features and identity information, face recognition is performed on the face sample image to obtain N preset coefficients.
  • the target feature acquisition module is specifically configured to configure N preset coefficients for the N second feature extraction models, and the N preset coefficients are equal; or, according to a preset evaluation criterion , Configure N preset coefficients for the N second feature extraction models.
  • the target feature acquisition module is specifically configured to acquire N second feature extraction model corresponding coefficient combinations within a preset coefficient range; combine the N second face sample features Multiply corresponding to the coefficient combination; stitch the multiplied N second face sample features to obtain the stitched face sample features; perform face retrieval on the face sample image according to the stitched face sample features , Obtain the preset coefficients that meet the preset conditions in the coefficient combination.
  • the target feature acquisition module is also used to reduce the dimensionality of the stitched face sample features; determine the reduced dimensionality of the face sample features as the target corresponding to the face sample image feature.
  • the second feature mapping model includes a unique module and a shared module
  • the mapping model training module is also used to input the face sample image and multiple first face sample features into the unique module to obtain the outputted third face sample feature, and the multiple first face sample features are It is extracted from the face sample image through different multiple first feature extraction models; the third face sample feature is input into the sharing module to obtain the standard features corresponding to the multiple first face sample features; according to the face sample image, A plurality of first face sample features, standard features corresponding to the multiple first face sample features, and target features corresponding to the face sample image are trained on the unique module and the shared module to obtain the first feature mapping model.
  • the second feature mapping model includes an image branching module, a feature branching module, and a synthesis module;
  • the mapping model training module is also used to input the face sample image into the image branching module to obtain the output fourth face sample feature; input the first face sample feature into the feature branching module to obtain the output fifth Face sample features, the first face sample feature is extracted from the face sample image through the first feature extraction model; the fourth face sample feature and the fifth face sample feature are input into the synthesis module to obtain the first person Standard features corresponding to the face sample features; according to the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature, and the target feature corresponding to the face sample image, the image branch module, feature branch module and The comprehensive module is trained to obtain the first feature mapping model.
  • the face retrieval module is specifically configured to: determine the similarity between the standard feature and the standard feature of the first face sample image, where the first face template image is multiple face sample images When the similarity is greater than the first threshold, the first face sample image is the target of the face image retrieval.
  • the interface module mentioned in the above second aspect may be a receiving interface, a receiving circuit or a receiver, etc.; the feature extraction module, feature mapping module, face retrieval module, mapping model training module, and target feature acquisition module may be one or more processor.
  • this application provides a face retrieval device, which may include a processor and a communication interface, and the processor may be used to support the face retrieval device to implement the first aspect or any possible implementation manner of the first aspect.
  • the processor can obtain the face image to be retrieved through the communication interface.
  • the face retrieval device may further include a memory, and the memory is used to store the computer-executed instructions and data necessary for the face retrieval device.
  • the processor executes the computer-executable instructions stored in the memory, so that the face retrieval device executes the aforementioned first aspect or any one of the possible implementation manners of the first aspect Face retrieval method.
  • the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the instructions are run on a computer, they are used to execute any of the face retrieval methods in the first aspect.
  • this application provides a computer program or computer program product.
  • the computer program or computer program product is executed on a computer, the computer realizes the face retrieval method of any one of the above-mentioned first aspects.
  • Figure 1 is a schematic flow diagram of a face retrieval method
  • Figure 2 is a schematic flow diagram of another face retrieval method
  • Figure 3 is a schematic flow diagram of another face retrieval method
  • FIG. 4 is a schematic diagram of facial features in an embodiment of the application.
  • FIG. 5 is a schematic flowchart of a face retrieval method in an embodiment of the application.
  • FIG. 6 is a schematic flowchart of a method for training a first feature mapping model in an embodiment of the application
  • FIG. 7 is a schematic flowchart of a method for training a feature mapping model in an embodiment of the application.
  • FIG. 8 is a schematic flowchart of a method for obtaining target features corresponding to a face sample image in an embodiment of the application
  • FIG. 9 is a schematic flowchart of a method for training a unique module and a shared module in an embodiment of the application.
  • FIG. 10 is a schematic flowchart of a method for performing feature mapping by a first feature mapping model in an embodiment of the application
  • FIG. 11 is a schematic diagram of a method flow of a training image branching module, a feature branching module, and a comprehensive module in an embodiment of the application;
  • FIG. 12 is a schematic structural diagram of a face retrieval device in an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of a face retrieval device in an embodiment of the application.
  • the corresponding device may include one or more units such as functional units to perform the described one or more method steps (for example, one unit performs one or more steps) , Or multiple units, each of which performs one or more of multiple steps), even if such one or more units are not explicitly described or illustrated in the drawings.
  • the corresponding method may include one step to perform the functionality of one or more units (for example, one step performs one or more units). The functionality, or multiple steps, each of which performs the functionality of one or more of the multiple units), even if such one or more steps are not explicitly described or illustrated in the drawings.
  • the embodiment of the present application provides a face retrieval method, which can be widely used in related scenarios such as identity recognition and identity verification.
  • the face retrieval system does not directly calculate the similarity between the face image to be retrieved and the face image in the face database, but represents all the images as features, and uses these features to calculate the similarity to each other , And then find the most similar face image or multiple face images.
  • Figure 1 is a schematic flow diagram of a face retrieval method.
  • the face retrieval system As shown in Figure 1, there are different feature extraction models A, B, and C in the face retrieval system, and face image 1 and face image 2 are input into this respectively.
  • Three feature extraction models are used to obtain features A1, B1, and C1 of face image 1, and features A2, B2, and C2 of face image 2.
  • the face retrieval system stitches the three features of the same face image, The final output feature is obtained, namely, the feature of face image 1 and the feature of face image 2.
  • the face retrieval system compares the feature of face image 1 with the feature of face image 2, thus completing the face retrieval.
  • the feature extraction models A, B, and C may be feature extraction models in different feature dimensions, or multiple feature extraction models with the same function in the same feature dimension.
  • a face retrieval method is also provided.
  • a feature mapping model is used, and the face retrieval system maps the features extracted from one feature extraction model (ie, source domain features) to another
  • a feature is extracted from the feature space corresponding to the model (ie, the feature space of the target domain), and feature comparison is performed in the feature space, thereby achieving mutual search between features, and then completing face retrieval.
  • Figure 2 is a schematic flow diagram of another face retrieval method.
  • feature A ie feature of face image 1
  • face image 2 is extracted through feature extraction Model B gets feature B
  • face image 2 gets feature C via feature extraction model C
  • feature B and feature C are respectively mapped to the feature space corresponding to model A via feature mapping model to get face image 2 feature 1 and face image 2 features 2.
  • it is compared with the feature of face image 1 (ie feature A).
  • Feature A is compared with feature 1 of face image 2 and feature 2 of face image 2 to form two sets of feature pairs, each The group feature pairs are compared in the feature space corresponding to model A, thereby realizing mutual search between features and completing face retrieval.
  • the feature A since all the features are mapped to the feature space corresponding to the feature extraction model A, the feature A does not need to be mapped and can be directly used as the feature of the face image 1 and the feature of the face image 2 for comparison.
  • each pair of features needs to train a feature mapping model.
  • the number of feature extraction models is n (n ⁇ 2)
  • the number of feature mapping models It can be as shown in the following formula (1):
  • FIG. 3 is another flow diagram of the face retrieval method in the embodiment of the application. As shown in FIG.
  • the face retrieval system inputs the face image 1 into the feature extraction model A to obtain the feature A, and the face retrieval system Input face image 2 into feature extraction model B to obtain feature B, and the face retrieval system inputs face image 3 into feature extraction model C to obtain feature C. Then, the face retrieval system maps feature B and feature C to the same feature space (Assuming the feature space corresponding to feature extraction model A) The features of face image 2 and face image 3 are obtained. In the embodiment of the present application, since all features are mapped to the feature space corresponding to feature extraction model A, the feature A does not need to be mapped, it can be directly used as the feature of face image 1 to compare with the feature of face image 2 and the feature of face image 3.
  • the feature of face image 2 and face image 3 Features can also be compared directly, thereby realizing mutual search between features and completing face retrieval.
  • the performance capabilities of feature extraction models in different scenarios are different.
  • the advantages of a single feature extraction model in each scenario are not outstanding It is a problem to choose which model corresponds to the feature space as the final mapped feature space; further, since all features are finally mapped to the feature space corresponding to a single feature extraction model, the comprehensive effect of multiple feature extraction models is not played. .
  • the embodiment of the present application provides a face retrieval method, which can be applied to the above-mentioned face retrieval system.
  • the face retrieval system can be set in such as security monitoring, Access control gates and other equipment.
  • FIG. 4 is a schematic diagram of the facial features in the embodiments of the present application.
  • the facial features in the face image can be divided into structured features and non-structured features.
  • Structured features where structured features can include features used to characterize face attributes, and face attributes can refer to some specific physical meanings of a face image, such as age, gender, angle, etc., which are extracted from the structured feature extraction model Extracted from a face image;
  • unstructured features can include vectors used to represent facial features.
  • These facial features can refer to features that have no specific physical meaning in the face image. They are composed of a string of numbers and can be It is called a feature vector, which is extracted from a face image through an unstructured feature extraction model.
  • the similarity between feature vectors can be used to represent the similarity between the face image to be retrieved and the face template image.
  • CNN convolutional neural networks
  • CNN is essentially a mapping from input to output, which can learn a large number of inputs The mapping relationship between input and output does not require any precise mathematical expressions between input and output. After the training samples are collected, the CNN is trained, and the CNN has the ability to map between input and output pairs.
  • the structured feature extraction model and the unstructured feature extraction model may also be other machine learning models, which are not specifically limited in the embodiment of the present application.
  • the "features" described in the following examples can be unstructured features, or can be spliced features after splicing structured features and unstructured features
  • the "feature extraction model” can be It is an unstructured feature extraction model, or it can be a model combination composed of a structured feature extraction model and an unstructured feature extraction model, which is not specifically limited in the embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a face retrieval method in an embodiment of this application. As shown in FIG. 5, the method may include:
  • the face image acquired by the face retrieval system may be an image directly captured by the retrieval system, such as an image taken by a camera of the face retrieval system; it may also be an image manually input by the user into the face retrieval system, such as The user needs to retrieve the target person and directly input the target person image into the face retrieval system; it can also be a person image in the face retrieval system gallery.
  • the face retrieval system receives the input face image to be retrieved.
  • the face retrieval system can also receive the input base image (that is, the face template image).
  • the face template image can be used to compare with the face image, realize the feature search of the face image, and complete the face retrieval of the face image.
  • S502 Input the face image into the first feature extraction model to obtain the first face feature
  • the face retrieval device may use a large number of face sample images to train the first feature extraction model, so that the face retrieval device can obtain the face image after inputting the face image into the first feature extraction model
  • the first facial feature may be an unstructured feature extraction model, or a model combination composed of a structured feature extraction model and an unstructured feature extraction model.
  • S503 Input the face image and the first face feature into the first feature mapping model for feature mapping, and obtain the standard feature corresponding to the output first face feature;
  • the face retrieval device may use the face sample image as the first feature extraction model to train a first feature mapping model in advance, the feature output dimension of the first feature extraction model and the features of the first feature mapping model The input dimensions are the same, and then, the face retrieval device can input the face image to be retrieved and its corresponding first face feature into the first feature mapping model for feature mapping to obtain the output standard features.
  • the first feature mapping model is obtained by training based on the face sample image and the target feature corresponding to the face sample image.
  • the target feature corresponding to the face sample image is obtained by splicing multiple face sample features
  • the face sample feature is obtained by feature extraction of the face sample image by multiple second feature extraction models.
  • the multiple second feature extraction models include a first feature extraction model; the multiple second feature extraction models have at least different training samples, model structures, training strategies, or feature dimensions.
  • the first feature extraction model and the second feature extraction model may be the same feature extraction model, or may be different feature extraction models.
  • the second feature extraction model can be the second feature extraction model of the manufacturer or the second feature extraction model from other different manufacturers; the second feature extraction model can include the first feature extraction model, that is, the second feature extraction model includes On the basis of the first feature extraction model, there may be other feature extraction models.
  • the training samples, model structure, training strategy, or feature dimension between the second feature extraction models of different manufacturers may also be different. For example, the feature dimension of the second feature extraction model of manufacturer A is 256, and the feature dimension of the second feature extraction model of manufacturer B is 512.
  • S504 Perform face retrieval on the face image according to the standard features.
  • the face retrieval device after the face retrieval device obtains the standard feature of the face image to be retrieved through S503, it can directly compare the standard feature with the standard features of multiple face template images to find the most similar feature , And then obtain one or more face template images, realize feature mutual search, and complete face retrieval.
  • the above multiple face template images can be input into the face retrieval device together with the face image to be retrieved, and S501 to S503 are executed in sequence to complete the extraction of facial features, and then compare them with the standard features of the face image to be retrieved. Perform comparison; or, input the face template image into the face retrieval device in advance to complete the extraction of face features, obtain the standard features corresponding to each face template image, and then store the standard features corresponding to the face template image for use After subsequently obtaining the standard features of the face image to be retrieved, the standard features corresponding to each face template image are read and compared to complete the face retrieval.
  • the face image to be retrieved and the face template image can also be subjected to feature extraction and feature comparison in other ways, as long as the face retrieval can be completed, which is not specifically limited in the embodiment of the present application.
  • the face recognition device retrieves the face image based on the standard features, it needs to compare the standard features with the standard features of multiple face sample images.
  • the standard features of the multiple face sample images can be provided by the manufacturer or other manufacturers. which provided. Set the standard feature of the face sample image currently compared with the standard feature as the standard feature of the first face sample image, and determine the similarity between the standard feature and the standard feature of the first face sample image. When the similarity is greater than the first When a threshold is set, the first face sample image is the target of face image retrieval.
  • This method can realize feature mutual search between different models of the same manufacturer, and feature mutual search between different manufacturers.
  • FIG. 6 is a schematic flowchart of a method for training a first feature mapping model in an embodiment of the application. Referring to FIG. 6, the method may include:
  • the face retrieval device obtains the input face sample image, and the face sample image may be one image or multiple images among a large number of face sample images.
  • S602 Input the face sample image into the first feature extraction model to obtain the output first face sample feature
  • the face retrieval device inputs the face sample image into the first feature extraction model to obtain the first face sample feature of the face sample image.
  • the first face sample feature may be an unstructured feature of the face sample image, or may be a spliced feature formed by splicing structured features and unstructured features of the face sample image.
  • the second feature mapping model corresponds to the first feature extraction model.
  • the face retrieval device may input the face sample image into multiple different feature extraction models to obtain multiple output face sample features, and then multiply each face sample feature by a corresponding preset coefficient , And stitch the multiplied face sample features. Finally, according to the stitched face sample features, the target feature corresponding to the face sample image is obtained, which can also be called the target standard feature.
  • the face retrieval device inputs the face sample image and its corresponding first face sample features into the second feature mapping model to obtain the output standard sample features, and adjusts the parameters of the second feature mapping model to make the standard sample features and The similarity between the target features is maximized.
  • the optimized objective function can be the cosine similarity between the standard sample feature and the target feature as large as possible.
  • the second feature mapping model is trained Then, the trained second feature model is the first feature mapping model.
  • the second feature mapping model may be a feature mapping model obtained by the manufacturer through sample training; it may also be a feature mapping model obtained through collaborative training of the manufacturer and other manufacturers. Specifically, only the manufacturer or the manufacturer With the face retrieval equipment of other manufacturers, the face sample image is input into multiple different feature extraction models to obtain multiple face sample features, and then each face sample feature is multiplied by the corresponding preset coefficient, and the The multiplied face sample features are stitched, and finally, the target features corresponding to the face sample image are obtained according to the stitched face sample features. Next, the face retrieval equipment of this manufacturer inputs the face sample image and its corresponding first face sample features into the second feature mapping model to obtain the output standard sample features.
  • the standard sample The similarity between the feature and the target feature is maximized (the target feature is only obtained by the manufacturer or the manufacturer in collaboration with other manufacturers).
  • the optimized objective function can be the cosine similarity between the standard sample feature and the target feature As large as possible, when the objective function converges, the second feature mapping model is trained, and the trained second feature model is the first feature mapping model.
  • the solution of the present invention can be used for feature mutual search between models of different versions of the same manufacturer.
  • vendor A has deployed a face retrieval system, and currently needs to upgrade the old feature extraction model to a new feature extraction model
  • the feature mutual search between the old and new models can be completed by the following operations: training and new and old feature extraction Model matching new and old feature mapping models; input the features extracted from the base library image and the old feature extraction model into the old feature mapping model to obtain the standard features of the base library image; input the image to be retrieved into the new feature extraction model to extract features ,
  • the features extracted by the new feature extraction model and the image to be retrieved are input into the new feature mapping model to obtain the standard features of the image to be retrieved; the standard features of the image to be retrieved are compared with the standard features of the base library image, and the base library The image with the higher the similarity ranks in the forefront.
  • the solution of the present invention can also be used for feature mutual search between models of the same manufacturer running on different devices.
  • the model of manufacturer A running on the central server is a large feature extraction model (the model structure is more complicated), and the model running on the camera is a small feature extraction model (the model structure is lighter)
  • the large and small models can be completed by the following operations Feature mutual search between: training large and small feature mapping models that match the large and small feature extraction models; images stored in the central server use the large feature extraction model to extract features, and the features and images are input into the large feature mapping model.
  • the solution of the invention can also be used for feature mutual search between models of different manufacturers.
  • the model of manufacturer A is feature extraction model A
  • the model of manufacturer B is feature extraction model B
  • the feature mutual search between models A and B can be completed by the following operations: training and feature extraction models A and B supporting features Mapping models A and B; the image assigned to vendor A uses feature extraction model A to extract features, and the feature and image are input into feature mapping model A to obtain the standard features of the image of vendor A; the image assigned to vendor B uses feature extraction model B extracts features, and inputs the feature and image into feature mapping model B to obtain the standard features of the image of manufacturer B; compare the standard features of the image of manufacturer A with the standard features of the image of manufacturer B to obtain their similarity, and use Search and sort.
  • multiple feature extraction models may be unstructured feature extraction models, or may be a model combination composed of a structured feature extraction model and an unstructured feature extraction model, which is not specifically limited in the embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for training a feature mapping model in an embodiment of the application.
  • the face retrieval device in the training phase, the face retrieval device first obtains the target feature A corresponding to the face sample image, and The face sample image is input to the feature extraction model, and the face sample feature is obtained from the feature extraction model.
  • the face retrieval device inputs the face sample image and the face sample feature into the feature mapping model corresponding to the feature extraction model to obtain the output standard Sample feature B, then, the face retrieval device calculates the similarity between the target feature A and the standard sample feature B, and adjusts the features according to the optimized objective function, that is, the cosine similarity between features A and B is as large as possible Mapping each parameter in the model until the objective function converges, thus completing the training of the feature mapping model.
  • the optimized objective function that is, the cosine similarity between features A and B is as large as possible Mapping each parameter in the model until the objective function converges, thus completing the training of the feature mapping model.
  • the cosine similarity of two features can be calculated by formula (2):
  • a i and B i above represent the components of features A and B, respectively, k is the number of components of A and B, and k is a positive integer.
  • the face retrieval device may obtain the target feature corresponding to the face sample image by the following method. First, the face retrieval device obtains a face sample image; optionally, the face retrieval device obtains an input face sample image, and the face sample image may be one image or multiple images among a large number of face sample images; Then, the face retrieval device inputs the face sample image into the N second feature extraction model to obtain the output N second face sample features; where N is a positive integer greater than or equal to 2; optionally, the face The retrieval device inputs the face sample image to N different second feature extraction models.
  • These second feature extraction models can be unstructured feature extraction models, or can be composed of structured feature extraction models and unstructured feature extraction models The combination of models is not specifically limited in the embodiment of this application.
  • the face retrieval device can obtain the output N second face sample features, and one second feature extraction model outputs one second face sample feature.
  • the face retrieval device multiplies the N second face sample features with N preset coefficients in a one-to-one correspondence; in this way, for the second feature extraction model with strong or weak capabilities, it is
  • the features extracted by different feature extraction models are assigned different coefficients, which can effectively play the role of matching each feature extraction model with its capabilities;
  • the face retrieval device splices the multiplied N second face sample features to obtain the spliced Face sample features;
  • the face retrieval device obtains the target feature corresponding to the face sample image according to the stitched face sample features, and the dimension of the target feature is less than or equal to the sum of the dimensions of the N second feature extraction models . It can be seen that in the process of calculating the target feature, the features corresponding to multiple feature extraction models are spliced, and the spliced feature is used as the basis for constructing the target feature, which can maximize the available information.
  • the face retrieval device can use, for example, principal component analysis (PCA) algorithms, linear discriminant analysis (LDA) algorithms, and self-encoding.
  • PCA principal component analysis
  • LDA linear discriminant analysis
  • the dimensionality reduction matrix is obtained by autoencoder (AE), etc.
  • AE autoencoder
  • the dimensionality reduction matrix can be multiplied to obtain the dimensionality reduction face sample feature, and the dimensionality reduction
  • the face sample feature is determined as the target feature corresponding to the face sample image.
  • the dimensionality reduction of the spliced features can reduce the feature comparison time on the one hand, thereby improving retrieval efficiency, and on the other hand, can remove redundant information and improve the robustness of standard features.
  • AE is a neural network that uses a backpropagation algorithm to make the output value equal to the input value. It includes an encoder and a decoder.
  • the encoder first compresses the input splicing feature into Latent space representation, the dimension of the latent space representation is lower than the input splicing feature, and then the decoder reconstructs the output through the latent space representation (the dimension is the same as the input splicing feature), and the output should be as close as possible to the input splicing feature.
  • the objective function can be set as the cosine similarity between the input splicing feature and the encoder output target feature.
  • the encoder After determining the form of the objective function, input the characteristics of different face samples, and calculate the gradient of the objective function with respect to the parameters in the encoder and decoder. Based on the gradient, the parameters in the encoder and decoder can be adjusted until the update After training, the change of the objective function is less than the set value (that is, the objective function has converged), so the parameters of the encoder and the decoder are determined. At this time, the encoder can use the parameters obtained by training to complete the dimensionality reduction function of the input splicing features.
  • FIG. 8 is a schematic flowchart of a method for obtaining target features corresponding to a face sample image in an embodiment of the application.
  • the foregoing preset coefficients can be obtained but not limited to the following methods.
  • the preset coefficients can be used as learnable parameters, and the preset coefficients are determined by training the face recognition model.
  • the face retrieval device can obtain the face sample image.
  • the face sample image has corresponding identity information; then, the face retrieval device inputs the face sample image into N second feature extraction models, Obtain the output N second face sample features; next, the face retrieval device performs face recognition on the face sample image according to the N second face sample features and identity information, and obtains N preset coefficients.
  • a face recognition model can be used, and the input data can be the second face sample features extracted by each second feature extraction model and the corresponding identity information; optimized objective function It can be that the face sample features of the same identity information are as close as possible, and the face sample features of different identity information are as far away as possible.
  • the face retrieval device After determining the form of the objective function, the face retrieval device inputs the features of the N second face samples and the corresponding identity information into the face recognition model, and calculates the objective function about the parameters in the face recognition model and the various predictions that need to be determined. Set the gradient of the coefficient.
  • the parameters of the face recognition model and the preset coefficients that need to be determined can be adjusted until the change of the updated objective function is less than the set value (that is, the objective function converges).
  • the value of the preset coefficient of is used as the final value, so that the aforementioned preset coefficient is obtained.
  • the objective function when obtaining the preset coefficients may be a triplet loss function as shown in formula (3):
  • M is the number of training samples, with Is the face sample image and its characteristics, with Is the face sample image and its features that are the same as the identity information of the face sample image, with Is the face sample image and its features that are different from the identity information of the face sample;
  • is the expected difference between the distance between the pair of positive samples and the distance between the pair of negative samples, when the distance between the pair of negative samples is greater than that of the positive sample
  • the loss function value of the triplet is 0, otherwise it is greater than 0.
  • the embodiment of the present application by minimizing the objective function, the goal of the feature of the same identity as close as possible and the feature of different identities as far as possible can be achieved. It should be noted that the embodiment of the present application does not limit the form of the objective function, and the objective function that can be used to train a single face recognition model can be used in the technical solutions described in the embodiment of the present application.
  • the preset coefficients may be pre-configured for the second feature extraction model. At this time, N preset coefficients with equal values are configured for the N second feature extraction models.
  • the preset coefficients can be configured to configure N preset coefficients for the N second feature extraction models according to a preset evaluation criterion. For example, using retrieval accuracy as the criterion, assuming that there are three second feature extraction models A, B, and C in the face retrieval device, the retrieval accuracy rates of these second feature extraction models are 0.98, 0.95, and 0.9, respectively.
  • the retrieval device may determine the preset coefficients of the second feature extraction models A, B, and C as 0.98, 0.95, and 0.9; for another example, the similarity between face sample images with the same identity information is used as the criterion, and the feature is assumed
  • the extraction model is based on a batch of identity information, and each identity information corresponds to multiple different face sample images.
  • the face retrieval device calculates two human face sample images extracted by the second feature extraction model the average degree of similarity between the characteristic S a, after obtaining the identity information corresponding to all S a, calculating the average value of the aS S a a, as in the second feature extraction model a, B, and C were 0.8 aS a, 0.6 and 0.5, the preset coefficients of the second feature extraction models A, B, and C are determined to be 0.8, 0.6, and 0.5.
  • the preset coefficients can be obtained based on hyperparameter search, and all methods used for hyperparameter search can be used for coefficient search.
  • the face retrieval device can obtain N coefficient combinations corresponding to the second feature extraction model within a preset coefficient range, and then, correspondingly multiply the N second face sample features and coefficient combinations, and then compare the corresponding coefficient combinations.
  • the multiplied N second face sample features are stitched to obtain the stitched face sample features; finally, the face retrieval device performs face retrieval on the face sample image according to the stitched face sample features to obtain coefficients
  • the preset coefficients in the combination that meet the preset conditions can be obtained based on hyperparameter search, and all methods used for hyperparameter search can be used for coefficient search.
  • the face retrieval device may also determine the preset coefficient by other methods, which are not specifically limited in the embodiment of the present application.
  • the face retrieval device may also perform joint training on the mapping of multiple face sample features.
  • the above-mentioned second feature mapping model includes a unique module and a shared module; assuming that a neural network model contains 7 layers, Among them, the first 4 layers can be unique modules, and the latter 3 layers can be shared modules. Shared modules and unique modules are actually neural network layers. The difference between them is that the parameters of the unique modules can be changed more flexibly to adapt to each The characteristics of the second face sample feature itself, and the parameters of the shared module need to process the input of multiple unique modules, and all the second face sample features are comprehensively used, and its parameters will be more restrictive in the training process. It can be seen that the unique module can learn the characteristics of the feature itself, and the shared module can learn the attributes shared by each model feature.
  • the above-mentioned S603 may include: inputting the face sample image and the multiple first face sample features into a unique module for feature mapping to obtain the output third face sample feature, and the multiple first face sample features are The face sample image is extracted through different multiple first feature extraction models; the third face sample feature is input to the sharing module to obtain multiple standard features corresponding to the first face sample feature; according to the face sample image, multiple The unique module and the shared module are trained to obtain the first feature mapping model by using the first face sample features, the standard features corresponding to the multiple first face sample features, and the target features corresponding to the face sample images.
  • FIG. 9 is a schematic flow diagram of the method for training the unique module and the sharing module in the embodiment of the application.
  • the face sample image passes through A , B
  • the second feature mapping model A's unique module A and the second mapping model B's unique module B get the third output Face sample features, and at the same time, use the original face sample image as the input of the unique module of the second feature mapping model A and B, and finally obtain the output standard features F A and F through the shared module of the mapping model A and B.
  • the training optimization goal is to be as close as possible to the target feature of the face sample image, that is, F A should be as similar to the target feature F as possible, and F B should be as similar to F as possible.
  • the second feature mapping model is trained, and the trained second feature model is the first feature mapping model.
  • FIG. 10 is a schematic flowchart of the method for feature mapping performed by the first feature mapping model in an embodiment of the application. See FIG. 10, each first feature mapping model Both can be used separately to map different facial features to standard features and used for face retrieval.
  • the above-mentioned second feature mapping model includes an image branching module, a feature branching module, and a synthesis module.
  • the image branching module may be a convolutional neural network
  • the feature branching module and the synthesis module may be a fully connected neural network.
  • the function of fully connected neural network and convolutional neural network is similar, the difference lies in the connection mode of neurons in the network.
  • the above S603 may include: inputting the face sample image into the image branching module to obtain the output fourth face sample feature; inputting the first face sample feature into the feature branching module to obtain the output fifth face sample feature,
  • the first face sample feature is extracted from the face sample image through the first feature extraction model; the fourth and fifth face sample features are input into the synthesis module together to obtain the standard feature corresponding to the first face sample feature; according to
  • the face sample image, the first face sample feature, the standard feature corresponding to the first face sample feature, and the target feature corresponding to the face sample image are trained on the image branching module, feature branching module and synthesis module to obtain the first feature Mapping model.
  • the face sample image is input to the first feature extraction model to obtain the output first face sample feature
  • the face retrieval device uses the face sample image and the first face sample feature as the input of the first feature mapping model at the same time .
  • the goal of optimization is to fit the target feature corresponding to the face sample image.
  • FIG. 11 is a schematic diagram of the method flow of the training image branching module, the feature branching module and the synthesis module in an embodiment of the application. Referring to FIG. 11, the face sample image obtains the first face sample feature through the first feature extraction model.
  • a face sample feature passes the feature branching module to obtain the outputted fifth face sample feature
  • the face sample image passes the image branching module to obtain the outputted fourth face sample feature
  • the fourth face sample feature and the fifth face
  • the sample features are input to the synthesis module together to obtain the output standard features, and the training optimization goal is to be as close as possible to the target features of the face sample image.
  • the second feature mapping model is trained, and the trained second feature model is the first feature mapping model. Further, after each first feature mapping model is trained, each first feature mapping model can be used to map the face features obtained by the corresponding first feature extraction model to standard features and used for face retrieval.
  • the target feature used when training the second feature mapping model may be obtained only by the manufacturer, or obtained by the manufacturer in collaboration with other manufacturers.
  • the face retrieval device can use the combined effects of multiple feature extraction models to select an appropriate feature space , Improve the accuracy of face retrieval.
  • each face image only needs to pass through a feature extraction model and a feature mapping model to obtain standard features, the calculation amount of the system will not increase exponentially with the number of models, reducing the amount of system calculation.
  • the feature mapping model and the feature extraction model are in one-to-one correspondence, the number of feature mapping models is the same as the number of feature extraction models, so that the face retrieval device does not need to train a huge number of feature mapping models, reducing the amount of system calculation.
  • an embodiment of the present application provides a face retrieval device.
  • the face retrieval device may be the face retrieval device in the face retrieval device described in the above embodiment or one of the face retrieval devices.
  • the chip or the system-on-chip may also be a functional module used to implement the methods described in the foregoing embodiments in the face retrieval device.
  • the face retrieval apparatus can implement the functions performed by the face retrieval devices in the foregoing embodiments, and the functions can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the aforementioned functions.
  • FIG. 12 is a schematic structural diagram of a face retrieval apparatus in an embodiment of this application. As shown in FIG.
  • the face retrieval apparatus 1200 includes: an interface module 1201 for obtaining The face image to be retrieved; the feature extraction module 1202, used to input the face image into the first feature extraction model to obtain the face features; the feature mapping module 1203, used to input the face image and the face features into the first feature map Model to obtain the standard features corresponding to the output face features.
  • the feature output dimension of the first feature extraction model is the same as the feature input dimension of the first feature mapping model.
  • the first feature mapping model is based on the target feature corresponding to the face image Obtained through training; face retrieval module 1204, used to perform face retrieval on face images according to standard features.
  • the above-mentioned device further includes: a mapping model training module for obtaining a face sample image; inputting the face sample image into the first feature extraction model to obtain the output first face sample feature; The face sample image, the first face sample feature, and the target feature corresponding to the face sample image are trained on the second feature mapping model to obtain the first feature mapping model.
  • the second feature mapping model corresponds to the first feature extraction model one-to-one .
  • the above-mentioned device further includes: a target feature acquisition module, which is used to acquire face sample images before the mapping model training module obtains the first feature mapping model that satisfies the objective function; and input the face sample images into N
  • the second feature extraction model obtains the output N second face sample features, where N is a positive integer greater than or equal to 2; the N second face sample features are multiplied by N preset coefficients in a one-to-one correspondence; The multiplied N second face sample features are stitched to obtain the stitched face sample features; according to the stitched face sample features, the target feature corresponding to the face sample image is obtained.
  • the target feature acquisition module is also used to acquire a sample face image, the sample face image has identity information; the sample face image is input into the N second feature extraction model to obtain the output Nth Two face sample features; according to the N second face sample features and identity information, face recognition is performed on the face sample image to obtain N preset coefficients.
  • the target feature acquisition module is specifically configured to configure N preset coefficients for the N second feature extraction models, and the N preset coefficients are equal; or, according to the preset evaluation criteria, the Nth The second feature extraction model is configured with N preset coefficients.
  • the target feature acquisition module is specifically configured to acquire N second feature extraction model corresponding coefficient combinations within a preset coefficient range; corresponding N second face sample features and coefficient combinations Multiply; stitch the multiplied N second face sample features to obtain the stitched face sample features; according to the stitched face sample features, perform face retrieval on the face sample image to obtain the coefficient combination
  • the target feature acquisition module is also used to reduce the dimensionality of the stitched face sample features; the dimensionality-reduced face sample feature is determined as the target feature corresponding to the face sample image.
  • the second feature mapping model includes a unique module and a shared module
  • the mapping model training module is also used to input the face sample image and multiple first face sample features into the unique module to obtain the outputted third face sample feature, and the multiple first face sample features are It is extracted from the face sample image through different multiple first feature extraction models; the third face sample feature is input into the sharing module to obtain the standard features corresponding to the multiple first face sample features; according to the face sample image, A plurality of first face sample features, standard features corresponding to the multiple first face sample features, and target features corresponding to the face sample image are trained on the unique module and the shared module to obtain the first feature mapping model.
  • the specific implementation process of the interface module 1201, the feature extraction module 1202, the feature mapping module 1203, the face retrieval module 1204, the mapping model training module, and the target feature acquisition module can refer to the detailed implementation of the embodiments in FIGS. Description, for the sake of brevity in the manual, I will not repeat it here.
  • the interface module 1201 can be used to perform S501 in the above embodiment
  • the feature extraction module 1202 can be used to perform S502 in the above embodiment
  • the feature mapping module 1203 can be used to perform S503 in the above embodiment.
  • the face retrieval module 1203 can be used to execute S504 in the foregoing embodiment.
  • the interface module mentioned in the embodiment of this application may be a receiving interface, a receiving circuit or a receiver, etc.; the feature extraction module, feature mapping module, face retrieval module, mapping model training module, and target feature acquisition module may be one or more processor.
  • FIG. 13 is a schematic structural diagram of the face retrieval device in an embodiment of the application. See the solid line in FIG.
  • the device 1300 may include a processor 1301 and a communication interface 1302.
  • the processor 1301 may be used to support the face retrieval device 1300 to implement the functions involved in each of the above embodiments. For example, the processor 1301 may obtain the to-be-retrieved information through the communication interface 1302. Face image.
  • the face retrieval device 1300 may further include a memory 1303 and a memory 1303 for storing computer-executed instructions and data necessary for the face retrieval device 1300.
  • the processor 1301 executes the computer-executable instructions stored in the memory 1303, so that the face retrieval device 1300 executes the face retrieval method described in each of the foregoing embodiments.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores instructions. When the instructions run on a computer, they are used to execute the human Face retrieval method.
  • embodiments of the present application provide a computer program or computer program product.
  • the computer program or computer program product is executed on a computer, the computer can realize the face retrieval described in each of the above embodiments. method.
  • the computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) .
  • computer-readable media may generally correspond to (1) non-transitory tangible computer-readable storage media, or (2) communication media, such as signals or carrier waves.
  • Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source
  • coaxial cable , Fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and optical discs include compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), and Blu-ray discs. Disks usually reproduce data magnetically, while discs use lasers to reproduce data optically. data. Combinations of the above should also be included in the scope of computer-readable media.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • processor may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • the term "processor” as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec.
  • the technology can be fully implemented in one or more circuits or logic elements.
  • the technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets).
  • ICs integrated circuits
  • a set of ICs for example, chipsets.
  • Various components, modules, or units are described in this application to emphasize the functional aspects of the device for performing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • various units can be combined with appropriate software and/or firmware in the codec hardware unit, or by interoperating hardware units (including one or more processors as described above). provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

一种人脸检索方法及装置,涉及人工智能和计算机视觉领域。该方法包括:获取待检索的人脸图像(S501);将人脸图像输入第一特征提取模型,得到第一人脸特征(S502);将人脸图像和第一人脸特征输入第一特征映射模型进行特征映射,得到输出的第一人脸特征对应的标准特征(S503);根据标准特征,对人脸图像进行人脸检索(S504)。通过将多个特征提取模型所提取的特征进行拼接,并将拼接后的特征作为标准特征的构建依据,使得人脸检索***能够利用多个特征提取模型的综合作用,选取适当的特征空间,提高人脸检索的准确率。

Description

一种人脸检索方法及装置
本申请要求于2019年08月15日提交中国专利局、申请号为201910755045.2、申请名称为“一种人脸检索方法及装置”和于2019年11月08日提交中国专利局、申请号为201911089829.2、申请名称为“一种人脸检索方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能和计算机视觉领域,特别涉及一种人脸检索方法及装置。
背景技术
随着科技的发展,人脸检索是一项融合了计算机图像处理知识以及生物统计学知识的新兴生物识别技术。目前人脸检索被广泛应用于身份识别、身份验证等相关场景(例如安防监控和门禁闸机等)。
在人脸检索技术中,通常是给定一张待检索的人脸图像,人脸检索***将其与指定人脸库中的多个人脸图像进行比对,找出最相似的一张人脸图像或多张人脸图像。人脸检索***并不直接计算待检索的人脸图像与人脸库中的人脸图像之间的相似度,而是将所有图像都表示成特征,并利用这些特征来计算与彼此的相似度,进而找出最相似的一张人脸图像或多张人脸图像。通常,不同特征提取模型能够提取出不同的人脸特征,但是来自不同特征提取模型的人脸特征是不能够直接进行比对的,为了使得来自不同特征提取模型的人脸特征能够直接比对,需要将所有特征提取模型提取出的人脸特征映射至同一个特征空间,并在该特征空间内进行特征比对,此时,选取怎样的特征空间来进行特征映射成为了一个问题,并且由于不同的人脸特征最终均映射至同一特征空间,如何发挥多特征提取模型的综合作用也是一个问题。
发明内容
本申请提供了一种人脸检索方法及装置,以利用多个特征提取模型的综合作用,选取适当的特征空间,提高人脸检索的准确率。
第一方面,本申请提供一种人脸检索方法,该方法可以应用于如身份识别、身份验证等相关场景中。上述人脸检索方法可以包括:获取待检索的人脸图像,待检索的人脸图像可以是摄像头拍摄的图像或者用户手动上传的图像;通过第一特征提取模型对人脸图像进行特征提取,得到第一人脸特征;将人脸图像和第一人脸特征输入第一特征映射模型,得到输出的第一人脸特征对应的标准特征,第一特征映射模型是根据人脸样本图像对应的目标特征训练得到的;所述第一特征提取模型的特征输出维数和所述第一特征映射模型的特征输入维数相同;根据标准特征,对人脸图像进行人脸检索。
在本申请中,人脸图像中的第一人脸特征可以分为结构化特征和非结构化特征,其中,结构化特征可以包括用于表征人脸属性的特征,人脸属性可以指人脸图像的一些具体的物理含义,例如年龄、性别和/或角度等,是通过结构化特征提取模型从人脸图像中提取出的;而非结构化特征可以包括用于表示人脸特征的向量,该人脸特征可以指人脸图像中没有具体物理含义的特征,由一串数字组成,又可以被称为特征向量,是通过非结构化特征提取模型从人脸图像中提取出的,特征向量之间的相似度可以用来代表待检索的人脸图像与人脸模板图像之间的相似度。
在本申请中,通过将人脸特征和人脸图像共同作为特征映射模型的输入,在仅使用人脸特征难以获取适当的标准特征时,通过人脸图像提供的额外信息获取更适当的标准特征,提高人脸检索的准确率。
在本申请中,通过将多个特征提取模型所提取的特征进行拼接,并将拼接后的特征作为标准特征的构建依据,使得人脸检索***能够利用多个特征提取模型的综合作用,选取适当的特征空间,提高人脸检索的准确率。进一步地,由于每幅人脸图像只需经过一个特征提取模型和一个特征映射模型来获得标准特征,使得***的计算量并不会随着模型的数目成倍增加,减少***计算量。进一步地,由于特征映射模型与特征提取模型是一一对应的,特征映射模型数目与特征提取模型的数目一致,使得人脸检索***无需训练数量巨大的特征映射模型,减少***计算量。
基于第一方面,在一些可能的实施方式中,人脸样本图像对应的目标特征是由多个人脸样本特征拼接得到的,人脸样本特征是由多个第二特征提取模型对人脸样本图像进行特征提取得到的;多个第二特征提取模型包括第一特征提取模型;多个第二特征提取模型至少具有不同的训练样本、模型结构、训练策略或特征维数不同。
在本申请中,多个第二特征提取模型可以是本厂家的第二特征提取模型也可以是来自其他不同厂家的第二特征提取模型;多个第二特征提取模型可以包括第一特征提取模型,即第二特征提取模型在包括第一特征提取模型的基础上,还可能有其他特征提取模型;多个第二特征提取模型之间的训练样本、模型结构、训练策略或特征维数不同。
基于第一方面,在一些可能的实施方式中,上述方法还包括:获取人脸样本图像;将人脸样本图像输入第一特征提取模型,得到输出的第一人脸样本特征;根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型,第二特征映射模型与第一特征提取模型对应。
在本申请中,第二特征映射模型可以是本厂家通过样本训练得到的特征映射模型;也可以是本厂家和其他厂家协作训练得到特征映射模型,具体而言,第二特征映射模型的输入为人脸图像和第一人脸特征,第二特征映射模型的输出为标准特征,训练的优化目的为标准特征尽可能地拟合目标特征,而目标特征由多个人脸样本特征拼接得到的,多个人脸样本特征是由来自本厂家的第一特征提取模型和来自本厂家或其他不同厂家的多个第二特征提取模型对人脸样本图像进行特征提取得到的,目标特征由本厂家和其他厂家各自提取的多个人脸样本特征拼接而成,在对第二特征映射模型进行训练时,需要使得第二特征映射模型的输出尽可能地拟合目标特征,第二特征映射模型训练好后即为第一特征映射模型;根据标准特征,对人脸图像进行人脸检索。
基于第一方面,在一些可能的实施方式中,在根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到满足目标函数的第一特征映射模型之前,上述方法还包括:获取人脸样本图像;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征,N为大于或者等于2的正整数;将N个第二人脸样本特征与N个预设系数一一对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据拼接后的人脸样本特征,获取人脸样本图像对应的目标特征,其中,所述目标特征的维数小于或等于所述N个第二特征提取模型的维数之和。
基于第一方面,在一些可能的实施方式中,上述方法还包括:获取人脸样本图像,人脸样本图像具有身份信息;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征;根据N个第二人脸样本特征和身份信息,对人脸样本图像进行人脸识别,得到N个预设系数。
基于第一方面,在一些可能的实施方式中,上述方法还包括:为N个第二特征提取模型配置N个预设系数,N个预设系数相等;或,根据预设评判准则,为N个第二特征提取模型配置N个预设系数。
基于第一方面,在一些可能的实施方式中,上述方法还包括:在预设的系数范围内,获取N个第二特征提取模型对应系数组合;将N个第二人脸样本特征与系数组合对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据拼接后的人脸样本特征,对人脸样本图像进行人脸检索,得到系数组合中满足预设条件的预设系数。
基于第一方面,在一些可能的实施方式中,根据拼接后的人脸样本特征,获取人脸样本图像对应的目标特征,包括:对拼接后的人脸样本特征进行降维;将降维后的人脸样本特征确定为人脸样本图像对应的目标特征。
基于第一方面,在一些可能的实施方式中,第二特征映射模型包括独有模块和共享模块;
相应地,根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型,包括:将人脸样本图像和多个第一人脸样本特征输入独有模块,得到输出后的第三人脸样本特征,多个第一人脸样本特征是由人脸样本图像通过不同的多个第一特征提取模型提取得到的;将第三人脸样本特征输入共享模块,得到多个第一人脸样本特征对应的标准特征;根据人脸样本图像、多个第一人脸样本特征、多个第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对独有模块和共享模块进行训练,得到第一特征映射模型。
基于第一方面,在一些可能的实施方式中,第二特征映射模型包括图像分支模块、特征分支模块和综合模块;
相应地,根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型,包括:将人脸样本图像输入图像分支模块,得到输出后的第四人脸样本特征;将第一人脸样本特征输入特征分支模块,得到输出后的第五人脸样本特征,第一人脸样本特征是由人脸样本图像通 过第一特征提取模型提取得到的;将第四人脸样本特征和第五人脸样本特征共同输入综合模块,得到第一人脸样本特征对应的标准特征;根据人脸样本图像、第一人脸样本特征、第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对图像分支模块、特征分支模块和综合模块进行训练,得到第一特征映射模型。
基于第一方面,在一些可能的实施方式中,根据所述标准特征,对人脸图像进行人脸检索,包括:确定标准特征与第一人脸样本图像的标准特征的相似度,第一人脸样板图像是多个人脸样本图像中的任一人脸样本图像;当相似度大于第一阈值时,所述第一人脸样本图像为人脸图像检索的目标。
第二方面,本申请提供一种人脸检索装置,包括:接口模块,用于获取待检索的人脸图像,待检索的人脸图像可以是摄像头拍摄的图像或者用户手动上传的图像;特征提取模块,用于通过第一特征提取模型对人脸图像进行特征提取,得到第一人脸特征;特征映射模块,用于将人脸图像和第一人脸特征输入第一特征映射模型,得到输出的第一人脸特征对应的标准特征,所述第一特征提取模型的特征输出维数和所述第一特征映射模型的特征输入维数相同,第一特征映射模型是根据人脸样本图像对应的目标特征训练得到的;人脸检索模块,用于根据标准特征,对人脸图像进行人脸检索。
基于第二方面,在一些可能的实施方式中,人脸样本图像对应的目标特征是由多个人脸样本特征拼接得到的,人脸样本特征是由多个第二特征提取模型对人脸样本图像进行特征提取得到的。
基于第二方面,在一些可能的实施方式中,上述装置还包括:映射模型训练模块,用于获取人脸样本图像;将人脸样本图像输入第一特征提取模型,得到输出的第一人脸样本特征;根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型,第二特征映射模型与第一特征提取模型一一对应。基于第二方面,在一些可能的实施方式中,上述装置还包括:目标特征获取模块,用于映射模型训练模块得到满足目标函数的第一特征映射模型之前,获取人脸样本图像;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征,N为大于或者等于2的正整数;将N个第二人脸样本特征与N个预设系数一一对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据拼接后的人脸样本特征,获取人脸样本图像对应的目标特征,其中,所述目标特征的维数小于或等于所述N个第二特征提取模型的维数之和。
基于第二方面,在一些可能的实施方式中,目标特征获取模块,还用于获取人脸样本图像,人脸样本图像具有身份信息;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征;根据N个第二人脸样本特征和身份信息,对人脸样本图像进行人脸识别,得到N个预设系数。
基于第二方面,在一些可能的实施方式中,目标特征获取模块,具体用于为N个第二特征提取模型配置N个预设系数,N个预设系数相等;或,根据预设评判准则,为N个第二特征提取模型配置N个预设系数。
基于第二方面,在一些可能的实施方式中,目标特征获取模块,具体用于在预设的系数范围内,获取N个第二特征提取模型对应系数组合;将N个第二人脸样本特征与系数组合对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人 脸样本特征;根据拼接后的人脸样本特征,对人脸样本图像进行人脸检索,得到系数组合中满足预设条件的预设系数。
基于第二方面,在一些可能的实施方式中,目标特征获取模块,还用于对拼接后的人脸样本特征进行降维;将降维后的人脸样本特征确定为人脸样本图像对应的目标特征。
基于第二方面,在一些可能的实施方式中,第二特征映射模型包括独有模块和共享模块;
相应地,映射模型训练模块,还用于将人脸样本图像和多个第一人脸样本特征输入独有模块,得到输出后的第三人脸样本特征,多个第一人脸样本特征是由人脸样本图像通过不同的多个第一特征提取模型提取得到的;将第三人脸样本特征输入共享模块,得到多个第一人脸样本特征对应的标准特征;根据人脸样本图像、多个第一人脸样本特征、多个第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对独有模块和共享模块进行训练,得到第一特征映射模型。
基于第二方面,在一些可能的实施方式中,第二特征映射模型包括图像分支模块、特征分支模块和综合模块;
相应地,映射模型训练模块,还用于将人脸样本图像输入图像分支模块,得到输出后的第四人脸样本特征;将第一人脸样本特征输入特征分支模块,得到输出后的第五人脸样本特征,第一人脸样本特征是由人脸样本图像通过第一特征提取模型提取得到的;将第四人脸样本特征和第五人脸样本特征共同输入综合模块,得到第一人脸样本特征对应的标准特征;根据人脸样本图像、第一人脸样本特征、第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对图像分支模块、特征分支模块和综合模块进行训练,得到第一特征映射模型。
基于第二方面,在一些可能的实施方式中,人脸检索模块具体用于:确定标准特征与第一人脸样本图像的标准特征的相似度,第一人脸样板图像是多个人脸样本图像中的任一人脸样本图像;当相似度大于第一阈值时,第一人脸样本图像为所述人脸图像检索的目标。
上述第二方面中提到的接口模块可以为接收接口、接收电路或者接收器等;特征提取模块、特征映射模块、人脸检索模块、映射模型训练模块以及目标特征获取模块可以为一个或者多个处理器。
第三方面,本申请提供一种人脸检索设备,可以包括:处理器和通信接口,处理器可以用于支持人脸检索设备实现上述第一方面或者第一方面的任一种可能的实施方式中所涉及的功能,例如:处理器可以通过通信接口获取待检索的人脸图像。
基于第三方面,在一些可能的实施方式中,人脸检索设备还可以包括存储器,存储器,用于保存人脸检索设备必要的计算机执行指令和数据。当该人脸检索设备运行时,该处理器执行该存储器存储的该计算机执行指令,以使该人脸检索设备执行如上述第一方面或者第一方面的任一种可能的实施方式所述的人脸检索方法。
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有指令,当指令在计算机上运行时,用于执行上述第一方面中任一的人脸检索方法。
第五方面,本申请提供一种计算机程序或计算机程序产品,当计算机程序或计算 机程序产品在计算机上被执行时,使得计算机实现上述第一方面中任一的人脸检索方法。
应当理解的是,本申请的第二至五方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1为一种人脸检索方法的流程示意图;
图2为另一种人脸检索方法的流程示意图;
图3为又一种人脸检索方法的流程示意图;
图4为本申请实施例中的人脸特征的示意图;
图5为本申请实施例中的人脸检索方法的流程示意图;
图6为本申请实施例中的训练第一特征映射模型的方法流程示意图;
图7为本申请实施例中的训练特征映射模型的方法流程示意图;
图8为本申请实施例中的获取人脸样本图像对应的目标特征的方法流程示意图;
图9为本申请实施例中的训练独有模块和共享模块的方法流程示意图;
图10为本申请实施例中的第一特征映射模型进行特征映射的方法流程示意图;
图11为本申请实施例中的训练图像分支模块、特征分支模块和综合模块的方法流程示意图;
图12为本申请实施例中的人脸检索装置的结构示意图;
图13为本申请实施例中的人脸检索设备的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。以下描述中,参考形成本申请一部分并以说明之方式示出本申请实施例的具体方面或可使用本申请实施例的具体方面的附图。应理解,本申请实施例可在其它方面中使用,并可包括附图中未描绘的结构或逻辑变化。例如,应理解,结合所描述方法的揭示内容可以同样适用于用于执行所述方法的对应设备或***,且反之亦然。例如,如果描述一个或多个具体方法步骤,则对应的设备可以包含如功能单元等一个或多个单元,来执行所描述的一个或多个方法步骤(例如,一个单元执行一个或多个步骤,或多个单元,其中每个都执行多个步骤中的一个或多个),即使附图中未明确描述或说明这种一个或多个单元。另一方面,例如,如果基于如功能单元等一个或多个单元描述具体装置,则对应的方法可以包含一个步骤来执行一个或多个单元的功能性(例如,一个步骤执行一个或多个单元的功能性,或多个步骤,其中每个执行多个单元中一个或多个单元的功能性),即使附图中未明确描述或说明这种一个或多个步骤。进一步,应理解的是,除非另外明确提出,本文中所描述的各示例性实施例和/或方面的特征可以相互组合。
本申请实施例提供一种人脸检索方法,该方法可以广泛的应用于身份识别、身份验证等相关场景中。人脸检索***并不直接计算待检索的人脸图像与人脸库中的人脸图像之间 的相似度,而是将所有图像都表示成特征,并利用这些特征来计算与彼此的相似度,进而找出最相似的一张人脸图像或多张人脸图像。通常,人脸检索***中存在多个特征提取模型,所有人脸图像需要利用所有特征提取模型来提取特征,并将同一人脸图像所提取出的特征进行拼接或者利用降维方法压缩维度,以得到最终的输出特征。图1为一种人脸检索方法的流程示意图,参见图1所示,在人脸检索***中存在不同的特征提取模型A、B和C,将人脸图像1和人脸图像2分别输入这三个特征提取模型,以得到人脸图像1的特征A1、B1和C1以及人脸图像2的特征A2、B2和C2,然后,人脸检索***将同一人脸图像的三个特征进行拼接,得到最终的输出特征,即人脸图像1特征和人脸图像2特征,最后,人脸检索***将人脸图像1特征和人脸图像2特征进行比对,由此,完成人脸检索。但是,由于每幅人脸图像都需要经过所有的特征提取模型,这使得***的计算量随着特征提取模型的数目成倍增加。可选的,特征提取模型A、B和C可以为不同特征维度下的特征提取模型,也可以为同一特征维度下的多个功能相同的特征提取模型。
在本申请实施例中,还提供了一种人脸检索方法,在该方法中,采用了特征映射模型,人脸检索***将一个特征提取模型提取出的特征(即源域特征)映射至另一个特征提取模型对应的特征空间(即目标域特征空间),并在该特征空间内进行特征比对,由此,实现特征之间的互搜,进而完成人脸检索。例如,图2为另一种人脸检索方法的流程示意图,参见图2所示,人脸图像1经特征提取模型A得到特征A(即人脸图像1特征),人脸图像2经特征提取模型B得到特征B,人脸图像2经特征提取模型C得到特征C,特征B和特征C分别经特征映射模型映射至模型A对应的特征空间得到人脸图像2特征1和人脸图像2特征2,并在模型A对应的特征空间内与人脸图像1特征(即特征A)进行比对,特征A分别与人脸图像2特征1和人脸图像2特征2组成两组特征对,每组特征对在模型A对应的特征空间中进行比对,由此,实现特征之间的互搜,进而完成人脸检索。在本申请实施例中,由于所有特征均映射至特征提取模型A对应的特征空间,所以,特征A不用进行映射,可以直接作为人脸图像1特征与人脸图像2特征进行比对。为了实现特征之间的比对,每一对特征之间都需要训练一个特征映射模型,当特征提取模型数目为n(n≥2)时,特征映射模型的数目
Figure PCTCN2020100547-appb-000001
可以如以下公式(1)所示:
Figure PCTCN2020100547-appb-000002
可见,人脸检索***中需要训练数量巨大的特征映射模型。进一步地,由于特征映射模型只将源域特征作为输入,在源域特征表达能力不佳的情况下,不能保证特征映射的效果。
进一步地,在本申请实施例中,又提供了一种人脸检索方法,在该方法中,由于不同特征提取模型能够提取出不同的人脸特征,但是不同的人脸特征是不能够直接进行比对的,为了使得不同的人脸特征能够直接比对,人脸检索***需要将所有提取出的人脸特征映射至同一个特征空间,并在该特征空间内进行特征比对,由此,实现特征之间的互搜,进而完成人脸检索。例如,图3为本申请实施例中的人脸检索方法的又一种流程示意图,参见图3所示,人脸检索***将人脸图像1输入特征提取模型A得到特征A,人脸检索***将人脸图像2输入特征提取模型B得到特征B,人脸检索***将人脸图像3输入特征提取模型C得到特征C,然后,人脸检索***再将特征B和特征C映射至同一特征空间(假设 为特征提取模型A对应的特征空间)得到人脸图像2特征和人脸图像3特征,在本申请实施例中,由于所有特征均映射至特征提取模型A对应的特征空间,所以,特征A不用进行映射,可以直接作为人脸图像1特征与人脸图像2特征和人脸图像3特征进行比对,在特征提取模型A对应的特征空间内,人脸图像2特征和人脸图像3特征也可以直接比对,由此,实现特征之间的互搜,进而完成人脸检索。但是,对于特征提取模型A、B和C来说,在不同的场景下特征提取模型的表现能力是不一样的,在单个特征提取模型在各个场景下所呈现出的优势并不突出的情况下,究竟选择哪个模型对应的特征空间作为最终映射的特征空间是一个问题;进一步地,由于所有特征最终映射至单个特征提取模型对应的特征空间,如此,并没有发挥多个特征提取模型的综合作用。
为了解决上述问题,本申请实施例提供一种人脸检索方法,该人脸检索方法可以应用于上述人脸检索***中,在本申请实施例中,人脸检索***可以设置于如安防监控、门禁闸机等设备上。
需要说明的是,在本申请实施例中,图4为本申请实施例中的人脸特征的示意图,参见图4所示,人脸图像中的人脸特征可以分为结构化特征和非结构化特征,其中,结构化特征可以包括用于表征人脸属性的特征,人脸属性可以指人脸图像的一些具体的物理含义,例如年龄、性别、角度等,是通过结构化特征提取模型从人脸图像中提取出的;而非结构化特征可以包括用于表示人脸特征的向量,这些人脸特征可以指人脸图像中没有具体物理含义的特征,由一串数字组成,又可以被称为特征向量,是通过非结构化特征提取模型从人脸图像中提取出的,特征向量之间的相似度可以用来代表待检索的人脸图像与人脸模板图像之间的相似度。
上述结构化特征提取模型和非结构化特征提取模型均为机器学习模型(例如卷积神经网络(convolutional neural networks,CNN)。CNN本质上是一种输入到输出的映射,它能够学习大量的输入与输出之间的映射关系,而不需要任何输入和输出之间的精确的数学表达式,在收集好训练样本后,对CNN加以训练,CNN就具有输入输出对之间的映射能力。当然,结构化特征提取模型和非结构化特征提取模型还可以为其他的机器学习模型,本申请实施例不做具体限定。
在一些可能的实施方式中,在下述实施例中所述的“特征”可以为非结构化特征,也可以为结构化特征与非结构化特征进行拼接后的拼接特征,“特征提取模型”可以为非结构化特征提取模型,也可以为由结构化特征提取模型和非结构化特征提取模型组成的模型组合,对此,本申请实施例不做具体限定。
图5为本申请实施例中的人脸检索方法的流程示意图,参见图5所示,该方法可以包括:
S501:获取待检索的人脸图像;
在本申请实施例中,人脸检索***获取的人脸图像可以是检索***直接捕获的图像,如人脸检索***的摄像头拍摄的图像;也可以是用户手动输入人脸检索***的图像,如用户需要检索目标人物,直接将目标人物图像输入人脸检索***;还可以是在人脸检索***图库中的某一人物图像。
在本申请实施例中,人脸检索***接收输入的待检索的人脸图像。可选的,人脸 检索***还可以接收输入的底库图像(也就是人脸模板图像)。人脸模板图像可以用于与人脸图像进行比对,实现人脸图特征互搜,完成对人脸图像的人脸检索。
S502:将人脸图像输入第一特征提取模型,得到第一人脸特征;
在本申请实施例中,人脸检索设备可以使用大量的人脸样本图像对第一特征提取模型进行训练,使得人脸检索设备将人脸图像输入到第一特征提取模型后能够得到人脸图像的第一人脸特征。可选的,第一特征提取模型可以为非结构化特征提取模型,也可以为结构化特征提取模型和非结构化特征提取模型组成的模型组合。
S503:将人脸图像和第一人脸特征输入第一特征映射模型进行特征映射,得到输出的第一人脸特征对应的标准特征;
在本申请实施例中,人脸检索设备可以预先使用人脸样本图像为第一特征提取模型训练一个第一特征映射模型,第一特征提取模型的特征输出维数和第一特征映射模型的特征输入维数相同,然后,人脸检索设备可以将待检索的人脸图像和其对应的第一人脸特征输入第一特征映射模型进行特征映射,得到输出的标准特征。
需要说明的是,第一特征映射模型是根据人脸样本图像及人脸样本图像对应的目标特征训练得到的。可选的,人脸样本图像对应的目标特征是由多个人脸样本特征拼接得到的,人脸样本特征是由多个第二特征提取模型对人脸样本图像进行特征提取得到的。多个第二特征提取模型包括第一特征提取模型;多个第二特征提取模型至少具有不同的训练样本、模型结构、训练策略或特征维数不同。
作为一种可能的实施方式,第一特征提取模型与第二特征提取模型可以为相同的特征提取模型,也可以为不同的特征提取模型。第二特征提取模型可以是本厂家的第二特征提取模型也可以是来自其他不同厂家的第二特征提取模型;第二特征提取模型可以包括第一特征提取模型,即第二特征提取模型在包括第一特征提取模型的基础上,还可能有其他特征提取模型。不同厂家的第二特征提取模型之间的训练样本、模型结构、训练策略或特征维数也可以不同。例如,厂家A的第二特征提取模型的特征维数为256,厂家B的第二特征提取模型的特征维数为512。
S504:根据标准特征,对人脸图像进行人脸检索。
在本申请实施例中,人脸检索设备通过S503得到待检索的人脸图像的标准特征之后,可以直接将该标准特征与多个人脸模板图像的标准特征进行直接比对,找到最为相似的特征,进而获得一张或者多张人脸模板图像,实现特征互搜,完成人脸检索。
需要说明的是,上述多个人脸模板图像可以与待检索的人脸图像一同输入人脸检索设备,依次执行S501至S503,完成人脸特征的提取,进而与待检索的人脸图像的标准特征进行比对;或者,人脸模板图像预先输入人脸检索设备,完成人脸特征的提取,获得各个人脸模板图像对应的标准特征,然后将人脸模板图像对应的标准特征进行存储,以供后续获取待检索的人脸图像的标准特征后,读取各个人脸模板图像对应的标准特征并进行比对,进而完成人脸检索。当然,待检索的人脸图像与人脸模板图像还可以以其他方式进行特征提取和特征比对,只要能够完成人脸检索即可,本申请实施例不做具体限定。
下面对上述S504中,根据标准特征,对人脸图像进行人脸检索过程进行说明。
人脸识别装置再根据标准特征对人脸图像进行检索时,需要对比标准特征与多个 人脸样本图像的标准特征,该多个人脸样本图像的标准特征可以是本厂家提供的也可以是其他厂家提供的。将当前与标准特征进行对比的人脸样本图像的标准特征定为第一人脸样本图像的标准特征,确定标准特征与第一人脸样本图像的标准特征的相似度,当该相似度大于第一阈值时,第一人脸样本图像为人脸图像检索的目标。
该方法可以实现同厂家不同模型之间的特征互搜,以及不同厂家之间的特征互搜。
下面对上述S503中第一特征映射模型的训练过程进行说明。
图6为本申请实施例中的训练第一特征映射模型的方法流程示意图,参见图6所示,该方法可以包括:
S601:获取人脸样本图像;
在本申请实施例中,人脸检索设备获取输入的人脸样本图像,该人脸样本图像可以为大量的人脸样本图像中一幅图像或者多幅图像。
S602:将人脸样本图像输入第一特征提取模型,得到输出的第一人脸样本特征;
在本申请实施例中,人脸检索设备将人脸样本图像输入第一特征提取模型,得到人脸样本图像的第一人脸样本特征。其中,第一人脸样本特征可以为人脸样本图像的非结构化特征,也可以为由人脸样本图像的结构化特征和非结构化特征拼接后形成的拼接特征。
S603:根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型。
其中,第二特征映射模型与第一特征提取模型对应。
在本申请实施例中,人脸检索设备可以将人脸样本图像输入多个不同的特征提取模型,得到输出的多个人脸样本特征,进而将各个人脸样本特征与对应的预设系数相乘,并将相乘后的人脸样本特征进行拼接,最后,根据拼接后的人脸样本特征得到人脸样本图像对应的目标特征,也可以称为目标标准特征。接下来,人脸检索设备将人脸样本图像及其对应的第一人脸样本特征输入第二特征映射模型得到输出的标准样本特征,通过调整第二特征映射模型的参数,使得标准样本特征与目标特征之间的相似度最大化,此时,优化的目标函数可以为标准样本特征与目标特征之间的余弦相似度尽可能地大,当目标函数收敛时,第二特征映射模型就训练好了,训练好的第二特征模型即为第一特征映射模型。
在本申请实施例中,第二特征映射模型可以是本厂家通过样本训练得到的特征映射模型;也可以是本厂家和其他厂家协作训练得到特征映射模型,具体而言,仅由本厂家或者本厂家与其他厂家的人脸检索设备将人脸样本图像输入多个不同的特征提取模型,得到输出的多个人脸样本特征,进而将各个人脸样本特征与对应的预设系数相乘,并将相乘后的人脸样本特征进行拼接,最后,根据拼接后的人脸样本特征得到人脸样本图像对应的目标特征。接下来,本厂家人脸检索设备将人脸样本图像及其对应的第一人脸样本特征输入第二特征映射模型得到输出的标准样本特征,通过调整第二特征映射模型的参数,使得标准样本特征与目标特征之间的相似度最大化(目标特征是仅由本厂家或者本厂家与其他厂家协作得到的),此时,优化的目标函数可以为标准样本特征与目标特征之间的余弦相似度尽可能地大,当目标函数收敛时,第二特征映射模型就训练好了,训练好的第二特征模型即为第一特征映射模型。
本发明方案可用于同一厂商的不同版本的模型之间的特征互搜。例如,厂商A已经部署了一套人脸检索***,目前需要将旧特征提取模型升级为新特征提取模型,则可以通过以下操作完成新旧模型之间的特征互搜:训练与新、旧特征提取模型配套的新、旧特征映射模型;将底库图像和旧特征提取模型提取好的特征共同输入旧特征映射模型,获得底库图像的标准特征;将待检索的图像输入新特征提取模型提取特征,将新特征提取模型提取的特征与待检索的图像共同输入新特征映射模型,获得待检索图像的标准特征;将待检索图像的标准特征与底库图像的标准特征进行检索比对,底库中相似度越高的图像排在越前列。
本发明方案还可用于同一厂商的运行在不同设备上的模型之间的特征互搜。例如,厂商A运行在中心服务器的模型为大特征提取模型(模型结构更为复杂),运行在摄像头上的模型为小特征提取模型(模型结构更为轻巧),则可以通过以下操作完成大小模型之间的特征互搜:训练与大、小特征提取模型配套的大、小特征映射模型;存储在中心服务器的图像利用大特征提取模型提取特征,将该特征和图像共同输入大特征映射模型,获得中心服务器图像的标准特征;存储在摄像头上的图像利用小特征提取模型提取特征,将该特征和图像共同输入小特征映射模型,获得摄像头图像的标准特征;将中心服务器图像的标准特征与摄像头图像的标准特征进行比对计算相似度,并用于检索排序。
本发明方案还可用于不同厂商的模型之间的特征互搜。例如,厂商A的模型为特征提取模型A,厂商B的模型为特征提取模型B,则可以通过以下操作完成A、B模型之间的特征互搜:训练与特征提取模型A、B配套的特征映射模型A、B;分配给厂商A的图像利用特征提取模型A提取特征,将该特征和图像共同输入特征映射模型A,获得厂商A图像的标准特征;分配给厂商B的图像利用特征提取模型B提取特征,将该特征和图像共同输入特征映射模型B,获得厂商B图像的标准特征;将厂商A图像的标准特征与厂商B图像的标准特征进行比对,获得它们的相似度,并用于检索排序。
需要说明的是,上述多个特征提取模型可以为非结构化特征提取模型,也可以为结构化特征提取模型和非结构化特征提取模型组成的模型组合,本申请实施例不做具体限定。
举例来说,图7为本申请实施例中的训练特征映射模型的方法流程示意图,参见图7所示,在训练阶段,人脸检索设备先获取人脸样本图像对应的目标特征A,并且将人脸样本图像输入特征提取模型,由特征提取模型得到人脸样本特征,然后,人脸检索设备再将人脸样本图像和人脸样本特征输入特征提取模型对应的特征映射模型,得到输出的标准样本特征B,然后,人脸检索设备计算目标特征A和标准样本特征B之间的相似度,并根据优化目标函数,即特征A和B之间的余弦相似度尽可能地大,来调整特征映射模型中的各个参数,直至目标函数收敛,如此便完成对特征映射模型的训练。
在一些可能的实施方式中,两个特征的余弦相似度可以通过公式(2)计算:
Figure PCTCN2020100547-appb-000003
其中,上述A i和B i分别表示特征A和B的各个分量,k为A和B的分量数目,k为正整数。
当然,可选的,除了目标函数可以为余弦相似度之外,其他可以衡量特征之间相似度的函数(如欧几里得相似度、欧几里德距离等)都可以作为目标函数,本申请实施例不做具体限定。
在一些可能的实施方式中,人脸检索设备可以通过以下方法获取人脸样本图像对应的目标特征。首先,人脸检索设备获取人脸样本图像;可选的,人脸检索设备获取输入的人脸样本图像,该人脸样本图像可以为大量的人脸样本图像中一幅图像或者多幅图像;然后,人脸检索设备将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征;其中,N为大于或者等于2的正整数;可选的,人脸检索设备将人脸样本图像输入到N个不同的第二特征提取模型,这些第二特征提取模型可以为非结构化特征提取模型,也可以为结构化特征提取模型和非结构化特征提取模型组成的模型组合,本申请实施例不做具体限定。通过N个第二特征提取模型,人脸检索设备可以获得输出的N个第二人脸样本特征,一个第二特征提取模型输出一个第二人脸样本特征。接下来,人脸检索设备将N个第二人脸样本特征与N个预设系数一一对应地按位相乘;如此,对于本身能力有强弱的第二特征提取模型来说,为不同特征提取模型提取出的特征赋予不同的系数,可以有效发挥各特征提取模型与其能力匹配的作用;人脸检索设备将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;人脸检索设备根据拼接后的人脸样本特征,获取人脸样本图像对应的目标特征,目标特征的维数小于或等于所述N个第二特征提取模型的维数之和。可见,在计算目标特征的过程中,将多个特征提取模型对应的特征进行拼接,并将拼接后的特征作为目标特征的构建依据,能够最大化可利用信息。
在一些可选的实施方式中,在准备好一批数据后,人脸检索设备可以利用如主成分分析(principal component analysis,PCA)算法、线性判别分析(linear discriminant analysis,LDA)算法、自编码器(autoencoder,AE)等获得降维矩阵,然后,对任一拼接后的人脸样本特征,都可通过与降维矩阵相乘获得降维后的人脸样本特征,并将降维后的人脸样本特征确定为人脸样本图像对应的目标特征。在本申请实施例中,对拼接特征进行降维,一方面可以减少特征比对时间,从而提升检索效率,另一方面可以去除冗余信息,提升标准特征的鲁棒性。
需要说明的是,AE是一种利用反向传播算法使得输出值等于输入值的神经网络,包含编码器和解码器两部分,对本申请实施例而言,编码器先将输入的拼接特征压缩成潜在空间表征,潜在空间表征的维度低于输入拼接特征,然后解码器通过潜在空间表征来重构输出(维度与输入拼接特征相同),输出应与输入的拼接特征尽可能的近。为了达到这个目的,可以将目标函数设置为输入拼接特征和编码器输出目标特征的余弦相似度。在确定好目标函数的形式后,输入不同人脸样本特征,可以计算目标函数关于编码器和解码器中的参数的梯度,基于梯度可以对编码器和解码器中的参数进行调整,直至更新后训练后目标函数的变化小于设定值(即目标函数已收敛),如此编码器和解码器的参数都确定下来。此时,编码器利用训练获得的参数即可对输入的拼接特征完成降维的功能。
举例来说,图8为本申请实施例中的获取人脸样本图像对应的目标特征的方法流程示意图,参见图8所示,人脸检索设备中存在A、B和C三个第二特征提取模型, 各个特征提取模型对应的预设系数分别为:w a=0.3、w b=0.5、w c=0.2;输入的人脸样本图像经过第二特征提取模型A、B和C所获得的人脸样本特征分别为:F A=[0.04,…,0.08]、F B=[0.06,…,0.03]和F C=[0.05,…,0.05];人脸检索设备可以利用w a、w b和w c对各人脸样本特征F A、F B和F C对应地按位相乘,即利用预设系数对人脸样本特征进行加权,具体而言,F A逐位与w a相乘获得的WF A=[0.012,…,0.024],WF B逐位与w b相乘获得的WF B=[0.03,…,0.015],F C逐位与w c相乘获得的WF C=[0.01,…,0.01];接下来,人脸检索设备对经过WF A、WF B和WF C进行拼接,得到拼接后的人脸样本特征CWF=[0.012,…,0.024,0.03,…,0.015,0.01,…,0.01];最后,CWF经与降维矩阵相乘,得到目标特征SF=[0.03,…,0.07]。
在本申请实施例中,上述预设系数可以且不限于通过以下方式获得。
第一种方式,预设系数可以作为可学习的参数,通过训练人脸识别模型来确定预设系数。具体来说,首先,人脸检索设备可以获取人脸样本图像,此时,人脸样本图像具有对应的身份信息;然后,人脸检索设备将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征;接下来,人脸检索设备根据N个第二人脸样本特征和身份信息,对人脸样本图像进行人脸识别,得到N个预设系数。可选的,在获得预设系数的过程中,可以使用人脸识别模型,其输入数据可以为各个第二特征提取模型所提取的第二人脸样本特征以及对应的身份信息;优化的目标函数可以为同一身份信息的人脸样本特征尽可能的近,不同身份信息的人脸样本特征尽可能的远。在确定好目标函数的形式后,人脸检索设备将N个第二人脸样本特征和对应的身份信息输入人脸识别模型,计算目标函数关于人脸识别模型中的参数以及各个需要确定的预设系数的梯度,基于该梯度可以对人脸识别模型的参数以及需要确定的预设系数进行调整,直至更新后的目标函数的变化小于设定值(即目标函数收敛),将此时需要确定的预设系数的数值作为最终值,如此,便得到了上述预设系数。
在一些可能的实施方式中,获得预设系数时的目标函数可以为如公式(3)所示的三元组损失函数:
Figure PCTCN2020100547-appb-000004
其中,M为训练样本的个数,
Figure PCTCN2020100547-appb-000005
Figure PCTCN2020100547-appb-000006
为人脸样本图像及其特征,
Figure PCTCN2020100547-appb-000007
Figure PCTCN2020100547-appb-000008
为与人脸样本图像的身份信息相同的人脸样本图像及其特征,
Figure PCTCN2020100547-appb-000009
Figure PCTCN2020100547-appb-000010
为与人脸样本的身份信息不同的人脸样本图像及其特征;α为期望的正样本对之间距离与负样本对之间距离的差值,当负样本对之间的距离比正样本对之间的距离大α时,则该三元组的损失函数值为0,否则大于0。
在本申请实施例中,通过最小化目标函数即可达到同一身份的特征尽可能近、不同身份的特征尽可能远的目的。需要注意的是,本申请实施例对目标函数的形式没有限制,可以用于训练单人脸识别模型的目标函数均可用于本申请实施例所述的技术方案。
第二种方式,预设系数可以为第二特征提取模型预先配置的,此时,为N个第二特征提取模型配置N个数值相等的预设系数。
第三种方式,预设系数可以为根据预设评判准则为N个第二特征提取模型配置N个预设系数。例如,以检索准确率作为评判准则,假设人脸检索设备中存在A、B和C三个第二特征提取模型,这些第二特征提取模型的检索准确率分别为0.98、0.95和0.9,人脸 检索设备可以将第二特征提取模型A、B和C的预设系数确定为0.98、0.95和0.9;再如,以具有同一身份信息的人脸样本图像之间的相似度作为评判标准,假设特征提取模型针对一批身份信息,每个身份信息对应多张不同的人脸样本图像,对每个身份信息而言,人脸检索设备计算两两人脸样本图像利用第二特征提取模型提取出的特征之间的平均相似度S a,在得出所有身份信息对应的S a后,计算这些S a的平均值AS a,如第二特征提取模型A、B和C的AS a分别为0.8、0.6和0.5,则将第二特征提取模型A、B和C的预设系数确定为0.8、0.6和0.5。
第四种方式,预设系数可以为基于超参数搜索得到的,用于超参数搜索的方法都可用于系数搜索。具体来说,人脸检索设备可以在预设的系数范围内,获取N个第二特征提取模型对应系数组合,然后,将N个第二人脸样本特征与系数组合对应相乘,再将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;最后,人脸检索设备根据拼接后的人脸样本特征,对人脸样本图像进行人脸检索,得到系数组合中满足预设条件的预设系数。例如,在网格搜索中,假设人脸检索设备中存在A、B和C三个第二特征提取模型,它们的系数范围均为0到1,将A、B和C的在系数范围内均分成10份,即A、B和C的系数有0、0.1、0.2、0.3、0.4、0.5、0.6、0.7、0.8、0.9和1,共11个系数,所以,A、B和C共有11×11×11=1331个系数组合,利用这1331个系数组合对N个人脸样本体征特征进行拼接,并采用拼接后的人脸样本特征进行人脸检索,将检索准确率最高的一组超参数(即A、B和C的系数组合)确定为最终的预设系数。
当然,上述几种方式仅为确定预设系数的举例,人脸检索设备还可以通过其他方式确定预设系数,本申请实施例不做具体限定。
在一些可能的实施方式中,人脸检索设备还可以对多个人脸样本特征的映射进行联合训练,上述第二特征映射模型包括独有模块和共享模块;假设,一个神经网络模型包含7层,其中前面4层可以为独有模块,后面3层可以为共享模块,共享模块和独有模块其实都是神经网络层,它们的不同点是,独有模块的参数可以更加灵活地变化,适应各个第二人脸样本特征本身的特点,而共享模块的参数需要处理多个独有模块的输入,综合利用所有第二人脸样本特征,其参数在训练过程中会有更强的限制。由此可以看出,独有模块可学习特征本身的特点,共享模块可学习各模型特征共有的属性。
相应地,上述S603可以包括:将人脸样本图像和多个第一人脸样本特征输入独有模块进行特征映射,得到输出的第三人脸样本特征,多个第一人脸样本特征是由人脸样本图像通过不同的多个第一特征提取模型提取得到的;将第三人脸样本特征输入共享模块,得到多个第一人脸样本特征对应的标准特征;根据人脸样本图像、多个第一人脸样本特征、多个第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对独有模块和共享模块进行训练,得到第一特征映射模型。
举例来说,人脸样本图像输入多个第一特征提取模型,得到输出的多个第一人脸样本特征,人脸检索设备将各个第一人脸样本特征和人脸样本图像同时作为输入,优化的目标为拟合该人脸样本图像对应的目标特征。图9为本申请实施例中的训练独有模块和共享模块的方法流程示意图,参见图9所示,人脸检索设备中存在A、B两个第一特征提取模型,人脸样本图像经过A、B两个第一特征提取模型获得第一人脸样 本特征A、B后,再经由第二特征映射模型A的独有模块A和第二映射模型B的独有模块B得到输出的第三人脸样本特征,同时将原始的人脸样本图像作为第二特征映射模型A和B的独有模块的输入,最后再经由映射模型A和B的共享模块分别获得输出的标准特征F A和F B,而训练优化目标为与人脸样本图像的目标特征尽可能的近,即F A应与目标特征F尽可能相似,F B也应与F尽可能相似。当目标函数收敛时,第二特征映射模型就训练好了,训练好的第二特征模型即为第一特征映射模型。
进一步地,在联合训练好各个第一特征映射模型后,图10为本申请实施例中的第一特征映射模型进行特征映射的方法流程示意图,参见图10所示,每一第一特征映射模型均可以单独使用,用于将不同人脸特征映射至标准特征,并用于人脸检索。
在一些可能的实施方式中,上述第二特征映射模型包括图像分支模块、特征分支模块和综合模块,其中,图像分支模块可以为卷积神经网络,特征分支模块和综合模块可以为全连接神经网络,全连接神经网络和卷积神经网络的作用类似,不同点在于网络中神经元的连接方式不同。
相应地,上述S603可以包括:将人脸样本图像输入图像分支模块,得到输出的第四人脸样本特征;将第一人脸样本特征输入特征分支模块,得到输出的第五人脸样本特征,第一人脸样本特征是由人脸样本图像通过第一特征提取模型提取得到的;将第四和第五人脸样本特征共同输入综合模块,得到第一人脸样本特征对应的标准特征;根据人脸样本图像、第一人脸样本特征、第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对图像分支模块、特征分支模块和综合模块进行训练,得到第一特征映射模型。
举例来说,人脸样本图像输入第一特征提取模型,得到输出的第一人脸样本特征,人脸检索设备将人脸样本图像和第一人脸样本特征同时作为第一特征映射模型的输入,优化的目标为拟合该人脸样本图像对应的目标特征。图11为本申请实施例中的训练图像分支模块、特征分支模块和综合模块的方法流程示意图,参见图11所示,人脸样本图像经过第一特征提取模型获得第一人脸样本特征,第一人脸样本特征经过特征分支模块得到输出后的第五人脸样本特征,人脸样本图像经过图像分支模块得到输出后的第四人脸样本特征,第四人脸样本特征和第五人脸样本特征共同输入综合模块,获得输出的标准特征,而训练优化目标为与人脸样本图像的目标特征尽可能的近。当目标函数收敛时,第二特征映射模型就训练好了,训练好的第二特征模型即为第一特征映射模型。进一步地,在训练好各个第一特征映射模型后,每一第一特征映射模型均可以用于将对应的第一特征提取模型获得的人脸特征映射至标准特征,并用于人脸检索。
应理解,在该实施方式中,在训练第二特征映射模型时所用到的目标特征可以是仅由本厂家得到的,也可以是本厂家与其他厂家协作得到的。
由上述可知,在本申请实施例中,通过将人脸特征和人脸图像共同作为特征映射模型的输入,在仅使用人脸特征难以获取适当的标准特征时,通过人脸图像提供的额外信息获取更适当的标准特征,提高人脸检索的准确率。另外,通过将多个特征提取模型所提取的特征进行拼接,并将拼接后的特征作为标准特征的构建依据,使得人脸检索设备能够利用多个特征提取模型的综合作用,选取适当的特征空间,提高人脸检索的准确率。进一步地,由于每幅人脸图像只需经过一个特征提取模型和一个特征映射模型来获得标准 特征,使得***的计算量并不会随着模型的数目成倍增加,减少***计算量。进一步地,由于特征映射模型与特征提取模型是一一对应的,特征映射模型数目与特征提取模型的数目一致,使得人脸检索设备无需训练数量巨大的特征映射模型,减少***计算量。
基于与上述方法相同的发明构思,本申请实施例提供一种人脸检索装置,该人脸检索装置可以为上述实施例所述人脸检索设备中的人脸检索装置或者人脸检索装置中的芯片或者片上***,还可以为人脸检索设备中用于实现上述各实施例所述的方法的功能模块。该人脸检索装置可以实现上述各实施例中人脸检索设备所执行的功能,所述功能可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。举例来说,一种可能的实施方式中,图12为本申请实施例中的人脸检索装置的结构示意图,参见图12所示,该人脸检索装置1200包括:接口模块1201,用于获取待检索的人脸图像;特征提取模块1202,用于将人脸图像输入第一特征提取模型,得到人脸特征;特征映射模块1203,用于将人脸图像和人脸特征输入第一特征映射模型,得到输出的人脸特征对应的标准特征,第一特征提取模型的特征输出维数和第一特征映射模型的特征输入维数相同,第一特征映射模型是根据人脸图像对应的目标特征训练得到的;人脸检索模块1204,用于根据标准特征,对人脸图像进行人脸检索。
在一些可能的实施方式中,上述装置还包括:映射模型训练模块,用于获取人脸样本图像;将人脸样本图像输入第一特征提取模型,得到输出的第一人脸样本特征;根据人脸样本图像、第一人脸样本特征以及人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到第一特征映射模型,第二特征映射模型与第一特征提取模型一一对应。
在一些可能的实施方式中,上述装置还包括:目标特征获取模块,用于映射模型训练模块得到满足目标函数的第一特征映射模型之前,获取人脸样本图像;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征,N为大于或者等于2的正整数;将N个第二人脸样本特征与N个预设系数一一对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据拼接后的人脸样本特征,获取人脸样本图像对应的目标特征。
在一些可能的实施方式中,目标特征获取模块,还用于获取人脸样本图像,人脸样本图像具有身份信息;将人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征;根据N个第二人脸样本特征和身份信息,对人脸样本图像进行人脸识别,得到N个预设系数。
在一些可能的实施方式中,目标特征获取模块,具体用于为N个第二特征提取模型配置N个预设系数,N个预设系数相等;或,根据预设评判准则,为N个第二特征提取模型配置N个预设系数。
在一些可能的实施方式中,目标特征获取模块,具体用于在预设的系数范围内,获取N个第二特征提取模型对应系数组合;将N个第二人脸样本特征与系数组合对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据拼接后的人脸样本特征,对人脸样本图像进行人脸检索,得到系数组合中满足预设条件的预设系数。
在一些可能的实施方式中,目标特征获取模块,还用于对拼接后的人脸样本特征进行降维;将降维后的人脸样本特征确定为人脸样本图像对应的目标特征。
在一些可能的实施方式中,第二特征映射模型包括独有模块和共享模块;
相应地,映射模型训练模块,还用于将人脸样本图像和多个第一人脸样本特征输入独有模块,得到输出后的第三人脸样本特征,多个第一人脸样本特征是由人脸样本图像通过不同的多个第一特征提取模型提取得到的;将第三人脸样本特征输入共享模块,得到多个第一人脸样本特征对应的标准特征;根据人脸样本图像、多个第一人脸样本特征、多个第一人脸样本特征对应的标准特征以及人脸样本图像对应的目标特征,对独有模块和共享模块进行训练,得到第一特征映射模型。
还需要说明的是,接口模块1201、特征提取模块1202、特征映射模块1203、人脸检索模块1204、映射模型训练模块以及目标特征获取模块的具体实现过程可参考图4至图11实施例的详细描述,为了说明书的简洁,这里不再赘述。在本申请实施例中,接口模块1201可以用于执行上述实施例中的S501,特征提取模块1202可以用于执行上述实施例中的S502,特征映射模块1203可以用于执行上述实施例中的S503,人脸检索模1203可以用于执行上述实施例中的S504。
本申请实施例中提到的接口模块可以为接收接口、接收电路或者接收器等;特征提取模块、特征映射模块、人脸检索模块、映射模型训练模块以及目标特征获取模块可以为一个或者多个处理器。
基于与上述方法相同的发明构思,本申请实施例提供一种人脸检索设备,图13为本申请实施例中的人脸检索设备的结构示意图,参见图13中实线所示,人脸检索设备1300可以包括:处理器1301和通信接口1302,处理器1301可以用于支持人脸检索设备1300实现上述各个实施例中所涉及的功能,例如:处理器1301可以通过通信接口1302获取待检索的人脸图像。
在一些可能的实施方式中,参见图13中虚线所示,人脸检索设备1300还可以包括存储器1303,存储器1303,用于保存人脸检索设备1300必要的计算机执行指令和数据。当该人脸检索设备1300运行时,该处理器1301执行该存储器1303存储的该计算机执行指令,以使该人脸检索设备1300执行如上述各个实施例中所述的人脸检索方法。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质存储有指令,当指令在计算机上运行时,用于执行上述各个实施例所述的人脸检索方法。
基于与上述方法相同的发明构思,本申请实施例提供一种计算机程序或计算机程序产品,当计算机程序或计算机程序产品在计算机上被执行时,使得计算机实现上述各个实施例所述的人脸检索方法。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (24)

  1. 一种人脸检索方法,其特征在于,包括:
    获取待检索的人脸图像;
    通过第一特征提取模型对所述人脸图像进行特征提取,得到第一人脸特征;
    将所述人脸图像和所述第一人脸特征输入第一特征映射模型,得到输出的所述第一人脸特征对应的标准特征,所述第一特征映射模型是根据人脸样本图像对应的目标特征训练得到的;所述第一特征提取模型的特征输出维数和所述第一特征映射模型的特征输入维数相同;
    根据所述标准特征,对所述人脸图像进行人脸检索。
  2. 根据权利要求1所述的方法,其特征在于,所述人脸样本图像对应的目标特征是由多个人脸样本特征拼接得到的,所述多个人脸样本特征是由多个第二特征提取模型对所述人脸样本图像进行特征提取得到的;所述多个第二特征提取模型包括所述第一特征提取模型;所述多个第二特征提取模型至少具有不同的训练样本、模型结构、训练策略或特征维数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取人脸样本图像;
    将所述人脸样本图像输入所述第一特征提取模型,得到输出的第一人脸样本特征;
    根据所述人脸样本图像、所述第一人脸样本特征以及所述人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到所述第一特征映射模型,所述第二特征映射模型与所述第一特征提取模型对应。
  4. 根据权利要求3所述的方法,其特征在于,在所述根据所述人脸样本图像、所述第一人脸样本特征以及所述人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到满足目标函数的所述第一特征映射模型之前,所述方法还包括:
    获取人脸样本图像;
    将所述人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征,N为大于或者等于2的正整数;
    将所述N个第二人脸样本特征与N个预设系数一一对应相乘;
    将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;
    根据所述拼接后的人脸样本特征,获取所述人脸样本图像对应的目标特征,其中,所述目标特征的维数小于或等于所述N个第二特征提取模型的维数之和。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    获取人脸样本图像,所述人脸样本图像具有身份信息;
    将所述人脸样本图像输入所述N个第二特征提取模型,得到输出的所述N个第二人脸样本特征;
    根据所述N个第二人脸样本特征和所述身份信息,对所述人脸样本图像进行人脸识别,得到所述N个预设系数。
  6. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    为所述N个第二特征提取模型配置所述N个预设系数,所述N个预设系数相等;或,
    根据预设评判准则,为所述N个第二特征提取模型配置所述N个预设系数。
  7. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    在预设的系数范围内,获取所述N个第二特征提取模型对应系数组合;
    将所述N个第二人脸样本特征与所述系数组合对应相乘;
    将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;
    根据所述拼接后的人脸样本特征,对所述人脸样本图像进行人脸检索,得到所述系数组合中满足预设条件的所述预设系数。
  8. 根据权利要求4至7任一项所述的方法,其特征在于,所述根据所述拼接后的人脸样本特征,获取所述人脸样本图像对应的目标特征,包括:
    对所述拼接后的人脸样本特征进行降维;
    将降维后的人脸样本特征确定为所述人脸样本图像对应的目标特征。
  9. 根据权利要求3至8任一项所述的方法,其特征在于,所述第二特征映射模型包括独有模块和共享模块;
    所述根据所述人脸样本图像、所述第一人脸样本特征以及所述人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到所述第一特征映射模型,包括:
    将所述人脸样本图像和多个第一人脸样本特征输入所述独有模块,得到输出后的第三人脸样本特征,所述多个第一人脸样本特征是由所述人脸样本图像通过不同的多个第一特征提取模型提取得到的;
    将所述第三人脸样本特征输入所述共享模块,得到所述多个第一人脸样本特征对应的标准特征;
    根据所述人脸样本图像、所述多个第一人脸样本特征、所述多个第一人脸样本特征对应的标准特征以及所述人脸样本图像对应的目标特征,对所述独有模块和所述共享模块进行训练,得到所述第一特征映射模型。
  10. 根据权利要求3至8任一项所述的方法,其特征在于,所述第二特征映射模型包括图像分支模块、特征分支模块和综合模块;
    所述根据所述人脸样本图像、所述第一人脸样本特征以及所述人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到所述第一特征映射模型,包括:
    将所述人脸样本图像输入所述图像分支模块,得到输出后的第四人脸样本特征;
    将所述第一人脸样本特征输入所述特征分支模块,得到输出后的第五人脸样本特征,所述第一人脸样本特征是由所述人脸样本图像通过第一特征提取模型提取得到的;
    将所述第四人脸样本特征和所述第五人脸样本特征共同输入所述综合模块,得到所述第一人脸样本特征对应的标准特征;
    根据所述人脸样本图像、所述第一人脸样本特征、所述第一人脸样本特征对应的标准特征以及所述人脸样本图像对应的目标特征,对所述图像分支模块、所述特征分支模块和所述综合模块进行训练,得到所述第一特征映射模型。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述根据所述标准特征,对所述人脸图像进行人脸检索,包括:
    确定所述标准特征与第一人脸样本图像的标准特征的相似度,所述第一人脸样板 图像是多个人脸样本图像中的任一人脸样本图像;
    当所述相似度大于第一阈值时,所述第一人脸样本图像为所述人脸图像检索的目标。
  12. 一种人脸检索装置,其特征在于,包括:
    接口模块,用于获取待检索的人脸图像;
    特征提取模块,用于通过第一特征提取模型对所述人脸图像进行特征提取,得到第一人脸特征;
    特征映射模块,用于将所述人脸图像和所述第一人脸特征输入第一特征映射模型,得到输出的所述第一人脸特征对应的标准特征,所述第一特征映射模型是根据人脸样本图像对应的目标特征训练得到的;所述第一特征提取模型的特征输出维数和所述第一特征映射模型的特征输入维数相同;
    人脸检索模块,用于根据所述标准特征,对所述人脸图像进行人脸检索。
  13. 根据权利要求12所述的装置,其特征在于,所述人脸样本图像对应的目标特征是由多个人脸样本特征拼接得到的,所述多个人脸样本特征是由多个第二特征提取模型对所述人脸样本图像进行特征提取得到的;所述多个第二特征提取模型包括所述第一特征提取模型;所述多个第二特征提取模型至少具有不同的训练样本、模型结构、训练策略或特征维数不同。。
  14. 根据权利要求12或13所述的装置,其特征在于,所述装置还包括:映射模型训练模块,用于获取人脸样本图像;将所述人脸样本图像输入所述第一特征提取模型,得到输出的第一人脸样本特征;根据所述人脸样本图像、所述第一人脸样本特征以及所述人脸样本图像对应的目标特征,对第二特征映射模型进行训练,得到所述第一特征映射模型,所述第二特征映射模型与所述第一特征提取模型一一对应。
  15. 根据权利要求14所述的装置,其特征在于,所述装置还包括:目标特征获取模块,用于所述映射模型训练模块得到满足目标函数的所述第一特征映射模型之前,获取人脸样本图像;将所述人脸样本图像输入N个第二特征提取模型,得到输出的N个第二人脸样本特征,N为大于或者等于2的正整数;将所述N个第二人脸样本特征与N个预设系数一一对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼接后的人脸样本特征;根据所述拼接后的人脸样本特征,获取所述人脸样本图像对应的目标特征,其中,所述目标特征的维数小于或等于所述N个第二特征提取模型的维数之和。
  16. 根据权利要求14所述的装置,其特征在于,所述目标特征获取模块,还用于获取人脸样本图像,所述人脸样本图像具有身份信息;将所述人脸样本图像输入N个第二特征提取模型,得到输出的所述N个第二人脸样本特征;根据所述N个第二人脸样本特征和所述身份信息,对所述人脸样本图像进行人脸识别,得到所述N个预设系数。
  17. 根据权利要求14所述的装置,其特征在于,所述目标特征获取模块,具体用于为N个第二特征提取模型配置所述N个预设系数,所述N个预设系数相等;或,根据预设评判准则,为所述N个第二特征提取模型配置所述N个预设系数。
  18. 根据权利要求14所述的装置,其特征在于,所述目标特征获取模块,具体用于在预设的系数范围内,获取N个第二特征提取模型对应系数组合;将所述N个第二人脸样本特征与所述系数组合对应相乘;将相乘后的N个第二人脸样本特征进行拼接,得到拼 接后的人脸样本特征;根据所述拼接后的人脸样本特征,对所述人脸样本图像进行人脸检索,得到所述系数组合中满足预设条件的所述预设系数。
  19. 根据权利要求15至18任一项所述的装置,其特征在于,所述目标特征获取模块,还用于对所述拼接后的人脸样本特征进行降维;将降维后的人脸样本特征确定为所述人脸样本图像对应的目标特征。
  20. 根据权利要求14至19任一项所述的装置,其特征在于,所述第二特征映射模型包括独有模块和共享模块;
    所述映射模型训练模块,还用于将所述人脸样本图像和多个第一人脸样本特征输入所述独有模块,得到输出后的第三人脸样本特征,所述多个第一人脸样本特征是由所述人脸样本图像通过不同的多个第一特征提取模型提取得到的;将所述第三人脸样本特征输入所述共享模块,得到所述多个第一人脸样本特征对应的标准特征;根据所述人脸样本图像、所述多个第一人脸样本特征、所述多个第一人脸样本特征对应的标准特征以及所述人脸样本图像对应的目标特征,对所述独有模块和所述共享模块进行训练,得到所述第一特征映射模型。
  21. 根据权利要求14至19任一项所述的装置,其特征在于,所述第二特征映射模型包括图像分支模块、特征分支模块和综合模块;
    所述映射模型训练模块,还用于将所述人脸样本图像输入所述图像分支模块,得到输出后的第四人脸样本特征;将所述第一人脸样本特征输入所述特征分支模块,得到输出后的第五人脸样本特征,所述第一人脸样本特征是由所述人脸样本图像通过第一特征提取模型提取得到的;将所述第四人脸样本特征和所述第五人脸样本特征共同输入所述综合模块,得到所述第一人脸样本特征对应的标准特征;根据所述人脸样本图像、所述第一人脸样本特征、所述第一人脸样本特征对应的标准特征以及所述人脸样本图像对应的目标特征,对所述图像分支模块、所述特征分支模块和所述综合模块进行训练,得到所述第一特征映射模型。
  22. 根据权利要求12至21任一项所述的装置,其特征在于,所述人脸检索模块具体用于:
    确定所述标准特征与第一人脸样本图像的标准特征的相似度,所述第一人脸样板图像是多个人脸样本图像中的任一人脸样本图像;
    当所述相似度大于第一阈值时,所述第一人脸样本图像为所述人脸图像检索的目标。
  23. 一种人脸检索设备,其特征在于,包括:处理器和通信接口;
    所述通信接口,与所述处理器耦合,所述处理器通过所述通信接口获取待检索人脸图像;
    所述处理器,用于支持所述人脸检索设备实现上述权利要求1至11任一项所述的人脸解锁方法。
  24. 根据权利要求22所述的人脸检索设备,其特征在于,所述人脸检索设备还包括:存储器,用于保存所述人脸检索设备必要的计算机执行指令和数据;当所述人脸检索设备运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述人脸检索设备执行如权利要求1至11任一项所述的人脸检索方法。
PCT/CN2020/100547 2019-08-15 2020-07-07 一种人脸检索方法及装置 WO2021027440A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20852877.8A EP4012578A4 (en) 2019-08-15 2020-07-07 FACE RETRIEVING METHOD AND DEVICE
US17/671,253 US11881052B2 (en) 2019-08-15 2022-02-14 Face search method and apparatus

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910755045.2 2019-08-15
CN201910755045 2019-08-15
CN201911089829.2A CN112395449A (zh) 2019-08-15 2019-11-08 一种人脸检索方法及装置
CN201911089829.2 2019-11-08

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/671,253 Continuation US11881052B2 (en) 2019-08-15 2022-02-14 Face search method and apparatus

Publications (1)

Publication Number Publication Date
WO2021027440A1 true WO2021027440A1 (zh) 2021-02-18

Family

ID=74570484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100547 WO2021027440A1 (zh) 2019-08-15 2020-07-07 一种人脸检索方法及装置

Country Status (3)

Country Link
US (1) US11881052B2 (zh)
EP (1) EP4012578A4 (zh)
WO (1) WO2021027440A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065530A (zh) * 2021-05-12 2021-07-02 曼德电子电器有限公司 人脸识别方法和装置、介质、设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021027440A1 (zh) * 2019-08-15 2021-02-18 华为技术有限公司 一种人脸检索方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526997A (zh) * 2009-04-22 2009-09-09 无锡名鹰科技发展有限公司 嵌入式红外人脸图像识别方法及识别装置
CN103824052A (zh) * 2014-02-17 2014-05-28 北京旷视科技有限公司 一种基于多层次语义特征的人脸特征提取方法及识别方法
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及***
CN108537120A (zh) * 2018-03-06 2018-09-14 安徽电科恒钛智能科技有限公司 一种基于深度学习的人脸识别方法及***
CN108921100A (zh) * 2018-07-04 2018-11-30 武汉高德智感科技有限公司 一种基于可见光图像与红外图像融合的人脸识别方法及***
US20190102528A1 (en) * 2017-09-29 2019-04-04 General Electric Company Automatic authentification for MES system using facial recognition

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100745981B1 (ko) * 2006-01-13 2007-08-06 삼성전자주식회사 보상적 특징에 기반한 확장형 얼굴 인식 방법 및 장치
KR100866792B1 (ko) * 2007-01-10 2008-11-04 삼성전자주식회사 확장 국부 이진 패턴을 이용한 얼굴 기술자 생성 방법 및장치와 이를 이용한 얼굴 인식 방법 및 장치
CN101159064B (zh) * 2007-11-29 2010-09-01 腾讯科技(深圳)有限公司 画像生成***以及按照图像生成画像的方法
EP2717223A4 (en) * 2011-05-24 2015-06-17 Nec Corp INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
US9824296B2 (en) * 2011-11-10 2017-11-21 Canon Kabushiki Kaisha Event detection apparatus and event detection method
JP6112801B2 (ja) * 2012-08-22 2017-04-12 キヤノン株式会社 画像認識装置及び画像認識方法
JP5500303B1 (ja) * 2013-10-08 2014-05-21 オムロン株式会社 監視システム、監視方法、監視プログラム、ならびに該プログラムを記録した記録媒体
US10776652B2 (en) * 2017-09-28 2020-09-15 Baidu Usa Llc Systems and methods to improve visual feature detection using motion-related data
JP2021503139A (ja) * 2017-11-27 2021-02-04 日本電気株式会社 画像処理装置、画像処理方法および画像処理プログラム
CN108197532B (zh) 2017-12-18 2019-08-16 深圳励飞科技有限公司 人脸识别的方法、装置及计算机装置
CN108537143B (zh) 2018-03-21 2019-02-15 光控特斯联(上海)信息科技有限公司 一种基于重点区域特征比对的人脸识别方法与***
WO2021027440A1 (zh) * 2019-08-15 2021-02-18 华为技术有限公司 一种人脸检索方法及装置
US20220138472A1 (en) * 2020-10-30 2022-05-05 University Of Maryland, College Park System and Method for Detecting Fabricated Videos

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526997A (zh) * 2009-04-22 2009-09-09 无锡名鹰科技发展有限公司 嵌入式红外人脸图像识别方法及识别装置
CN103824052A (zh) * 2014-02-17 2014-05-28 北京旷视科技有限公司 一种基于多层次语义特征的人脸特征提取方法及识别方法
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及***
US20190102528A1 (en) * 2017-09-29 2019-04-04 General Electric Company Automatic authentification for MES system using facial recognition
CN108537120A (zh) * 2018-03-06 2018-09-14 安徽电科恒钛智能科技有限公司 一种基于深度学习的人脸识别方法及***
CN108921100A (zh) * 2018-07-04 2018-11-30 武汉高德智感科技有限公司 一种基于可见光图像与红外图像融合的人脸识别方法及***

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4012578A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065530A (zh) * 2021-05-12 2021-07-02 曼德电子电器有限公司 人脸识别方法和装置、介质、设备
CN113065530B (zh) * 2021-05-12 2023-05-30 曼德电子电器有限公司 人脸识别方法和装置、介质、设备

Also Published As

Publication number Publication date
US11881052B2 (en) 2024-01-23
EP4012578A4 (en) 2022-10-05
US20220165091A1 (en) 2022-05-26
EP4012578A1 (en) 2022-06-15

Similar Documents

Publication Publication Date Title
CN111523621B (zh) 图像识别方法、装置、计算机设备和存储介质
JP7322044B2 (ja) レコメンダシステムのための高効率畳み込みネットワーク
CN110197099B (zh) 跨年龄人脸识别及其模型训练的方法和装置
US9990558B2 (en) Generating image features based on robust feature-learning
WO2020156153A1 (zh) 音频识别方法、***和机器设备
CN108491805B (zh) 身份认证方法和装置
WO2020119350A1 (zh) 视频分类方法、装置、计算机设备和存储介质
Yang et al. Few-shot classification with contrastive learning
RU2666631C2 (ru) Обучение dnn-студента посредством распределения вывода
Wang et al. Industrial cyber-physical systems-based cloud IoT edge for federated heterogeneous distillation
US20170316287A1 (en) Image hash codes generated by a neural network
CN110347932B (zh) 一种基于深度学习的跨网络用户对齐方法
WO2016062095A1 (zh) 视频分类方法和装置
WO2021027440A1 (zh) 一种人脸检索方法及装置
CN110765882B (zh) 一种视频标签确定方法、装置、服务器及存储介质
JP2023523029A (ja) 画像認識モデル生成方法、装置、コンピュータ機器及び記憶媒体
CN113761261A (zh) 图像检索方法、装置、计算机可读介质及电子设备
CN112395449A (zh) 一种人脸检索方法及装置
CN113177616B (zh) 图像分类方法、装置、设备及存储介质
JP2023526787A (ja) データ・レコードを処理するための方法およびシステム
CN113393474A (zh) 一种基于特征融合的三维点云的分类和分割方法
CN113312989A (zh) 一种基于聚合描述子与注意力的指静脉特征提取网络
US20230297617A1 (en) Video retrieval method and apparatus, device, and storage medium
CN113191479A (zh) 联合学习的方法、***、节点及存储介质
WO2021027555A1 (zh) 一种人脸检索方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20852877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020852877

Country of ref document: EP

Effective date: 20220310