WO2019128367A1

WO2019128367A1 - Face verification method and apparatus based on triplet loss, and computer device and storage medium

Info

Publication number: WO2019128367A1
Application number: PCT/CN2018/109169
Authority: WO
Inventors: 许丹丹; 梁添才; 章烈剽; 龚文川
Original assignee: 广州广电运通金融电子股份有限公司
Priority date: 2017-12-26
Filing date: 2018-09-30
Publication date: 2019-07-04
Also published as: CN108009528A; CN108009528B

Abstract

The present invention relates to a face verification method and apparatus based on Triplet Loss, and a computer device and a storage medium. The method comprises: based on a face verification request, acquiring a certificate photograph and a scenario photograph of a person; respectively performing face detection, key point positioning and image pre-processing on the scenario photograph and the certificate photograph, so as to obtain a scenario face image corresponding to the scenario photograph and a certificate face image corresponding to the certificate photograph; inputting the scenario face image and the certificate face image into a pre-trained convolutional neural network model used for face verification, and acquiring a first feature vector corresponding to the scenario face image and a second feature vector corresponding to the certificate face image, which feature vectors are output by the convolutional neural network model; calculating a cosine distance between the first feature vector and the second feature vector; and comparing the cosine distance with a pre-set threshold value, and determining a face verification result according to a comparison result. The method improves the reliability of face verification.

Description

基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质Face authentication method, device, computer device and storage medium based on Triplet Loss

技术领域Technical field

本发明涉及图像处理技术领域，特别是涉及一种基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质。The present invention relates to the field of image processing technologies, and in particular, to a face authentication method, apparatus, computer device, and storage medium based on Triplet Loss.

背景技术Background technique

人脸认证，是指对比现场采集的人物场景照片以及身份信息中的证件照片，判断是否为同一个人。人脸认证的关键技术为人脸识别。Face authentication refers to comparing the photos of the scenes collected by the scene and the photos of the IDs in the identity information to determine whether they are the same person. The key technology of face authentication is face recognition.

随着深度学***得到较大的提升。在运用深度学习解决人脸识别问题的研究工作中，主要有两派主流的方法：基于分类学习的方法和基于度量学习的方法。其中，基于分类学习的方法主要是在深度卷积网络提取特征之后计算样本的分类损失函数(比如softmax loss、center loss及相关变体)来对网络进行优化，网络最后一层是用于分类的全连接层，其输出节点的数量往往要与训练数据集的总类别数保持一致，该类方法适用于训练样本较多，尤其是同一类别的训练样本比较丰富的情况，网络可以得到较好的训练效果和泛化能力。但当类别数达到数十万或更高数量级时，网络最后的分类层(全连接层)参数量会呈线性增长而相当庞大，导致网络难以训练。With the rise of deep learning technology, the related problems of face recognition have continuously broken through the traditional technical bottlenecks, and the performance level has been greatly improved. In the research work of applying deep learning to solve face recognition problems, there are mainly two mainstream methods: classification learning based methods and metric learning based methods. Among them, the method based on classification learning mainly calculates the classification loss function (such as softmax loss, center loss and related variants) of the sample after the feature is extracted by the deep convolution network to optimize the network. The last layer of the network is used for classification. In the fully connected layer, the number of output nodes is always consistent with the total number of categories in the training data set. This type of method is suitable for training samples, especially when the training samples of the same category are rich, and the network can get better. Training effect and generalization ability. However, when the number of categories reaches hundreds of thousands or more, the amount of parameters in the final classification layer (fully connected layer) of the network will increase linearly and become quite large, which makes the network difficult to train.

另一类方法是基于度量学习的方法，该方法以元组的方式组织训练样本(比如二元组pair或者三元组triplet)，在深度卷积网络之后无需通过分类层，而是直接基于卷积特征向量计算样本间的度量损失(比如contrastive loss、triplet loss等)来对网络进行优化，该方法不需要训练分类层，因此网络参数量不受类别数增长的影响，对训练数据集的类别数无限定，只需要根据相应策略选取同类或异类样本构造合适的元组即可。相比分类学习方法，度量学习方法更适用于训练数据广度较大但深度不足(样本类别数多，但同类样本少)的情况，通过样本之间的不同组合，可以构造相当丰富的元组数据用于训练，同时度量学习方式更加关注元组内部关系，对于1:1人脸验证这类判断是与不是的问题有其先天的优势。Another type of method is a method based on metric learning, which organizes training samples in the form of tuples (such as a binary set or a triplet triplet). After deep deconvolution, there is no need to pass the classification layer, but directly based on the volume. The product feature vector calculates the metric loss between samples (such as contrast loss, triplet loss, etc.) to optimize the network. This method does not need to train the classification layer, so the network parameter quantity is not affected by the number of categories, and the category of the training data set. There is no limit to the number. It is only necessary to select the same or different types of samples according to the corresponding strategy to construct a suitable tuple. Compared with the classification learning method, the metric learning method is more suitable for the case where the training data breadth is large but the depth is insufficient (the number of sample categories is large, but the similar samples are few). By the different combinations between the samples, a considerable amount of tuple data can be constructed. It is used for training, and the measurement learning method pays more attention to the internal relationship of the tuple. It has its inherent advantages for the judgment of 1:1 face verification.

在实际应用中，许多的机构都要求实名制登记，例如，银行开户，手机号码登记、金融账号开户等等。实名制登记要求用户携带身份证到指定的地点，由工作人员验证本人与身份证的照片对应后，方可开户成功。而随着互联网技术地发展，越来越多的机构推出了便民服务，不再强制要求客户到指定网点。用户的地理位置不受限制，上传身份证，并利用移动终端的图像采集装置采集现场的人物场景照片，由***进行人脸认证，并在人脸认证通过后，即可开户成功。而传统地基于度量的学习方法，使用欧式距离来度量样本之间的相似度，而欧氏距离衡量的是空间各点的绝对距离，跟各个点所在的位置坐标直接相关，这并不符合人脸特征空间的分布属性，导致人脸识别的可靠性较低。In practical applications, many organizations require real-name registration, such as bank account opening, mobile phone number registration, financial account opening, and so on. The real-name registration requires the user to bring the ID card to the designated place, and the staff member can verify that the photo is corresponding to the ID card before the account can be opened successfully. With the development of Internet technology, more and more organizations have introduced convenience services, and no longer require customers to go to designated outlets. The user's geographical location is not limited, the ID card is uploaded, and the image collection device of the mobile terminal is used to collect the photo of the scene of the scene, and the system performs face authentication, and after the face authentication is passed, the account opening can be successful. Traditionally, metric-based learning methods use Euclidean distance to measure the similarity between samples, while Euclidean distance measures the absolute distance of each point in space, which is directly related to the position coordinates of each point. This is not in line with people. The distribution property of the face feature space leads to low reliability of face recognition.

发明内容Summary of the invention

基于此，有必要针对传统的人脸认证方法可靠性低的问题，提供一种基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a face authentication method, device, computer device and storage medium based on Triplet Loss for the problem that the traditional face authentication method has low reliability.

一种基于Triplet Loss的人脸认证方法，包括：A face recognition method based on Triplet Loss, comprising:

基于人脸认证请求，获取证件照片和人物的场景照片；Obtaining photos of the ID and scenes of the person based on the face authentication request;

对所述场景照片和所述证件照片分别进行人脸检测、关键点定位和图像预处理，得到所述场景照片对应的场景人脸图像，以及所述证件照片对应的证件人脸图像；Performing face detection, key point positioning, and image pre-processing on the scene photo and the document photo respectively, obtaining a scene face image corresponding to the scene photo, and a document face image corresponding to the document photo;

将所述场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取所述卷积神经网络模型输出的所述场景人脸图像对应的第一特征向量，以及所述证件人脸图像对应的第二特征向量；其中，所述卷积神经网络模型基于三元组损失函数的监督训练得到；Inputting the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and acquiring a corresponding image of the scene face image output by the convolutional neural network model a feature vector, and a second feature vector corresponding to the document face image; wherein the convolutional neural network model is obtained based on supervised training of a triple loss function;

计算所述第一特征向量和所述第二特征向量的余弦距离；Calculating a cosine distance of the first feature vector and the second feature vector;

比较所述余弦距离和预设阈值，并根据比较结果确定人脸认证结果。Comparing the cosine distance and a preset threshold, and determining a face authentication result according to the comparison result.

在一个实施例中，所述方法还包括：In an embodiment, the method further includes:

获取带标记的训练样本，所述训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像；Obtaining a labeled training sample, the training sample including a document face image and at least one scene face image that are marked for each mark object;

根据所述训练样本训练卷积神经网络模块，通过OHEM产生各训练样本对应的三元组元素；所述三元组元素包括参考样本、正样本和负样本；And training a convolutional neural network module according to the training sample, and generating a triple element corresponding to each training sample by using OHEM; the triad element includes a reference sample, a positive sample, and a negative sample;

根据各训练样本的三元组元素，基于三元组损失函数的监督，训练所述卷积神经网络模型；该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数；According to the triple element of each training sample, the convolutional neural network model is trained based on the supervision of the triplet loss function; the triad loss function is measured by the cosine distance and the model is optimized by the stochastic gradient descent algorithm. parameter;

将验证集数据输入所述卷积神经网络模型，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络模型。The verification set data is input into the convolutional neural network model, and when the training end condition is reached, the trained convolutional neural network model for face authentication is obtained.

在另一个实施例中，根据所述训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素的步骤，包括：In another embodiment, the step of training the convolutional neural network model according to the training sample, and generating the triple element corresponding to each training sample by using OHEM, includes:

随机选择一个图像作为参考样本，选择属于同一标签对象、与参考样本类别不同的图像作为正样本；Randomly selecting an image as a reference sample, and selecting an image belonging to the same label object and different from the reference sample category as a positive sample;

根据OHEM策略，利用当前训练的卷积神经网络模型提取特征之间的余弦距离，对于每一个参考样本，从其它不属于所述标签对象的图像中，选择距离最小、与所述参考样本属于不同类别的图像，作为该参考样本的负样本。According to the OHEM strategy, the currently trained convolutional neural network model is used to extract the cosine distance between the features. For each reference sample, from other images that do not belong to the label object, the selection distance is the smallest and is different from the reference sample. An image of the category as a negative sample of the reference sample.

在另一个实施例中，所述三元组损失函数包括对同类样本的余弦距离的限定，以及对异类样本的余弦距离的限定。In another embodiment, the triad loss function includes a definition of the cosine distance of a homogeneous sample, and a definition of the cosine distance of the heterogeneous sample.

在另一个实施例中，所述三元组损失函数为：In another embodiment, the triplet loss function is:

其中，cos(·)表示余弦距离，其计算方式为

N是三元组数量，

表示参考样本的特征向量，

表示同类正样本的特征向量，

表示异类负样本的特征向量，[·] ₊的含义如下：

α ₁为类间间隔参数，α ₂为类内间隔参数。 Where cos(·) represents the cosine distance and is calculated as

N is the number of triples,

Represents the feature vector of the reference sample,

a eigenvector representing a positive sample of its kind,

A eigenvector representing a heterogeneous negative sample, the meaning of [·] ₊ is as follows:

α ₁ is the inter-class spacing parameter and α ₂ is the intra-class spacing parameter.

在另一个实施例中，所述方法还包括：利用基于海量开源人脸数据训练好的基础模型参数进行初始化，在特征输出层后添加归一化层及三元组损失函数层，得到待训练的卷积神经网络模型。In another embodiment, the method further includes: initializing the basic model parameters trained based on the massive open source face data, adding a normalization layer and a ternary loss function layer after the feature output layer, to obtain training Convolutional neural network model.

一种基于Triplet Loss的人脸认证装置，包括：图像获取模块、图像预处理模块、特征获取模块、计算模块和认证模块；A face recognition device based on Triplet Loss, comprising: an image acquisition module, an image preprocessing module, a feature acquisition module, a calculation module and an authentication module;

所述图像获取模块，用于基于人脸认证请求，获取证件照片和人物的场景照片；The image obtaining module is configured to obtain a photo of the document and a photo of the scene of the character based on the face authentication request;

所述图像预处理模块，用于对所述场景照片和所述证件照片分别进行人脸检测、关键点定位和图像预处理，得到所述场景照片对应的场景人脸图像，以及所述证件照片对应的证件人脸图像；The image pre-processing module is configured to perform face detection, key point positioning, and image pre-processing on the scene photo and the photo of the document, respectively, to obtain a scene face image corresponding to the scene photo, and the photo of the ID Corresponding document face image;

所述特征获取模块，用于将所述场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取所述卷积神经网络模型输出的所述场景人脸图像对应的第一特征向量，以及所述证件人脸图像对应的第二特征向量；其中，所述卷积神经网络模型基于三元组损失函数的监督训练得到；The feature acquiring module is configured to input the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and acquire the output of the convolutional neural network model a first feature vector corresponding to the scene face image, and a second feature vector corresponding to the document face image; wherein the convolutional neural network model is obtained based on the supervision training of the triplet loss function;

所述计算模块，用于计算所述第一特征向量和所述第二特征向量的余弦距离；The calculating module is configured to calculate a cosine distance of the first feature vector and the second feature vector;

所述认证模块，用于比较所述余弦距离和预设阈值，并根据比较结果确定人脸认证结果。The authentication module is configured to compare the cosine distance and a preset threshold, and determine a face authentication result according to the comparison result.

在另一个实施例中，所述装置还包括：样本获取模块、三元组获取模块、训练模块和验证模块；In another embodiment, the apparatus further includes: a sample acquisition module, a triplet acquisition module, a training module, and a verification module;

所述样本获取模块，用于获取带标记的训练样本，所述训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像；The sample obtaining module is configured to acquire a labeled training sample, where the training sample includes a document face image and at least one scene face image that are marked for each mark object;

所述三元组获取模块，用于根据所述训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素；所述三元组元素包括参考样本、正样本和负样本；The triplet obtaining module is configured to train a convolutional neural network model according to the training sample, and generate a triple element corresponding to each training sample by using OHEM; the triplet element includes a reference sample, a positive sample, and a negative sample. ;

所述训练模块，用于根据各训练样本的三元组元素，基三元组损失函数的监督，训练所述卷积神经网络模型；该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数；The training module is configured to train the convolutional neural network model according to the triad element of each training sample and the supervision of the base triad loss function; the triad loss function is measured by a cosine distance A stochastic gradient descent algorithm to optimize model parameters;

所述验证模块，用于将验证集数据输入所述卷积神经网络模型，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络模型。The verification module is configured to input the verification set data into the convolutional neural network model, and when the training end condition is reached, obtain a trained convolutional neural network model for face authentication.

一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述的基于Triplet Loss的人脸认证方法的步骤。A computer apparatus comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the computer program to implement the steps of the above-described Triplet Loss-based face authentication method.

一种存储介质，其上存储有计算机程序，其特征在于，该计算机程序被处理器执行时，实现上述的基于Triplet Loss的人脸认证方法的步骤。A storage medium having stored thereon a computer program, characterized in that, when the computer program is executed by a processor, the steps of the above-described facelet authentication method based on Triplet Loss are implemented.

本发明所述的基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质，利用预先训练的卷积神经网络进行人脸认证，由于卷积神经网络模型基于三元组损失函数的监督训练得到，而场景人脸图像和证件人脸图像的相似度根据场景人脸图像对应的第一特征向量和证件人脸图像对应的第二特征向量的余弦距离计算得到，余弦距离衡量的是空间向量的夹角，更加体现在方向上的差异，从而更符合人脸特征空间的分布属性，提高了人脸认证的可靠性。The facelet authentication method, device, computer device and storage medium based on Triplet Loss according to the present invention use face-conflicted convolutional neural network for face authentication, and the convolutional neural network model is based on the supervision training of the triplet loss function. Obtaining, the similarity between the scene face image and the document face image is calculated according to the first feature vector corresponding to the scene face image and the cosine distance of the second feature vector corresponding to the document face image, and the cosine distance is a space vector The angle is more reflected in the direction, which is more in line with the distribution properties of the face feature space, which improves the reliability of face authentication.

附图说明DRAWINGS

图1为一个实施例的基于Triplet Loss的人脸认证***的结构示意图；1 is a schematic structural diagram of a facelet authentication system based on a Triplet Loss according to an embodiment;

图2为一个实施例中基于Triplet Loss的人脸认证方法的流程图；2 is a flow chart of a face authentication method based on Triplet Loss in an embodiment;

图3为一个实施例中训练得到用于人脸认证的卷积神经网络模型的步骤的流程图；3 is a flow chart showing the steps of training a convolutional neural network model for face authentication in one embodiment;

图4为在类间间隔一致、类内方差较大情况下，样本错分的概率示意图；Figure 4 is a schematic diagram showing the probability of sample misclassification in the case where the interval between classes is uniform and the variance within the class is large;

图5为在类间间隔一致、类内方差较小情况下，样本错分的概率示意图；Figure 5 is a schematic diagram showing the probability of sample misclassification in the case where the interval between classes is uniform and the variance within the class is small;

图6为一个实施例中基于Triplet Loss的人脸认证的迁移学习过程的示意图；6 is a schematic diagram of a migration learning process of face authentication based on Triplet Loss in an embodiment;

图7为一个实施例中用于人脸认证的卷积神经网络模型的结构示意图；7 is a schematic structural diagram of a convolutional neural network model for face authentication in an embodiment;

图8为一个实施例中基于Triplet Loss的人脸认证方法的流程示意图；FIG. 8 is a schematic flowchart of a face authentication method based on Triplet Loss in an embodiment; FIG.

图9为一个实施例中基于Triplet Loss的人脸认证装置的结构框图；9 is a structural block diagram of a face authentication device based on a Triplet Loss in an embodiment;

图10为另一个实施例中基于Triplet Loss的人脸认证装置的结构框图。FIG. 10 is a structural block diagram of a face authentication device based on Triplet Loss in another embodiment.

具体实施方式Detailed ways

图1为一个实施例的基于Triplet Loss的人脸认证***的结构示意图。如图1所示，人脸认证***包括服务器101和图像采集装置102。其中，服务器101与图像采集装置102网络连接。图像采集装置102采集待认证用户的实时场景照片，以及证件照片，并将采集的实时场景照片和证件照片发送至服务器101。服务器101判断场景照片的人物与证件照中的人物是否为同一人，对待认证用户的身份进行认证。基于具体的应用场景，图像采集装置102可以为摄像头，或是具有摄像功能的用户终端。以在开户现场为例，图像采集装置102可以为摄像头；以通过互联网进行金融账号开户为例，图像采集装置102可以为具有摄像功能的移动终端。FIG. 1 is a schematic structural diagram of a facelet authentication system based on a Triplet Loss according to an embodiment. As shown in FIG. 1, the face authentication system includes a server 101 and an image capture device 102. The server 101 is connected to the network of the image collection device 102. The image collection device 102 collects a real-time scene photo of the user to be authenticated, and a photo of the ID, and transmits the collected real-time scene photo and ID photo to the server 101. The server 101 determines whether the person in the scene photo and the person in the photo ID are the same person, and authenticates the identity of the authenticated user. Based on the specific application scenario, the image capture device 102 can be a camera or a user terminal having a camera function. For example, the image collection device 102 can be a camera; for example, the image collection device 102 can be a mobile terminal having an imaging function.

在其它的实施例中，人脸认证***还可以包括读卡器，用于读取证件(如身份证等)芯片内的证件照。In other embodiments, the face authentication system may further include a card reader for reading a photo of the ID in the chip (such as an ID card).

图2为一个实施例中基于Triplet Loss的人脸认证方法的流程图。如图2所示，该方法包括：2 is a flow chart of a face authentication method based on Triplet Loss in one embodiment. As shown in Figure 2, the method includes:

S202，基于人脸认证请求，获取证件照片和人物的场景照片。S202. Acquire a photo of the ID and a photo of the scene of the character based on the face authentication request.

其中，证件照片是指能够证明人物身份的证件所对应的照片，例如身份证上所印制的证件照或芯片内的证件照。证件照片的获取方式可以采用对证件进行拍照获取，也可以通过读卡器读取证件芯片所存储的证件照片。本实施例中的证件可以为身份证，驾驶证或社会保障卡等。Among them, the photo of the certificate refers to the photo corresponding to the document that can prove the identity of the person, such as the photo of the ID printed on the ID card or the photo of the ID in the chip. The way to obtain the photo of the ID card can be obtained by taking a photo of the document, or reading the photo of the ID stored in the ID chip through the card reader. The documents in this embodiment may be an identity card, a driver's license or a social security card.

人物的场景照片是指待认证用户在认证时所采集，该待认证用户在现场环境的照片。现场环境是指用户在拍照时的所处环境，现场环境不受限制。场景照片的获取方式可以为，利用具有摄像功能的移动终端采集场景照片并发送至服务器。The scene photo of the character is a photo taken by the user to be authenticated at the time of authentication, and the user to be authenticated is in the live environment. The on-site environment refers to the environment in which the user is taking pictures, and the on-site environment is not limited. The scene photo may be obtained by using a mobile terminal having a camera function to collect a scene photo and send it to the server.

人脸认证，是指对比现场采集的人物场景照片以及身份信息中的证件照片，判断是否为同一个人。人脸认证请求基于实际的应用操作触发，例如，基于用户的开户请求，触发人脸认证请求。应用程序在用户终端的显示界面提示用户进行照片的采集操作，并在照片采集完成后，将采集的照片发送至服务器，进行人脸认证。Face authentication refers to comparing the photos of the scenes collected by the scene and the photos of the IDs in the identity information to determine whether they are the same person. The face authentication request is triggered based on an actual application operation, for example, a face authentication request is triggered based on the user's account opening request. The application prompts the user to perform a photo collection operation on the display interface of the user terminal, and after the photo collection is completed, sends the collected photo to the server for face authentication.

S204，对场景照片和证件照片分别进行人脸检测、关键点定位和图像预处理，得到场景照片对应的场景人脸图像，以及证件照片对应的证件人脸图像。S204, performing face detection, key point positioning, and image preprocessing on the scene photo and the ID photo, respectively, obtaining a scene face image corresponding to the scene photo, and a document face image corresponding to the ID photo.

人脸检测是指识别照片并获取照片中的人脸区域。Face detection refers to recognizing a photo and obtaining a face area in the photo.

关键点定位，是指对照片中检测的人脸区域，获取人脸关键点在每幅照片中的位置。人脸关键点包括眼睛，鼻尖、嘴角尖、眉毛以及人脸各部件轮廓点。Key point positioning refers to the location of the face key detected in the photo and the position of the face key in each photo. Key points of the face include the eyes, the tip of the nose, the tip of the mouth, the eyebrows, and the outline points of the various parts of the face.

本实施例中，可采用基于多任务联合学习的级联卷积神经网络MTCNN方法同时完成人脸检测和人脸关键点检测，亦可采用基于LBP特征的人脸检测方法和基于形状回归的人脸关键点检测方法。In this embodiment, the cascading convolutional neural network (MTCNN) method based on multi-task joint learning can be used to simultaneously perform face detection and face key point detection, and face detection methods based on LBP features and people based on shape regression can also be used. Face key point detection method.

图像预处理是指将根据检测的人脸关键点在每张图片中的位置，进行人像对齐和剪切处理，从而得到尺寸归一化的场景人脸图像和证件人脸图像。其中，场景人脸图像是指对场景照片进行人脸检测、关键点定位和图像预处理后得到的人脸图像，证件人脸图像是指对证件照片进行人脸检测、关键点定位和图像预处理后得到的人脸图像。Image pre-processing refers to performing portrait alignment and cropping processing according to the position of the detected face key point in each picture, thereby obtaining a size-normalized scene face image and a document face image. The scene face image refers to the face image obtained by performing face detection, key point positioning and image preprocessing on the scene photo. The face image of the document refers to face detection, key point positioning and image pre-preparation of the document photo. The face image obtained after processing.

S206，将场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取卷积神经网络模型输出的场景人脸图像对应的第一特征向量，以及证件人脸图像对应的第二特征向量。S206. Input the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and obtain a first feature vector corresponding to the scene face image output by the convolutional neural network model, And a second feature vector corresponding to the document face image.

其中，卷积神经网络模型基于三元组损失函数的监督预先根据训练样本提前训练好的。卷积神经网络包括卷积层、池化层、激活函数层和全连接层，每层的各个神经元参数通过训练确定。利用训练好的卷积神经网络，通过网络前向传播，获取卷积神经网络模型的全连接层输出的场景人脸图像的第一特征向量，以及证件人脸图像对应的第二特征向量。Among them, the convolutional neural network model based on the supervision of the triplet loss function is pre-trained according to the training samples in advance. The convolutional neural network includes a convolutional layer, a pooling layer, an activation function layer, and a fully connected layer, and each neuron parameter of each layer is determined by training. Using the trained convolutional neural network, through the network forward propagation, the first feature vector of the scene face image output by the fully connected layer of the convolutional neural network model and the second feature vector corresponding to the document face image are obtained.

三元组(triplet)是指从训练数据集中随机选一个样本，该样本称为参考样本，然后再随机选取一个和参考样本属于同一人的样本作为正样本，选取不属于同一人的样本作为负样本，由此构成一个(参考样本、正样本、负样本)三元组。由于人证比对主要是基于证件照与场景照的比对，而不是证件照与证件照、或者场景照与场景照的比对，因此三元组的模式主要有两种组合：以证件照图像为参考样本时，正样本和负样本均为场景照；以场景照图像为参考样本时，正样本和负样本均为证件照。A triplet refers to randomly selecting a sample from a training data set. The sample is called a reference sample, and then randomly selects a sample that belongs to the same person as the reference sample as a positive sample, and selects a sample that does not belong to the same person as a negative sample. The sample thus constitutes a (reference sample, positive sample, negative sample) triplet. Since the comparison of the witnesses is mainly based on the comparison between the photo of the document and the scene, rather than the comparison of the photo of the document and the photo of the document, or the comparison of the scene and the scene, the three-tuple model has two main combinations: When the image is a reference sample, both the positive sample and the negative sample are scene photos; when the scene image is used as the reference sample, both the positive sample and the negative sample are passport photos.

针对三元组中的每个样本，训练一个参数共享的网络，得到三个元素的特征表达。改进三元组损失(triplet loss)的目的就是通过学习，让参考样本和正样本的特征表达之间的距离尽可能小，而参考样本和负样本的特征表达之间的距离尽可能大，并且要让参考样本和正样本的特征表达之间的距离和参考样本和负样本的特征表达之间的距离之间有一个最小的间隔。For each sample in the triple, a network shared by parameters is trained to obtain the feature representation of the three elements. The purpose of improving the triplet loss is to learn to make the distance between the reference expression of the reference sample and the positive sample as small as possible, and the distance between the feature expression of the reference sample and the negative sample is as large as possible, and There is a minimum spacing between the distance between the feature representation of the reference sample and the positive sample and the distance between the reference sample and the feature representation of the negative sample.

S208，计算第一特征向量和第二特征向量的余弦距离。S208. Calculate a cosine distance of the first feature vector and the second feature vector.

余弦距离，也称为余弦相似度，是用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。第一特征向量和第二特征向量的余弦距离越大，表示场景人脸图像和证件人脸图像的相似度越大，第一特征向量和第二特征向量的余弦距离越小，表示场景人脸图像和证件人脸图像的相似度越小。当场景人脸图像和证件人脸图像的余弦距离越接收于1时，两张图像属于同一人的机率越大，当场景人脸图像和证件人脸图像的余弦距离越小，两张图像属于同一人的机率越小。The cosine distance, also known as the cosine similarity, is a measure of the magnitude of the difference between two individuals using the cosine of the angle between the two vectors in the vector space. The larger the cosine distance of the first feature vector and the second feature vector is, the larger the similarity between the scene face image and the document face image is, and the smaller the cosine distance of the first feature vector and the second feature vector is, indicating the scene face. The similarity between the image and the document face image is smaller. When the cosine distance of the scene face image and the document face image is received at 1, the probability that the two images belong to the same person is larger, and the smaller the cosine distance of the scene face image and the document face image, the two images belong to The probability of the same person is smaller.

传统的三元组损失(triplet loss)方法中，使用欧式距离来度量样本之间的相似度。而欧氏距离衡量的是空间各点的绝对距离，跟各个点所在的位置坐标直接相关，这并不符合人脸特征空间的分布属性。本实施例中，考虑人脸特征空间的分布属性和实际应用场景，采用余弦距离来度量样本之间的相似度。余弦距离衡量的是空间向量的夹角，更加体现在方向上的差异，而不是位置，从而更符合人脸特征空间的分布属性。In the traditional triplet loss method, the Euclidean distance is used to measure the similarity between samples. The Euclidean distance measures the absolute distance of each point in the space, which is directly related to the position coordinates of each point, which does not conform to the distribution property of the face feature space. In this embodiment, considering the distribution attribute of the face feature space and the actual application scenario, the cosine distance is used to measure the similarity between the samples. The cosine distance measures the angle between the space vectors, which is more reflected in the direction, not the position, which is more in line with the distribution properties of the face feature space.

具体地，余弦距离的计算公式为：Specifically, the formula for calculating the cosine distance is:

其中，x表示第一特征向量，y表示第二特征向量。Where x represents the first feature vector and y represents the second feature vector.

S210，比较余弦距离和预设阈值，并根据比较结果确定人脸认证结果。S210: Compare the cosine distance and the preset threshold, and determine a face authentication result according to the comparison result.

认证结果包括认证通过，即证件照片和场景照片属于同一人。认证结果还包括认证失败，即证件照片和场景照片不属于同一人。The result of the certification includes the passing of the certification, that is, the photo of the certificate and the photo of the scene belong to the same person. The result of the certification also includes the failure of the certification, ie the photo of the ID and the photo of the scene do not belong to the same person.

具体地，将余弦距离与预设阈值进行比较，当余弦距离大于预设阈值时，表示即证件照片与场景照片的相似度大于预设阈值，认证成功，当余弦距离小于预设阈值时，表示即证件照片与场景照片的相似度小于预设阈值，认证失败。Specifically, comparing the cosine distance with the preset threshold, when the cosine distance is greater than the preset threshold, indicating that the similarity between the photo of the document and the scene photo is greater than a preset threshold, the authentication is successful, and when the cosine distance is less than the preset threshold, the representation is That is, the similarity between the photo of the document and the photo of the scene is less than a preset threshold, and the authentication fails.

上述的基于Triplet Loss的人脸认证方法，利用预先训练的卷积神经网络进行人脸认证，由于卷积神经网络模型基于三元组损失函数的监督训练得到，而场景人脸图像和证件人脸图像的相似度根据场景人脸图像对应的第一特征向量和证件人脸图像对应的第二特征向量的余弦距离计算得到，余弦距离衡量的是空间向量的夹角，更加体现在方向上的差异，从而更符合人脸特征空间的分布属性，提高了人脸认证的可靠性。The above-mentioned facelet authentication method based on Triplet Loss uses a pre-trained convolutional neural network for face authentication, and the convolutional neural network model is obtained based on the supervision training of the triplet loss function, and the scene face image and the document face are obtained. The similarity of the image is calculated according to the first feature vector corresponding to the scene face image and the cosine distance of the second feature vector corresponding to the document face image. The cosine distance measures the angle between the space vectors and is more reflected in the direction difference. Therefore, it is more in line with the distribution attribute of the face feature space, which improves the reliability of face authentication.

在另一个实施例中，人脸认证方法还包括训练得到用于人脸认证的卷积神经网络模型的步骤。图3为一个实施例中训练得到用于人脸认证的卷积神经网络模型的步骤的流程图。如图3所示，该步骤包括：In another embodiment, the face authentication method further includes the step of training to obtain a convolutional neural network model for face authentication. 3 is a flow diagram of the steps of training a convolutional neural network model for face authentication in one embodiment. As shown in Figure 3, this step includes:

S302，获取带标记的训练样本，训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像。S302. Acquire a labeled training sample, where the training sample includes a document face image and at least one scene face image that are marked for each mark object.

本实施例中，标记对象即人，训练样本以人为单位，标记了同属于一个人的场景人脸图像和证件人脸图像。具体地，场景人脸图像和证件人脸图像可通过对带标记的场景照片和证件照片进行人脸检测、关键点定位和图像预处理得到。In this embodiment, the marker object is a person, and the training sample is marked in person, and the scene face image and the document face image belonging to the same person are marked. Specifically, the scene face image and the document face image can be obtained by performing face detection, key point positioning, and image preprocessing on the marked scene photo and the ID photo.

本实施例中，可采用基于多任务联合学习的级联卷积神经网络MTCNN方法同时完成人脸检测和人脸关键，亦可采用基于LBP特征的人脸检测方法和基于形状回归的人脸关键点检测方法。In this embodiment, the cascading convolutional neural network (MTCNN) method based on multi-task joint learning can be used to simultaneously perform face detection and face key, and the face detection method based on LBP feature and the face key based on shape regression can also be used. Point detection method.

图像预处理是指将根据检测的人脸关键点在每张图片中的位置，进行人像对齐和剪切处理，从而得到尺寸归一化场景人脸图像和证件人脸图像。其中，场景人脸图像是指对场景照片进行人脸检测、关键点定位和图像预处理后得到的人脸图像，证件人脸图像是指对证件照片进行人脸检测、关键点定位和图像预处理后得到的人脸图像。Image preprocessing refers to performing portrait alignment and cropping processing according to the position of the detected face key point in each picture, thereby obtaining a size normalized scene face image and a document face image. The scene face image refers to the face image obtained by performing face detection, key point positioning and image preprocessing on the scene photo. The face image of the document refers to face detection, key point positioning and image pre-preparation of the document photo. The face image obtained after processing.

S304，根据训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素；三元组元素包括参考样本、正样本和负样本。S304. Train a convolutional neural network model according to the training sample, and generate a triple element corresponding to each training sample by using OHEM; the triplet element includes a reference sample, a positive sample, and a negative sample.

三元组有两种组合方式：以证件照图像为参考样本时，正样本和负样本均为场景照图像；以场景照图像为参考样本时，正样本和负样本均为证件照图像。There are two combinations of triads: when the ID photo is used as the reference sample, the positive and negative samples are scene images; when the scene image is used as the reference sample, both the positive and negative samples are ID images.

具体地，以证件照为参考图像为例，从训练数据集中随机选一个人的证件照样本，该样本称为参考样本，然后再随机选取一个和参考样本属于同一人的场景照样本作为正样本，选取不属于同一人的场景照样本作为负样本，由此构成一个(参考样本、正样本、负样本)三元组。Specifically, taking the document as a reference image as an example, randomly selecting a person's certificate photo sample from the training data set, the sample is called a reference sample, and then randomly selecting a scene photo sample that belongs to the same person as the reference sample as a positive sample. Select a scene sample that does not belong to the same person as a negative sample, thereby forming a (reference sample, positive sample, negative sample) triplet.

即正样本与参考样本为同类样本，即属于同一人图像。负样本是参考样本的异类样本，即不属于同一人的图像。其中，三元组元素中的参考样本和正样本是训练样本中已标记的，负样本在卷积神经网络的训练过程中，采用OHEM(Online Hard Example Mining)策略在线构造三元组，即在网络每次迭代优化的过程中，利用当前网络对候选三元组进行前向计算，选择训练样本中与参考样本不属于同一用户，且余弦距离最近的图像作为负样本，从而得到各训练样本对应的三元组元素。That is, the positive sample and the reference sample are the same kind of samples, that is, belong to the same person image. A negative sample is a heterogeneous sample of a reference sample, that is, an image that does not belong to the same person. Among them, the reference sample and the positive sample in the triple element are labeled in the training sample, and the negative sample is constructed in the convolutional neural network. The OHEM (Online Hard Example Mining) strategy is used to construct the triplet online, that is, in the network. In each iterative optimization process, the current network is used to perform forward calculation on the candidate triplet, and the image in the training sample that does not belong to the same user as the reference sample is selected, and the image with the closest cosine distance is used as the negative sample, thereby obtaining corresponding training samples. Triple element.

一个实施例中，根据训练样本训练卷积神经网络，并产生各训练样本对应的三元组元素的步骤，包括以下步骤S1和S2：In one embodiment, the step of training the convolutional neural network based on the training samples and generating the corresponding triple elements of each training sample comprises the following steps S1 and S2:

S1：随机选择一个图像作为参考样本，选择属于同一标签对象、与参考样本类别不同的图像作为正样本。S1: randomly select an image as a reference sample, and select an image belonging to the same label object and different from the reference sample category as a positive sample.

类别是指所属的图像类型，本实施例中，训练样本的类别包括场景人脸图像和证件人脸图像。因为人脸认证主要是证件照和场景照之间的对比，因此，参考样本和正样本应当属于不同的类别，若参考样本为场景人脸图像，则正样本为证件人脸图像；若参考样本为证件人脸图像，则正样本为场景人脸图像。The category refers to the type of image to which it belongs. In this embodiment, the category of the training sample includes a scene face image and a document face image. Because the face authentication is mainly the comparison between the document photo and the scene photo, the reference sample and the positive sample should belong to different categories. If the reference sample is the scene face image, the positive sample is the document face image; if the reference sample is the document For the face image, the positive sample is the scene face image.

S2：根据OHEM策略，利用当前训练的卷积神经网络模型提取特征之间的余弦距离，对于每一个参考样本，从其它不属于同一标签对象的图像中，选择距离最小、与参考样本属于不同类别的图像，作为该参考样本的负样本。S2: According to the OHEM strategy, the currently trained convolutional neural network model is used to extract the cosine distance between the features. For each reference sample, from other images that do not belong to the same tag object, the selection distance is the smallest, and the reference sample belongs to different categories. The image as a negative sample of the reference sample.

负样本从与参考样本不属于同一人的标签的人脸图像中选择，具体地，负样本在卷积神经网络的训练过程中，采用OHEM策略在线构造三元组，即在网络每次迭代优化的过程中，利用当前网络对候选三元组进行前向计算，选择训练样本中与参考样本不属于同一用户，且余弦距离最近、与参考样本不属于同一类别的图像作为负样本。即，负样本与参考样本的类别不同。可以认为，三元组中若以证件照为参考样本，则正样本和负样本均是场景照；反之若以场景照为参考样本，则另外正样本和负样本均是证件照。The negative sample is selected from the face image of the label that does not belong to the same person as the reference sample. Specifically, in the training process of the convolutional neural network, the negative sample uses the OHEM strategy to construct the triplet online, that is, optimization in each iteration of the network. In the process of using the current network, the candidate triples are forwardly calculated, and the images in the training samples that do not belong to the same user as the reference samples and whose cosine distance is closest and do not belong to the same category as the reference samples are selected as negative samples. That is, the negative sample is different from the reference sample. It can be considered that if the documentary photo is taken as a reference sample in the triplet, both the positive sample and the negative sample are scene photos; otherwise, if the scene is taken as the reference sample, then the other positive and negative samples are the identity photos.

S306，根据各训练样本的三元组元素，基于三元组损失函数的监督，训练卷积神经网络模型，该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数。S306, according to the triple element of each training sample, based on the supervision of the triplet loss function, training the convolutional neural network model, the triad loss function, using a cosine distance as a measurement method, and optimizing the model by a stochastic gradient descent algorithm parameter.

人证核验终端通过比对用户证件芯片照与场景照是否一致来对用户身份进行验证，后台采集到的数据往往是单个人的样本只有两张图，即证件照与比对时刻抓拍到的场景照，而不同个体的数量却可以成千上万。这种类别数量较大而同类样例少的数据如果用基于分类的方法来进行训练，分类层参数会过于庞大而导致网络非常难以学习，因此考虑用度量学习的方法来解决。其中度量学习的典型的一般是用三元组损失(triplet loss)方法，通过构造图像三元组来学习一种有效的特征映射，在该映射下同类样本的特征距离小于异类样本的特征距离，从而达到正确比对的目的。The human verification terminal verifies the user identity by comparing the user ID chip photo with the scene photo. The data collected in the background is usually a single person's sample with only two pictures, that is, the photo taken and the scene captured at the comparison time. Photo, but the number of different individuals can be thousands. If the data with such a large number of categories and few similar samples is trained by the classification-based method, the classification layer parameters will be too large and the network is very difficult to learn, so consider using the metric learning method. The typical measurement learning is generally to use the triplet loss method to construct an effective feature map by constructing an image triplet. Under this mapping, the feature distance of the same sample is smaller than the feature distance of the heterogeneous sample. Thereby achieving the purpose of correct comparison.

三元组损失(triplet loss)的目的就是通过学习，让参考样本和正样本的特征表达之间的距离尽可能小，而参考样本和负样本的特征表达之间的距离尽可能大，并且要让参考样本和正样本的特征表达之间的距离和参考样本和负样本的特征表达之间的距离之间有一个最小的间隔。The purpose of the triplet loss is to make the distance between the feature expressions of the reference sample and the positive sample as small as possible, and the distance between the feature expressions of the reference sample and the negative sample is as large as possible, and There is a minimum separation between the distance between the reference expression of the reference sample and the positive sample and the distance between the reference expression of the reference sample and the negative sample.

在另一个实施例中，三元组损失函数包括对同类样本的余弦距离的限定，以及对异类样本的余弦距离的限定。In another embodiment, the triplet loss function includes a definition of the cosine distance of a homogeneous sample and a definition of the cosine distance of the heterogeneous sample.

其中，同类样本是指参考样本和正样本，异类样本是指参考样本和负样本。同类样本的余弦距离是指参考样本和正样本的余弦距离，异类样本的余弦距离是指参考样本和负样本的余弦距离。Among them, the same type of sample refers to the reference sample and the positive sample, and the heterogeneous sample refers to the reference sample and the negative sample. The cosine distance of a similar sample refers to the cosine distance of the reference sample and the positive sample, and the cosine distance of the heterogeneous sample refers to the cosine distance of the reference sample and the negative sample.

一方面，原始的triplet loss方法只是考虑了类间差距而没有考虑类内差距，如果类内分布不够聚敛，网络的泛化能力就会减弱，对场景适应性也会随之降低。另一方面，原始的tripletloss方法采用的是欧式距离来度量样本之间的相似度，实际上人脸模型部署后在特征比对环节，更多地会采用余弦距离来进行度量。欧氏距离衡量的是空间各点的绝对距离，跟各个点所在的位置坐标直接相关；而余弦距离衡量的是空间向量的夹角，更加体现在方向上的差异，而不是位置，从而更符合人脸特征空间的分布属性。On the one hand, the original triplet loss method only considers the gap between classes and does not consider the intra-class gap. If the distribution within the class is not enough, the generalization ability of the network will be weakened, and the adaptability to the scene will also decrease. On the other hand, the original tripletloss method uses Euclidean distance to measure the similarity between samples. In fact, after the face model is deployed, the cosine distance is used to measure more in the feature comparison. The Euclidean distance measures the absolute distance of each point in the space, which is directly related to the position coordinates of each point. The cosine distance measures the angle between the space vectors, which is more reflected in the direction, not the position, which is more in line with The distribution property of the face feature space.

采用triplet loss方法，通过在线构造三元组数据输入网络，然后反向传播三元组的度量损失来进行迭代优化。每一个三元组包含三张图像，分别是一个参考样本，一个与参考样本同类的正样本，以及一个与参考样本异类的负样本，标记为(anchor,positive,negative)。原始triplet loss的基本思想是，通过度量学习使得参考样本与正样本之间的距离小于参考样本与负样本之间的距离，并且距离之差大于一个最小间隔参数α。因此原始的triplet loss损失函数如下：The triplet loss method is used to perform iterative optimization by constructing a triplet data input network online and then backpropagating the metric loss of the triple. Each triple contains three images, one reference sample, one positive sample of the same kind as the reference sample, and one negative sample that is heterogeneous to the reference sample, labeled (anchor, positive, negative). The basic idea of the original triplet loss is that the distance between the reference sample and the positive sample is made smaller than the distance between the reference sample and the negative sample by metric learning, and the difference between the distances is greater than a minimum interval parameter α. So the original triplet loss loss function is as follows:

其中，N是三元组数量，

表示参考样本(anchor)的特征向量，

表示同类正样本(positive)的特征向量，

表示异类负样本(negative)的特征向量。

表示L2范式，即欧氏距离。[·] ₊的含义如下：

Where N is the number of triples,

a feature vector representing a reference sample,

a eigenvector representing a positive sample of its kind,

A feature vector representing a heterogeneous negative sample.

Represents the L2 paradigm, the Euclidean distance. The meaning of [·] ₊ is as follows:

从上式可看出，原始的triplet loss函数只限定了同类样本(anchor,positive)与异类样本(anchor,negative)之间的距离，即通过间隔参数α尽可能增大类间距离，而对类内距离未作任何限定，即对同类样本之间的距离未作任何约束。如果类内距离比较分散，方差过大，网络的泛化能力就会减弱，样本被错分的概率就会更大。图4为在类间间隔一致、类内方差较大情况下，样本错分的概率示意图，图5为在类间间隔一致、类内方差较小情况下，样本错分的概率示意图，如图4和图5所示，阴影部分表示样本错分的概率，在类间间隔一致、类内方差较大情况下，样本错分的概率明显大于类间间隔一致、类内方差较小情况下，样本错分的概率。As can be seen from the above formula, the original triplet loss function only defines the distance between the same sample (anchor, positive) and the heterogeneous sample (anchor, negative), that is, the interval between the classes is increased as much as possible by the interval parameter α, and The intra-class distance is not limited, that is, there is no constraint on the distance between similar samples. If the distance within the class is scattered and the variance is too large, the generalization ability of the network will be weakened, and the probability that the sample will be misclassified will be greater. Figure 4 is a schematic diagram showing the probability of sample misclassification when the interval between classes is uniform and the variance within the class is large. Figure 5 is a probability diagram of the probability of sample misclassification when the interval between classes is uniform and the variance within the class is small. 4 and Figure 5, the shaded part indicates the probability of sample misclassification. When the interval between classes is uniform and the variance within the class is large, the probability of sample misclassification is significantly larger than the interval between classes, and the intra-class variance is small. The probability of a sample being misclassified.

针对上述问题，本发明提出改进的triplet loss方法，一方面保留了原始方法中对类间距离的限定，同时增加了对类内距离的约束项，使得类内距离尽可能聚敛。其loss函数表达式为：In view of the above problems, the present invention proposes an improved triplet loss method, which on the one hand retains the limitation of the distance between classes in the original method, and increases the constraint on the distance within the class, so that the intra-class distance is as concentrated as possible. Its loss function expression is:

其中，cos(·)表示余弦距离，其计算方式为

N是三元组数量，

表示参考样本的特征向量，

表示同类正样本的特征向量，

表示异类负样本的特征向量，[·] ₊的含义如下：

N is the number of triples,

Represents the feature vector of the reference sample,

a eigenvector representing a positive sample of its kind,

相比原始的triplet loss函数，改进后的triplet loss函数的度量方式由欧氏距离改为余弦距离，这样可以保持训练阶段与部署阶段度量方式的一致性，提高特征学习的连续性。同时新的triplet loss函数第一项与原始的triplet loss作用一致，用于增大类间差距，第二项添加了对同类样本对(正元组)的距离约束，用于缩小类内差距。α ₁为类间间隔参数，取值范围为0～0.2，α ₂为类内间隔参数，取值范围为0.8～1.0。值得注意的是，由于是用余弦方式度量，得到的度量值对应两个样本之间的相似度，因此在

表达式中，只有负元组余弦相似度在α ₁范围内大于正元组余弦相似度的样本，才会真正参与训练。 Compared with the original triplet loss function, the improved triplet loss function is changed from Euclidean distance to cosine distance, which can keep the consistency between the training phase and the deployment phase and improve the continuity of feature learning. At the same time, the first item of the new triplet loss function is consistent with the original triplet loss, which is used to increase the gap between classes. The second item adds the distance constraint on the same sample pair (orthogonal group), which is used to narrow the intra-class gap. α ₁ is an inter-class interval parameter, which ranges from 0 to 0.2, and α ₂ is an intra-class interval parameter, ranging from 0.8 to 1.0. It is worth noting that since the metric is measured by the cosine method, the obtained metric corresponds to the similarity between the two samples, so

In the expression, only the samples with the cosine similarity of the negative tuple in the range of α ₁ greater than the cosine similarity of the positive tuple will actually participate in the training.

基于改进后的三元组损失函数来训练模型，通过类间损失与类内损失的联合约束来对模型进行反向传播的优化训练，使得同类样本在特征空间尽可能接近而异类样本在特征空间尽可能远离，提高模型的辨识力，从而提高人脸认证的可靠性。The model is trained based on the improved ternary loss function, and the back-propagation optimization training of the model is carried out through the joint constraint between the loss between classes and the loss within the class, so that the similar samples are as close as possible in the feature space and the heterogeneous samples are in the feature space. Keep it as far as possible to improve the recognition of the model, thus improving the reliability of face authentication.

S308，将验证集数据输入卷积神经网络，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络。S308. Enter the verification set data into the convolutional neural network, and when the training end condition is reached, obtain a trained convolutional neural network for face authentication.

具体地，从人证图像数据池中取90％数据作为训练集，剩余10％作为验证集。基于上式计算出改进后的triplet loss值，反馈到卷积神经网络中进行迭代优化。同时观测模型在验证集中的性能表现，当验证性能不再升高时，模型达到收敛状态，训练阶段终止。Specifically, 90% of the data is taken from the pool of human image data as a training set, and the remaining 10% is used as a verification set. The improved triplet loss value is calculated based on the above formula and fed back to the convolutional neural network for iterative optimization. At the same time, the performance of the observation model in the verification set, when the verification performance is no longer elevated, the model reaches a convergence state, and the training phase is terminated.

上述的人脸认证方法，一方面在原始triplet loss的损失函数中增加了对类内样本距离的约束，从而在增大类间差距的同时减小类内差距，提升模型的泛化能力；另一方面，将原始triplet loss的度量方式由欧氏距离改为余弦距离，保持训练与部署的度量一致性，提高特征学习的连续性。The above-mentioned face authentication method, on the one hand, increases the constraint on the sample distance within the class in the loss function of the original triplet loss, thereby reducing the intra-class gap and increasing the generalization ability of the model while increasing the inter-class gap; On the one hand, the original triplet loss measurement method is changed from Euclidean distance to cosine distance, keeping the consistency of training and deployment metrics, and improving the continuity of feature learning.

在另一个实施例中，训练卷积神经网络的步骤还包括：利用基于海量开源人脸数据训练好的基础模型参数进行初始化，在特征输出层后添加归一化层及改进后的三元组损失函数层，得到待训练的卷积神经网络。In another embodiment, the step of training the convolutional neural network further comprises: initializing the basic model parameters trained based on the massive open source face data, adding a normalized layer and the improved triplet after the feature output layer Loss function layer, get the convolutional neural network to be trained.

具体地，在用深度学习解决人证合一问题时，常规的基于互联网海量人脸数据训练得到的深度人脸识别模型在特定场景下的人证比对应用上性能会大幅下降，而特定应用场景下的人证数据来源又比较有限，直接地学习往往由于样本不足导致训练结果不理想，因此极需要研发一种有效地针对小数据集的场景数据进行扩展训练的方法，以提升人脸识别模型在特定应用场景下的准确率，满足市场应用需求。Specifically, when deep learning is used to solve the problem of human-integration, the deep face recognition model based on the conventional Internet-based massive face data training will greatly reduce the performance of the human-environment comparison application in a specific scenario, but the specific application The source of human witness data in the scenario is limited. Direct learning often results in unsatisfactory training results due to insufficient samples. Therefore, it is extremely necessary to develop a method for effectively expanding the training of small data sets to enhance face recognition. The accuracy of the model in a specific application scenario meets the needs of market applications.

深度学习算法往往依赖于海量数据的训练，在人证合一应用中，证件照与场景照比对属于异质样本比对问题，常规的基于海量互联网人脸数据训练得到的深度人脸识别模型在人证比对应用上性能会大幅下降。然而人证数据来源有限(需要同时具备同一个人的身份证图像及相应的场景图像)，可用于训练的数据量较少，直接训练会由于样本不足导致训练效果不理想，因此在运用深度学习进行人证合一的模型训练时，往往是利用迁移学习的思想，先基于海量的互联网人脸数据训练一个在开源测试集上性能可靠的基础模型，然后再在有限的人证数据上进行二次扩展训练，使模型能自动学习特定模态的特征表示，提升模型性能。此过程如图6所示。Deep learning algorithms often rely on the training of massive data. In the application of human-integration, the comparison between document photos and scenes is a heterogeneous sample comparison problem. The conventional deep face recognition model based on massive Internet face data training. Performance will drop significantly in the comparison of applications. However, the source of human data is limited (requires the same person's ID card image and corresponding scene image), and the amount of data that can be used for training is small. Direct training may result in poor training results due to insufficient samples, so deep learning is used. When training the model of humanity and syndrome, it is often the idea of using migration learning. Firstly, based on massive Internet face data, a basic model with reliable performance on the open source test set is trained, and then it is repeated twice on the limited person data. Extended training enables the model to automatically learn the feature representation of a particular modality and improve model performance. This process is shown in Figure 6.

在二次训练的过程中，整个网络用预训练好的基础模型参数进行初始化，然后在网络的特征输出层之后添加一个L2归一化层以及改进后的triplet loss层，待训练的卷积神经网络结构图如图7所示。In the process of secondary training, the entire network is initialized with pre-trained basic model parameters, and then an L2 normalization layer and an improved triplet loss layer are added after the feature output layer of the network, and the convolutional nerve to be trained The network structure diagram is shown in Figure 7.

一个实施例中，一种人脸认证方法的流程示意图如图8所示，包括三个阶段，分别为数据采集与预处理阶段、训练阶段和部署阶段。In one embodiment, a schematic diagram of a face authentication method is shown in FIG. 8 and includes three phases, namely, a data acquisition and preprocessing phase, a training phase, and a deployment phase.

数据采集与预处理阶段，由人证核验终端设备的读卡器模块读取证件芯片照，以及前置摄像头抓取现场照片，经过人脸检测器、关键点检测器、人脸对齐与剪切模块之后得到尺寸归一化的证件人脸图像和场景人脸图像。In the data acquisition and preprocessing stage, the card reader module of the human verification terminal device reads the ID card photo, and the front camera captures the live photo, and passes through the face detector, key point detector, face alignment and cutting. The module is then obtained with a normalized ID face image and a scene face image.

训练阶段，从人证图像数据池中取90％数据作为训练集，剩余10％作为验证集。由于人证比对主要是证件照与场景照之间的比对，因为三元组中若以证件照为参考图(anchor),则另外两张图均是场景照；反之若以场景照为参考图，则另外两张图均是证件照。采用OHEM 在线构造三元组的策略，即在网络每次迭代优化的过程中，利用当前网络对候选三元组进行前向计算，筛选满足条件的有效三元组，按照上式计算出改进后的triplet loss值，反馈到网络中进行迭代优化。同时观测模型在验证集中的性能表现，当验证性能不再升高时，模型达到收敛状态，训练阶段终止。In the training phase, 90% of the data from the human image data pool is used as the training set, and the remaining 10% is used as the verification set. Since the comparison of the person's card is mainly the comparison between the photo of the document and the scene photo, if the photo is taken as an anchor in the triad, the other two pictures are scene photos; otherwise, if the scene is taken as a reference Figure, the other two pictures are photo ID. The strategy of constructing triples on-line using OHEM is to use the current network to perform forward calculation on the candidate triples in the process of optimization of each iteration of the network, and to filter the effective triples satisfying the conditions, and calculate the improved according to the above formula. The value of the triplet loss is fed back into the network for iterative optimization. At the same time, the performance of the observation model in the verification set, when the verification performance is no longer elevated, the model reaches a convergence state, and the training phase is terminated.

部署阶段，将训练好的模型部署到人证核验终端进行使用时，设备采集到的图像经过与训练阶段相同的预处理程序，然后通过网络前向计算得到每张人脸图像的特征向量，通过计算余弦距离得到两张图像的相似度，然后根据预设阈值进行判决，大于预设阈值的为同一人，反之为不同人。During the deployment phase, when the trained model is deployed to the human verification terminal for use, the image acquired by the device passes through the same pre-processing procedure as the training phase, and then the feature vector of each face image is obtained through the network forward calculation. The cosine distance is calculated to obtain the similarity between the two images, and then the judgment is performed according to the preset threshold, and the same person is greater than the preset threshold, and vice versa.

上述的人脸认证方法，原始triplet loss函数只限定了类间距离的学习关系，上述的人脸认证方法，通过改进原始triplet loss损失函数增加了类内距离的约束项，可以使得网络在训练过程中增大类间差距的同时尽可能减小类内差距，从而提高网络的泛化能力，进而提升模型的场景适应性。另外，用余弦距离替代了原始triplet loss中的欧氏距离度量方式，更符合人脸特征空间的分布属性，保持了训练阶段与部署阶段度量方式的一致性，使得比对结果更加可靠。In the above face authentication method, the original triplet loss function only defines the learning relationship of the distance between classes. The above face authentication method increases the constraint of the intra-class distance by improving the original triplet loss loss function, which can make the network in the training process. In the process of increasing the gap between classes, the intra-class gap is minimized, thereby improving the generalization ability of the network, and thus improving the adaptability of the model. In addition, the Euclidean distance is used to replace the Euclidean distance metric in the original triplet loss, which is more consistent with the distribution property of the face feature space, and maintains the consistency between the training phase and the deployment phase, making the comparison result more reliable.

在一个实施例中，提供一种人脸认证装置，如图9所示，包括：图像获取模块902、图像预处理模块904、特征获取模块906、计算模块908和认证模块910。In an embodiment, a face authentication device is provided, as shown in FIG. 9, comprising: an image acquisition module 902, an image preprocessing module 904, a feature acquisition module 906, a calculation module 908, and an authentication module 910.

图像获取模块902，用于基于人脸认证请求，获取证件照片和人物的场景照片。The image obtaining module 902 is configured to obtain a photo of the document and a photo of the scene of the person based on the face authentication request.

图像预处理模块904，用于对场景照片和证件照片分别进行人脸检测、关键点定位和图像预处理，得到场景照片对应的场景人脸图像，以及证件照片对应的证件人脸图像。The image pre-processing module 904 is configured to perform face detection, key point positioning, and image pre-processing on the scene photo and the ID photo, respectively, to obtain a scene face image corresponding to the scene photo, and a document face image corresponding to the ID photo.

特征获取模块906，用于将场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取卷积神经网络模型输出的场景人脸图像对应的第一特征向量，以及证件人脸图像对应的第二特征向量；其中，卷积神经网络模型基于三元组损失函数的监督训练得到。The feature acquisition module 906 is configured to input the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and obtain a corresponding scene face image output by the convolutional neural network model. The first feature vector and the second feature vector corresponding to the document face image; wherein the convolutional neural network model is obtained based on the supervised training of the triple loss function.

计算模块908，用于计算第一特征向量和第二特征向量的余弦距离。The calculation module 908 is configured to calculate a cosine distance of the first feature vector and the second feature vector.

认证模块910，用于比较余弦距离和预设阈值，并根据比较结果确定人脸认证结果。The authentication module 910 is configured to compare the cosine distance and the preset threshold, and determine a face authentication result according to the comparison result.

上述的人脸认证装置，利用预先训练的卷积神经网络进行人脸认证，由于卷积神经网络模型基于改进后的三元组损失函数的监督训练得到，而场景人脸图像和证件人脸图像的相似度根据场景人脸图像对应的第一特征向量和证件人脸图像对应的第二特征向量的余弦距离计算得到，余弦距离衡量的是空间向量的夹角，更加体现在方向上的差异，而不是位置，从而更符合人脸特征空间的分布属性，提高了人脸认证的可靠性。The face authentication device described above performs face authentication using a pre-trained convolutional neural network, and the convolutional neural network model is obtained based on the supervised training of the improved triad loss function, and the scene face image and the document face image are obtained. The similarity is calculated according to the first feature vector corresponding to the scene face image and the cosine distance of the second feature vector corresponding to the document face image. The cosine distance measures the angle between the space vectors and is more reflected in the direction difference. Instead of position, it is more in line with the distribution properties of the face feature space, which improves the reliability of face authentication.

如图9所示，在另一个实施例中，人脸认证装置还包括：样本获取模块912、三元组获取模块914、训练模块916和验证模块918。As shown in FIG. 9, in another embodiment, the face authentication device further includes: a sample acquisition module 912, a triplet acquisition module 914, a training module 916, and a verification module 918.

样本获取模块912，用于获取带标记的训练样本，所述训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像。The sample obtaining module 912 is configured to obtain a labeled training sample, where the training sample includes a document face image and at least one scene face image that are marked for each mark object.

三元组获取模块914，用于根据训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素；三元组元素包括参考样本、正样本和负样本。The triplet obtaining module 914 is configured to train the convolutional neural network model according to the training samples, and generate the triple element corresponding to each training sample by using OHEM; the triplet elements include a reference sample, a positive sample, and a negative sample.

具体地，三元组获取模块914，用于随机选择一个图像作为参考样本，选择属于同一标签对象、与参考样本类别不同的图像作为正样本，还用于根据OHEM策略，利用当前训练的卷积神经网络模型提取特征之间的余弦距离，对于每一个参考样本，从其它具有不属于同一标签对象的人脸图像中，选择距离最小、与参考样本属于不同类别的图像，作为该参考样本的负样本。Specifically, the triplet obtaining module 914 is configured to randomly select an image as a reference sample, select an image belonging to the same label object and different from the reference sample category as a positive sample, and also use the current training convolution according to the OHEM strategy. The neural network model extracts the cosine distance between the features. For each reference sample, from other face images that do not belong to the same tag object, select the image with the smallest distance and different categories from the reference sample as the negative of the reference sample. sample.

具体地，以证件照作为参考样本时，正样本和负样本均为场景照；以场景照作为参考样本时，正样本和负样本均为证件照。Specifically, when the passport photo is used as the reference sample, both the positive sample and the negative sample are scene photos; when the scene photo is taken as the reference sample, both the positive sample and the negative sample are the passport photos.

训练模块916，用于根据各训练样本的三元组元素，基于三元组损失函数的监督，训练卷积神经网络模型，该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数。The training module 916 is configured to train a convolutional neural network model based on the triple element of each training sample based on the supervision of the triad loss function, and the triad loss function is measured by a cosine distance as a metric by a random gradient Algorithm to optimize model parameters.

具体地，改进型三元组损失函数包括对同类样本的余弦距离的限定，以及对异类样本的余弦距离的限定。In particular, the improved triplet loss function includes a definition of the cosine distance of a homogeneous sample and a definition of the cosine distance of the heterogeneous sample.

改进型三元组损失函数为：The improved triplet loss function is:

其中，cos(·)表示余弦距离，其计算方式为

N是三元组数量，

表示参考样本的特征向量，

表示同类正样本的特征向量，

表示异类负样本的特征向量，[·] ₊的含义如下：

N is the number of triples,

Represents the feature vector of the reference sample,

a eigenvector representing a positive sample of its kind,

验证模块918，用于将验证集数据输入卷积神经网络模型，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络模型。The verification module 918 is configured to input the verification set data into the convolutional neural network model, and when the training end condition is reached, obtain a trained convolutional neural network model for face authentication.

在另一个实施例中，人脸认证装置还包括模型初始化模块920，用于利用基于海量开源人脸数据训练好的基础模型参数进行初始化，在特征输出层后添加归一化层及三元组损失函数层，得到待训练的卷积神经网络。上述的人脸认证装置，一方面在原始triplet loss的损失函数中增加了对类内样本距离的约束，从而在增大类间差距的同时减小类内差距，提升模型的泛化能力；另一方面，将原始triplet loss的度量方式由欧氏距离改为余弦距离，保持训练与部署的度量一致性，提高特征学习的连续性。In another embodiment, the face authentication device further includes a model initialization module 920, configured to initialize the basic model parameters trained based on the massive open source face data, and add a normalization layer and a triplet after the feature output layer. Loss function layer, get the convolutional neural network to be trained. The above-mentioned face authentication device increases the constraint on the sample distance within the class in the loss function of the original triplet loss, thereby reducing the intra-class gap and increasing the generalization ability of the model while increasing the inter-class gap; On the one hand, the original triplet loss measurement method is changed from Euclidean distance to cosine distance, keeping the consistency of training and deployment metrics, and improving the continuity of feature learning.

一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现上述各实施例的人脸认证方法的步骤。A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the step of implementing the face authentication method of each of the above embodiments when the processor executes the computer program.

一种存储介质，其上存储有计算机程序，其特征在于，该计算机程序被处理器执行时，实现上述各实施例的人脸认证方法的步骤。A storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the steps of the face authentication method of each of the above embodiments.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be considered as the scope of this manual.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-described embodiments are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims

一种基于Triplet Loss的人脸认证方法，包括：A face recognition method based on Triplet Loss, comprising:

基于人脸认证请求，获取证件照片和人物的场景照片；Obtaining photos of the ID and scenes of the person based on the face authentication request;

对所述场景照片和所述证件照片分别进行人脸检测、关键点定位和图像预处理，得到所述场景照片对应的场景人脸图像，以及所述证件照片对应的证件人脸图像；Performing face detection, key point positioning, and image pre-processing on the scene photo and the document photo respectively, obtaining a scene face image corresponding to the scene photo, and a document face image corresponding to the document photo;

将所述场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取所述卷积神经网络模型输出的所述场景人脸图像对应的第一特征向量，以及所述证件人脸图像对应的第二特征向量；其中，所述卷积神经网络模型基于三元组损失函数的监督训练得到；Inputting the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and acquiring a corresponding image of the scene face image output by the convolutional neural network model a feature vector, and a second feature vector corresponding to the document face image; wherein the convolutional neural network model is obtained based on supervised training of a triple loss function;

计算所述第一特征向量和所述第二特征向量的余弦距离；Calculating a cosine distance of the first feature vector and the second feature vector;

比较所述余弦距离和预设阈值，并根据比较结果确定人脸认证结果。Comparing the cosine distance and a preset threshold, and determining a face authentication result according to the comparison result.
根据权利要求1所述的方法，其特征在于，所述方法还包括：The method of claim 1 further comprising:

获取带标记的训练样本，所述训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像；Obtaining a labeled training sample, the training sample including a document face image and at least one scene face image that are marked for each mark object;

根据所述训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素；所述三元组元素包括参考样本、正样本和负样本；And training a convolutional neural network model according to the training sample, and generating a triple element corresponding to each training sample by using OHEM; the triad element includes a reference sample, a positive sample, and a negative sample;

根据各训练样本的三元组元素，基于三元组损失函数的监督，训练所述卷积神经网络模型；该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数；According to the triple element of each training sample, the convolutional neural network model is trained based on the supervision of the triplet loss function; the triad loss function is measured by the cosine distance and the model is optimized by the stochastic gradient descent algorithm. parameter;

将验证集数据输入所述卷积神经网络模型，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络模型。The verification set data is input into the convolutional neural network model, and when the training end condition is reached, the trained convolutional neural network model for face authentication is obtained.
根据权利要求2所述的方法，其特征在于，根据所述训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素的步骤，包括：The method according to claim 2, wherein the step of training the convolutional neural network model according to the training sample, and generating the triple element corresponding to each training sample by OHEM comprises:

随机选择一个图像作为参考样本，选择属于同一标签对象、与参考样本类别不同的图像作为正样本；Randomly selecting an image as a reference sample, and selecting an image belonging to the same label object and different from the reference sample category as a positive sample;

根据OHEM策略，利用当前训练的卷积神经网络模型提取特征之间的余弦距离，对于每一个参考样本，从其它不属于同一标签对象的图像中，选择距离最小、与所述参考样本属于不同类别的图像，作为该参考样本的负样本。According to the OHEM strategy, the currently trained convolutional neural network model is used to extract the cosine distance between the features. For each reference sample, from other images that do not belong to the same tag object, the selection distance is the smallest, and the reference sample belongs to a different category. The image as a negative sample of the reference sample.
根据权利要求2所述的方法，其特征在于，所述三元组损失函数包括对同类样本的余弦距离的限定，以及对异类样本的余弦距离的限定。The method of claim 2 wherein said triplet loss function comprises a definition of a cosine distance for a homogeneous sample and a definition of a cosine distance of the heterogeneous sample.
根据权利要求4所述的方法，其特征在于，所述三元组损失函数为：The method of claim 4 wherein said triplet loss function is:

其中，cos(·)表示余弦距离，其计算方式为
N是三元组数量，
表示参考样本的特征向量，
表示同类正样本的特征向量，
表示异类负样本的特征向量，[·] ₊的含义如下：
α ₁为类间间隔参数，α ₂为类内间隔参数。 Where cos(·) represents the cosine distance and is calculated as
N is the number of triples,
Represents the feature vector of the reference sample,
a eigenvector representing a positive sample of its kind,
A eigenvector representing a heterogeneous negative sample, the meaning of [·] ₊ is as follows:
α ₁ is the inter-class spacing parameter and α ₂ is the intra-class spacing parameter.
根据权利要求2所述的方法，其特征在于，所述方法还包括：利用基于海量开源人脸数据训练好的基础模型参数进行初始化，在特征输出层后添加归一化层及三元组损失函数层，得到待训练的卷积神经网络模型。The method according to claim 2, wherein the method further comprises: initializing the basic model parameters trained based on the massive open source face data, and adding the normalized layer and the triplet loss after the feature output layer The function layer obtains the convolutional neural network model to be trained.
一种基于Triplet Loss的人脸认证装置，包括：图像获取模块、图像预处理模块、特征获取模块、计算模块和认证模块；A face recognition device based on Triplet Loss, comprising: an image acquisition module, an image preprocessing module, a feature acquisition module, a calculation module and an authentication module;

所述图像获取模块，用于基于人脸认证请求，获取证件照片和人物的场景照片；The image obtaining module is configured to obtain a photo of the document and a photo of the scene of the character based on the face authentication request;

所述图像预处理模块，用于对所述场景照片和所述证件照片分别进行人脸检测、关键点定位和图像预处理，得到所述场景照片对应的场景人脸图像，以及所述证件照片对应的证件人脸图像；The image pre-processing module is configured to perform face detection, key point positioning, and image pre-processing on the scene photo and the photo of the document, respectively, to obtain a scene face image corresponding to the scene photo, and the photo of the ID Corresponding document face image;

所述特征获取模块，用于将所述场景人脸图像和证件人脸图像输入到预先训练好的用于人脸认证的卷积神经网络模型，并获取所述卷积神经网络模型输出的所述场景人脸图像对应的第一特征向量，以及所述证件人脸图像对应的第二特征向量；其中，所述卷积神经网络模型基于三元组损失函数的监督训练得到；The feature acquiring module is configured to input the scene face image and the document face image into a pre-trained convolutional neural network model for face authentication, and acquire the output of the convolutional neural network model a first feature vector corresponding to the scene face image, and a second feature vector corresponding to the document face image; wherein the convolutional neural network model is obtained based on the supervision training of the triplet loss function;

所述计算模块，用于计算所述第一特征向量和所述第二特征向量的余弦距离；The calculating module is configured to calculate a cosine distance of the first feature vector and the second feature vector;

所述认证模块，用于比较所述余弦距离和预设阈值，并根据比较结果确定人脸认证结果。The authentication module is configured to compare the cosine distance and a preset threshold, and determine a face authentication result according to the comparison result.
根据权利要求7所述的装置，其特征在于，所述装置还包括：样本获取模块、三元组获取模块、训练模块和验证模块；The device according to claim 7, wherein the device further comprises: a sample acquisition module, a triplet acquisition module, a training module, and a verification module;

所述样本获取模块，用于获取带标记的训练样本，所述训练样本包括标记了属于每个标记对象的一张证件人脸图像和至少一张场景人脸图像；The sample obtaining module is configured to acquire a labeled training sample, where the training sample includes a document face image and at least one scene face image marked with each tag object;

所述三元组获取模块，用于根据所述训练样本训练卷积神经网络模型，通过OHEM产生各训练样本对应的三元组元素；所述三元组元素包括参考样本、正样本和负样本；The triplet obtaining module is configured to train a convolutional neural network model according to the training sample, and generate a triple element corresponding to each training sample by using OHEM; the triplet element includes a reference sample, a positive sample, and a negative sample. ;

所述训练模块，用于根据各训练样本的三元组元素，基于三元组损失函数的监督，训练所述卷积神经网络模型；该三元组损失函数，以余弦距离作为度量方式，通过随机梯度下降算法来优化模型参数；The training module is configured to train the convolutional neural network model based on the triple element of each training sample based on the supervision of the triad loss function; the triad loss function is measured by a cosine distance A stochastic gradient descent algorithm to optimize model parameters;

所述验证模块，用于将验证集数据输入所述卷积神经网络模型，达到训练结束条件时，得到训练好的用于人脸认证的卷积神经网络模型。The verification module is configured to input the verification set data into the convolutional neural network model, and when the training end condition is reached, obtain a trained convolutional neural network model for face authentication.
一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其特征在于，所述处理器执行所述计算机程序时实现权利要求1至6任一项所述的基于Triplet Loss的人脸认证方法的步骤。A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to implement any one of claims 1 to The steps of the face recognition method based on Triplet Loss.
一种存储介质，其上存储有计算机程序，其特征在于，该计算机程序被处理器执行时，实现权利要求1至6任一项所述的基于Triplet Loss的人脸认证方法的步骤。A storage medium having stored thereon a computer program, wherein the computer program is executed by a processor, the step of implementing the facelet authentication method based on the Triplet Loss according to any one of claims 1 to 6.