WO2022127111A1 - 跨模态人脸识别方法、装置、设备及存储介质 - Google Patents

跨模态人脸识别方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022127111A1
WO2022127111A1 PCT/CN2021/107933 CN2021107933W WO2022127111A1 WO 2022127111 A1 WO2022127111 A1 WO 2022127111A1 CN 2021107933 W CN2021107933 W CN 2021107933W WO 2022127111 A1 WO2022127111 A1 WO 2022127111A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
cross
modal
image sequence
face recognition
Prior art date
Application number
PCT/CN2021/107933
Other languages
English (en)
French (fr)
Inventor
陈碧辉
高通
钱贝贝
黄源浩
肖振中
Original Assignee
奥比中光科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 奥比中光科技集团股份有限公司 filed Critical 奥比中光科技集团股份有限公司
Publication of WO2022127111A1 publication Critical patent/WO2022127111A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present application belongs to the technical field of image processing, and in particular, relates to a cross-modal face recognition method, apparatus, device and storage medium.
  • the present application provides a cross-modal face recognition method, device, device and storage medium, which can solve the problem of low recognition accuracy of face images obtained under different modalities of cameras.
  • the present application provides a cross-modal face recognition method, including:
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained to obtain the first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • the method before the acquiring the first training sample set, the method further includes:
  • the method before the acquiring the first training sample set, the method further includes:
  • performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence including:
  • the grayscale image is normalized to obtain the infrared face preprocessing image sequence.
  • performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence including:
  • the enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.
  • image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, including:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • the present application provides a cross-modal face recognition device, including:
  • the acquisition module is used to collect the face image to be recognized
  • a recognition module for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • a first acquisition module configured to acquire the first preset number of visible light face image sequences
  • the first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
  • a second acquisition module configured to acquire the first preset number of infrared face image sequences
  • the second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the first processing module includes:
  • a conversion unit for converting the visible light face image sequence into a grayscale image
  • the first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
  • the second processing module includes:
  • an enhancement module configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence
  • the second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the enhancement module is specifically used for:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • the present application provides a cross-modal face recognition device, the above-mentioned cross-modal face recognition device includes a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor, the above-mentioned
  • the processor executes the above-mentioned computer program, the steps of the method of the above-mentioned first aspect are implemented.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the method in the first aspect.
  • the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method of the first aspect are implemented.
  • the cross-modal face recognition method of the first aspect above uses a cross-modal face recognition model trained from a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform a facial image recognition process. Face recognition can improve the accuracy of face image recognition obtained under different modes of cameras.
  • Fig. 1 is the realization flow chart of the cross-modal face recognition method provided by the embodiment of the present application.
  • Fig. 2 is the training process schematic diagram of the cross-modal face recognition model that pre-training is completed
  • FIG. 3 is a schematic diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • FIG. 1 is an implementation flowchart of the cross-modal face recognition method provided by the embodiment of the present application. This implementation can be performed by cross-modal face recognition devices, including but not limited to self-service terminals, monitoring equipment, attendance equipment, and servers, robots, wearable devices or mobile terminal, etc. Details are as follows:
  • the face image to be recognized may be a face image collected in a visible light mode or an infrared mode.
  • a camera of a cross-modal face recognition device such as a camera of a mobile terminal or an attendance device, may collect a face image in a visible light mode, or collect a face image in an infrared mode.
  • S102 Input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition.
  • the pre-trained cross-modal face recognition model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image
  • the deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according to preset rules. Tuple training set, and select the pre-trained cross-modal image deep convolutional neural network for fine-tuning, and iterate repeatedly until the performance of the pre-trained cross-modal image deep convolutional neural network is no longer improved.
  • a neural network model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image
  • the deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according
  • FIG. 2 is a schematic diagram of the training process of the pre-trained cross-modal face recognition model.
  • the training process of the pre-trained cross-modal face recognition model includes the following steps:
  • S201 Obtain a first training sample set, where the first training sample set includes a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.
  • a color camera or a multispectral camera can be used to collect a visible-light face image sequence including a human face.
  • visible light face images contain rich texture features and are easily affected by ambient light. Therefore, in some optional implementation manners, the visible light face preprocessing image sequence is obtained by acquiring a first preset number of visible light face image sequences; and performing pixel equalization processing on the visible light face image sequence.
  • performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence may include: collecting the face region including the visible light face image. Segmentation with the background area to obtain the visible face image.
  • the image detection model may be used to detect whether there is a face in the image to be processed.
  • the output result of the image detection model shows that there is no face in the image, it is not necessary to perform a face segmentation, and end face segmentation processing to reduce unnecessary workload.
  • the output result of the image detection model shows that there is a face in the image
  • it is possible to further screen the face that is, to determine whether there is a face that meets the preset conditions in the image, for example, it can be to determine whether the face meets the requirements.
  • the above requirements can be preset for the position and/or size of the face, for example, the size of the face area can be preset to meet the preset size, and it is considered to meet the requirements.
  • follow-up processing is performed, such as image rotation correction and object segmentation processing on the image.
  • the face does not meet the preset conditions, the image can not be segmented.
  • a series of visible light face images are processed as above to obtain a visible light face image sequence, and further preprocessing of the visible light face image sequence can obtain a visible light face preprocessed image sequence.
  • preprocessing the visible light face image sequence may include: performing grayscale conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.
  • the face images in the visible light face image sequence are converted into grayscale images by grayscale.
  • grayscale conversion can be performed by a preset grayscale conversion formula, and the preset grayscale conversion formula can be expressed as:
  • Igray is the grayscale image output after grayscale conversion
  • R, G, and B are the RGB values corresponding to the image before grayscale conversion.
  • the converted grayscale image is further subjected to normalization processing, exemplarily, normalization processing is performed by using a preset normalization processing formula.
  • S202 Train a preset cross-modal face recognition model according to the visible-light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.
  • the cross-modal neural network can be pre-trained by using the visible light face preprocessing image sequence.
  • the visible light face preprocessing image sequence is divided into two parts: a training set and a validation set. Wherein, the training set and the verification set do not overlap, the training set is used to train the preset cross-modal face recognition model, and the verification is carried out through the training of the preset cross-modal face recognition model in the verification set.
  • a classification loss function by continuously saving the neural network that minimizes the loss of the validation set to determine the first cross-modal neural network finally trained in this embodiment, and the first cross-modal neural network is the first cross-modal neural network. face recognition model.
  • S203 Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modality face recognition model, and classify the first cross-modality face recognition model based on the first classification loss function
  • the face recognition model is retrained to obtain a second cross-modal face recognition model.
  • the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence.
  • the method includes: acquiring the second preset number of infrared face image sequences; performing pixel equalization processing on the infrared face image sequences to obtain the infrared face pre-set number. Process image sequences.
  • performing pixel equalization processing on the infrared face image sequence includes: performing image contrast enhancement and normalization processing on the infrared face image sequence.
  • histogram equalization may be performed on the infrared face image sequence to enhance image contrast.
  • histogram equalization is a method to enhance the image contrast by stretching the pixel intensity distribution range.
  • a logarithmic function and a power function can also be used to transform the infrared face image sequence to enhance the image contrast.
  • the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the process of normalizing the infrared face image sequence is the same as the process of normalizing the visible light face image, and details are not repeated here.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers. Among them, there can be any number of convolutional layers before the fully connected layer.
  • the cross-modal face recognition model includes five convolutional layers and one fully connected layer; wherein, the first convolutional layer and the second convolutional layer each include two convolutional layers for feature extraction.
  • the third to fifth convolutional layers include three convolutional layers for feature extraction and one maximum pooling layer for dimensionality reduction, each layer
  • the feature maps after the operation are all subjected to nonlinear activation functions; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation to extract feature values through the convolution layer, and then the face feature vector is output through the fully connected layer.
  • the first convolutional layer may include two convolutional layers with a convolution kernel size of 3 ⁇ 3, a stride of 1 ⁇ 1, and a convolutional kernel number of 64, and one convolutional kernel of 2 ⁇ 2.
  • a maximum pooling layer with stride 2 ⁇ 2 the second convolutional layer includes two convolutional layers with kernel size 3 ⁇ 3, stride 1 ⁇ 1, and the number of convolution kernels 128 and a volume
  • the third convolutional layer includes three convolution kernels with a size of 3 ⁇ 3, a stride of 1 ⁇ 1, and a number of convolution kernels of 256.
  • the fourth convolutional layer includes three convolutional kernels of size 3 ⁇ 3, a stride of 1 ⁇ 1, and a convolutional A convolutional layer with 512 kernels and a max pooling layer with 2 ⁇ 2 convolution kernels and 2 ⁇ 2 stride;
  • the fifth convolutional layer includes three convolutional kernels with a size of 3 ⁇ 3 and a stride of 2 ⁇ 2.
  • a 1 ⁇ 1 convolutional layer with 512 convolution kernels and a max pooling layer with 2 ⁇ 2 convolution kernels and a stride of 2 ⁇ 2; the two fully connected layers each have 4096 nodes.
  • the convolution kernel and weight are randomly initialized, and the bias term is set to 0.
  • the stochastic gradient descent (SGD) algorithm is used to update the network parameters and optimize the gradient of the above-mentioned cross-modal neural network.
  • the training stops and the trained cross-modal neural network is saved.
  • the cross-modal face recognition method provided by the present application adopts the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence. face image recognition, which can improve the recognition accuracy of face images obtained under different modalities of cameras.
  • FIG. 3 shows a structural block diagram of the cross-modal face recognition device provided by the embodiment of the present application. Example relevant part.
  • FIG. 3 is a schematic diagram of a cross-modal face recognition apparatus provided by an embodiment of the present application.
  • the cross-modal face recognition device 300 includes:
  • the collection module 301 is used to collect the face image to be recognized
  • a recognition module 302 configured to input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • a first acquisition module configured to acquire the first preset number of visible light face image sequences
  • the first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
  • a second acquisition module configured to acquire the first preset number of infrared face image sequences
  • the second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the first processing module includes:
  • a conversion unit for converting the visible light face image sequence into a grayscale image
  • the first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
  • the second processing module includes:
  • an enhancement module configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence
  • the second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the enhancement module is specifically used for:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • the cross-modal face recognition device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4 ), a memory 41 , and a memory 41 stored in the memory 41 and available in the A computer program 42 running on at least one processor 40, when the processor 40 executes the computer program 42, the steps in the method embodiment described in FIG. 1 above are implemented.
  • the cross-modal face recognition device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the cross-modal face recognition device 4 may include, but is not limited to, a processor 40 and a memory 41 .
  • FIG. 4 is only an example of the cross-modal face recognition device 4, and does not constitute a limitation to the cross-modal face recognition device 4, and may include more or less components than those shown in the figure. Alternatively, some components may be combined, or different components may also include, for example, input and output devices, network access devices, and the like.
  • the so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), and the processor 40 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 41 may be an internal storage unit of the cross-modality face recognition device 4 , such as a hard disk or memory of the cross-modality face recognition device 4 .
  • the memory 41 may also be an external storage device of the cross-modal face recognition device 4 in other embodiments, such as a plug-in hard disk equipped on the cross-modal face recognition device 4, a smart memory card. (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 41 may also include both an internal storage unit of the cross-modal face recognition device 4 and an external storage device.
  • the memory 41 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as program codes of the computer program.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • An embodiment of the present application also provides a network device, the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing The computer program implements the steps in any of the foregoing method embodiments.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a cross-modal face recognition device, the steps in the above method embodiments can be implemented when the cross-modal face recognition device is executed.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware.
  • the computer program can be stored in a computer-readable storage medium, and the computer program When executed by the processor, the steps of the above-mentioned various method embodiments may be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种跨模态人脸识别方法、装置、设备及存储介质,所述方法通过采用由可见光人脸预处理图像序列和红外光人脸预处理图像序列训练完成的跨模态人脸识别模型,对待识别的人脸图像进行人脸识别,能够提高对不同模态的摄像头下获取的人脸图像识别的准确率。

Description

跨模态人脸识别方法、装置、设备及存储介质 技术领域
本申请属于图像处理技术领域,尤其涉及一种跨模态人脸识别方法、装置、设备及存储介质。
背景技术
人脸识别的准确性受周围环境光照的影响较大。常见的人脸识别技术主要是针对不受环境光照影响的近红外摄像头拍摄的人脸图像进行识别。但是,在现实环境中经常会出现光照不均以及光照不佳的情况,这就需要针对在不同模态的摄像头下获取的图像进行识别,而目前的人脸识别技术无法对不同模态的摄像头下获取的图像进行准确性识别。因此,现有技术存在对不同模态的摄像头下获取的人脸图像识别准确率不高的问题。
发明内容
本申请提供了一种跨模态人脸识别方法、装置、设备及存储介质,能够解决对不同模态的摄像头下获取的人脸图像识别准确率不高的问题。
第一方面,本申请提供了一种跨模态人脸识别方法,包括:
采集待识别的人脸图像;
将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态 人脸识别模型进行训练,得到第一跨模态人脸识别模型;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。
在一可选的实现方式中,在所述获取第一训练样本集之前,还包括:
获取所述第一预设数量的可见光人脸图像序列;
对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,在所述获取第一训练样本集之前,还包括:
获取所述第二预设数量的红外人脸图像序列;
对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,包括:
对所述可见光人脸图像序列转换为灰度图像;
对所述灰度图像进行归一化处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列,包括:
对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;
对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,对所述红外人脸图像序列进行图像对比度增强, 得到增强后的红外人脸图像序列,包括:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。
第二方面,本申请提供了跨模态人脸识别装置,包括:
采集模块,用于采集待识别的人脸图像;
识别模块,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。
在一可选的实现方式中,还包括:
第一获取模块,用于获取所述第一预设数量的可见光人脸图像序列;
第一处理模块,用于对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,还包括:
第二获取模块,用于获取所述第一预设数量的红外人脸图像序列;
第二处理模块,用于对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,所述第一处理模块,包括:
转换单元,用于对所述可见光人脸图像序列转换为灰度图像;
第一处理单元,用于对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,所述第二处理模块,包括:
增强模块,用于对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;
第二处理单元,用于对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,所述增强模块,具体用于:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。
第三方面,本申请提供了一种跨模态人脸识别设备,上述跨模态人脸识别设备包括存储器、处理器以及存储在上述存储器中并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现如上述第一方面的方法的步骤。
第四方面,本申请提供了一种计算机可读存储介质,上述计算机可读存储介质存储有计算机程序,上述计算机程序被处理器执行时实现如上述第一方面的方法的步骤。
第五方面,本申请提供了一种计算机程序产品,上述计算机程序产品包括计算机程序,上述计算机程序被一个或多个处理器执行时实现如上述第一方面的方法的步骤。
上述第一方面的跨模态人脸识别方法,通过采用由可见光人脸预处理图像序列和红外光人脸预处理图像序列训练完成的跨模态人脸识别模型,对待识别的人脸图像进行人脸识别,能够提高对不同模态的摄像头下获取的人脸图像识别的准确率。
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的跨模态人脸识别方法的实现流程图;
图2是预先训练完成的跨模态人脸识别模型的训练过程示意图;
图3是本申请实施例提供的跨模态人脸识别装置的示意图;
图4是本申请实施例提供的跨模态人脸识别设备的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定***结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的***、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依 据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
下面结合具体实施例对本申请提供的跨模态人脸识别方法进行示例性的说明。如图1所示,图1是本申请实施例提供的跨模态人脸识别方法的实现流程图。本实施了可以由跨模态人脸识别设备执行,所述跨模态人脸识别设备包括但不限于自助终端、监控设备、考勤设备以及各种应用场景下的服务器、机器人、可穿戴设备或者移动终端等。详述如下:
S101,采集待识别的人脸图像。
在本申请的实施例中,所述待识别的人脸图像可以是可见光模态下或者红外模态下采集的人脸图像。示例性地,可以通过跨模态人脸识别设备的摄像机,例如移动终端或者考勤设备的摄像机采集可见光模态下的人脸图像,或者采集红外模态下的人脸图像。
S102将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别。
在本申请的实施例中,所述预先训练完成的跨模态人脸识别模型为通过可 见光模态下的人脸图像对深度卷积神经网络进行预训练之后,得到预训练的跨模态图像的深度卷积神经网络,为跨模态图像的深度卷积神经网络的训练提供先验知识,然后将可见光模态下的人脸图像和红外模态下的人脸图像按照预设规则构成二元组训练集,并挑选出预训练的跨模态图像深度卷积神经网络进行精调,反复迭代,直到预训练的跨模态图像深度卷积神经网络的性能不再提升之后得到的深度卷积神经网络模型。
其中,图2是预先训练完成的跨模态人脸识别模型的训练过程示意图。如图2所示,所述预先训练完成的跨模态人脸识别模型的训练过程包括如下步骤:
S201,获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列。
需要说明的是,可使用彩色相机或多光谱相机采集包含人脸的可见光人脸图像序列。其中,可见光人脸图像包含丰富的纹理特征且容易受环境光影响。因此,在一些可选的实现方式中,通过获取第一预设数量的可见光人脸图像序列;并对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,可以包括:将采集到的包括可见光人脸图像中的人脸区域与背景区域进行分割,以获取见光人脸图像。
在本申请的一些实施例中,在进行图像分割之前,可以首先通过图像检测模型检测待处理图像中是否有人脸,当图像检测模型的输出结果显示图像中没有人脸时,则不必进行人脸分割,同时结束人脸分割处理,以减少不必要的工作量。当图像检测模型的输出结果显示图像中有人脸时,还可以进一步进行人脸的筛选,即判断图像中是否存在符合预设条件的人脸,例如,可以是判断人脸是否符合要求,具体的,可以针对人脸的位置和/或大小来预先设定上述要求,如可以预先设定人脸区域大小满足预设大小时才认为符合要求。当人脸符合预设条件时,则执行后续处理,如进行图像的旋转校正、对图像进行对象分割处 理,当人脸不符合预设条件时,则可以不对图像进行分割处理。
在本申请的一些实施例中,将一系列可见光人脸图像作如上处理,即可获取可见光人脸图像序列,进一步对可见光人脸图像序列进行预处理可以得到可见光人脸预处理图像序列。可选地,对所述可见光人脸图像序列进行预处理,可以包括:对所述可见光人脸图像序列进行灰度转换和归一化处理,以得到可见光人脸预处理图像序列。
其中,将可见光人脸图像序列中的人脸图像进行灰度转换为灰度图像。可选地,可通过预设灰度转换的公式进行灰度转换,所述预设灰度转换公式可以表示为:
I gary=0.2989×R+0.5870×G+0.1140×B
其中,Igray为灰度转换后灰度图像输出,R、G、B为灰度转换前图像对应的RGB值。
进一步将转换得到的灰度图像进行归一化处理,示例性地,通过预设的归一化处理公式进行归一化处理。
S202,根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型。
由于可见光模态下的人脸较复杂,因此可先使用可见光人脸预处理图像序列对跨模态神经网络进行预训练。在本申请的一个实施例中,将可见光人脸预处理图像序列分为训练集、验证集两部分。其中,训练集与验证集不重合,使用训练集对所述预设跨模态人脸识别模型进行训练,通过验证集所述预设跨模态人脸识别模型的训练进行验证,同时构建第一分类损失函数,通过不断保存使验证集损失最小的神经网络确定为本实施例最终训练得到的第一跨模态神经网络,该第一跨模态神经网络为所述第一跨模态人脸识别模型。
S203,将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型。
需要说明的是,所述红外人脸预处理图像序列为对红外人脸图像序列进行预处理得到。示例性地,在获取第二训练样本集之前,包括:获取所述第二预设数量的红外人脸图像序列;对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。
其中,对所述红外人脸图像序列进行像素均衡处理包括:对所述红外人脸图像序列进行图像对比度增强以及归一化处理。在本申请的一些实施例中,可以对红外人脸图像序列进行直方图均衡化来增强图像对比度。其中,直方图均衡化是一种通过拉伸像素强度分布范围来增强图像对比度的方法。在其它一些实施例中,还可以对红外人脸图像序列采用对数函数和幂函数进行转换来增强图像对比度。
此外,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
在本申请实例之后,对红外人脸图像序列进行归一化处理的过程与对可见光人脸图像进行归一化处理的过程相同,在此不再赘述。
在本申请的一些实施例中,所述跨模态人脸识别模型包括预设数量卷积层和全连接层。其中,在全连接层前可有任意层数的卷积层。示例性地,所述跨模态人脸识别模型包括五个卷积层和一个全连接层;其中,第一卷积层和第二卷积层均包括两个用于特征提取的卷积层和一个用于降维的最大池化层,第三卷积层至第五卷积层均包括三个用于特征提取的卷积层和一个用于降维的最大池化层,每一层运算操作后的特征图都经过非线性激活函数;可见光预处理图像序列与红外预处理图像序列经过卷积层进行卷积操作提取特征值,之后经过全连接层输出人脸特征向量。
示例性的,所述第一卷积层可以包括两个卷积核尺寸为3×3、步长为1×1、卷积核数量为64的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第二卷积层包括两个卷积核尺寸为3×3、步长为1×1、卷积核数量为128的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第三卷积层包括三个卷积 核尺寸为3×3、步长为1×1、卷积核数量为256的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第四卷积层包括三个卷积核尺寸为3×3、步长为1×1、卷积核数量为512的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第五卷积层包括三个卷积核尺寸为3×3、步长为1×1、卷积核数量为512的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;两个全连接层各有4096个节点。应该了解的是,上述跨模态神经网络可采用任意结构,上述例子不具有限制作用。
其中,在跨模态人脸识别模型的训练过程中,卷积核和权重进行随机初始化,偏置项置为0。采用随机梯度下降(SGD)算法对上述跨模态神经网络进行网络参数的更新和梯度的优化,当网络迭代次数达到预设值时,训练停止并保存训练好的跨模态神经网络。
通过上述实施例可知,本申请提供的跨模态人脸识别方法,通过采用由可见光人脸预处理图像序列和红外光人脸预处理图像序列训练完成的跨模态人脸识别模型,对待识别的人脸图像进行人脸识别,能够提高对不同模态的摄像头下获取的人脸图像识别的准确率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的跨模态人脸识别方法,图3示出了本申请实施例提供的跨模态人脸识别装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
如图3所示,图3是本申请实施例提供的跨模态人脸识别装置的示意图。该跨模态人脸识别装置300包括:
采集模块301,用于采集待识别的人脸图像;
识别模块302,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。
在一可选的实现方式中,还包括:
第一获取模块,用于获取所述第一预设数量的可见光人脸图像序列;
第一处理模块,用于对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,还包括:
第二获取模块,用于获取所述第一预设数量的红外人脸图像序列;
第二处理模块,用于对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,所述第一处理模块,包括:
转换单元,用于对所述可见光人脸图像序列转换为灰度图像;
第一处理单元,用于对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。
在一可选的实现方式中,所述第二处理模块,包括:
增强模块,用于对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;
第二处理单元,用于对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。
在一可选的实现方式中,所述增强模块,具体用于:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述***中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
图4是本申请实施例提供的跨模态人脸识别设备的结构示意图。如图4所示,该实施例的跨模态人脸识别设备4包括:至少一个处理器40(图4中仅示出一个)、存储器41以及存储在所述存储器41中并可在所述至少一个处理器40上运行的计算机程序42,所述处理器40执行所述计算机程序42时实现上述图1所述方法实施例中的步骤。
所述跨模态人脸识别设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该跨模态人脸识别设备4可包括,但不仅限于,处理器40、存储器41。本领域技术人员可以理解,图4仅仅是跨模态人脸识别设备4 的举例,并不构成对跨模态人脸识别设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。
所称处理器40可以是中央处理单元(Central Processing Unit,CPU),该处理器40还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器41在一些实施例中可以是所述跨模态人脸识别设备4的内部存储单元,例如跨模态人脸识别设备4的硬盘或内存。所述存储器41在另一些实施例中也可以是所述跨模态人脸识别设备4的外部存储设备,例如所述跨模态人脸识别设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器41还可以既包括所述跨模态人脸识别设备4的内部存储单元也包括外部存储设备。所述存储器41用于存储操作***、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。
本申请实施例提供了一种计算机程序产品,当计算机程序产品在跨模态人脸识别设备上运行时,使得跨模态人脸识别设备执行时实现可实现上述各个方 法实施例中的步骤。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/电子设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或 通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种跨模态人脸识别方法,其特征在于,包括:
    采集待识别的人脸图像;
    将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;
    其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;
    根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;
    将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
  2. 如权利要求1所述的方法,其特征在于,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。
  3. 如权利要求1所述的方法,其特征在于,在所述获取第一训练样本集之前,还包括:
    获取所述第一预设数量的可见光人脸图像序列;
    对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。
  4. 如权利要求1所述的方法,其特征在于,在所述获取第一训练样本集之前,还包括:
    获取所述第二预设数量的红外人脸图像序列;
    对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。
  5. 如权利要求3所述的方法,其特征在于,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,包括:
    对所述可见光人脸图像序列转换为灰度图像;
    对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。
  6. 如权利要求4所述的方法,其特征在于,对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列,包括:
    对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;
    对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。
  7. 如权利要求6所述的跨模态人脸识别方法,其特征在于,对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列,包括:
    对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。
  8. 一种跨模态人脸识别装置,其特征在于,包括:
    采集模块,用于采集待识别的人脸图像;
    识别模块,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;
    其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;
    根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;
    将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第 一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。
  9. 一种跨模态人脸识别设备,其特征在于,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。
PCT/CN2021/107933 2020-12-14 2021-07-22 跨模态人脸识别方法、装置、设备及存储介质 WO2022127111A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011467115.3A CN112507897A (zh) 2020-12-14 2020-12-14 跨模态人脸识别方法、装置、设备及存储介质
CN202011467115.3 2020-12-14

Publications (1)

Publication Number Publication Date
WO2022127111A1 true WO2022127111A1 (zh) 2022-06-23

Family

ID=74973029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107933 WO2022127111A1 (zh) 2020-12-14 2021-07-22 跨模态人脸识别方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112507897A (zh)
WO (1) WO2022127111A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565215A (zh) * 2022-07-01 2023-01-03 北京瑞莱智慧科技有限公司 一种人脸识别算法切换方法、装置及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507897A (zh) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 跨模态人脸识别方法、装置、设备及存储介质
CN113743379B (zh) * 2021-11-03 2022-07-12 杭州魔点科技有限公司 一种多模态特征的轻量活体识别方法、***、装置和介质
CN115147679B (zh) * 2022-06-30 2023-11-14 北京百度网讯科技有限公司 多模态图像识别方法和装置、模型训练方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (zh) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 基于深度卷积神经网络的异质人脸识别方法
CN108520220A (zh) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 模型生成方法和装置
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN112149635A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 跨模态人脸识别模型训练方法、装置、设备以及存储介质
CN112507897A (zh) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 跨模态人脸识别方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (zh) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 基于深度卷积神经网络的异质人脸识别方法
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN108520220A (zh) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 模型生成方法和装置
CN112149635A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 跨模态人脸识别模型训练方法、装置、设备以及存储介质
CN112507897A (zh) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 跨模态人脸识别方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG DIAN, WANG HAI-TAO, JIANG YING CHEN, XING: "Research on Face Recognition Algorithm Based on Near Infrared and Visible Image Fusion of Lightweight Neural Network", JOURNAL OF CHINESE COMPUTER SYSTEMS, vol. 41, no. 4, 30 April 2020 (2020-04-30), XP055943347 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565215A (zh) * 2022-07-01 2023-01-03 北京瑞莱智慧科技有限公司 一种人脸识别算法切换方法、装置及存储介质
CN115565215B (zh) * 2022-07-01 2023-09-15 北京瑞莱智慧科技有限公司 一种人脸识别算法切换方法、装置及存储介质

Also Published As

Publication number Publication date
CN112507897A (zh) 2021-03-16

Similar Documents

Publication Publication Date Title
WO2022127112A1 (zh) 跨模态人脸识别方法、装置、设备及存储介质
WO2022127111A1 (zh) 跨模态人脸识别方法、装置、设备及存储介质
CN109117803B (zh) 人脸图像的聚类方法、装置、服务器及存储介质
Faraji et al. Face recognition under varying illuminations using logarithmic fractal dimension-based complete eight local directional patterns
CN111461165A (zh) 图像识别方法、识别模型的训练方法及相关装置、设备
WO2020143330A1 (zh) 一种人脸图像的捕捉方法、计算机可读存储介质及终端设备
WO2020192112A1 (zh) 人脸识别方法及装置
US20120027263A1 (en) Hand gesture detection
US20120027252A1 (en) Hand gesture detection
WO2023010758A1 (zh) 一种动作检测方法、装置、终端设备和存储介质
KR101912748B1 (ko) 확장성을 고려한 특징 기술자 생성 및 특징 기술자를 이용한 정합 장치 및 방법
CN110717497B (zh) 图像相似度匹配方法、装置及计算机可读存储介质
CN106650568B (zh) 一种人脸识别方法及装置
Malgheet et al. Iris recognition development techniques: a comprehensive review
WO2020143165A1 (zh) 一种翻拍图像的识别方法、***及终端设备
WO2024077781A1 (zh) 基于卷积神经网络模型的图像识别方法、装置及终端设备
CN112464803A (zh) 图像比较方法和装置
CN113158869A (zh) 图像识别方法、装置、终端设备及计算机可读存储介质
CN111325709A (zh) 无线胶囊内窥镜图像检测***及检测方法
CN111400528A (zh) 一种图像压缩方法、装置、服务器及存储介质
Rehman Light microscopic iris classification using ensemble multi‐class support vector machine
WO2021027155A1 (zh) 基于指静脉图像的验证方法、装置、存储介质及计算机设备
CN108960246B (zh) 一种用于图像识别的二值化处理装置及方法
CN111126250A (zh) 一种基于ptgan的行人重识别方法及装置
CN113158773B (zh) 一种活体检测模型的训练方法及训练装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905044

Country of ref document: EP

Kind code of ref document: A1