WO2022252635A1 - Face positioning method, apparatus and device based on multi-task fusion, and storage medium - Google Patents

Face positioning method, apparatus and device based on multi-task fusion, and storage medium Download PDF

Info

Publication number
WO2022252635A1
WO2022252635A1 PCT/CN2022/072186 CN2022072186W WO2022252635A1 WO 2022252635 A1 WO2022252635 A1 WO 2022252635A1 CN 2022072186 W CN2022072186 W CN 2022072186W WO 2022252635 A1 WO2022252635 A1 WO 2022252635A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
detection model
face detection
key point
training sample
Prior art date
Application number
PCT/CN2022/072186
Other languages
French (fr)
Chinese (zh)
Inventor
胡魁
戴磊
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022252635A1 publication Critical patent/WO2022252635A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Definitions

  • the present application relates to the technical field of face recognition, and in particular to a multi-task fusion face positioning method, device, equipment and storage medium.
  • the face tracking model and the face quality judgment model are required to recognize the same picture. In some cases, even multiple models (such as illumination model, blur model, attitude judgment model, etc.) are required for face quality judgment. , occlusion judgment model, etc.), which leads to low computing power in the entire face recognition process, causing serious delay problems and affecting user experience.
  • This application provides a multi-task fusion face positioning method, device, equipment and storage medium, which can solve the recognition error caused by the unbalanced face posture of the face recognition model, improve the accuracy of face recognition and ensure recognition efficiency.
  • the present application provides a multi-task fusion face positioning method, the method comprising:
  • the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
  • the first human face detection model is trained according to the training sample set to obtain the second human face detection model
  • the face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
  • the present application also provides a multi-task fusion face positioning device, including:
  • a first obtaining module configured to fuse at least two models associated with face recognition to obtain a first face detection model, the first face detection model including the public network structure of the associated models, A plurality of output branches, a loss function corresponding to each of the output branches;
  • the second obtaining module is used to train the first face detection model according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model;
  • the third obtaining module is configured to detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
  • the present application also provides a multi-task fusion face positioning device, including:
  • the memory is used to store computer programs
  • the processor is configured to execute the computer program and realize the steps of the multi-task fusion face location method as described in the first aspect when executing the computer program.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor realizes the above-mentioned first aspect.
  • the steps of the multi-task fusion face localization method are also provided.
  • the present application discloses a multi-task fusion face positioning method, device, equipment and storage medium.
  • the first human face detection model is trained according to the training sample set, and the second human face detection model is obtained; the human face to be recognized is detected based on the second human face detection model, A face positioning result and a face quality detection result of the face to be recognized are obtained. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
  • Fig. 1 is the implementation flowchart of the multi-task fusion face location method provided by an embodiment of the present application
  • Fig. 2 is the specific implementation flowchart of S101 in Fig. 1;
  • Fig. 3 is the specific implementation flowchart of S102 in Fig. 1;
  • FIG. 4 is a schematic structural diagram of a multi-task fusion face positioning device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural block diagram of a multi-task fusion face locating device provided by an embodiment of the present application.
  • Embodiments of the present application provide a multi-task fusion face location method, device, device, and storage medium.
  • the multi-task fusion face positioning method provided by the embodiment of the present application, after obtaining the first face detection model by fusing at least two models associated with face recognition, based on the preset loss weight and full key points A loss function, training the first face detection model according to the training sample set to obtain a second face detection model; detecting the face to be recognized based on the second face detection model to obtain the face to be recognized Face localization results and face quality detection results. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
  • FIG. 1 is a schematic flowchart of a multi-task fusion face location method provided by an embodiment of the present application.
  • the multi-task fusion face positioning method can be implemented by a server or a terminal, and the server can be a single server or a server cluster.
  • the terminal may be a handheld terminal, a notebook computer, a wearable device, a robot, or the like.
  • FIG. 1 is an implementation flowchart of a multi-task fusion face location method provided by an embodiment of the present application. Specifically include: step S101 to step S103. The details are as follows:
  • S101 Fuse at least two models associated with face recognition to obtain a first face detection model, where the first face detection model includes a public network structure of the associated models, several output branches, Each of the output branches corresponds to a loss function.
  • the at least two models associated with face recognition may be a face positioning model, a face quality detection model, and/or a face gesture recognition model.
  • the face positioning model is used to locate the position of the face from the image;
  • the face quality detection model is used to detect whether the face is occluded and the position of the occlusion; Set whether there is a relatively large gesture at the key point, such as whether there is a gesture such as closing the eyes or opening the mouth.
  • the fused model can directly perform multi-task face recognition, such as face positioning and face quality at the same time Detection (occlusion or presence of large gestures), etc., can effectively improve the efficiency of multi-task recognition.
  • a first face detection model with a common basic network and multiple output branches can be obtained.
  • FIG. 2 is a specific implementation flowchart of S101 in FIG. 1 . It can be seen from FIG. 2 that in this embodiment, S101 includes S1011 to S1013. The details are as follows:
  • the basic networks of at least two models associated with face recognition may be composed of different or identical convolutional layers; in this embodiment, the acquired model parameters of each of the basic networks are shared to construct the The process of the public network structure of the first face detection model is to merge the obtained convolutional layers of each model associated with face recognition in a shared manner to obtain a collection of all convolutional layers, and obtain the obtained The public network structure of the first face detection model.
  • the loss function of each model associated with face recognition can be a classification function preset in the training process of each model associated with face recognition, for example, an absolute value loss function, a log logarithmic loss function, Square loss function, exponential loss function, Hinge loss function, cross entropy loss function, etc. Understandably, the loss functions of the models associated with face recognition may be the same or different, which are mainly determined by the usage of each model associated with face recognition, and will not be repeated here.
  • the first face detection model includes a basic network structure and multiple output branches, wherein the basic network structure is a union of convolutional layers of models associated with face recognition, and multiple output branches The branches are respectively the loss functions of the models associated with face recognition.
  • the preset loss weights are the weights of the loss functions corresponding to the models associated with face recognition, and each preset loss weight is used to balance the corresponding loss functions of the first face detection model. According to the proportion in the combination process, the value of the loss function is adjusted according to the magnitude of the corresponding loss function.
  • the preset loss weight can be adjusted according to the training of the first face detection model and according to the different recognition requirements of the first face recognition model for different task scenarios. For example, when the first face recognition model is used for scene recognition requiring high face pose accuracy, the weight of the corresponding loss function under the scene recognition can be adjusted to improve the simulation performance of the first face recognition model in the recognition scene. combined ability.
  • the full key point loss function can be expressed as:
  • the value of L lmk is used to indicate whether there is a large-scale gesture at the key points of the face.
  • the value of L lmk is 1, indicating that there is no large-scale gesture, and the value of L lmk is 0, indicating that there is a large-scale gesture at the preset face key points.
  • Posture; z is the recognized key point of the target face, y, p, r are the preset key points of the face (for example, eyes, mouth, nose, etc.), and x is the range of change of the corresponding recognized target key point , ⁇ is the attitude angle change size corresponding to the key points of the face.
  • the full amount of key point loss function is mainly used in the training process of the first face detection model, for each preset face key points output by the first face detection model, such as face contour, eyes , mouth, nose and other face key point confidence detection results are adjusted.
  • FIG. 3 is a specific implementation flowchart of S102 in FIG. 1 . It can be seen from FIG. 3 that in this embodiment, S102 includes S1021 to S1022. The details are as follows:
  • updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set may include: inputting each training sample in the training sample set into the The first human face detection model, based on the preset data labels of the various training samples, update the parameters of each output branch of the first human face detection model; based on each output of the first human face detection model The parameter update result of the branch is reversely updated to the public network structure of the first face detection model.
  • the parameter update of the first face detection model is balanced based on the preset loss weight, and the first face detection model after parameter update is determined based on the full key point loss function.
  • the degree of convergence of the model, to obtain the second face detection model may include: balancing the proportions of each output branch of the first face detection model in the process of parameter update based on the preset loss weight, according to each The order of magnitude of the output branch adjusts the respective corresponding parameters; based on the full amount of key point loss function, update the detection result of the first face detection model to the confidence of the key points of the face until the first face detection model converges and stabilizes , to obtain the second face detection model.
  • the convergence stability of the first face detection model is determined by the value of the preset face positioning loss coefficient ⁇ ', specifically, ⁇ ' can be expressed as:
  • represents the probability value of face positioning, and the value of L lmk is used to indicate whether the key points of the face have a large attitude; ⁇ is the change in the attitude angle corresponding to the key points of the face; y, p, r are the preset Face key points (for example, eyes, mouth, nose, etc.); mask i represents the probability value of locating the face.
  • the full amount of key point loss function includes a face key point confidence label that has an impact on the pose of the face; based on the full amount of key point loss function, the first face detection model is updated to determine the face key point Confidence detection results, until the first face detection model converges and stabilizes, and obtaining the second face detection model may include: based on the confidence label of the key point of the face that has an impact on the face posture, updating The first face detection model detects the confidence level of key points of the face until the first face detection model converges and stabilizes to obtain the second face detection model.
  • the face key point confidence label that has an impact on the face pose is related to the degree of occlusion of the face key points and the size of the face pose angle;
  • the face key point confidence label that the posture has influence update the detection result of the first face detection model to the face key point confidence, until the convergence and stability of the first face detection model, obtain the second person
  • the face detection model may include: based on the human face key point confidence label that has an impact on the human face posture, determine the occlusion degree of the human face key point and the size of the human face pose angle; according to the occlusion degree of the human face key point and the human face
  • the size of the face pose angle update the loss coefficient of the first face detection model to face positioning; update the first face detection according to the updated loss coefficient of the first face detection model to face positioning
  • the detection results of the model on the confidence of key points of the face are obtained until the first face detection model converges and stabilizes to obtain the second face detection model.
  • face positioning and face quality detection can be performed on the face to be recognized based on the second face detection model;
  • the multi-task fusion face location method provided in this embodiment, after obtaining the first face detection model by fusing at least two models associated with face recognition, based on the preset loss weight and a full amount of key point loss function, the first face detection model is trained according to the training sample set to obtain the second face detection model; based on the second face detection model, the face to be recognized is detected to obtain the Describe the face positioning results and face quality detection results of the faces to be recognized. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
  • FIG. 4 is a schematic structural diagram of a multi-task fusion face locating device provided by an embodiment of the present application.
  • the speech synthesis device is used to execute the steps of the multi-task fusion face location method shown in the embodiment of FIG. 1 .
  • the multi-tasking fusion face positioning device can be a single server or server cluster, or the multi-task fusion face positioning device can be a terminal, and the terminal can be a handheld terminal, a notebook computer, a wearable device, or a robot.
  • the multi-task fusion face positioning device 400 includes:
  • the first obtaining module 401 is configured to fuse at least two models associated with face recognition to obtain a first face detection model, the first face detection model including the public network structure of the associated models , several output branches, and a loss function corresponding to each of the output branches;
  • the second obtaining module 402 is used to train the first face detection model according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model;
  • the third obtaining module 403 is configured to detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
  • the first obtaining module 401 includes:
  • a construction unit configured to obtain at least two basic networks of models associated with face recognition, share the acquired model parameters of each of the basic networks, and construct the public network of the first face detection model structure;
  • An acquisition unit configured to respectively acquire loss functions of at least two models associated with face recognition, and use each of the acquired loss functions as each output branch of the first face detection model;
  • the first obtaining unit is configured to obtain the first face detection model based on the public network structure and each of the output branches.
  • the second obtaining module 402 includes:
  • An update unit configured to update the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
  • a determining unit configured to equalize the parameter update of the first face detection model based on the preset loss weight, and determine the convergence of the first face detection model after the parameter update based on the full key point loss function degree to obtain the second face detection model.
  • the update unit includes:
  • the first update subunit is configured to input each training sample in the training sample set into the first face detection model, and based on the preset data label of each training sample, perform an operation on the first face detection model.
  • Each output branch performs parameter update;
  • the second update subunit is configured to reversely update the public network structure of the first face detection model based on the parameter update results of each output branch of the first face detection model.
  • the determination unit includes:
  • the adjustment subunit is used to balance the proportions of each output branch of the first face detection model in the parameter update process based on the preset loss weight, and adjust the corresponding parameters according to the order of magnitude of each output branch;
  • the update subunit is used to update the detection result of the first face detection model on the confidence of the face key points based on the full key point loss function until the first face detection model converges and stabilizes, and obtains the The second face detection model.
  • the full amount of key point loss function includes a face key point confidence label that has an impact on the face pose; the update subunit is specifically used for:
  • the confidence label of the key point of the face that has an impact on the face pose is related to the degree of occlusion of the key points of the face and the size of the face pose angle; the update subunit is specifically used for:
  • the loss coefficient of the first face detection model for face positioning is updated
  • the second face detection model is obtained.
  • the above speech synthesis method can be implemented in the form of a computer program, and the computer program can be run on the device shown in FIG. 4 .
  • FIG. 5 is a schematic structural block diagram of a multi-task fusion face locating device provided by an embodiment of the present application.
  • the multi-tasking fusion face positioning device includes a processor connected through a system bus, a memory and a network interface, wherein the memory may include a non-volatile storage medium and an internal memory.
  • Non-volatile storage media can store operating systems and computer programs.
  • the computer program includes program instructions.
  • the processor can be executed to perform any multi-task fusion face location method.
  • the processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
  • the internal memory provides an environment for the operation of the computer program in the non-volatile storage medium.
  • the processor can execute any multi-task fusion face positioning method.
  • This network interface is used for network communication, such as sending assigned tasks, etc.
  • Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the terminal to which the solution of this application is applied.
  • the specific multitasking fusion The face locating device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor is used to run a computer program stored in the memory to implement the following steps:
  • the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
  • the first human face detection model is trained according to the training sample set to obtain the second human face detection model
  • the face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
  • the fusion of at least two models associated with face recognition to obtain a first face detection model includes:
  • the first human face detection model is obtained.
  • the first face detection model is trained according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model, including:
  • updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set includes:
  • the parameter update of the first face detection model is balanced based on the preset loss weight, and the first face detection model after parameter update is determined based on the full key point loss function.
  • the degree of convergence of the model obtains the second face detection model, including:
  • the full amount of key point loss function includes a face key point confidence label that has an impact on the face pose
  • the confidence label of the key point of the face that has an impact on the face pose is related to the degree of occlusion of the key points of the face and the size of the face pose angle;
  • update the detection result of the first face detection model on the face key point confidence until the first face detection model converges Stable, obtain described second human face detection model, comprise:
  • the loss coefficient of the first face detection model for face positioning is updated
  • the second face detection model is obtained.
  • Embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application.
  • the computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device.
  • the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD ) card, flash memory card (Flash Card), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of face recognition, and discloses a face positioning method, apparatus and device based on multi-task fusion, and a storage medium. The method comprises: after a first face detection model is obtained by fusing at least two models associated with face recognition, training the first face detection model according to a training sample set on the basis of a preset loss weight and a total key point loss function to obtain a second face detection model; and detecting, on the basis of the second face detection model, a face to be recognized to obtain a face positioning result and a face quality detection result for said face. The recognition error of a face recognition model caused by an unbalanced face posture can be solved, and the recognition efficiency can be ensured while the face recognition precision is improved.

Description

多任务融合的人脸定位方法、装置、设备及存储介质Face positioning method, device, equipment and storage medium for multi-task fusion
本申请要求本申请要求于2021年6月2日提交中国专利局、申请号为202110609385.1,发明名称为“多任务融合的人脸定位方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on June 2, 2021, with the application number 202110609385.1, and the title of the invention is "Multi-task fusion face positioning method, device, equipment and storage medium" , the entire contents of which are incorporated in this application by reference.
技术领域technical field
本申请涉及人脸识别技术领域,尤其涉及一种多任务融合的人脸定位方法、装置、设备及存储介质。The present application relates to the technical field of face recognition, and in particular to a multi-task fusion face positioning method, device, equipment and storage medium.
背景技术Background technique
发明人意识到,基于大数据的人脸识别技术,其识别性能主要依赖于采集的人脸数据质量,而由于人脸数据质量受众多因素的影响。因此,在进行人脸识别时,需要同时对跟踪的人脸进行质量判断,并选取质量较好的图像帧进行人脸识别。现有技术中就需要人脸跟踪模型与人脸质量判断模型对同一张图片进行识别,在有些情况下,进行人脸质量判断时甚至需要多个模型(例如光照模型,模糊模型,姿态判断模型,遮挡判断模型等),这就导致整个人脸识别过程的算力较低,产生严重延时问题,影响用户的体验效果。The inventor realized that the recognition performance of face recognition technology based on big data mainly depends on the quality of collected face data, and the quality of face data is affected by many factors. Therefore, when performing face recognition, it is necessary to judge the quality of the tracked face at the same time, and select an image frame with better quality for face recognition. In the prior art, the face tracking model and the face quality judgment model are required to recognize the same picture. In some cases, even multiple models (such as illumination model, blur model, attitude judgment model, etc.) are required for face quality judgment. , occlusion judgment model, etc.), which leads to low computing power in the entire face recognition process, causing serious delay problems and affecting user experience.
发明内容Contents of the invention
本申请提供了一种多任务融合的人脸定位方法、装置、设备及存储介质,能够解决人脸识别模型由于人脸姿态不均衡而导致的识别误差,提升人脸识别精度的同时能够保证识别效率。This application provides a multi-task fusion face positioning method, device, equipment and storage medium, which can solve the recognition error caused by the unbalanced face posture of the face recognition model, improve the accuracy of face recognition and ensure recognition efficiency.
第一方面,本申请提供了一种多任务融合的人脸定位方法,所述方法包括:In the first aspect, the present application provides a multi-task fusion face positioning method, the method comprising:
将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;Fusing at least two models associated with face recognition to obtain a first face detection model, the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;Based on the preset loss weight and the full key point loss function, the first human face detection model is trained according to the training sample set to obtain the second human face detection model;
基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
第二方面,本申请还提供了一种多任务融合的人脸定位装置,包括:In the second aspect, the present application also provides a multi-task fusion face positioning device, including:
第一得到模块,用于将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;A first obtaining module, configured to fuse at least two models associated with face recognition to obtain a first face detection model, the first face detection model including the public network structure of the associated models, A plurality of output branches, a loss function corresponding to each of the output branches;
第二得到模块,用于基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;The second obtaining module is used to train the first face detection model according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model;
第三得到模块,用于基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The third obtaining module is configured to detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
第三方面,本申请还提供了一种多任务融合的人脸定位设备,包括:In the third aspect, the present application also provides a multi-task fusion face positioning device, including:
存储器和处理器;memory and processor;
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如 上第一方面所述的多任务融合的人脸定位方法的步骤。The processor is configured to execute the computer program and realize the steps of the multi-task fusion face location method as described in the first aspect when executing the computer program.
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如上第一方面所述的多任务融合的人脸定位方法的步骤。In the fourth aspect, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor realizes the above-mentioned first aspect. The steps of the multi-task fusion face localization method.
本申请公开了一种多任务融合的人脸定位方法、装置、设备及存储介质,通过将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型后,基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。能够解决人脸识别模型由于人脸姿态不均衡而导致的识别误差,提升人脸识别精度的同时能够保证识别效率。The present application discloses a multi-task fusion face positioning method, device, equipment and storage medium. By fusing at least two models associated with face recognition to obtain the first face detection model, based on the preset The loss weight and the full amount of key point loss function, the first human face detection model is trained according to the training sample set, and the second human face detection model is obtained; the human face to be recognized is detected based on the second human face detection model, A face positioning result and a face quality detection result of the face to be recognized are obtained. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
附图说明Description of drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1是本申请一实施例提供的多任务融合的人脸定位方法的实现流程图;Fig. 1 is the implementation flowchart of the multi-task fusion face location method provided by an embodiment of the present application;
图2是图1中S101的具体实现流程图;Fig. 2 is the specific implementation flowchart of S101 in Fig. 1;
图3是图1中S102的具体实现流程图;Fig. 3 is the specific implementation flowchart of S102 in Fig. 1;
图4是本申请实施例提供的多任务融合的人脸定位装置的结构示意图;FIG. 4 is a schematic structural diagram of a multi-task fusion face positioning device provided by an embodiment of the present application;
图5是本申请实施例提供的多任务融合的人脸定位设备的结构示意性框图。Fig. 5 is a schematic structural block diagram of a multi-task fusion face locating device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flow charts shown in the drawings are just illustrations, and do not necessarily include all contents and operations/steps, nor must they be performed in the order described. For example, some operations/steps can be decomposed, combined or partly combined, so the actual order of execution may be changed according to the actual situation.
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terms used in the specification of this application are for the purpose of describing specific embodiments only and are not intended to limit the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include plural referents unless the context clearly dictates otherwise.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.
本申请的实施例提供了一种多任务融合的人脸定位方法、装置、设备及存储介质。本申请实施例提供的多任务融合的人脸定位方法,通过将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型后,基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训 练,得到第二人脸检测模型;基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。能够解决人脸识别模型由于人脸姿态不均衡而导致的识别误差,提升人脸识别精度的同时能够保证识别效率。Embodiments of the present application provide a multi-task fusion face location method, device, device, and storage medium. The multi-task fusion face positioning method provided by the embodiment of the present application, after obtaining the first face detection model by fusing at least two models associated with face recognition, based on the preset loss weight and full key points A loss function, training the first face detection model according to the training sample set to obtain a second face detection model; detecting the face to be recognized based on the second face detection model to obtain the face to be recognized Face localization results and face quality detection results. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some implementations of the present application will be described in detail below in conjunction with the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.
请参阅图1,图1是本申请一实施例提供的多任务融合的人脸定位方法的示意流程图。该多任务融合的人脸定位方法可以由服务器或者终端实现,所述服务器可以是单个服务器或者服务器集群。所述终端可以是手持终端、笔记本电脑、可穿戴设备或者机器人等。Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of a multi-task fusion face location method provided by an embodiment of the present application. The multi-task fusion face positioning method can be implemented by a server or a terminal, and the server can be a single server or a server cluster. The terminal may be a handheld terminal, a notebook computer, a wearable device, a robot, or the like.
如图1所示,图1是本申请一实施例提供的多任务融合的人脸定位方法的实现流程图。具体包括:步骤S101至步骤S103。详述如下:As shown in FIG. 1 , FIG. 1 is an implementation flowchart of a multi-task fusion face location method provided by an embodiment of the present application. Specifically include: step S101 to step S103. The details are as follows:
S101,将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数。S101. Fuse at least two models associated with face recognition to obtain a first face detection model, where the first face detection model includes a public network structure of the associated models, several output branches, Each of the output branches corresponds to a loss function.
其中,至少两个与人脸识别相关联的模型可以是人脸定位模型、人脸质量检测模型,和/或人脸姿态识别模型等。示例性地,人脸定位模型用于从图像中定位人脸位置;人脸质量检测模型用于检测人脸是否存在遮挡以及存在遮挡的位置;人脸姿态识别模型用于识别人脸的各预设关键点是否存在较大幅度的姿态,例如是否存在闭眼、张嘴等姿态。在本实施例中,通过将具有不同功能的与人脸识别相关联的至少两个模型进行融合,使得融合之后的模型可以直接进行多任务人脸识别,例如同时进行人脸定位以及人脸质量检测(遮挡或者存在大幅度姿态)等,能够有效提升多任务识别的效率。Wherein, the at least two models associated with face recognition may be a face positioning model, a face quality detection model, and/or a face gesture recognition model. Exemplarily, the face positioning model is used to locate the position of the face from the image; the face quality detection model is used to detect whether the face is occluded and the position of the occlusion; Set whether there is a relatively large gesture at the key point, such as whether there is a gesture such as closing the eyes or opening the mouth. In this embodiment, by fusing at least two models associated with face recognition with different functions, the fused model can directly perform multi-task face recognition, such as face positioning and face quality at the same time Detection (occlusion or presence of large gestures), etc., can effectively improve the efficiency of multi-task recognition.
具体地,将至少两个与人脸识别相关联的模型进行融合,可以得到具有公共基础网络以及多个输出分支的第一人脸检测模型。Specifically, by fusing at least two models associated with face recognition, a first face detection model with a common basic network and multiple output branches can be obtained.
示例性地,如图2所示,图2是图1中S101的具体实现流程图。由图2可知,在本实施例中,S101包括S1011至S1013。详述如下:Exemplarily, as shown in FIG. 2 , FIG. 2 is a specific implementation flowchart of S101 in FIG. 1 . It can be seen from FIG. 2 that in this embodiment, S101 includes S1011 to S1013. The details are as follows:
S1011,分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构。S1011. Acquire at least two basic networks of models associated with face recognition, share the acquired model parameters of each of the basic networks, and construct the common network structure of the first face detection model.
其中,至少两个与人脸识别相关联的模型的基础网络可以分别由不同或者相同的卷积层构成;在本实施例中,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构的过程为将获取的各个与人脸识别相关联的模型的卷积层通过共享的方式分别进行合并,得到所有卷积层的合集,得到所述第一人脸检测模型的所述公共网络结构。Wherein, the basic networks of at least two models associated with face recognition may be composed of different or identical convolutional layers; in this embodiment, the acquired model parameters of each of the basic networks are shared to construct the The process of the public network structure of the first face detection model is to merge the obtained convolutional layers of each model associated with face recognition in a shared manner to obtain a collection of all convolutional layers, and obtain the obtained The public network structure of the first face detection model.
S1012,分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支。S1012. Acquire loss functions of at least two models associated with face recognition, and use each of the obtained loss functions as each output branch of the first face detection model.
其中,各个与人脸识别相关联的模型的损失函数可以分别为在各个与人脸识别相关联的模型的训练过程中预设的分类函数,例如,绝对值损失函数,log对数损失函数,平方损失函数,指数损失函数,Hinge损失函数,交叉熵损失函 数等。可以理解地,各个与人脸识别相关联的模型的损失函数可以相同,也可以不同,其主要由各个与人脸识别相关联的模型的用途进行确定,在此不再赘述。Wherein, the loss function of each model associated with face recognition can be a classification function preset in the training process of each model associated with face recognition, for example, an absolute value loss function, a log logarithmic loss function, Square loss function, exponential loss function, Hinge loss function, cross entropy loss function, etc. Understandably, the loss functions of the models associated with face recognition may be the same or different, which are mainly determined by the usage of each model associated with face recognition, and will not be repeated here.
S1013,基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。S1013. Obtain the first face detection model based on the public network structure and each of the output branches.
在本实施例中,所述第一人脸检测模型包括基础网络结构和多个输出分支,其中,基础网络结构为各个与人脸识别相关联的模型的卷积层的并集,多个输出分支分别为各个与人脸识别相关联的模型的损失函数。In this embodiment, the first face detection model includes a basic network structure and multiple output branches, wherein the basic network structure is a union of convolutional layers of models associated with face recognition, and multiple output branches The branches are respectively the loss functions of the models associated with face recognition.
S102,基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型。S102. Based on the preset loss weight and the full key point loss function, train the first face detection model according to the training sample set to obtain a second face detection model.
在一实施例中,预设的损失权重为各个与人脸识别相关联的模型各自对应的损失函数的权重,各个预设的损失权重用于均衡各自对应损失函数在第一人脸检测模型拟合过程中的占比,分别根据各自对应损失函数的数量级调整损失函数的值。In one embodiment, the preset loss weights are the weights of the loss functions corresponding to the models associated with face recognition, and each preset loss weight is used to balance the corresponding loss functions of the first face detection model. According to the proportion in the combination process, the value of the loss function is adjusted according to the magnitude of the corresponding loss function.
进一步地,预设的损失权重随着对第一人脸检测模型的训练,以及根据第一人脸识别模型对不同任务场景识别要求的不同,可以进行调整。例如,在第一人脸识别模型用于人脸姿态精度要求较高的场景识别中,可以调整该场景识别下对应损失函数的权重,以提升第一人脸识别模型在该识别场景下的拟合能力。Further, the preset loss weight can be adjusted according to the training of the first face detection model and according to the different recognition requirements of the first face recognition model for different task scenarios. For example, when the first face recognition model is used for scene recognition requiring high face pose accuracy, the weight of the corresponding loss function under the scene recognition can be adjusted to improve the simulation performance of the first face recognition model in the recognition scene. combined ability.
其中,全量关键点损失函数可以表示为:Among them, the full key point loss function can be expressed as:
Figure PCTCN2022072186-appb-000001
Figure PCTCN2022072186-appb-000001
其中,L lmk的值用于表示人脸关键点是否有大幅度姿态,例如L lmk的值为1表示没有大幅度姿态,L lmk的值为0表示预设的人脸关键点处有大幅度姿态;z为识别得到的目标人脸关键点,y,p,r分别为预设的人脸关键点(例如,眼睛、嘴巴、鼻子等),x为对应识别得到的目标关键点的变化幅度,θ为人脸关键点对应的姿态角变化大小。 Among them, the value of L lmk is used to indicate whether there is a large-scale gesture at the key points of the face. For example, the value of L lmk is 1, indicating that there is no large-scale gesture, and the value of L lmk is 0, indicating that there is a large-scale gesture at the preset face key points. Posture; z is the recognized key point of the target face, y, p, r are the preset key points of the face (for example, eyes, mouth, nose, etc.), and x is the range of change of the corresponding recognized target key point , θ is the attitude angle change size corresponding to the key points of the face.
在一实施例中,全量关键点损失函数主要用于对第一人脸检测模型训练过程中,针对第一人脸检测模型输出的对各个预设的人脸关键点,例如人脸轮廓、眼睛、嘴巴、鼻子等人脸关键点置信度的检测结果进行调整。In one embodiment, the full amount of key point loss function is mainly used in the training process of the first face detection model, for each preset face key points output by the first face detection model, such as face contour, eyes , mouth, nose and other face key point confidence detection results are adjusted.
示例性地,如图3所示,图3是图1中S102的具体实现流程图。由图3可知,在本实施例中,S102包括S1021至S1022。详述如下:Exemplarily, as shown in FIG. 3 , FIG. 3 is a specific implementation flowchart of S102 in FIG. 1 . It can be seen from FIG. 3 that in this embodiment, S102 includes S1021 to S1022. The details are as follows:
S1021,根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数。S1021. Update parameters of the first face detection model according to preset data labels of each training sample in the training sample set.
在一实施例中,所述根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数,可以包括:将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;基于对所述第一人脸 检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。In an embodiment, updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set may include: inputting each training sample in the training sample set into the The first human face detection model, based on the preset data labels of the various training samples, update the parameters of each output branch of the first human face detection model; based on each output of the first human face detection model The parameter update result of the branch is reversely updated to the public network structure of the first face detection model.
S1022,基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。S1022. Equalize the parameter update of the first face detection model based on the preset loss weight, and determine the convergence degree of the first face detection model after the parameter update based on the full key point loss function, to obtain The second face detection model.
在一实施例中,所述基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型,可以包括:基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。In an embodiment, the parameter update of the first face detection model is balanced based on the preset loss weight, and the first face detection model after parameter update is determined based on the full key point loss function. The degree of convergence of the model, to obtain the second face detection model, may include: balancing the proportions of each output branch of the first face detection model in the process of parameter update based on the preset loss weight, according to each The order of magnitude of the output branch adjusts the respective corresponding parameters; based on the full amount of key point loss function, update the detection result of the first face detection model to the confidence of the key points of the face until the first face detection model converges and stabilizes , to obtain the second face detection model.
其中,所述第一人脸检测模型收敛稳定性由预设的人脸定位损失系数α′的值确定,具体地,α′可以表示为:Wherein, the convergence stability of the first face detection model is determined by the value of the preset face positioning loss coefficient α', specifically, α' can be expressed as:
Figure PCTCN2022072186-appb-000002
Figure PCTCN2022072186-appb-000002
其中,α表示人脸定位的概率值,L lmk的值用于表示人脸关键点是否有大幅度姿态;θ为人脸关键点对应的姿态角变化大小;y,p,r分别为预设的人脸关键点(例如,眼睛、嘴巴、鼻子等);mask i表示定位到人脸的概率值。 Among them, α represents the probability value of face positioning, and the value of L lmk is used to indicate whether the key points of the face have a large attitude; θ is the change in the attitude angle corresponding to the key points of the face; y, p, r are the preset Face key points (for example, eyes, mouth, nose, etc.); mask i represents the probability value of locating the face.
其中,所述全量关键点损失函数包括对人脸姿态具有影响的人脸关键点置信度标签;所述基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,可以包括:基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Wherein, the full amount of key point loss function includes a face key point confidence label that has an impact on the pose of the face; based on the full amount of key point loss function, the first face detection model is updated to determine the face key point Confidence detection results, until the first face detection model converges and stabilizes, and obtaining the second face detection model may include: based on the confidence label of the key point of the face that has an impact on the face posture, updating The first face detection model detects the confidence level of key points of the face until the first face detection model converges and stabilizes to obtain the second face detection model.
示例性地,所述对人脸姿态具有影响的人脸关键点置信度标签与人脸关键点的遮挡程度以及人脸姿态角大小相关;在一实施例中,所述基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,可以包括:基于所述对人脸姿态具有影响的人脸关键点置信度标签,确定人脸关键点的遮挡程度以及人脸姿态角大小;根据人脸关键点的遮挡程度以及人脸姿态角大小,更新所述第一人脸检测模型对人脸定位的损失系数;根据更新后的所述第一人脸检测模型对人脸定位的损失系数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Exemplarily, the face key point confidence label that has an impact on the face pose is related to the degree of occlusion of the face key points and the size of the face pose angle; The face key point confidence label that the posture has influence, update the detection result of the first face detection model to the face key point confidence, until the convergence and stability of the first face detection model, obtain the second person The face detection model may include: based on the human face key point confidence label that has an impact on the human face posture, determine the occlusion degree of the human face key point and the size of the human face pose angle; according to the occlusion degree of the human face key point and the human face The size of the face pose angle, update the loss coefficient of the first face detection model to face positioning; update the first face detection according to the updated loss coefficient of the first face detection model to face positioning The detection results of the model on the confidence of key points of the face are obtained until the first face detection model converges and stabilizes to obtain the second face detection model.
S103,基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。S103. Detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
其中,基于所述第二人脸检测模型可以对待识别人脸进行人脸定位以及人脸质量检测;对应地,人脸质量检测包括人脸是否存在遮挡、人脸存在遮挡时, 对应遮挡区域对应的人脸关键点、人脸关键点是否存在姿态变化以及存在姿态变化时对应的姿态角大小。Wherein, face positioning and face quality detection can be performed on the face to be recognized based on the second face detection model; The key points of the face, whether there is a pose change at the face key point, and the corresponding pose angle when there is a pose change.
通过上述分析可知,本实施例提供的多任务融合的人脸定位方法,通过将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型后,基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。能够解决人脸识别模型由于人脸姿态不均衡而导致的识别误差,提升人脸识别精度的同时能够保证识别效率。From the above analysis, it can be known that the multi-task fusion face location method provided in this embodiment, after obtaining the first face detection model by fusing at least two models associated with face recognition, based on the preset loss weight and a full amount of key point loss function, the first face detection model is trained according to the training sample set to obtain the second face detection model; based on the second face detection model, the face to be recognized is detected to obtain the Describe the face positioning results and face quality detection results of the faces to be recognized. It can solve the recognition error caused by the unbalanced face posture of the face recognition model, and improve the accuracy of face recognition while ensuring the recognition efficiency.
请参阅图4,图4是本申请实施例提供的多任务融合的人脸定位装置的结构示意图。该语音合成装置用于执行图1实施例所示的多任务融合的人脸定位方法的步骤。该多任务融合的人脸定位装置可以是单个服务器或服务器集群,或者该多任务融合的人脸定位装置可以是终端,该终端可以是手持终端、笔记本电脑、可穿戴设备或者机器人等。Please refer to FIG. 4 . FIG. 4 is a schematic structural diagram of a multi-task fusion face locating device provided by an embodiment of the present application. The speech synthesis device is used to execute the steps of the multi-task fusion face location method shown in the embodiment of FIG. 1 . The multi-tasking fusion face positioning device can be a single server or server cluster, or the multi-task fusion face positioning device can be a terminal, and the terminal can be a handheld terminal, a notebook computer, a wearable device, or a robot.
如图4所示,多任务融合的人脸定位装置400包括:As shown in Figure 4, the multi-task fusion face positioning device 400 includes:
第一得到模块401,用于将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;The first obtaining module 401 is configured to fuse at least two models associated with face recognition to obtain a first face detection model, the first face detection model including the public network structure of the associated models , several output branches, and a loss function corresponding to each of the output branches;
第二得到模块402,用于基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;The second obtaining module 402 is used to train the first face detection model according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model;
第三得到模块403,用于基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The third obtaining module 403 is configured to detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
在一实施例中,第一得到模块401,包括:In one embodiment, the first obtaining module 401 includes:
构建单元,用于分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构;A construction unit, configured to obtain at least two basic networks of models associated with face recognition, share the acquired model parameters of each of the basic networks, and construct the public network of the first face detection model structure;
获取单元,用于分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支;An acquisition unit, configured to respectively acquire loss functions of at least two models associated with face recognition, and use each of the acquired loss functions as each output branch of the first face detection model;
第一得到单元,用于基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。The first obtaining unit is configured to obtain the first face detection model based on the public network structure and each of the output branches.
在一实施例中,第二得到模块402,包括:In one embodiment, the second obtaining module 402 includes:
更新单元,用于根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数;An update unit, configured to update the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
确定单元,用于基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。A determining unit, configured to equalize the parameter update of the first face detection model based on the preset loss weight, and determine the convergence of the first face detection model after the parameter update based on the full key point loss function degree to obtain the second face detection model.
在一实施例中,更新单元,包括:In one embodiment, the update unit includes:
第一更新子单元,用于将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;The first update subunit is configured to input each training sample in the training sample set into the first face detection model, and based on the preset data label of each training sample, perform an operation on the first face detection model. Each output branch performs parameter update;
第二更新子单元,用于基于对所述第一人脸检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。The second update subunit is configured to reversely update the public network structure of the first face detection model based on the parameter update results of each output branch of the first face detection model.
在一实施例中,确定单元,包括:In one embodiment, the determination unit includes:
调整子单元,用于基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;The adjustment subunit is used to balance the proportions of each output branch of the first face detection model in the parameter update process based on the preset loss weight, and adjust the corresponding parameters according to the order of magnitude of each output branch;
更新子单元,用于基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。The update subunit is used to update the detection result of the first face detection model on the confidence of the face key points based on the full key point loss function until the first face detection model converges and stabilizes, and obtains the The second face detection model.
在一实施例中,所述全量关键点损失函数包括对人脸姿态具有影响的人脸关键点置信度标签;所述更新子单元,具体用于:In an embodiment, the full amount of key point loss function includes a face key point confidence label that has an impact on the face pose; the update subunit is specifically used for:
基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the face key point confidence label that has an impact on the face posture, update the detection result of the first face detection model to the face key point confidence until the first face detection model converges and stabilizes, The second human face detection model is obtained.
在一实施例中,所述对人脸姿态具有影响的人脸关键点置信度标签与人脸关键点的遮挡程度以及人脸姿态角大小相关;所述更新子单元,具体用于:In an embodiment, the confidence label of the key point of the face that has an impact on the face pose is related to the degree of occlusion of the key points of the face and the size of the face pose angle; the update subunit is specifically used for:
基于所述对人脸姿态具有影响的人脸关键点置信度标签,确定人脸关键点的遮挡程度以及人脸姿态角大小;Determine the degree of occlusion of the key points of the face and the size of the face pose angle based on the confidence label of the key points of the face that has an impact on the pose of the face;
根据人脸关键点的遮挡程度以及人脸姿态角大小,更新所述第一人脸检测模型对人脸定位的损失系数;According to the degree of occlusion of the key points of the face and the size of the face pose angle, the loss coefficient of the first face detection model for face positioning is updated;
根据更新后的所述第一人脸检测模型对人脸定位的损失系数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。According to the updated loss coefficient of the first face detection model for face positioning, update the detection result of the first face detection model for the confidence of key points of the face until the first face detection model converges stable, the second face detection model is obtained.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的语音合成装置和各模块的具体工作过程,可以参考图1实施例所述的多任务融合的人脸定位方法实施例中的对应过程,在此不再赘述。It should be noted that those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described speech synthesis device and each module can refer to the multi-task fusion described in the embodiment of FIG. 1 The corresponding process in the embodiment of the face location method will not be repeated here.
上述的语音合成方法可以实现为一种计算机程序的形式,该计算机程序可以在如图4所示的装置上运行。The above speech synthesis method can be implemented in the form of a computer program, and the computer program can be run on the device shown in FIG. 4 .
请参阅图5,图5是本申请实施例提供的多任务融合的人脸定位设备的结构示意性框图。该多任务融合的人脸定位设备包括通过***总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。Please refer to FIG. 5 . FIG. 5 is a schematic structural block diagram of a multi-task fusion face locating device provided by an embodiment of the present application. The multi-tasking fusion face positioning device includes a processor connected through a system bus, a memory and a network interface, wherein the memory may include a non-volatile storage medium and an internal memory.
非易失性存储介质可存储操作***和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种多任务融合的人脸定位方法。Non-volatile storage media can store operating systems and computer programs. The computer program includes program instructions. When the program instructions are executed, the processor can be executed to perform any multi-task fusion face location method.
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。The processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种多任务融合的人脸定位方法。The internal memory provides an environment for the operation of the computer program in the non-volatile storage medium. When the computer program is executed by the processor, the processor can execute any multi-task fusion face positioning method.
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的多任务融合的人脸定 位设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。This network interface is used for network communication, such as sending assigned tasks, etc. Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the terminal to which the solution of this application is applied. The specific multitasking fusion The face locating device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:Wherein, in one embodiment, the processor is used to run a computer program stored in the memory to implement the following steps:
将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;Fusing at least two models associated with face recognition to obtain a first face detection model, the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;Based on the preset loss weight and the full key point loss function, the first human face detection model is trained according to the training sample set to obtain the second human face detection model;
基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
在一实施例中,所述将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,包括:In one embodiment, the fusion of at least two models associated with face recognition to obtain a first face detection model includes:
分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构;Obtaining at least two basic networks of models associated with face recognition respectively, sharing the acquired model parameters of each of the basic networks, and constructing the public network structure of the first face detection model;
分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支;Obtaining loss functions of at least two models associated with face recognition respectively, using each of the obtained loss functions as each output branch of the first face detection model;
基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。Based on the public network structure and each of the output branches, the first human face detection model is obtained.
在一实施例中,所述基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型,包括:In one embodiment, the first face detection model is trained according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model, including:
根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数;Updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。Balance the parameter update of the first face detection model based on the preset loss weight, and determine the degree of convergence of the first face detection model after parameter update based on the full key point loss function, to obtain the The second face detection model.
在一实施例中,所述根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数,包括:In one embodiment, updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set includes:
将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;Input each training sample in the training sample set into the first face detection model, and update the parameters of each output branch of the first face detection model based on the preset data label of each training sample;
基于对所述第一人脸检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。Based on the parameter update results of each output branch of the first face detection model, reversely update the public network structure of the first face detection model.
在一实施例中,所述基于预设的所述损失权重均衡所述第一人脸检测模型 的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型,包括:In an embodiment, the parameter update of the first face detection model is balanced based on the preset loss weight, and the first face detection model after parameter update is determined based on the full key point loss function. The degree of convergence of the model obtains the second face detection model, including:
基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;Balancing the proportions of each output branch of the first face detection model in the parameter update process based on the preset loss weight, and adjusting the respective corresponding parameters according to the order of magnitude of each output branch;
基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the full key point loss function, update the detection result of the first face detection model on the confidence of face key points until the first face detection model converges and stabilizes, and obtain the second face detection model .
在一实施例中,所述全量关键点损失函数包括对人脸姿态具有影响的人脸关键点置信度标签;In one embodiment, the full amount of key point loss function includes a face key point confidence label that has an impact on the face pose;
所述基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,包括:Based on the full key point loss function, update the detection result of the first face detection model on the confidence of the key points of the face until the first face detection model converges and stabilizes, and obtains the second face Detection models, including:
基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the face key point confidence label that has an impact on the face posture, update the detection result of the first face detection model to the face key point confidence until the first face detection model converges and stabilizes, The second human face detection model is obtained.
在一实施例中,所述对人脸姿态具有影响的人脸关键点置信度标签与人脸关键点的遮挡程度以及人脸姿态角大小相关;In one embodiment, the confidence label of the key point of the face that has an impact on the face pose is related to the degree of occlusion of the key points of the face and the size of the face pose angle;
所述基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,包括:Based on the face key point confidence label that has an impact on the face pose, update the detection result of the first face detection model on the face key point confidence until the first face detection model converges Stable, obtain described second human face detection model, comprise:
基于所述对人脸姿态具有影响的人脸关键点置信度标签,确定人脸关键点的遮挡程度以及人脸姿态角大小;Determine the degree of occlusion of the key points of the face and the size of the face pose angle based on the confidence label of the key points of the face that has an impact on the pose of the face;
根据人脸关键点的遮挡程度以及人脸姿态角大小,更新所述第一人脸检测模型对人脸定位的损失系数;According to the degree of occlusion of the key points of the face and the size of the face pose angle, the loss coefficient of the first face detection model for face positioning is updated;
根据更新后的所述第一人脸检测模型对人脸定位的损失系数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。According to the updated loss coefficient of the first face detection model for face positioning, update the detection result of the first face detection model for the confidence of key points of the face until the first face detection model converges stable, the second face detection model is obtained.
本申请的实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序中包括程序指令,所述处理器执行所述程序指令,实现本申请图1实施例提供的多任务融合的人脸定位方法的步骤。Embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application. The steps of the multi-task fusion face location method provided in the embodiment of FIG. 1 .
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。Wherein, the computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device. The computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD ) card, flash memory card (Flash Card), etc.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the application, but the scope of protection of the application is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the scope of the technology disclosed in the application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (20)

  1. 一种多任务融合的人脸定位方法,其特征在于,所述方法包括:A face location method of multi-task fusion, characterized in that said method comprises:
    将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;Fusing at least two models associated with face recognition to obtain a first face detection model, the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
    基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;Based on the preset loss weight and the full key point loss function, the first human face detection model is trained according to the training sample set to obtain the second human face detection model;
    基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
  2. 根据权利要求1所述的多任务融合的人脸定位方法,其特征在于,所述将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,包括:The face location method of multi-task fusion according to claim 1, wherein said at least two models associated with face recognition are fused to obtain a first face detection model, comprising:
    分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构;Obtaining at least two basic networks of models associated with face recognition respectively, sharing the acquired model parameters of each of the basic networks, and constructing the public network structure of the first face detection model;
    分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支;Obtaining loss functions of at least two models associated with face recognition respectively, using each of the obtained loss functions as each output branch of the first face detection model;
    基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。Based on the public network structure and each of the output branches, the first human face detection model is obtained.
  3. 根据权利要求1或2所述的多任务融合的人脸定位方法,其特征在于,所述基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型,包括:The face location method of multi-task fusion according to claim 1 or 2, wherein the preset loss weight and the full key point loss function are used for the first face detection model according to the training sample set Perform training to obtain the second face detection model, including:
    根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数;Updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
    基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。Balance the parameter update of the first face detection model based on the preset loss weight, and determine the degree of convergence of the first face detection model after parameter update based on the full key point loss function, to obtain the The second face detection model.
  4. 根据权利要求3所述的多任务融合的人脸定位方法,其特征在于,所述根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数,包括:The face location method of multi-task fusion according to claim 3, wherein, according to the preset data labels of each training sample in the training sample set, updating the parameters of the first face detection model includes :
    将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;Input each training sample in the training sample set into the first face detection model, and update the parameters of each output branch of the first face detection model based on the preset data label of each training sample;
    基于对所述第一人脸检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。Based on the parameter update results of each output branch of the first face detection model, reversely update the public network structure of the first face detection model.
  5. 根据权利要求4所述的多任务融合的人脸定位方法,其特征在于,所述基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型,包括:The face location method of multi-task fusion according to claim 4, wherein the parameter update of the first face detection model is equalized based on the preset loss weight, and based on the full amount of key point loss Function, determine the degree of convergence of the first human face detection model after parameter update, obtain the second human face detection model, including:
    基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行 参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;Each output branch of the first human face detection model is balanced based on the preset loss weight to carry out the proportion in the parameter update process, and the respective corresponding parameters are adjusted according to the order of magnitude of each output branch;
    基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the full key point loss function, update the detection result of the first face detection model on the confidence of face key points until the first face detection model converges and stabilizes, and obtain the second face detection model .
  6. 根据权利要求5所述的多任务融合的人脸定位方法,其特征在于,所述全量关键点损失函数包括对人脸姿态具有影响的人脸关键点置信度标签;The face location method of multi-task fusion according to claim 5, wherein the full amount of key point loss function includes a face key point confidence label that has an impact on the face posture;
    所述基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,包括:Based on the full key point loss function, update the detection result of the first face detection model on the confidence of the key points of the face until the first face detection model converges and stabilizes, and obtains the second face Detection models, including:
    基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the face key point confidence label that has an impact on the face posture, update the detection result of the first face detection model to the face key point confidence until the first face detection model converges and stabilizes, The second human face detection model is obtained.
  7. 根据权利要求6所述的多任务融合的人脸定位方法,其特征在于,所述对人脸姿态具有影响的人脸关键点置信度标签与人脸关键点的遮挡程度以及人脸姿态角大小相关;The face location method of multi-task fusion according to claim 6, characterized in that, the degree of occlusion of the key point confidence label of the face and the key point of the face and the size of the face pose angle that have an impact on the face pose relevant;
    所述基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型,包括:Based on the face key point confidence label that has an impact on the face pose, update the detection result of the first face detection model on the face key point confidence until the first face detection model converges Stable, obtain described second human face detection model, comprise:
    基于所述对人脸姿态具有影响的人脸关键点置信度标签,确定人脸关键点的遮挡程度以及人脸姿态角大小;Determine the degree of occlusion of the key points of the face and the size of the face pose angle based on the confidence label of the key points of the face that has an impact on the pose of the face;
    根据人脸关键点的遮挡程度以及人脸姿态角大小,更新所述第一人脸检测模型对人脸定位的损失系数;According to the degree of occlusion of the key points of the face and the size of the face pose angle, the loss coefficient of the first face detection model for face positioning is updated;
    根据更新后的所述第一人脸检测模型对人脸定位的损失系数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。According to the updated loss coefficient of the first face detection model for face positioning, update the detection result of the first face detection model for the confidence of key points of the face until the first face detection model converges stable, the second face detection model is obtained.
  8. 一种多任务融合的人脸定位装置,其特征在于,包括:A multi-task fusion face positioning device is characterized in that it comprises:
    第一得到模块,用于将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;A first obtaining module, configured to fuse at least two models associated with face recognition to obtain a first face detection model, the first face detection model including the public network structure of the associated models, A plurality of output branches, a loss function corresponding to each of the output branches;
    第二得到模块,用于基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;The second obtaining module is used to train the first face detection model according to the training sample set based on the preset loss weight and the full key point loss function to obtain the second face detection model;
    第三得到模块,用于基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The third obtaining module is configured to detect the face to be recognized based on the second face detection model, and obtain a face positioning result and a face quality detection result of the face to be recognized.
  9. 一种多任务融合的人脸定位设备,其特征在于,包括:A multi-task fusion face positioning device, characterized in that it includes:
    存储器和处理器;memory and processor;
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如下步骤:The processor is configured to execute the computer program and implement the following steps when executing the computer program:
    将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;Fusing at least two models associated with face recognition to obtain a first face detection model, the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
    基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;Based on the preset loss weight and the full key point loss function, the first human face detection model is trained according to the training sample set to obtain the second human face detection model;
    基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
  10. 根据权利要求9所述的多任务融合的人脸定位设备,其特征在于,所述处理器执行所述将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型时,实现:The multi-task fusion face positioning device according to claim 9, wherein when the processor executes the fusion of at least two models associated with face recognition to obtain the first face detection model ,accomplish:
    分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构;Obtaining at least two basic networks of models associated with face recognition respectively, sharing the acquired model parameters of each of the basic networks, and constructing the public network structure of the first face detection model;
    分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支;Obtaining loss functions of at least two models associated with face recognition respectively, using each of the obtained loss functions as each output branch of the first face detection model;
    基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。Based on the public network structure and each of the output branches, the first human face detection model is obtained.
  11. 根据权利要求9或10所述的多任务融合的人脸定位设备,其特征在于,所述处理器执行基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型时,实现:The multi-task fusion face positioning device according to claim 9 or 10, wherein the processor executes a loss function based on a preset loss weight and a full amount of key points, and performs the first face detection model according to The training sample set is trained, and when the second face detection model is obtained, it is realized:
    根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数;Updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
    基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。Balance the parameter update of the first face detection model based on the preset loss weight, and determine the degree of convergence of the first face detection model after parameter update based on the full key point loss function, to obtain the The second face detection model.
  12. 根据权利要求11所述的多任务融合的人脸定位设备,其特征在于,所述处理器执行根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数时,实现:The multi-task fusion face positioning device according to claim 11, wherein the processor executes to update the first face detection model according to the preset data labels of each training sample in the training sample set. Parameters, implement:
    将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;Input each training sample in the training sample set into the first face detection model, and update the parameters of each output branch of the first face detection model based on the preset data label of each training sample;
    基于对所述第一人脸检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。Based on the parameter update results of each output branch of the first face detection model, reversely update the public network structure of the first face detection model.
  13. 根据权利要求12所述的多任务融合的人脸定位设备,其特征在于,所述处理器执行基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型时,实现:The multi-task fusion face positioning device according to claim 12, wherein the processor executes parameter updating based on the preset loss weight equalization of the first face detection model, based on the full amount The key point loss function determines the degree of convergence of the first human face detection model after the parameter update, and when obtaining the second human face detection model, realizes:
    基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;Balancing the proportions of each output branch of the first face detection model in the parameter update process based on the preset loss weight, and adjusting the respective corresponding parameters according to the order of magnitude of each output branch;
    基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸 检测模型。Based on the full key point loss function, update the detection result of the first face detection model on the confidence of face key points until the first face detection model converges and stabilizes, and obtain the second face detection model .
  14. 根据权利要求13所述的多任务融合的人脸定位设备,其特征在于,所述全量关键点损失函数包括对人脸姿态具有影响的人脸关键点置信度标签;The multi-task fusion face positioning device according to claim 13, wherein the full key point loss function includes a face key point confidence label that has an impact on the face pose;
    所述处理器执行基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型时,实现:The processor executes the loss function based on the full amount of key points, and updates the detection result of the confidence of the key points of the face by the first face detection model until the first face detection model converges and stabilizes, and the first face detection model is obtained. When the two-person face detection model is implemented:
    基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the face key point confidence label that has an impact on the face posture, update the detection result of the first face detection model to the face key point confidence until the first face detection model converges and stabilizes, The second human face detection model is obtained.
  15. 根据权利要求14所述的多任务融合的人脸定位设备,其特征在于,所述对人脸姿态具有影响的人脸关键点置信度标签与人脸关键点的遮挡程度以及人脸姿态角大小相关;The multi-task fusion face positioning device according to claim 14, wherein the face key point confidence label having influence on the face pose and the occlusion degree of the face key points and the face pose angle relevant;
    所述处理器执行基于所述对人脸姿态具有影响的人脸关键点置信度标签,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型时,实现:The processor executes based on the face key point confidence label that has an impact on the face pose, and updates the detection result of the first face detection model on the face key point confidence until the first face The detection model is convergent and stable, and when the second face detection model is obtained, realize:
    基于所述对人脸姿态具有影响的人脸关键点置信度标签,确定人脸关键点的遮挡程度以及人脸姿态角大小;Determine the degree of occlusion of the key points of the face and the size of the face pose angle based on the confidence label of the key points of the face that has an impact on the pose of the face;
    根据人脸关键点的遮挡程度以及人脸姿态角大小,更新所述第一人脸检测模型对人脸定位的损失系数;According to the degree of occlusion of the key points of the face and the size of the face pose angle, the loss coefficient of the first face detection model for face positioning is updated;
    根据更新后的所述第一人脸检测模型对人脸定位的损失系数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。According to the updated loss coefficient of the first face detection model for face positioning, update the detection result of the first face detection model for the confidence of key points of the face until the first face detection model converges stable, the second face detection model is obtained.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如下步骤:A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the following steps:
    将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型,所述第一人脸检测模型包括所述相关联的模型的公共网络结构、若干个输出分支、每个所述输出分支各自对应的损失函数;Fusing at least two models associated with face recognition to obtain a first face detection model, the first face detection model includes the public network structure of the associated models, several output branches, each Loss functions corresponding to the respective output branches;
    基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型;Based on the preset loss weight and the full key point loss function, the first human face detection model is trained according to the training sample set to obtain the second human face detection model;
    基于所述第二人脸检测模型对待识别人脸进行检测,得到对所述待识别人脸的人脸定位结果和人脸质量检测结果。The face to be recognized is detected based on the second face detection model, and a face positioning result and a face quality detection result of the face to be recognized are obtained.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机程序被处理器执行实现所述将至少两个与人脸识别相关联的模型进行融合,得到第一人脸检测模型时,实现:The storage medium according to claim 16, wherein the computer program is executed by a processor to realize the fusion of at least two models associated with face recognition to obtain the first face detection model, :
    分别获取至少两个与人脸识别相关联的模型的基础网络,将获取的各个所述基础网络的模型参数进行共享,构建所述第一人脸检测模型的所述公共网络结构;Obtaining at least two basic networks of models associated with face recognition respectively, sharing the acquired model parameters of each of the basic networks, and constructing the public network structure of the first face detection model;
    分别获取至少两个与人脸识别相关联的模型的损失函数,以获取的各个所述损失函数作为所述第一人脸检测模型的各个输出分支;Obtaining loss functions of at least two models associated with face recognition respectively, using each of the obtained loss functions as each output branch of the first face detection model;
    基于所述公共网络结构和各个所述输出分支,得到所述第一人脸检测模型。Based on the public network structure and each of the output branches, the first human face detection model is obtained.
  18. 根据权利要求16或17所述的存储介质,其特征在于,所述计算机程序被处理器执行实现所述基于预设的损失权重和全量关键点损失函数,对所述第一人脸检测模型根据训练样本集进行训练,得到第二人脸检测模型时,实现:The storage medium according to claim 16 or 17, wherein the computer program is executed by a processor to implement the preset-based loss weight and the full key point loss function, and the first face detection model according to The training sample set is trained, and when the second face detection model is obtained, it is realized:
    根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数;Updating the parameters of the first face detection model according to the preset data labels of each training sample in the training sample set;
    基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型。Balance the parameter update of the first face detection model based on the preset loss weight, and determine the degree of convergence of the first face detection model after parameter update based on the full key point loss function, to obtain the The second face detection model.
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机程序被处理器执行实现所述根据所述训练样本集中各个训练样本的预设数据标签,更新所述第一人脸检测模型的参数时,实现:The storage medium according to claim 18, wherein the computer program is executed by a processor to implement the updating of the first face detection model according to the preset data labels of each training sample in the training sample set. Parameters, implement:
    将所述训练样本集中的各个训练样本输入所述第一人脸检测模型,基于所述各个训练样本的预设数据标签,对所述第一人脸检测模型的各个输出分支进行参数更新;Input each training sample in the training sample set into the first face detection model, and update the parameters of each output branch of the first face detection model based on the preset data label of each training sample;
    基于对所述第一人脸检测模型的各个输出分支的参数更新结果,反向更新所述第一人脸检测模型的所述公共网络结构。Based on the parameter update results of each output branch of the first face detection model, reversely update the public network structure of the first face detection model.
  20. 根据权利要求19所述的存储介质,其特征在于,所述计算机程序被处理器执行实现所述基于预设的所述损失权重均衡所述第一人脸检测模型的参数更新,基于所述全量关键点损失函数,确定参数更新后的所述第一人脸检测模型的收敛度,得到所述第二人脸检测模型时,实现:The storage medium according to claim 19, wherein the computer program is executed by the processor to implement the parameter update of the first face detection model based on the preset loss weight balance, based on the full amount The key point loss function determines the degree of convergence of the first human face detection model after the parameter update, and when obtaining the second human face detection model, realizes:
    基于预设的所述损失权重均衡所述第一人脸检测模型的各个输出分支进行参数更新过程中的占比,根据各个输出分支的数量级调整各自对应的参数;Balancing the proportions of each output branch of the first face detection model in the parameter update process based on the preset loss weight, and adjusting the respective corresponding parameters according to the order of magnitude of each output branch;
    基于所述全量关键点损失函数,更新所述第一人脸检测模型对人脸关键点置信度的检测结果,直至所述第一人脸检测模型收敛稳定,得到所述第二人脸检测模型。Based on the full key point loss function, update the detection result of the first face detection model on the confidence of face key points until the first face detection model converges and stabilizes, and obtain the second face detection model .
PCT/CN2022/072186 2021-06-01 2022-01-14 Face positioning method, apparatus and device based on multi-task fusion, and storage medium WO2022252635A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110609385.1 2021-06-01
CN202110609385.1A CN113255539B (en) 2021-06-01 2021-06-01 Multi-task fusion face positioning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022252635A1 true WO2022252635A1 (en) 2022-12-08

Family

ID=77185716

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072186 WO2022252635A1 (en) 2021-06-01 2022-01-14 Face positioning method, apparatus and device based on multi-task fusion, and storage medium

Country Status (2)

Country Link
CN (1) CN113255539B (en)
WO (1) WO2022252635A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255539B (en) * 2021-06-01 2024-05-10 平安科技(深圳)有限公司 Multi-task fusion face positioning method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325108A (en) * 2020-01-22 2020-06-23 中能国际建筑投资集团有限公司 Multitask network model, using method, device and storage medium
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN112580572A (en) * 2020-12-25 2021-03-30 深圳市优必选科技股份有限公司 Training method of multi-task recognition model, using method, equipment and storage medium
US20210158142A1 (en) * 2019-11-22 2021-05-27 Samsung Electronics Co., Ltd. Multi-task fusion neural network architecture
CN113255539A (en) * 2021-06-01 2021-08-13 平安科技(深圳)有限公司 Multi-task fusion face positioning method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751043B (en) * 2019-09-19 2023-08-22 平安科技(深圳)有限公司 Face recognition method and device based on face visibility and storage medium
CN111666873A (en) * 2020-06-05 2020-09-15 汪金玲 Training method, recognition method and system based on multitask deep learning network
CN111860259A (en) * 2020-07-10 2020-10-30 东莞正扬电子机械有限公司 Training and using method, device, equipment and medium of driving detection model
CN112232117A (en) * 2020-09-08 2021-01-15 深圳微步信息股份有限公司 Face recognition method, face recognition device and storage medium
CN112380923A (en) * 2020-10-26 2021-02-19 天津大学 Intelligent autonomous visual navigation and target detection method based on multiple tasks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210158142A1 (en) * 2019-11-22 2021-05-27 Samsung Electronics Co., Ltd. Multi-task fusion neural network architecture
CN111325108A (en) * 2020-01-22 2020-06-23 中能国际建筑投资集团有限公司 Multitask network model, using method, device and storage medium
CN111814706A (en) * 2020-07-14 2020-10-23 电子科技大学 Face recognition and attribute classification method based on multitask convolutional neural network
CN112580572A (en) * 2020-12-25 2021-03-30 深圳市优必选科技股份有限公司 Training method of multi-task recognition model, using method, equipment and storage medium
CN113255539A (en) * 2021-06-01 2021-08-13 平安科技(深圳)有限公司 Multi-task fusion face positioning method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RANJAN RAJEEV, PATEL VISHAL M., CHELLAPPA RAMA: "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 41, no. 1, 1 January 2019 (2019-01-01), USA , pages 121 - 135, XP093011821, ISSN: 0162-8828, DOI: 10.1109/TPAMI.2017.2781233 *

Also Published As

Publication number Publication date
CN113255539B (en) 2024-05-10
CN113255539A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
US10783364B2 (en) Method, apparatus and device for waking up voice interaction function based on gesture, and computer readable medium
JP2019535055A (en) Perform gesture-based operations
WO2021129527A1 (en) Sorting method and apparatus, device, and storage medium
US9256369B2 (en) Programmable memory controller
WO2020244075A1 (en) Sign language recognition method and apparatus, and computer device and storage medium
US9575822B2 (en) Tracking a relative arrival order of events being stored in multiple queues using a counter using most significant bit values
WO2022103575A1 (en) Techniques for modifying cluster computing environments
WO2022252635A1 (en) Face positioning method, apparatus and device based on multi-task fusion, and storage medium
WO2020168754A1 (en) Prediction model-based performance prediction method and device, and storage medium
JP2023508062A (en) Dialogue model training method, apparatus, computer equipment and program
CN110349161A (en) Image partition method, device, electronic equipment and storage medium
WO2022095640A1 (en) Method for reconstructing tree-shaped tissue in image, and device and storage medium
CN111966361A (en) Method, device and equipment for determining model to be deployed and storage medium thereof
CN111126347A (en) Human eye state recognition method and device, terminal and readable storage medium
EP3983950A1 (en) Neural network training in a distributed system
WO2022193640A1 (en) Robot calibration method and apparatus, and robot and storage medium
CN114359963A (en) Gesture recognition method and communication system
US10452134B2 (en) Automated peripheral device handoff based on eye tracking
US20220109617A1 (en) Latency determinations for human interface devices
CN114282587A (en) Data processing method and device, computer equipment and storage medium
US20230289600A1 (en) Model distillation training method, related apparatus and device, and readable storage medium
CN112200183A (en) Image processing method, device, equipment and computer readable medium
CN115880719A (en) Gesture depth information generation method, device, equipment and computer readable medium
CN114090158B (en) Display method, display device, electronic equipment and medium
US10353928B2 (en) Real-time clustering using multiple representatives from a cluster

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814688

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22814688

Country of ref document: EP

Kind code of ref document: A1