WO2021068323A1 - Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium - Google Patents

Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021068323A1
WO2021068323A1 PCT/CN2019/116615 CN2019116615W WO2021068323A1 WO 2021068323 A1 WO2021068323 A1 WO 2021068323A1 CN 2019116615 W CN2019116615 W CN 2019116615W WO 2021068323 A1 WO2021068323 A1 WO 2021068323A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
facial
task
facial motion
key point
Prior art date
Application number
PCT/CN2019/116615
Other languages
French (fr)
Chinese (zh)
Inventor
罗琳耀
徐国强
邱寒
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021068323A1 publication Critical patent/WO2021068323A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • a model training instruction is issued to the server through the operating terminal.
  • the server After the server receives the model training instruction, it responds to the model training instruction to obtain the pre-stored facial action recognition data set from the database. Or, use the URL (Uniform Resource Locator, Uniform Resource Locator) link carried in the model training instruction to obtain the facial motion recognition data set from the open source crawler.
  • URL Uniform Resource Locator, Uniform Resource Locator
  • Step S206 Detect the angle of the human face in the key point label image according to the preset standard image, and obtain a multi-task label image including the angle label.
  • the key point label image is obtained, so as to ensure that the image includes the key Click the label. Then, the angle of the face in the key point label image is detected according to the preset standard image, and a multi-task label image including the angle label is obtained, and the angle label is added to the image that already includes the key point label, thereby obtaining the multi-task label image.
  • the coordinate areas of the five key points of facial features are the left eye, the eye, the nose, the left corner of the mouth, and the right corner of the mouth.
  • the facial feature image including the feature points is obtained through detection by the multi-task convolutional neural network, and there is no need to manually label the feature points.
  • URL Uniform Resource Locator
  • the key point detection module 504 is also used to obtain preset template point coordinates
  • the training model 508 is also used to initialize the network parameters of the residual neural network
  • the facial motion recognition data set is obtained by crawling.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 5.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A multitask facial action recognition model training method, comprising: collecting a facial action recognition data set; performing face detection and face alignment on facial action images in the facial action recognition data set to obtain key point label images; detecting angles of faces in the key point label images according to a preset standard image to obtain a multitask label image comprising an angle label; and inputting the multitask label image into a preset residual neural network to perform multitask training on the residual neural network, wherein the trained residual neural network is used as a multitask facial action recognition model.

Description

多任务面部动作识别模型训练方法、多任务面部动作识别方法、装置、计算机设备和存储介质Multi-task facial motion recognition model training method, multi-task facial motion recognition method, device, computer equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2019年10月12日提交中国专利局,申请号为2019109690541,申请名称为“多任务面部动作识别模型训练和多任务面部动作识别方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 12, 2019, with the application number 2019109690541, and the application titled "Multitasking Facial Action Recognition Model Training and Multitasking Facial Action Recognition Method", with all contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及一种多任务面部动作识别模型训练方法、多任务面部动作识别方法、装置、计算机设备和存储介质。This application relates to a multi-task facial motion recognition model training method, a multi-task facial motion recognition method, device, computer equipment and storage medium.
背景技术Background technique
人脸识别又称为面部识别,面部动作识别则是指能够识别人脸面部具体动作表情,面部动作识别与人脸五官的形状、位置和几何关系等均有关系。Facial recognition is also called facial recognition. Facial motion recognition refers to the ability to recognize specific facial movements and expressions. Facial motion recognition is related to the shape, position, and geometric relationship of facial features.
传统开源的识别方法是使用面部动作识别模型进行分类优化目标。然而,发明人意识到,传统的面部动作识别模型并未考虑其可能还存在其他相关目标任务,只能进行单一的检测,缺乏多样性。The traditional open source recognition method is to use the facial action recognition model to classify and optimize the target. However, the inventor realizes that the traditional facial action recognition model does not consider that it may also have other related target tasks, and can only perform a single detection, lacking diversity.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种多任务面部动作识别模型训练方法、多任务面部动作识别方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a multi-task facial motion recognition model training method, a multi-task facial motion recognition method, device, computer equipment, and storage medium are provided.
一种多任务面部动作识别模型训练方法,包括:A multi-task facial motion recognition model training method includes:
采集面部动作识别数据集;Collect facial action recognition data sets;
对所述面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;Performing face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image;
根据预设标准图像检测所述关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Detecting the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
将所述多任务标签图像输入预设的残差神经网络,以对所述残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务面部动作识别模型。The multi-task tag image is input into a preset residual neural network to perform multi-task training on the residual neural network, and the trained residual neural network is used as a multi-task facial action recognition model.
一种多任务面部动作识别方法,包括:A multi-task facial motion recognition method, including:
获取待识别面部动作图像;及Acquire facial motion images to be recognized; and
利用上述任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。Use the multi-task facial motion recognition model trained by the multi-task facial motion recognition model training method described above to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result includes an action tag, a key Point labels and angle labels.
一种多任务面部动作识别模型训练装置,包括:A multi-task facial motion recognition model training device, including:
采集模块,用于采集面部动作识别数据集;The collection module is used to collect facial motion recognition data sets;
关键点检测模块,用于对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;The key point detection module is used to perform face detection and face alignment on the facial action images in the facial action recognition data set to obtain the key point label image;
角度检测,用于根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Angle detection is used to detect the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
训练模块,用于将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务的面部动作识别模型。The training module is used to input the multi-task label image into the preset residual neural network to perform multi-task training on the residual neural network, and use the trained residual neural network as a multi-task facial action recognition model.
一种多任务面部动作识别装置,包括:A multi-task facial motion recognition device, including:
获取模块,用于获取待识别面部动作图像;及An acquisition module for acquiring facial motion images to be recognized; and
识别模块,用于利用上述任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。The recognition module is configured to use the multi-task facial motion recognition model trained by any one of the above-mentioned multi-task facial motion recognition model training methods to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result Including action tags, key point tags and angle tags.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的多任务面部动作识别模型训练方法的步骤和多任务面部动作识别方法的步骤。A computer device includes a memory and one or more processors, the memory stores computer-readable instructions, and the computer-readable instructions implement the multi-task facial motion recognition model provided in any embodiment of the present application when the computer-readable instructions are executed by the processor The steps of the training method and the steps of the multi-task facial motion recognition method.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的多任务面部 动作识别模型训练方法的步骤和多任务面部动作识别方法的步骤。One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors can realize the multiple data provided in any one of the embodiments of the present application. The steps of the task facial motion recognition model training method and the steps of the multi-task facial motion recognition method.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1为根据一个或多个实施例中多任务面部动作识别模型训练方法的应用场景图。Fig. 1 is an application scenario diagram of a multi-task facial action recognition model training method according to one or more embodiments.
图2为根据一个或多个实施例中多任务面部动作识别模型训练方法的流程示意图。Fig. 2 is a schematic flowchart of a multi-task facial action recognition model training method according to one or more embodiments.
图3为根据一个或多个实施例中检测人脸的角度步骤的流程示意图。Fig. 3 is a schematic flowchart of a step of detecting the angle of a human face according to one or more embodiments.
图4为根据一个或多个实施例中采集面部动作识别数据集步骤的流程示意图。Fig. 4 is a schematic flowchart of the step of collecting a facial action recognition data set according to one or more embodiments.
图5为根据一个或多个实施例中多任务面部动作识别模型训练装置的框图。Fig. 5 is a block diagram of a multi-task facial action recognition model training device according to one or more embodiments.
图6为根据一个或多个实施例中计算机设备的框图。Figure 6 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供的多任务面部动作识别模型训练方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104进行通信。服务器104接收终端102发送的模型训练指令,服务器104响应模型训练指令采集面部动作识别数据集;服务器104对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;服务器104根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;服务器104将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务的面部动作识别模型。终端102通过网络与服务器104进行通信。终端102可以但不限于是各种个人计算机、笔记本电脑、智能 手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The multi-task facial motion recognition model training method provided in this application can be applied to the application environment as shown in FIG. 1. The terminal 102 communicates with the server 104 through the network. The server 104 receives the model training instruction sent by the terminal 102, and the server 104 collects the facial motion recognition data set in response to the model training instruction; the server 104 performs face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain the key point label image The server 104 detects the angle of the face in the key point label image according to the preset standard image, and obtains the multi-task label image including the angle label; the server 104 inputs the multi-task label image into the preset residual neural network to determine the residual neural network The network performs multi-task training, and the trained residual neural network is used as a multi-task facial action recognition model. The terminal 102 communicates with the server 104 through the network. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在一些实施例中,如图2所示,提供了一种多任务面部动作识别模型训练方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In some embodiments, as shown in FIG. 2, a method for training a multi-task facial motion recognition model is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
步骤S202,采集面部动作识别数据集。Step S202: Collect a facial action recognition data set.
面部动作识别数据集为包括多张面部动作图像的集合。面部动作识别数据集中的面部动作图像可以为人工预先采集存储在数据库中,也可以利用爬虫从开源数据库上采集获取。可以理解为,面部动作识别数据集中的面部动作图像包括面部动作标签,而面部动作标签即对面部动作图像已标注面部动作的图像。The facial motion recognition data set is a collection of multiple facial motion images. The facial motion images in the facial motion recognition data set can be manually collected in advance and stored in the database, or they can be collected from an open source database using crawlers. It can be understood that the facial motion images in the facial motion recognition data set include facial motion tags, and the facial motion tags are images that have marked facial motions on the facial motion images.
具体地,当用户有训练多任务面部动作识别模型需求时,通过操作终端向服务器下发模型训练指令。当服务器接收到模型训练指令之后,响应模型训练指令从数据库中获取预先存储的面部动作识别数据集。或者,利用模型训练指令中携带的URL(Uniform Resource Locator,统一资源定位符)链接从开源上爬虫获取面部动作识别数据集。Specifically, when the user has a need to train a multi-task facial motion recognition model, a model training instruction is issued to the server through the operating terminal. After the server receives the model training instruction, it responds to the model training instruction to obtain the pre-stored facial action recognition data set from the database. Or, use the URL (Uniform Resource Locator, Uniform Resource Locator) link carried in the model training instruction to obtain the facial motion recognition data set from the open source crawler.
步骤S204,对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像。Step S204: Perform face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image.
关键点标签图像是指包括面部特征关键点坐标的面部动作图像,可以理解为,面部动作图像中的面部特征关键点都已进行坐标的标注。面部特征关键点可以理解为眼睛、鼻子、嘴巴等。也就是说,包括面部特征关键点的面部动作图像,即关键点标签图像是指将面部特征部位都进行坐标标注的面部动作图像。而基于不同的人脸检测算法进行检测,所得得到的关键点数量不一样。The key point label image refers to a facial motion image that includes the coordinates of the facial feature key point, which can be understood as that all the facial feature key points in the facial motion image have been labeled with coordinates. The key points of facial features can be understood as eyes, nose, mouth, etc. That is to say, the facial motion image including the key points of facial features, that is, the key point label image refers to the facial motion image in which the coordinates of the facial feature parts are marked. But based on different face detection algorithms for detection, the number of key points obtained is different.
具体地,当获取到面部动作识别数据集后,利用人脸检测算法对面部动作数据集中的面部动作图像进行人脸检测得到面部特征关键点。然后,对包括面部特征关键点的面部动作图像进行人脸对齐,使得关键点与对应部位对齐,得到的图像即为关键点标签图像。人脸检测算法包括但不限于DLIB库中的人脸检测算法、MTCNN网络(Multi-task convolutional neural network,多任务卷积神经网络)等。可以理解为,不同的然连检测算法得到的关键点数量不一样,DLIB中的人脸检测算法输出68个关键点,MTCNN则输出5个特征关键点。Specifically, after the facial action recognition data set is acquired, the face detection algorithm is used to perform face detection on the facial action images in the facial action data set to obtain the key points of facial features. Then, face alignment is performed on the facial action image including the key points of facial features, so that the key points are aligned with the corresponding parts, and the obtained image is the key point label image. Face detection algorithms include but are not limited to face detection algorithms in the DLIB library, MTCNN network (Multi-task convolutional neural network, multi-task convolutional neural network), etc. It can be understood that the number of key points obtained by different Ranlian detection algorithms is different. The face detection algorithm in DLIB outputs 68 key points, while MTCNN outputs 5 feature key points.
步骤S206,根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像。Step S206: Detect the angle of the human face in the key point label image according to the preset standard image, and obtain a multi-task label image including the angle label.
由于面部动作数据集中的面部动作图像本身包括面部动作标签,所以多任务标签图像是指包括面部动作标签、关键点标签和角度标签的面部动作图像。角度标签可以理解为,经过人脸角度检测得到人脸角度的面部动作图像,也就是已被标注人脸角度的面部动作图像。Since the facial motion images in the facial motion data set themselves include facial motion tags, the multitasking tag image refers to facial motion images that include facial motion tags, key point tags, and angle tags. The angle label can be understood as a facial motion image of a face angle obtained through face angle detection, that is, a facial motion image that has been marked with a face angle.
具体地,当通过人脸检测算法得到包含面部特征关键点的面部动作图像,即关键点标签图像之后,进一步通过人脸角度检测算法对关键点标签图像进行人脸角度检测,从而得到角度标签。人脸角度可以理解为面部动作图像中的面部所转动的角度。由于是将已经包含面部动作标签和关键点标签的关键点标签图像进行检测得到角度标签,因此最终得到的图像为包括面部动作标签、关键点标签和角度标签的多任务标签图像。Specifically, after the facial motion image containing the key points of facial features, that is, the key point label image, is obtained by the face detection algorithm, the face angle detection is performed on the key point label image by the face angle detection algorithm to obtain the angle label. The face angle can be understood as the angle at which the face in the facial motion image rotates. Since the key point label image that already contains the facial motion label and the key point label is detected to obtain the angle label, the final image obtained is a multi-task label image including the facial motion label, the key point label and the angle label.
步骤S208,将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务的面部动作识别模型。Step S208: Input the multi-task tag image into a preset residual neural network to perform multi-task training on the residual neural network, and use the trained residual neural network as a multi-task facial action recognition model.
残差神经网络(Residual Network,ResNet)是一种深度卷积神经网络,具有容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差模块使用了跳跃连接,同时缓解了深度神经网络中增减深度带来的梯度消失问题。而本实施例中的残差神经网络是一种优化的ResNet50模型,与传统ResNet50模型的区别在于将最后一层全连接层更换成输出通道为12的全连接层。因为本实施例中训练得到的多任务面部动作识别模型包括12个面部动作识别模型,因此选用输出通道为12的全连接层能够更好的进行分类。Residual Neural Network (Residual Network, ResNet) is a deep convolutional neural network, which is easy to optimize, and can increase accuracy by increasing a considerable depth. The internal residual module uses jump connections, and at the same time alleviates the problem of gradient disappearance caused by the increase or decrease of depth in the deep neural network. The residual neural network in this embodiment is an optimized ResNet50 model. The difference from the traditional ResNet50 model is that the last fully connected layer is replaced with a fully connected layer with an output channel of 12. Because the multi-task facial motion recognition model trained in this embodiment includes 12 facial motion recognition models, a fully connected layer with an output channel of 12 can be used for better classification.
具体地,将多任务标签图像作为模型的训练样本,按批量输入至优化后的预设的残差神经网络,使得神经网络基于多任务标签图像中的面部动作标签、关键点标签和角度标签进行学习,从而完成训练。将完成训练的残差神经网络作为多任务面部动作识别模型。Specifically, the multi-task label image is used as the training sample of the model, and input into the optimized preset residual neural network in batches, so that the neural network is based on the facial action label, key point label and angle label in the multi-task label image. Learn to complete the training. The trained residual neural network is used as a multi-task facial action recognition model.
上述多任务面部动作识别模型训练方法,采集面部动作识别数据集后,通过对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像,从而确保图像中包括关键点标签。然后,根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像,确保在已包括关键点标签的图像上增加角度标签,从而得到多任务标签图像。根据多任务标签图像,多任务联合训练预设的残差神经网络, 将训练好的残差神经网络作为多任务的面部动作识别模型,从而实现面部动作识别模型能够同时进行多任务的面部动作识别,提高多样性。In the above-mentioned multi-task facial motion recognition model training method, after collecting the facial motion recognition data set, by performing face detection and face alignment on the facial motion images in the facial motion recognition data set, the key point label image is obtained, so as to ensure that the image includes the key Click the label. Then, the angle of the face in the key point label image is detected according to the preset standard image, and a multi-task label image including the angle label is obtained, and the angle label is added to the image that already includes the key point label, thereby obtaining the multi-task label image. According to the multi-task label image, multi-task joint training of the preset residual neural network, the trained residual neural network is used as a multi-task facial motion recognition model, so that the facial motion recognition model can perform multi-task facial motion recognition at the same time , Improve diversity.
在一些实施例中,步骤S204,对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像具体包括:将面部动作识别数据集中的面部动作图像进行缩放处理,并构建得到图像金字塔。利用多任务卷积神经网络对图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像。基于面部特征关键点坐标和预设的模板点坐标,将对应的面部动作图像进行人脸对齐处理,得到关键点标签图像。In some embodiments, step S204, performing face detection and face alignment on the facial motion images in the facial motion recognition data set, to obtain the key point label image specifically includes: performing scaling processing on the facial motion images in the facial motion recognition data set, And build an image pyramid. The multi-task convolutional neural network is used to detect the face of the image pyramid, and obtain the facial action image containing the coordinates of the key points of facial features. Based on the coordinates of the key point of the facial features and the preset template point coordinates, the corresponding facial motion image is processed for face alignment to obtain the key point label image.
图像金字塔是指通过不同尺寸的图像构建成的金字塔,可以理解为,最底层的图像的尺寸最大,最上层的图像的尺寸最小,即每一张图像的尺寸大于上一层的图像的尺寸,小于下一层的图像的尺寸,从而构造出图像金字塔。多任务卷积神经网络(Multi-task convolutional neural network,Mtcnn)是用于人脸检测的神经网络。Mtcnn可分为三大部分,分别为P-Net(Proposal Network,提案网络)、R-Net(Refine Network,优化网络)和O-Net(Output Network,输出网络)三层网络结构。P-Net基本的构造是一个全连接神经网络,R-Net基本的构造是一个卷积神经网络,R-Net相比于P-Net来说,增加了一个全连接层,因此R-Net对于输入数据的筛选会更加严格。R-Net而O-Net是一个较为复杂的卷积神经网络,相对于R-Net来说多了一个卷积层。O-Net的效果与R-Net的区别在于这一层结构会通过更多的监督来识别面部的区域,而且会对人的面部特征关键点进行回归,最终输出包括面部特征关键点的面部动作图像。可以理解为,Mtcnn输出的面部动作图像上已经包括标注的坐标框,坐标框内的区域即为被标注的面部特征关键点。An image pyramid refers to a pyramid constructed from images of different sizes. It can be understood that the size of the bottommost image is the largest, and the size of the topmost image is the smallest, that is, the size of each image is larger than the size of the image of the previous layer. The size of the image smaller than the next layer, thus constructing an image pyramid. Multi-task convolutional neural network (Multi-task convolutional neural network, Mtcnn) is a neural network used for face detection. Mtcnn can be divided into three parts, namely P-Net (Proposal Network), R-Net (Refine Network, optimized network) and O-Net (Output Network, output network) three-layer network structure. The basic structure of P-Net is a fully connected neural network. The basic structure of R-Net is a convolutional neural network. Compared with P-Net, R-Net adds a fully connected layer. The filtering of input data will be more stringent. R-Net and O-Net is a more complex convolutional neural network, with one more convolutional layer compared to R-Net. The difference between the effect of O-Net and R-Net is that this layer structure will recognize the area of the face through more supervision, and will regress the key points of the facial features of the person, and finally output the facial action including the key points of the facial features image. It can be understood that the facial motion image output by Mtcnn already includes the marked coordinate frame, and the area in the coordinate frame is the marked facial feature key point.
具体地,对面部动作图像进行缩放处理,即缩小或者放大处理,得到尺寸不一致的面部动作图像。将尺寸不一致的面部特征图像按照尺寸从大到小叠加排序得到对应的图像金字塔。然后,利用多任务卷积神经网络对图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像。可以理解为,面部动作图像即是没有进行人脸对齐的关键点标签图像。进一步的,获取预设的模板点坐标,根据预设的模板点坐标将面部动作图像中的面部特征关键点坐标进行对齐处理,得到人脸对齐后的关键点标签图像。在本实施例中,通过神经网络多面部动作图像进行检测,从而获取关键点标签图像,无需手动对关键点进行标注,节约人力资源。Specifically, the facial motion image is scaled, that is, reduced or enlarged, to obtain facial motion images with inconsistent sizes. The facial feature images with inconsistent sizes are superimposed and sorted from large to small to obtain the corresponding image pyramid. Then, the multi-task convolutional neural network is used to detect the face of the image pyramid, and the facial action image containing the coordinates of the key points of the facial features is obtained. It can be understood that the facial motion image is the key point label image without face alignment. Further, the preset template point coordinates are acquired, and the facial feature key point coordinates in the facial motion image are aligned according to the preset template point coordinates to obtain the key point label image after face alignment. In this embodiment, the neural network is used to detect multiple facial motion images, thereby obtaining the key point label image, without manually labeling the key points, which saves human resources.
在一些实施例中,利用多任务卷积神经网络对图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像具体包括:利用多任务卷积神经网络对图像金字塔进行特征提取和边框标定,得到第一特征图;过滤第一特征图中标定的边框,获得第二特征图;根据第二特征图得到包含面部特征关键点坐标的面部动作图像。In some embodiments, using a multi-task convolutional neural network to perform face detection on an image pyramid to obtain a facial action image containing key point coordinates of facial features specifically includes: using a multi-task convolutional neural network to perform feature extraction and a frame on the image pyramid Calibrate to obtain a first feature map; filter the calibrated frame in the first feature map to obtain a second feature map; obtain a facial motion image containing the coordinates of key points of facial features according to the second feature map.
具体地,将利用多任务卷积神经网络中的P-Net对图像金字塔进行初步特征提取与边框标定,得到包括多个标定边框的特征图。通过对该特征图进行Bounding-Box Regression(边框回归向量)调整边框和使用NMS(非极大值抑制)进行大部分边框的过滤,也就是合并重叠的边框,从而得到第一特征图。Bounding-Box Regression的作用是网络预测得到边框进行微调,使其接近真实值。而NMS就是抑制不是极大值的元素,使用该方法可以快速去掉重合度很高且标定相对不准确的边框。进一步的,由于面部动作图像经过P-Net之后,输出的第一特征图还是留下了许多预测窗口。因此,将第一特征图输入至R-Net,通过R-Net对第一特征图进行大部分的边框进行过滤,确定候选边框。同样的,进一步对候选边框进行Bounding-Box Regression(边框回归)调整边框和使用NMS(非极大值抑制),从而得到只包括一个边框的第二特征图。也就是说,利用R-Net进一步优化预测结果。最后,将R-Net输出的第二特征图输入至O-Net中,利用O-Net对只包括一个边框的第二特征图进行更进一步的特征提取,最终输出包括五个面部特征关键点坐标的面部动作图像。五个面部特征关键点坐标区域内分别为左眼、有眼、鼻子、左嘴角和右嘴角。在本实施例中,通过多任务卷积神经网络进行检测得到包括特征点的面部特征图像,无需人工手动进行特征点的标注。Specifically, the P-Net in the multi-task convolutional neural network will be used to perform preliminary feature extraction and frame calibration on the image pyramid to obtain a feature map including multiple calibrated frames. By performing Bounding-Box Regression on the feature map to adjust the border and using NMS (non-maximum suppression) to filter most of the borders, that is, to merge the overlapping borders, the first feature map is obtained. The function of Bounding-Box Regression is to fine-tune the bounding box predicted by the network to make it close to the true value. The NMS is to suppress elements that are not maximum values. Using this method, the borders with high coincidence and relatively inaccurate calibration can be quickly removed. Furthermore, since the facial motion image passes through the P-Net, the output first feature map still leaves many prediction windows. Therefore, the first feature map is input to R-Net, and most of the frames of the first feature map are filtered through R-Net to determine candidate frames. Similarly, the candidate frame is further subjected to Bounding-Box Regression to adjust the frame and NMS (Non-Maximum Suppression) is used to obtain a second feature map including only one frame. In other words, use R-Net to further optimize the prediction results. Finally, input the second feature map output by R-Net into O-Net, and use O-Net to perform further feature extraction on the second feature map that includes only one frame. The final output includes five facial feature key point coordinates. Facial motion images. The coordinate areas of the five key points of facial features are the left eye, the eye, the nose, the left corner of the mouth, and the right corner of the mouth. In this embodiment, the facial feature image including the feature points is obtained through detection by the multi-task convolutional neural network, and there is no need to manually label the feature points.
在一些实施例中,基于面部特征关键点坐标和预设的模板点坐标,将对应的面部动作图像进行人脸对齐处理,得到关键点标签图像具体包括:获取预设的模板点坐标;计算面部特征关键点坐标和模板点坐标的相似变换矩阵;将相似变换矩阵与对应的面部动作图像的矩阵进行相乘,得到的图像为关键点标签图像。In some embodiments, based on facial feature key point coordinates and preset template point coordinates, performing face alignment processing on the corresponding facial motion image to obtain the key point label image specifically includes: obtaining preset template point coordinates; calculating the face The similarity transformation matrix of the characteristic key point coordinates and the template point coordinates; the similarity transformation matrix is multiplied with the matrix of the corresponding facial motion image, and the obtained image is the key point label image.
预设的模板坐标点是指已经预先定义关键点坐标的面部动作图像。相似变换矩阵是指存在相似关系的矩阵。The preset template coordinate point refers to a facial motion image whose key point coordinates have been defined in advance. The similarity transformation matrix refers to the matrix with similar relationship.
具体地,通过获取预先已定义关键点坐标的面部动作图像,从该面部动作图像中获取已标记的关键点坐标,即为模板坐标点。利用最小二乘法,计算面部特征关键点坐标和 模板点坐标的相似变换矩阵,将相似变换矩阵与对应的面部动作图像的矩阵进行相乘,得到的矩阵对应的图像即为关键点标签图像。可以理解为,获取包括该面部特征关键点坐标对应的面部动作图像对应的图像矩阵,将相似变换矩阵与该图像矩阵进行相乘,即进行矩阵相乘计算。相乘得到新的图像矩阵,将该新的图像矩阵转换为图像,即为关键点标签图像。Specifically, by acquiring a facial motion image with pre-defined key point coordinates, the marked key point coordinates are acquired from the facial motion image, which are template coordinate points. Using the least square method, calculate the similarity transformation matrix of the facial feature key point coordinates and the template point coordinates, multiply the similarity transformation matrix and the corresponding facial motion image matrix, and the image corresponding to the matrix is the key point label image. It can be understood that the image matrix corresponding to the facial motion image corresponding to the coordinates of the key point of the facial feature is acquired, and the similarity transformation matrix is multiplied by the image matrix, that is, the matrix multiplication calculation is performed. Multiply to obtain a new image matrix, and convert the new image matrix into an image, which is the key point label image.
在一些实施例中,如图3所示,根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签多任务标签图像,包括以下步骤:In some embodiments, as shown in FIG. 3, detecting the angle of the face in the key point label image according to the preset standard image to obtain the multi-task label image including the angle label includes the following steps:
步骤S302,获取预设标准图像的人脸关键点坐标。Step S302: Obtain the coordinates of key points of the face of the preset standard image.
步骤S304,利用dlib库中的人脸检测模型对关键点标签图像进行人脸角度检测,得到关键点标签图像的人脸关键点坐标。Step S304: Use the face detection model in the dlib library to perform face angle detection on the key point label image to obtain the face key point coordinates of the key point label image.
步骤S306,根据预设标准图像的人脸关键点坐标和关键点标签图像的人脸关键点坐标进行角度计算,得到关键点标签图像中人脸的旋转角度。Step S306: Perform angle calculation according to the face key point coordinates of the preset standard image and the face key point coordinates of the key point label image to obtain the rotation angle of the face in the key point label image.
步骤S308,根据旋转角度确定角度标签,得到包括角度标签的多任务标签图像。In step S308, the angle label is determined according to the rotation angle, and a multi-task label image including the angle label is obtained.
具体地,预设标准图像即为预先定义的包含68个人脸关键点的人脸图像。获取预先定义的包含68个人脸关键点坐标的人脸图像,从中获取68个人脸关键点坐标。并且,同时利用dlib库中的人脸检测模型对得到的关键点标签图像进行人脸检测,得到关键点标签图像中的人脸关键点坐标,即得到一共68个人脸关键点坐标。利用OpenCV工具中的solvePeP函数对预设标准图像的人脸关键点坐标和关键点标签图像的人脸关键点坐标进行角度计算,得到关键点标签图像中人脸的旋转角度。将得到的旋转角度进行转换,转换得到对应的欧拉角,即得到人脸角度的角度标签,将得到角度标签标注到对应的关键点标签图像上,得到包括角度标签的多任务标签图像。预先定义的包含68个人脸关键点坐标的人脸模型可以理解为没有任何角度旋转,标准的人脸模型的68个人脸关键点坐标。68个关键点则包括左眼角、右眼角、鼻尖、左嘴角、右嘴角、下颌等等共68个点。在本实施例中,利用标准的人脸模型的人脸关键点坐标检测图像人脸的角度,相比人测量标记更加快速高效率。Specifically, the preset standard image is a pre-defined face image containing 68 key points of the face. Obtain a pre-defined face image containing 68 facial key point coordinates, and obtain 68 facial key point coordinates from it. And, at the same time, the face detection model in the dlib library is used to perform face detection on the obtained key point label image, and the face key point coordinates in the key point label image are obtained, that is, a total of 68 face key point coordinates are obtained. Use the solvePeP function in the OpenCV tool to calculate the angle of the face key point coordinates of the preset standard image and the face key point coordinates of the key point label image to obtain the rotation angle of the face in the key point label image. The obtained rotation angle is converted to obtain the corresponding Euler angle, that is, the angle label of the face angle is obtained, and the obtained angle label is labeled on the corresponding key point label image to obtain a multi-task label image including the angle label. The pre-defined face model containing 68 key point coordinates of the face can be understood as the 68 key point coordinates of the standard face model without any angle rotation. The 68 key points include the left corner of the eye, the right corner of the eye, the tip of the nose, the left corner of the mouth, the right corner of the mouth, the lower jaw, and so on. In this embodiment, the angle of the face of the image is detected by using the coordinates of the key points of the face of the standard face model, which is faster and more efficient than the human measurement mark.
在一些实施例中,将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务面部动作识别模型,具体包括:初始 化残差神经网络的网络参数;将多任务标签图像按批量输入至残差神经网络做前向传播,得到网络输出值;基于预设损失函数和加权系数,以及网络输出值计算加权损失值;根据加权损失值进行反向传播,得到残差神经网络的网络参数的梯度值;根据梯度值更新残差神经网络的网络参数;返回将多任务标签图像按批量输入至残差神经网络做前向传播的步骤,直到所述加权损失值不再下降为止,将训练好的残差神经网络作为多任务面部动作识别模型。In some embodiments, the multi-task label image is input into a preset residual neural network to perform multi-task training on the residual neural network, and the trained residual neural network is used as a multi-task facial action recognition model, which specifically includes: Initialize the network parameters of the residual neural network; input the multi-task label images to the residual neural network in batches for forward propagation to obtain the network output value; calculate the weighted loss value based on the preset loss function and weighting coefficient, and the network output value; Perform back propagation according to the weighted loss value to obtain the gradient value of the network parameter of the residual neural network; update the network parameter of the residual neural network according to the gradient value; return to input the multi-task label image to the residual neural network in batches for forward In the propagation step, until the weighted loss value no longer decreases, the trained residual neural network is used as a multi-task facial action recognition model.
多任务训练是指多个相关任务一起训练学习,保证得到的模型能够同时识别多个任务。在本实施例中,多任务面部动作模型即为能够同时进行面部动作识别、关键点回归以及人脸角度预测三个任务的检测识别。Multi-task training means that multiple related tasks are trained together to ensure that the resulting model can recognize multiple tasks at the same time. In this embodiment, the multi-task facial motion model is capable of simultaneously performing detection and recognition of three tasks: facial motion recognition, key point regression, and face angle prediction.
具体地,利用Xavier方法初始化预设残差神经网络中每一层的网络参数,Xavier是一种很有效的神经网络初始化的方法。确定好残差神经网络的初始网络参数后,将训练图像集按批量(batch)输入至残差神经网络。即将多任务标签图像按batch分批输入至神经网络,在本实施例中,batch优选为128。可以理解为,将多任务标签图像以128张为一批方式输入至网络参数初始化后的残差神经网络中,残差神经网络中的特征层和分类层基于预设的学习率对输入的多任务标签图像进行前向传播,得到对应的网络输出值。学习率为预先设置好的,学习率包括但不限于0.001、0.0001等,可根据实际情况设置。可以理解为,残差神经网络中的特征层和分类层均使用预设的学习率进行学习。残差神经网络根据预设的损失函数和加权系数,以及对应的网络输出值计算本次训练的加权损失值,基于加权损失值在进行反向传播,从而得到每个网络参数的梯度值,根据得到梯度值对网络参数进行更新。然后,将下一批多任务标签图像输入至网络参数更新后的残差神经网络,残差神经网络同样基于预设的学习率重新进行学习训练。即输入第二批多任务标签图像,残差神经网络基于学习率,再次对输入的多任务标签图像进行前向传播,同样得到对应的网络输出值并计算加权损失值之后进行反向传播再次更新网络参数。重复上述步骤进行迭代训练,直到加权损失值不再下降为止,即得到加权损失值不再减小为止。可以理解为,当加权损失值一直变动时,表示神经网络的各个网络参数并未达到最优值,即还需要进行训练,而加权损失值不在变动,则表示神经网络到了最优,可以将该残差神经网络作为多任务面部动作识别模型投入使用。也就是说,当第二批面部特征图像训练完成后,当加权 损失值相比第一次的加权损失值变动减小了,即可在第二次更新网络参数后再次输入第三批多任务标签图像,一直到加权损失值不再下降为止。可以理解为计算的加权损失值趋向于0,趋向于0则表示神经网络的预测值和期望值越接近,表示神经网络训练完成。Specifically, the Xavier method is used to initialize the network parameters of each layer in the preset residual neural network. Xavier is a very effective method for neural network initialization. After determining the initial network parameters of the residual neural network, the training image set is input to the residual neural network in batches. That is, the multi-task tag images are input to the neural network in batches in batches. In this embodiment, the batch is preferably 128. It can be understood that the multi-task label images are input into the residual neural network after the network parameters are initialized in a batch of 128. The feature layer and classification layer in the residual neural network are based on the preset learning rate. The task label image is propagated forward to obtain the corresponding network output value. The learning rate is preset, and the learning rate includes but not limited to 0.001, 0.0001, etc., which can be set according to the actual situation. It can be understood that both the feature layer and the classification layer in the residual neural network use a preset learning rate for learning. The residual neural network calculates the weighted loss value of this training according to the preset loss function and weighting coefficient, and the corresponding network output value. Based on the weighted loss value, it performs back propagation to obtain the gradient value of each network parameter. Obtain the gradient value to update the network parameters. Then, the next batch of multi-task label images is input to the residual neural network after the network parameters are updated, and the residual neural network also re-learns and trains based on the preset learning rate. That is, input the second batch of multi-task label images, and the residual neural network is based on the learning rate, and then forwards the input multi-task label images again, and also obtains the corresponding network output value and calculates the weighted loss value, and then performs back propagation to update again Network parameters. Repeat the above steps for iterative training until the weighted loss value no longer decreases, that is, the weighted loss value no longer decreases. It can be understood that when the weighted loss value changes all the time, it means that the various network parameters of the neural network have not reached the optimal value, that is, training is still needed, and the weighted loss value does not change, it means that the neural network has reached the optimal value. Residual neural network is put into use as a multi-task facial action recognition model. That is to say, when the second batch of facial feature image training is completed, when the weighted loss value changes less than the first weighted loss value, you can enter the third batch of multitasking again after updating the network parameters for the second time Label the image until the weighted loss value no longer drops. It can be understood that the calculated weighted loss value tends to 0, which means that the predicted value of the neural network is closer to the expected value, which means that the neural network training is completed.
网络输出值包括预测值和真实标签,得到预测值和真实标签之后,即可利用损失函数进行计算,从而得到对应的损失值。然而,由于本实施例为多任务学习,相比一般的模型训练,多任务学习的差别在于损失函数的定义,每个子任务都有对应损失函数,因此最终模型的损失函数应当为多个子任务对应的损失值的加权和。则,假设本实施例中的面部动作识别任务、人脸角度预测任务和关键点回归三个任务的损失函数分别为L au、L pose和L lm,它们的加权系数分别为λ au、λ pose和λ lm,即最终模型的损失函数为: The network output value includes the predicted value and the real label. After the predicted value and the real label are obtained, the loss function can be used for calculation to obtain the corresponding loss value. However, since this embodiment is multi-task learning, compared with general model training, the difference of multi-task learning lies in the definition of loss function. Each subtask has a corresponding loss function. Therefore, the loss function of the final model should correspond to multiple subtasks. The weighted sum of the loss values. Then, suppose that the loss functions of the facial action recognition task, the face angle prediction task and the key point regression task in this embodiment are L au , L pose and L lm respectively, and their weighting coefficients are λ au , λ pose respectively And λ lm , that is, the loss function of the final model is:
L total=λ au*L aupose*L poselm*L lm L totalau *L aupose *L poselm *L lm
在本实施例中,由于面部动作识别任务作为主任务,人脸角度预测任务和关键点回归为次任务,因此优选λ au取1,λ pose和λ lm分别取0.5。通过多任务联合训练,两个次任务和主任务具有一定的相关性,通过设置加权系数可以一起作为优化目标。并且,由于次任务中包括位置、角度等几何信息,使得模型能够学习相关信息从而提升泛化能力,从而提升主任务目标识别精度。 In this embodiment, since the facial action recognition task is the main task, and the face angle prediction task and the key point regression are the secondary tasks, it is preferable that λ au is 1, and λ pose and λ lm are respectively 0.5. Through multi-task joint training, the two secondary tasks have a certain correlation with the main task, and can be used as optimization targets together by setting the weighting coefficient. Moreover, because the secondary task includes geometric information such as position and angle, the model can learn relevant information to improve generalization ability, thereby improving the accuracy of target recognition for the main task.
在一些实施例中,如图4所示,采集面部动作识别数据集,包括以下步骤:In some embodiments, as shown in FIG. 4, collecting a facial motion recognition data set includes the following steps:
步骤S402,获取采集的统一资源定位符。Step S402: Obtain the collected uniform resource locator.
步骤S404,根据统一资源定位符爬取得到面部动作识别数据集。In step S404, the facial motion recognition data set is obtained by crawling according to the uniform resource locator.
统一资源定位符(Uniform Resource Locator,URL)是对可以从互联网上得到的资源你的位置和访问方法的一种简洁的标识,是互联网上标准资源的地址,互联网上的每个文件都有一个唯一的URL。Uniform Resource Locator (URL) is a concise identification of your location and access method for resources available on the Internet. It is the address of a standard resource on the Internet. Every file on the Internet has a Unique URL.
具体地,当需要获取面部动作识别数据集时,通过统一资源定位符即可爬虫获取到对应的面部动作识别数据集。统一资源定位符可以获取预先配置好的,也可以通过接收终端发送的统一资源定位符。Specifically, when a facial motion recognition data set needs to be acquired, the crawler can obtain the corresponding facial motion recognition data set through the uniform resource locator. The uniform resource locator can be pre-configured, or can be received by the uniform resource locator sent by the terminal.
在一些实施例中,当经过训练得到多任务面部动作识别模型之后,即可利用该多任务面部动作识别模型进行面部动作识别。具体地,获取待识别面部动作图像,将待识别面部动作图像输入至该多任务面部动作识别模型。该多任务面部动作识别模型通过对待识别面 部动作图像进行特征提取,以及对特征进行分类后确定待识别面部动作图像中面部动作标签、关键点标签以及角度标签。可以理解为识别得到面部动作表情,例如张嘴、闭眼等动作,以及待识别图像中人脸的关键点和旋转的角度等。In some embodiments, after the multi-task facial motion recognition model is obtained through training, the multi-task facial motion recognition model can be used for facial motion recognition. Specifically, the facial motion image to be recognized is acquired, and the facial motion image to be recognized is input to the multi-task facial motion recognition model. The multi-task facial motion recognition model determines facial motion labels, key point labels and angle labels in the facial motion image to be recognized by extracting features from the facial motion image to be recognized, and classifying the features. It can be understood as the recognition of facial expressions, such as opening mouth, closing eyes, etc., as well as key points and rotation angles of the face in the image to be recognized.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一些实施例中,如图5所示,提供了一种多任务面部动作识别模型训练装置,包括:采集模块502、关键点检测模块504、角度检测506和训练模块508,具体地:In some embodiments, as shown in FIG. 5, a multi-task facial motion recognition model training device is provided, including: an acquisition module 502, a key point detection module 504, an angle detection 506, and a training module 508, specifically:
采集模块502,用于采集面部动作识别数据集。The collection module 502 is used to collect facial action recognition data sets.
关键点检测模块504,用于对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像。The key point detection module 504 is configured to perform face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain key point label images.
角度检测506,用于根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像。及The angle detection 506 is used to detect the angle of the face in the key point label image according to the preset standard image, and obtain a multi-task label image including the angle label. and
训练模块508,用于将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务的面部动作识别模型。The training module 508 is configured to input the multi-task label image into a preset residual neural network to perform multi-task training on the residual neural network, and use the trained residual neural network as a multi-task facial action recognition model.
在一些实施例中,关键点检测模块504还用于将面部动作识别数据集中的面部动作图像进行缩放处理,并构建得到图像金字塔;In some embodiments, the key point detection module 504 is further configured to perform scaling processing on the facial motion images in the facial motion recognition data set, and construct an image pyramid;
利用多任务卷积神经网络对图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像;及Use a multi-task convolutional neural network to perform face detection on the image pyramid to obtain a facial action image containing the coordinates of key points of facial features; and
基于面部特征关键点坐标和预设的模板点坐标,将对应的面部动作图像进行人脸对齐处理,得到关键点标签图像。Based on the coordinates of the key point of the facial features and the preset template point coordinates, the corresponding facial motion image is processed for face alignment to obtain the key point label image.
在一些实施例中,关键点检测模块504还用于利用多任务卷积神经网络对图像金字塔进行特征提取和边框标定,得到第一特征图;In some embodiments, the key point detection module 504 is further configured to use a multi-task convolutional neural network to perform feature extraction and frame calibration on the image pyramid to obtain the first feature map;
过滤第一特征图中标定的边框,获得第二特征图;及Filter the calibrated frame in the first feature map to obtain the second feature map; and
根据第二特征图得到包含面部特征关键点坐标的面部动作图像。According to the second feature map, a facial motion image containing the coordinates of the key points of the facial features is obtained.
在一些实施例中,关键点检测模块504还用于获取预设的模板点坐标;In some embodiments, the key point detection module 504 is also used to obtain preset template point coordinates;
计算面部特征关键点坐标和模板点坐标的相似变换矩阵;及Calculate the similarity transformation matrix between the coordinates of key points of facial features and the coordinates of template points; and
将相似变换矩阵与对应的面部动作图像的矩阵进行相乘,得到的图像为关键点标签图像。The similarity transformation matrix is multiplied by the corresponding facial motion image matrix, and the obtained image is the key point label image.
在一些实施例中,角度检测506还用于获取预设标准图像的人脸关键点坐标;In some embodiments, the angle detection 506 is also used to obtain the key point coordinates of the face of the preset standard image;
利用dlib库中的人脸检测模型对关键点标签图像进行人脸检测,得到关键点标签图像的人脸关键点坐标;Use the face detection model in the dlib library to perform face detection on the key point label image to obtain the face key point coordinates of the key point label image;
根据预设标准图像的人脸关键点坐标和关键点标签图像的人脸关键点坐标进行角度计算,得到关键点标签图像中人脸的旋转角度;及Perform angle calculation according to the face key point coordinates of the preset standard image and the face key point coordinates of the key point label image to obtain the rotation angle of the face in the key point label image; and
根据旋转角度确定角度标签,得到包括角度标签的多任务标签图像。The angle label is determined according to the rotation angle, and a multi-task label image including the angle label is obtained.
在一些实施例中,训练模型508还用于初始化残差神经网络的网络参数;In some embodiments, the training model 508 is also used to initialize the network parameters of the residual neural network;
将多任务标签图像按批量输入至残差神经网络做前向传播,得到网络输出值;基于预设损失函数和加权系数,以及网络输出值计算加权损失值;Input the multi-task label images in batches to the residual neural network for forward propagation to obtain the network output value; calculate the weighted loss value based on the preset loss function and weighting coefficient, and the network output value;
根据加权损失值进行反向传播,得到残差神经网络的网络参数的梯度值;Perform back propagation according to the weighted loss value to obtain the gradient value of the network parameter of the residual neural network;
根据梯度值更新残差神经网络的网络参数;及Update the network parameters of the residual neural network according to the gradient value; and
返回将多任务标签图像按批量输入至残差神经网络做前向传播的步骤,直到加权损失值不再下降为止,将训练好的残差神经网络作为多任务面部动作识别模型。Return to the step of inputting multi-task label images to the residual neural network in batches for forward propagation until the weighted loss value no longer decreases, and use the trained residual neural network as a multi-task facial action recognition model.
在一些实施例中,采集模块502还用于获取采集的统一资源定位符;及In some embodiments, the collection module 502 is also used to obtain the collected uniform resource locator; and
根据统一资源定位符爬取得到面部动作识别数据集。According to the uniform resource locator, the facial motion recognition data set is obtained by crawling.
在一些实施例中,提供一种多任务面部动作识别装置,包括获取模块和识别模块,具体地,In some embodiments, a multi-task facial motion recognition device is provided, which includes an acquisition module and a recognition module. Specifically,
获取模块,用于获取待识别面部动作图像;及An acquisition module for acquiring facial motion images to be recognized; and
识别模块,用于利用上述任一个实施例中提供的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。The recognition module is configured to use the multi-task facial motion recognition model trained by the multi-task facial motion recognition model training method provided in any of the above embodiments to recognize the facial motion image to be recognized to obtain a recognition result; the recognition The results include action labels, keypoint labels, and angle labels.
关于多任务面部动作识别模型训练装置和多任务面部动作识别装置的具体限定可以参见上文中对于多任务面部动作识别模型训练方法和多任务面部动作识别方法的限定,在此不再赘述。上述多任务面部动作识别模型训练装置和多任务面部动作识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definitions of the multi-task facial motion recognition model training device and the multi-task facial motion recognition device, please refer to the above definitions of the multi-task facial motion recognition model training method and the multi-task facial motion recognition method, which will not be repeated here. The modules in the above-mentioned multi-task facial motion recognition model training device and the multi-task facial motion recognition device may be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作***和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种多任务面部动作识别模型训练方法和多任务面部动作识别方法。In some embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a multi-task facial motion recognition model training method and a multi-task facial motion recognition method are realized.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行计算机可读指令时实现本申请任意一个实施例中提供的多任务面部动作识别模型训练方法的步骤和多任务面部动作识别方法的步骤。A computer device, including a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the multitasking face provided in any one of the embodiments of the present application is realized. The steps of the action recognition model training method and the steps of the multi-task facial action recognition method.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的多任务面部动作识别模型训练方法的步骤和多任务面部动作识别方法的步骤。One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors can realize the multiple data provided in any one of the embodiments of the present application. The steps of the task facial motion recognition model training method and the steps of the multi-task facial motion recognition method.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引 用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the procedures of the above-mentioned method embodiments. Any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种多任务面部动作识别模型训练方法,包括:A multi-task facial motion recognition model training method includes:
    采集面部动作识别数据集;Collect facial action recognition data sets;
    对所述面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;Performing face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image;
    根据预设标准图像检测所述关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Detecting the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
    将所述多任务标签图像输入预设的残差神经网络,以对所述残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务面部动作识别模型。The multi-task tag image is input into a preset residual neural network to perform multi-task training on the residual neural network, and the trained residual neural network is used as a multi-task facial action recognition model.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像,包括:The method according to claim 1, wherein the performing face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image comprises:
    将所述面部动作识别数据集中的面部动作图像进行缩放处理,并构建得到图像金字塔;Performing scaling processing on the facial motion images in the facial motion recognition data set, and constructing an image pyramid;
    利用多任务卷积神经网络对所述图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像;及Use a multi-task convolutional neural network to perform face detection on the image pyramid to obtain a facial motion image containing the coordinates of key points of facial features; and
    基于所述面部特征关键点坐标和预设的模板点坐标,将对应的所述面部动作图像进行人脸对齐处理,得到关键点标签图像。Based on the facial feature key point coordinates and the preset template point coordinates, the corresponding facial motion image is subjected to face alignment processing to obtain a key point label image.
  3. 根据权利要求2所述的方法,其特征在于,所述利用多任务卷积神经网络对所述图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像,包括:The method according to claim 2, wherein said using a multi-task convolutional neural network to perform face detection on said image pyramid to obtain a facial motion image containing key point coordinates of facial features comprises:
    利用多任务卷积神经网络对所述图像金字塔进行特征提取和边框标定,得到第一特征图;Using a multi-task convolutional neural network to perform feature extraction and frame calibration on the image pyramid to obtain a first feature map;
    过滤所述第一特征图中标定的边框,获得第二特征图;及Filter the calibrated frame in the first feature map to obtain a second feature map; and
    根据所述第二特征图得到包含面部特征关键点坐标的面部动作图像。According to the second feature map, a facial motion image containing the coordinates of the key points of the facial features is obtained.
  4. 根据权利要求2所述的方法,其特征在于,所述基于所述面部特征关键点坐标和预设的模板点坐标,将对应的所述面部动作图像进行人脸对齐处理,得到关键点标签图像,包括:The method according to claim 2, wherein the corresponding facial motion image is subjected to face alignment processing based on the facial feature key point coordinates and preset template point coordinates to obtain a key point label image ,include:
    获取预设的模板点坐标;Obtain the preset template point coordinates;
    计算所述面部特征关键点坐标和所述模板点坐标的相似变换矩阵;及Calculating the similarity transformation matrix between the coordinates of the key points of the facial features and the coordinates of the template points; and
    将所述相似变换矩阵与对应的所述面部动作图像的矩阵进行相乘,得到的图像为关键点标签图像。The similarity transformation matrix is multiplied by the corresponding facial motion image matrix, and the obtained image is a key point label image.
  5. 根据权利要求1所述的方法,其特征在于,所述根据预设标准图像检测所述关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像,包括:The method according to claim 1, wherein the detecting the angle of the face in the key point label image according to a preset standard image to obtain a multi-task label image including the angle label comprises:
    获取预设标准图像的人脸关键点坐标;Obtain the face key point coordinates of the preset standard image;
    利用dlib库中的人脸检测模型对所述关键点标签图像进行人脸检测,得到所述关键点标签图像的人脸关键点坐标;Use the face detection model in the dlib library to perform face detection on the key point label image to obtain the face key point coordinates of the key point label image;
    根据预设标准图像的人脸关键点坐标和所述关键点标签图像的人脸关键点坐标进行角度计算,得到关键点标签图像中人脸的旋转角度;及Perform angle calculation according to the face key point coordinates of the preset standard image and the face key point coordinates of the key point label image to obtain the rotation angle of the face in the key point label image; and
    根据所述旋转角度确定角度标签,得到包括所述角度标签的多任务标签图像。An angle label is determined according to the rotation angle, and a multi-task label image including the angle label is obtained.
  6. 根据权利要求1所述的方法,其特征在于,所述将所述多任务标签图像输入预设的残差神经网络,以对所述残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务面部动作识别模型,包括:The method according to claim 1, wherein said inputting said multi-task label image into a preset residual neural network, so as to perform multi-task training on said residual neural network, and the trained residual As a multi-task facial action recognition model, neural network includes:
    初始化所述残差神经网络的网络参数;Initialize the network parameters of the residual neural network;
    将所述多任务标签图像按批量输入至所述残差神经网络做前向传播,得到网络输出值;Input the multi-task label images to the residual neural network in batches for forward propagation to obtain the network output value;
    基于预设损失函数和加权系数,以及所述网络输出值计算加权损失值;Calculating a weighted loss value based on a preset loss function and weighting coefficient, and the network output value;
    根据所述加权损失值进行反向传播,得到所述残差神经网络的网络参数的梯度值;Perform back propagation according to the weighted loss value to obtain the gradient value of the network parameter of the residual neural network;
    根据所述梯度值更新所述残差神经网络的网络参数;及Updating the network parameters of the residual neural network according to the gradient value; and
    返回将所述多任务标签图像按批量输入至所述残差神经网络做前向传播的步骤,直到所述加权损失值不再下降为止,将训练好的残差神经网络作为多任务面部动作识别模型。Return to the step of inputting the multi-task label images to the residual neural network in batches for forward propagation until the weighted loss value no longer decreases, and use the trained residual neural network as multi-task facial action recognition model.
  7. 根据权利要求1所述的方法,其特征在于,所述采集面部动作识别数据集,包括:The method according to claim 1, wherein said collecting a facial motion recognition data set comprises:
    获取采集的统一资源定位符;及Obtain the collected uniform resource locator; and
    根据所述统一资源定位符爬取得到面部动作识别数据集。According to the uniform resource locator, the facial action recognition data set is obtained by crawling.
  8. 一种多任务面部动作识别方法,包括:A multi-task facial motion recognition method, including:
    获取待识别面部动作图像;及Acquire facial motion images to be recognized; and
    利用权利要求1-7任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。The multi-task facial motion recognition model trained by the multi-task facial motion recognition model training method of any one of claims 1-7 is used to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result includes Action tags, key point tags, and angle tags.
  9. 一种多任务面部动作识别模型训练装置,包括:A multi-task facial motion recognition model training device, including:
    采集模块,用于采集面部动作识别数据集;The collection module is used to collect facial motion recognition data sets;
    关键点检测模块,用于对面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;The key point detection module is used to perform face detection and face alignment on the facial action images in the facial action recognition data set to obtain the key point label image;
    角度检测,用于根据预设标准图像检测关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Angle detection is used to detect the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
    训练模块,用于将多任务标签图像输入预设的残差神经网络,以对残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务的面部动作识别模型。The training module is used to input the multi-task label image into the preset residual neural network to perform multi-task training on the residual neural network, and use the trained residual neural network as a multi-task facial action recognition model.
  10. 一种多任务面部动作识别装置,包括:A multi-task facial motion recognition device, including:
    获取模块,用于获取待识别面部动作图像;及An acquisition module for acquiring facial motion images to be recognized; and
    识别模块,用于利用上述任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。The recognition module is configured to use the multi-task facial motion recognition model trained by any one of the above-mentioned multi-task facial motion recognition model training methods to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result Including action tags, key point tags and angle tags.
  11. 一种计算机设备,包括存储器一个及多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述处理器执行以下步骤:A computer device including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the one or more processors, the processor executes the following steps :
    采集面部动作识别数据集;Collect facial action recognition data sets;
    对所述面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;Performing face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image;
    根据预设标准图像检测所述关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Detecting the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
    将所述多任务标签图像输入预设的残差神经网络,以对所述残差神经网络进行多任务训练,将训练好的残差神经网络作为多任务面部动作识别模型。The multi-task tag image is input into a preset residual neural network to perform multi-task training on the residual neural network, and the trained residual neural network is used as a multi-task facial action recognition model.
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机 可读指令时还执行以下步骤:The computer device according to claim 11, wherein the processor further executes the following steps when executing the computer readable instruction:
    将所述面部动作识别数据集中的面部动作图像进行缩放处理,并构建得到图像金字塔;Performing scaling processing on the facial motion images in the facial motion recognition data set, and constructing an image pyramid;
    利用多任务卷积神经网络对所述图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像;及Use a multi-task convolutional neural network to perform face detection on the image pyramid to obtain a facial motion image containing the coordinates of key points of facial features; and
    基于所述面部特征关键点坐标和预设的模板点坐标,将对应的所述面部动作图像进行人脸对齐处理,得到关键点标签图像。Based on the facial feature key point coordinates and the preset template point coordinates, the corresponding facial motion image is subjected to face alignment processing to obtain a key point label image.
  13. 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer-readable instruction:
    利用多任务卷积神经网络对所述图像金字塔进行特征提取和边框标定,得到第一特征图;Using a multi-task convolutional neural network to perform feature extraction and frame calibration on the image pyramid to obtain a first feature map;
    过滤所述第一特征图中标定的边框,获得第二特征图;及Filter the calibrated frame in the first feature map to obtain a second feature map; and
    根据所述第二特征图得到包含面部特征关键点坐标的面部动作图像。According to the second feature map, a facial motion image containing the coordinates of the key points of the facial features is obtained.
  14. 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:The computer device according to claim 12, wherein the processor further executes the following steps when executing the computer-readable instruction:
    获取预设的模板点坐标;Obtain the preset template point coordinates;
    计算所述面部特征关键点坐标和所述模板点坐标的相似变换矩阵;及Calculating the similarity transformation matrix between the coordinates of the key points of the facial features and the coordinates of the template points; and
    将所述相似变换矩阵与对应的所述面部动作图像的矩阵进行相乘,得到的图像为关键点标签图像。The similarity transformation matrix is multiplied by the corresponding facial motion image matrix, and the obtained image is a key point label image.
  15. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
    采集面部动作识别数据集;Collect facial action recognition data sets;
    对所述面部动作识别数据集中的面部动作图像进行人脸检测和人脸对齐,得到关键点标签图像;Performing face detection and face alignment on the facial motion images in the facial motion recognition data set to obtain a key point label image;
    根据预设标准图像检测所述关键点标签图像中人脸的角度,得到包括角度标签的多任务标签图像;及Detecting the angle of the face in the key point label image according to the preset standard image to obtain a multi-task label image including the angle label; and
    将所述多任务标签图像输入预设的残差神经网络,以对所述残差神经网络进行多任务 训练,将训练好的残差神经网络作为多任务面部动作识别模型。The multi-task label image is input into a preset residual neural network to perform multi-task training on the residual neural network, and the trained residual neural network is used as a multi-task facial action recognition model.
  16. 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 15, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:
    将所述面部动作识别数据集中的面部动作图像进行缩放处理,并构建得到图像金字塔;Performing scaling processing on the facial motion images in the facial motion recognition data set, and constructing an image pyramid;
    利用多任务卷积神经网络对所述图像金字塔进行人脸检测,得到包含面部特征关键点坐标的面部动作图像;及Use a multi-task convolutional neural network to perform face detection on the image pyramid to obtain a facial motion image containing the coordinates of key points of facial features; and
    基于所述面部特征关键点坐标和预设的模板点坐标,将对应的所述面部动作图像进行人脸对齐处理,得到关键点标签图像。Based on the facial feature key point coordinates and the preset template point coordinates, the corresponding facial motion image is subjected to face alignment processing to obtain a key point label image.
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:
    利用多任务卷积神经网络对所述图像金字塔进行特征提取和边框标定,得到第一特征图;Using a multi-task convolutional neural network to perform feature extraction and frame calibration on the image pyramid to obtain a first feature map;
    过滤所述第一特征图中标定的边框,获得第二特征图;及Filter the calibrated frame in the first feature map to obtain a second feature map; and
    根据所述第二特征图得到包含面部特征关键点坐标的面部动作图像。According to the second feature map, a facial motion image containing the coordinates of the key points of the facial features is obtained.
  18. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:The storage medium according to claim 16, wherein the following steps are further executed when the computer-readable instructions are executed by the processor:
    获取预设的模板点坐标;Obtain the preset template point coordinates;
    计算所述面部特征关键点坐标和所述模板点坐标的相似变换矩阵;及Calculating the similarity transformation matrix between the coordinates of the key points of the facial features and the coordinates of the template points; and
    将所述相似变换矩阵与对应的所述面部动作图像的矩阵进行相乘,得到的图像为关键点标签图像。The similarity transformation matrix is multiplied by the corresponding facial motion image matrix, and the obtained image is a key point label image.
  19. 一种计算机设备,包括存储器一个及多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述处理器执行以下步骤:A computer device including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the one or more processors, the processor executes the following steps :
    获取待识别面部动作图像;及Acquire facial motion images to be recognized; and
    利用上述任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作 标签、关键点标签和角度标签。Use the multi-task facial motion recognition model trained by the multi-task facial motion recognition model training method described above to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result includes an action tag, a key Point labels and angle labels.
  20. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
    获取待识别面部动作图像;及Acquire facial motion images to be recognized; and
    利用上述任一项所述的多任务面部动作识别模型训练方法所训练的多任务面部动作识别模型,对所述待识别面部动作图像进行识别,得到识别结果;所述识别结果包括动作标签、关键点标签和角度标签。Use the multi-task facial motion recognition model trained by the multi-task facial motion recognition model training method described above to recognize the facial motion image to be recognized to obtain a recognition result; the recognition result includes an action tag, a key Point labels and angle labels.
PCT/CN2019/116615 2019-10-12 2019-11-08 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium WO2021068323A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910969054.1A CN110889325B (en) 2019-10-12 2019-10-12 Multitasking facial motion recognition model training and multitasking facial motion recognition method
CN201910969054.1 2019-10-12

Publications (1)

Publication Number Publication Date
WO2021068323A1 true WO2021068323A1 (en) 2021-04-15

Family

ID=69746096

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116615 WO2021068323A1 (en) 2019-10-12 2019-11-08 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN110889325B (en)
WO (1) WO2021068323A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712460A (en) * 2020-12-09 2021-04-27 杭州妙绘科技有限公司 Portrait generation method and device, electronic equipment and medium
CN113221771A (en) * 2021-05-18 2021-08-06 北京百度网讯科技有限公司 Living body face recognition method, living body face recognition device, living body face recognition equipment, storage medium and program product
CN113239858A (en) * 2021-05-28 2021-08-10 西安建筑科技大学 Face detection model training method, face recognition method, terminal and storage medium
CN113313010A (en) * 2021-05-26 2021-08-27 广州织点智能科技有限公司 Face key point detection model training method, device and equipment
CN113449694A (en) * 2021-07-24 2021-09-28 福州大学 Android-based certificate compliance detection method and system
CN113591573A (en) * 2021-06-28 2021-11-02 北京百度网讯科技有限公司 Training and target detection method and device for multi-task learning deep network model
CN113610042A (en) * 2021-08-18 2021-11-05 睿云联(厦门)网络通讯技术有限公司 Face recognition living body detection method based on pre-training picture residual error
CN114581969A (en) * 2022-01-21 2022-06-03 厦门大学 Face position information-based standing and sitting motion detection method
CN115223220A (en) * 2022-06-23 2022-10-21 北京邮电大学 Face detection method based on key point regression
CN116895047A (en) * 2023-07-24 2023-10-17 北京全景优图科技有限公司 Rapid people flow monitoring method and system

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597884A (en) * 2020-04-03 2020-08-28 平安科技(深圳)有限公司 Facial action unit identification method and device, electronic equipment and storage medium
CN111639537A (en) * 2020-04-29 2020-09-08 深圳壹账通智能科技有限公司 Face action unit identification method and device, electronic equipment and storage medium
CN111626246B (en) * 2020-06-01 2022-07-15 浙江中正智能科技有限公司 Face alignment method under mask shielding
CN111882717A (en) * 2020-07-30 2020-11-03 缪加加 Intelligent grounding box with identity recognition function
CN112861926B (en) * 2021-01-18 2023-10-31 平安科技(深圳)有限公司 Coupled multi-task feature extraction method and device, electronic equipment and storage medium
CN113011279A (en) * 2021-02-26 2021-06-22 清华大学 Method and device for recognizing mucosa contact action, computer equipment and storage medium
US20220301298A1 (en) * 2021-03-17 2022-09-22 Google Llc Multi-task self-training for learning general representations
CN112926553B (en) * 2021-04-25 2021-08-13 北京芯盾时代科技有限公司 Training method and device for motion detection network
CN113743238A (en) * 2021-08-12 2021-12-03 浙江大华技术股份有限公司 Abnormal behavior detection method and device, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7697765B2 (en) * 2004-01-29 2010-04-13 Canon Kabushiki Kaisha Learning method and device for pattern recognition
CN103824049A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascaded neural network-based face key point detection method
CN105760836A (en) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 Multi-angle face alignment method based on deep learning and system thereof and photographing terminal
CN108197602A (en) * 2018-01-30 2018-06-22 厦门美图之家科技有限公司 A kind of convolutional neural networks generation method and expression recognition method
CN108830237A (en) * 2018-06-21 2018-11-16 北京师范大学 A kind of recognition methods of human face expression
CN109800648A (en) * 2018-12-18 2019-05-24 北京英索科技发展有限公司 Face datection recognition methods and device based on the correction of face key point
CN110163567A (en) * 2019-05-08 2019-08-23 长春师范大学 Classroom roll calling system based on multitask concatenated convolutional neural network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961006A (en) * 2019-01-30 2019-07-02 东华大学 A kind of low pixel multiple target Face datection and crucial independent positioning method and alignment schemes
CN110263673B (en) * 2019-05-31 2022-10-14 合肥工业大学 Facial expression recognition method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7697765B2 (en) * 2004-01-29 2010-04-13 Canon Kabushiki Kaisha Learning method and device for pattern recognition
CN103824049A (en) * 2014-02-17 2014-05-28 北京旷视科技有限公司 Cascaded neural network-based face key point detection method
CN105760836A (en) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 Multi-angle face alignment method based on deep learning and system thereof and photographing terminal
CN108197602A (en) * 2018-01-30 2018-06-22 厦门美图之家科技有限公司 A kind of convolutional neural networks generation method and expression recognition method
CN108830237A (en) * 2018-06-21 2018-11-16 北京师范大学 A kind of recognition methods of human face expression
CN109800648A (en) * 2018-12-18 2019-05-24 北京英索科技发展有限公司 Face datection recognition methods and device based on the correction of face key point
CN110163567A (en) * 2019-05-08 2019-08-23 长春师范大学 Classroom roll calling system based on multitask concatenated convolutional neural network

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712460A (en) * 2020-12-09 2021-04-27 杭州妙绘科技有限公司 Portrait generation method and device, electronic equipment and medium
CN113221771B (en) * 2021-05-18 2023-08-04 北京百度网讯科技有限公司 Living body face recognition method, device, apparatus, storage medium and program product
CN113221771A (en) * 2021-05-18 2021-08-06 北京百度网讯科技有限公司 Living body face recognition method, living body face recognition device, living body face recognition equipment, storage medium and program product
CN113313010A (en) * 2021-05-26 2021-08-27 广州织点智能科技有限公司 Face key point detection model training method, device and equipment
CN113239858A (en) * 2021-05-28 2021-08-10 西安建筑科技大学 Face detection model training method, face recognition method, terminal and storage medium
CN113591573A (en) * 2021-06-28 2021-11-02 北京百度网讯科技有限公司 Training and target detection method and device for multi-task learning deep network model
CN113449694A (en) * 2021-07-24 2021-09-28 福州大学 Android-based certificate compliance detection method and system
CN113610042A (en) * 2021-08-18 2021-11-05 睿云联(厦门)网络通讯技术有限公司 Face recognition living body detection method based on pre-training picture residual error
CN113610042B (en) * 2021-08-18 2023-05-23 睿云联(厦门)网络通讯技术有限公司 Face recognition living body detection method based on pre-training picture residual error
CN114581969A (en) * 2022-01-21 2022-06-03 厦门大学 Face position information-based standing and sitting motion detection method
CN115223220A (en) * 2022-06-23 2022-10-21 北京邮电大学 Face detection method based on key point regression
CN115223220B (en) * 2022-06-23 2023-06-09 北京邮电大学 Face detection method based on key point regression
CN116895047A (en) * 2023-07-24 2023-10-17 北京全景优图科技有限公司 Rapid people flow monitoring method and system
CN116895047B (en) * 2023-07-24 2024-01-30 北京全景优图科技有限公司 Rapid people flow monitoring method and system

Also Published As

Publication number Publication date
CN110889325B (en) 2023-05-23
CN110889325A (en) 2020-03-17

Similar Documents

Publication Publication Date Title
WO2021068323A1 (en) Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
US20210012198A1 (en) Method for training deep neural network and apparatus
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
US10572072B2 (en) Depth-based touch detection
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN110135406B (en) Image recognition method and device, computer equipment and storage medium
US20210182537A1 (en) Method and apparatus for detecting facial key points, computer device, and storage medium
WO2020182121A1 (en) Expression recognition method and related device
CN107808129B (en) Face multi-feature point positioning method based on single convolutional neural network
US20200074227A1 (en) Neural network-based action detection
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
WO2021068325A1 (en) Facial action recognition model training method, facial action recognition method and apparatus, computer device, and storage medium
WO2021238548A1 (en) Region recognition method, apparatus and device, and readable storage medium
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
WO2021047587A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
US11562489B2 (en) Pixel-wise hand segmentation of multi-modal hand activity video dataset
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
WO2023083030A1 (en) Posture recognition method and related device
WO2023109361A1 (en) Video processing method and system, device, medium and product
AU2014253687A1 (en) System and method of tracking an object
CN111507288A (en) Image detection method, image detection device, computer equipment and storage medium
Zhou et al. MTCNet: Multi-task collaboration network for rotation-invariance face detection
CN114332927A (en) Classroom hand-raising behavior detection method, system, computer equipment and storage medium
Raj et al. An improved human activity recognition technique based on convolutional neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19948664

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19948664

Country of ref document: EP

Kind code of ref document: A1