WO2019100608A1

WO2019100608A1 - Video capturing device, face recognition method, system, and computer-readable storage medium

Info

Publication number: WO2019100608A1
Application number: PCT/CN2018/076140
Authority: WO
Inventors: 陈林
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-11-21
Filing date: 2018-02-10
Publication date: 2019-05-31
Also published as: CN108038422A; CN108038422B

Abstract

The present application relates to a video capturing device, a face recognition method, a system, and a computer-readable storage medium, the face recognition method comprising: carrying out face detection on video data to obtain a face image; tracking the face image, and acquiring a sequence of face images; performing image quality evaluation on the sequence of face images, and acquiring a preset number of top-ranking face images; carrying out feature point positioning on the preset number of top-ranking face images, and carrying out calibration on the basis of the positioned face images; inputting the calibrated face images into a deep neural network model which is generated by training in advance, and acquiring an outputted face feature vector; and sending the face feature vector to a server so as to carry out a comparison operation with face images in samples in a face image sample library. According to the present application, the calculation pressure of the server during face recognition may be reduced, and the network transmission pressure may be decreased.

Description

摄像装置、人脸识别的方法、***及计算机可读存储介质Camera device, method and system for face recognition, and computer readable storage medium

优先权申明Priority claim

本申请基于巴黎公约申明享有2017年11月21日递交的申请号为CN 201711166813.8、名称为“摄像装置、人脸识别的方法及计算机可读存储介质”中国专利申请的优先权，该中国专利申请的整体内容以参考的方式结合在本申请中。The present application is based on the priority of the Chinese Patent Application entitled "Camera device, face recognition method and computer readable storage medium", filed on November 21, 2017, with the application number CN 201711166813.8, which is filed on November 21, 2017. The overall content is incorporated herein by reference.

技术领域Technical field

本申请涉及图像处理技术领域，尤其涉及一种摄像装置、人脸识别的方法、***及计算机可读存储介质。The present application relates to the field of image processing technologies, and in particular, to a camera device, a method and system for recognizing a face, and a computer readable storage medium.

背景技术Background technique

目前，现有1：N动态人脸识别***一般是使用一台服务器连接一路或多路网络摄像机，服务器通过网络从摄像机收取视频数据，对视频数据进行人脸识别，这样的集中式分析方案对服务器的计算压力很大，尤其当摄像机数量较大时通常一台服务器无法满足需求，需要构建服务器阵列，且在功耗和散热方面都会有很高的要求；另外，由于视频数据需要从摄像机传送到服务器，对网络的压力也比较大，并且这种压力会随着摄像机分辨率和画质的提升而上升。At present, the existing 1:N dynamic face recognition system generally uses one server to connect one or more network cameras, and the server collects video data from the camera through the network, and performs face recognition on the video data, such a centralized analysis scheme The computing pressure of the server is very large, especially when the number of cameras is large, usually one server can not meet the demand, the server array needs to be built, and there are high requirements in terms of power consumption and heat dissipation; in addition, since the video data needs to be transmitted from the camera To the server, the pressure on the network is also relatively large, and this pressure will rise as the resolution and quality of the camera increase.

发明内容Summary of the invention

本申请的目的在于提供一种摄像装置、人脸识别的方法、***及计算机可读存储介质，旨在减轻服务器在人脸识别时的计算压力，降低网络传输压力。The purpose of the present application is to provide an image capturing apparatus, a method and system for recognizing a face, and a computer readable storage medium, which aim to alleviate the calculation pressure of the server in face recognition and reduce the network transmission pressure.

为实现上述目的，本申请提供一种摄像装置，所述摄像装置包括存储器及与所述存储器连接的处理器，所述存储器中存储有可在所述处理器上运行的处理***，所述处理***被所述处理器执行时实现如下步骤：To achieve the above object, the present application provides an image pickup apparatus including a memory and a processor connected to the memory, wherein the memory stores a processing system operable on the processor, the processing The system implements the following steps when executed by the processor:

检测步骤：对视频数据进行人脸检测，得到人脸图像；Detection step: performing face detection on the video data to obtain a face image;

追踪步骤：对人脸图像进行追踪，获取一序列的人脸图像；Tracking step: tracking the face image to obtain a sequence of face images;

图像质量评分步骤：对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；Image quality scoring step: performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with the highest score;

特征点定位步骤：对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；Feature point positioning step: performing feature point positioning on a preset number of face images with a higher score, and correcting based on the positioned face image;

特征向量输出步骤：将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；Feature vector output step: inputting the corrected face image into a depth neural network model generated by pre-training, and acquiring the output face feature vector;

传输步骤：将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。Transmission step: transmitting the face feature vector to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.

为实现上述目的，本申请还提供一种人脸识别的方法，所述人脸识别的方法包括：To achieve the above object, the present application further provides a method for face recognition, and the method for face recognition includes:

S1，对视频数据进行人脸检测，得到人脸图像；S1, performing face detection on the video data to obtain a face image;

S2，对人脸图像进行追踪，获取一序列的人脸图像；S2, tracking the face image to obtain a sequence of face images;

S3，对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；S3, performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with a higher score;

S4，对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；S4, performing feature point positioning on the preset number of face images with the highest score, and correcting the image based on the positioned face image;

S5，将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；S5, the corrected face image is input into a depth neural network model generated by pre-training, and the output face feature vector is obtained;

S6，将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。S6: Send the face feature vector to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.

实现上述目的，本申请还提供一种人脸识别的***，所述人脸识别的***包括：To achieve the above object, the present application further provides a system for face recognition, the system for face recognition comprising:

检测模块，用于对视频数据进行人脸检测，得到人脸图像；a detecting module, configured to perform face detection on the video data to obtain a face image;

追踪模块，用于对人脸图像进行追踪，获取一序列的人脸图像；a tracking module for tracking a face image to obtain a sequence of face images;

评分模块，用于对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；a scoring module, configured to perform image quality scoring on the sequence of face images, and obtain a preset number of face images with a higher score;

校正模块，用于对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；a correction module, configured to perform feature point positioning on a preset number of face images that are scored first, and perform correction based on the positioned face image;

输入模块，用于将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；An input module, configured to input the corrected face image into a depth neural network model generated by pre-training, and obtain an output face feature vector;

发送模块，用于将人脸特征向量发送给服务器，以触发与人脸图像样本库中样本中的人脸图像进行比对运算的操作。The sending module is configured to send the face feature vector to the server to trigger an operation of comparing the face image in the sample in the face image sample library.

本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有处理***，所述处理***被处理器执行时实现步骤：The application further provides a computer readable storage medium having a processing system stored thereon, the processing system being implemented by a processor to implement the steps:

本申请的有益效果是：本申请每一摄像装置的处理一路视频数据，摄像装置除了采集视频外，还可以对视频进行人脸检测、追踪、图像质量评分、特征点定位及输入深度神经网络模型中，得到人脸特征向量，最后仅传输人脸特征向量给服务器，在摄像装置的数量较多时，能够大大减轻服务器的计算压力，不需要构建服务器阵列，同时，可以较大程度地降低网络传输压力，且网络传输压力并不会随着摄像装置分辨率和画质的提升而上升。The beneficial effects of the present application are: the processing of one video data per camera device of the present application, in addition to collecting video, the camera device can perform face detection, tracking, image quality scoring, feature point localization and input depth neural network model on the video. In the middle, the face feature vector is obtained, and finally only the face feature vector is transmitted to the server. When the number of camera devices is large, the calculation pressure of the server can be greatly reduced, and the server array does not need to be built, and the network transmission can be reduced to a large extent. Pressure, and network transmission pressure does not rise with the resolution and image quality of the camera.

附图说明DRAWINGS

图1为本申请各个实施例一可选的应用环境示意图；1 is a schematic diagram of an optional application environment of each embodiment of the present application;

图2是图1中摄像装置一实施例的硬件架构的示意图；2 is a schematic diagram of a hardware architecture of an embodiment of the camera device of FIG. 1;

图3为本申请人脸识别的方法一实施例的流程示意图。FIG. 3 is a schematic flowchart diagram of an embodiment of a method for recognizing a face of an applicant.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

需要说明的是，在本申请中涉及“第一”、“第二”等的描述仅用于描述目的，而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外，各个实施例之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.

参阅图1所示，是本申请人脸识别的方法的较佳实施例的应用环境示意图。该应用环境示意图包括摄像装置1及服务器2。多个摄像装置1可以分别通过网络、近场通信技术等适合的技术与服务器2进行数据交互。Referring to FIG. 1 , it is a schematic diagram of an application environment of a preferred embodiment of the method for face recognition of the present applicant. The application environment diagram includes a camera device 1 and a server 2. The plurality of imaging apparatuses 1 can perform data interaction with the server 2 through suitable technologies such as a network and a near field communication technology, respectively.

所述服务器2可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云，其中云计算是分布式计算的一种，由一群松散耦合的计算机集组成的一个超级虚拟计算机。The server 2 may be a single network server, a server group composed of a plurality of network servers, or a cloud-based cloud composed of a large number of hosts or network servers, wherein the cloud computing is a kind of distributed computing, and is a group of loosely coupled computers. A set of super virtual computers.

所述摄像装置1是一种常见的包括摄像头的、可动态采集图像的电子产品，且能够按照事先设定或者存储的指令，自动进行数值计算和/或信息处理的设备。The camera device 1 is a common electronic product that includes a camera and can dynamically acquire images, and can automatically perform numerical calculation and/or information processing according to preset or stored instructions.

结合参阅图2，在本实施例中，摄像装置1可包括，但不仅限于，可通过***总线相互通信连接的存储器11、处理器12、网络接口13及摄像头14，存储器11存储有可在处理器12上运行的处理***。需要指出的是，图2仅示出了具有组件11-14的摄像装置1，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。Referring to FIG. 2, in the embodiment, the image capturing apparatus 1 may include, but is not limited to, a memory 11, a processor 12, a network interface 13, and a camera 14 that are communicably connected to each other through a system bus. The memory 11 is stored and processed. The processing system running on the device 12. It is to be noted that FIG. 2 only shows the camera device 1 having the components 11-14, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.

其中，每一摄像装置1均包括处理器(处理器为用于处理图像的nvidia tx2芯片)，nvidia tx2芯片可以通过usb或csi或网络接口连接于摄像装置1 上，用以运行处理***。摄像装置1与服务器2之间通过网络连接，服务器2中存储有人脸图像样本库。摄像装置1安装于特定场所(例如办公场所、监控区域)，对进入该特定场所的目标实时拍摄得到视频，处理器对视频进行处理得到人脸特征向量，然后仅仅将人脸特征向量通过网络发送给服务器2，由服务器2基于人脸图像样本库进行比对，实现人脸识别。Each camera device 1 includes a processor (the processor is an nvidia tx2 chip for processing images), and the nvidia tx2 chip can be connected to the camera device 1 via a usb or csi or a network interface to run the processing system. The imaging device 1 and the server 2 are connected by a network, and the server 2 stores a database of human face image samples. The camera device 1 is installed in a specific place (for example, an office place and a monitoring area), and a video is captured in real time for a target entering the specific place, and the processor processes the video to obtain a face feature vector, and then only sends the face feature vector through the network. To the server 2, the server 2 performs comparison based on the face image sample library to implement face recognition.

其中，存储器11包括内存及至少一种类型的可读存储介质。内存为摄像装置1的运行提供缓存；可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如，SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中，可读存储介质可以是摄像装置1的内部存储单元，例如该摄像装置1的硬盘；在另一些实施例中，该非易失性存储介质也可以是摄像装置1的外部存储设备，例如摄像装置1上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。本实施例中，存储器11的可读存储介质通常用于存储安装于摄像装置1的操作***和各类应用软件，例如本申请一实施例中的处理***的程序代码等。此外，存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the camera device 1; the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM). A non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the camera 1, such as a hard disk of the camera 1; in other embodiments, the non-volatile storage medium may also be external to the camera 1. The storage device is, for example, a plug-in hard disk provided on the camera 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like. In this embodiment, the readable storage medium of the memory 11 is generally used to store an operating system installed in the image pickup apparatus 1 and various types of application software, such as program codes of the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.

本实施例中，所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据，例如运行处理***等。In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.

所述网络接口13可包括无线网络接口或有线网络接口，该网络接口13通常用于在所述摄像装置1与其他电子设备之间建立通信连接。本实施例中，网络接口13主要用于将摄像装置1与服务器2相连，在摄像装置1与服务器2之间建立数据传输通道和通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the camera device 1 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the camera device 1 to the server 2, and establish a data transmission channel and a communication connection between the camera device 1 and the server 2.

所述处理***存储在存储器11中，包括至少一个存储在存储器11中的计算机可读指令，该至少一个计算机可读指令可被处理器器12执行，以实现本申请各实施例的方法；以及，该至少一个计算机可读指令依据其各部分所实现的功能不同，可被划为不同的逻辑模块。The processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; The at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.

在一实施例中，上述处理***被所述处理器12执行时实现如下步骤：In an embodiment, when the processing system is executed by the processor 12, the following steps are implemented:

对视频数据中的每一帧图像基于人脸的特征进行人脸检测，每一帧图像中可能有一个或多个人脸，或者没有人脸，在进行人脸检测后，可以从图像中提取得到人脸图像。Face detection is performed on each frame image in the video data based on the features of the face, and there may be one or more faces in each frame image, or no face, and after face detection, the image may be extracted from the image. Face image.

其中，人脸图像为仅包括人脸区域的图像(无其他背景)，人脸区域可大可小，对于远景拍摄的人脸图像，其人脸区域小，对于近景拍摄的人脸图像，其人脸区域大。人脸区域为包括人脸的最小区域，优选为包括人脸的矩形区域，当然也可以是其他形状的包括人脸的区域，例如圆形区域等，此处不做过多限定。The face image is an image including only a face region (no other background), and the face region can be large or small. For a face image taken in a distant scene, the face region is small, and for a close-up shot of a face image, The face area is large. The face area is a minimum area including a human face, and is preferably a rectangular area including a human face. Of course, it may be an area including a human face of other shapes, such as a circular area, and is not limited thereto.

本实施例中，在人脸追踪时，可以计算相邻两帧人脸图像的相似度，以实现人脸追踪。在一实施例中，可以根据相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值计算人脸的相似度；在其他实施例中，可以根据相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度。基于相邻两帧人脸图像中人脸的相似度进行人脸追踪，得到同一人的一序列的人脸图像，对于人脸图像中出现两个或两个以上的人的，也可以分别得到各人对应的一序列的人脸图像。In this embodiment, when face tracking is performed, the similarity of the adjacent two frames of the face image can be calculated to implement face tracking. In an embodiment, the similarity of the face may be calculated according to the X and Y coordinate values of the center point of the face region in the adjacent two frames of the face image; in other embodiments, the face image of the adjacent two frames may be used. The X, Y coordinate values of the center point of the face region in the face region, and the height H and the width W value of the face region are calculated, and the similarity of the faces in the adjacent two frames of the face image is calculated. The face tracking is performed based on the similarity of the faces in the adjacent two frames of the face image, and a sequence of face images of the same person is obtained, and two or more persons appearing in the face image may also be respectively obtained. A sequence of face images corresponding to each person.

其中，对每一序列的人脸图像进行图像质量评分时，根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。Wherein, when the image quality of each sequence of face images is scored, the quality of each face image in the series is scored according to the gradient values and coordinate values of predetermined points in the face image.

其中，预定的点包括眼部点、鼻部点及嘴部点，预定的点的梯度值为平均梯度(meangradient)，平均梯度指人脸图像的预定的点的边界或影线两侧附近灰度有明显差异，即灰度变化率大，这种变化率的大小可用来表示图像清晰度，反映了预定的点微小细节反差变化的速率，即预定的点多维方向上密度变化的速率，表征人脸图像的相对清晰程度。预定的点的坐标值至少包括眼部点及鼻部点的x横坐标。Wherein, the predetermined points include an eye point, a nose point and a mouth point, the gradient value of the predetermined point is a mean gradient, and the average gradient refers to a boundary of a predetermined point of the face image or a gray near the sides of the shadow line. There is a significant difference in the degree of gray scale change, which can be used to indicate the sharpness of the image, reflecting the rate of change of the minute detail contrast of the predetermined point, that is, the rate of density change in the multi-dimensional direction of the predetermined point, and the characterization The relative clarity of the face image. The coordinate values of the predetermined points include at least the x-axis of the eye point and the nose point.

在对该系列中的每张人脸图像的质量进行评分的过程中，拍摄得到的人脸图像中，双眼之间距离越大、双眼中心点与鼻尖的x横坐标越接近，平均梯度值越大，图像的评分就越高，表示人脸图像为正脸图像的概率越大。In the process of scoring the quality of each face image in the series, in the captured face image, the greater the distance between the eyes, the closer the x-axis of the center point of the eyes to the tip of the nose, and the average gradient value is Large, the higher the score of the image, the greater the probability that the face image is a positive face image.

本实施例中，对每一序列的人脸图像，为了方便选出正脸的人脸图像，将评分结果降序排列，即人脸图像为正脸图像的排列在前，从排列的序列中选取评分靠前的预设数量的人脸图像，例如选取7张人脸图像。In this embodiment, for each sequence of face images, in order to conveniently select a face image of a positive face, the scoring results are arranged in descending order, that is, the face image is arranged in front of the face image, and the sequence is selected from the arranged sequence. The preset number of face images of the top score, for example, 7 face images are selected.

对于对评分靠前的预设数量的人脸图像进行特征点定位，特征点至少包括眼部特征点、嘴部特征点、鼻部特征点，基于特征点定位后的人脸图像进行校正。For feature points positioning on a preset number of face images, the feature points include at least an eye feature point, a mouth feature point, and a nose feature point, and are corrected based on the face image after the feature point is positioned.

本实施例中，将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，通过深度神经网络模型对其进行计算后输出每一人脸图像的人脸特征向量，然后摄像装置仅仅将人脸特征向量传输至服务器端进行1:N动态识别。In this embodiment, the corrected face image is input into a depth neural network model generated by pre-training, and is calculated by a deep neural network model, and then the face feature vector of each face image is output, and then the camera device only The face feature vector is transmitted to the server for 1:N dynamic recognition.

与现有技术相比，本实施例每一摄像装置处理一路视频数据，摄像装置除了采集视频外，还可以对视频进行人脸检测、追踪、图像质量评分、特征点定位及输入深度神经网络模型中，得到人脸特征向量，最后仅传输人脸特征向量给服务器，在摄像装置的数量较多时，能够大大减轻服务器的计算压力，不需要构建服务器阵列，同时，可以较大程度地降低网络传输压力，且网络传输压力并不会随着摄像装置分辨率和画质的提升而上升。Compared with the prior art, each camera device in the embodiment processes one channel of video data, and the camera device can perform face detection, tracking, image quality scoring, feature point location, and input depth neural network model in addition to video capture. In the middle, the face feature vector is obtained, and finally only the face feature vector is transmitted to the server. When the number of camera devices is large, the calculation pressure of the server can be greatly reduced, and the server array does not need to be built, and the network transmission can be reduced to a large extent. Pressure, and network transmission pressure does not rise with the resolution and image quality of the camera.

在一优选的实施例中，在上述图2的实施例的基础上，所述处理***被所述处理器12执行时，在人脸检测之前，还包括：In a preferred embodiment, on the basis of the foregoing embodiment of FIG. 2, when the processing system is executed by the processor 12, before the face detection, the method further includes:

分析视频数据是压缩的视频数据还是非压缩的视频数据；Analyzing whether the video data is compressed video data or uncompressed video data;

若是非压缩的视频数据，则将视频数据的格式转换为可进行人脸检测的格式；If it is uncompressed video data, the format of the video data is converted into a format capable of face detection;

若是压缩的视频数据，则对视频数据进行解码再将视频数据的格式转换为可进行人脸检测的格式。In the case of compressed video data, the video data is decoded and the format of the video data is converted into a format capable of face detection.

本实施例中，摄像装置在采集视频数据后可能会将其进行压缩，其中，在压缩时可以按照实时性将采集的视频数据进行非实时性压缩或者实时压缩，本实施例优选为实时压缩。另外可以根据实际情况将采集的视频数据进行有损压缩，压缩比率为预定的比率，优选为5:1。视频压缩的算法包括M-JPEG(Motion-Join Photographic Experts Group，运动图像逐帧压缩技术)、Mpeg(Moving Pictures Experts Group，动态图像专家组)、H.264、Wavelet(小波压缩)、JPEG 2000、AVS压缩等，经过上述的压缩算法得到压缩的输数据。在人脸检测之前，可以分析视频数据是否被压缩，具体地，可以分析其格式是否为压缩后的格式，如果是将其进一步处理，例如对于摄像头利用M-JPEG进行压缩后，其格式为YCrCB，则需要将YCrCB格式的视频数据转换为RGB格式，以便可以执行人脸检测。In this embodiment, the camera device may compress the video data after the video data is collected. The compressed video data may be non-real-time compressed or compressed in real time according to the real-time performance. This embodiment is preferably real-time compression. In addition, the captured video data may be subjected to lossy compression according to actual conditions, and the compression ratio is a predetermined ratio, preferably 5:1. Video compression algorithms include M-JPEG (Motion-Join Photographic Experts Group), Mpeg (Moving Pictures Experts Group), H.264, Wavelet (Wavelet Compression), JPEG 2000, AVS compression, etc., through the above compression algorithm to obtain compressed data. Before the face detection, it can be analyzed whether the video data is compressed. Specifically, it can be analyzed whether the format is a compressed format, and if it is further processed, for example, the camera is compressed by M-JPEG, and the format is YCrCB. , you need to convert the video data in YCrCB format to RGB format so that face detection can be performed.

在一优选的实施例中，在上述图2的实施例的基础上，所述追踪步骤，具体包括：In a preferred embodiment, based on the foregoing embodiment of FIG. 2, the tracking step specifically includes:

获取相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，根据相邻两帧人脸图像的X、Y坐标值，以及人高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度；Obtaining the X and Y coordinate values of the center point of the face region in the adjacent two frames of the face image, and the height H and the width W value of the face region, according to the X and Y coordinate values of the adjacent two frames of the face image, and The height H and the width W value of the person are calculated, and the similarity of the face in the adjacent two frames of the face image is calculated;

基于相邻两帧人脸图像中人脸的相似度进行人脸追踪。Face tracking is performed based on the similarity of faces in adjacent two frames of face images.

其中，相似度计算步骤包括：The similarity calculation step includes:

所述S _i,j为相似度，所述w _x,w _y,w _w,w _h分别为相邻两帧人脸i、人脸j的x方向距离、y方向距离、宽度差异、高度差异的权重，w _x,w _y,w _w,w _h∈[0,1]，其中：

The S _i,j is a similarity, and the w _x , w _y , w _w , w _h are the x-direction distance, the y-direction distance, the width difference, and the height difference of the adjacent two frames of the face i and the face j, respectively. Weight, w _x , w _y , w _w , w _h ∈[0,1], where:

为人脸i和人脸j中心点之间x方向距离；

The distance between the face i and the center point of the face j in the x direction;

为人脸i和人脸j中心点之间y方向距离；

The distance between the face i and the center point of the face j in the y direction;

为人脸i和人脸j的宽度差异；

The difference in width between face i and face j;

为人脸i和人脸j的高度差异。

The difference in height between face i and face j.

当该相邻两帧人脸图像中人脸的相似度大于等于预设阈值时，判断该相邻两帧人脸图像中的人脸为同一人的人脸。When the similarity of the face in the adjacent two frames of the face image is greater than or equal to a preset threshold, the face in the adjacent two frames of the face image is determined to be the same person's face.

在一优选的实施例中，在上述图2的实施例的基础上，所述图像质量评分步骤，具体包括：根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。In a preferred embodiment, based on the foregoing embodiment of FIG. 2, the image quality scoring step specifically includes: each of the series according to a gradient value and a coordinate value of a predetermined point in the face image. The quality of the face image is scored.

其中，所述预定的点包括眼部点、鼻部点及嘴部点，所述梯度值为眼部点、鼻部点及嘴部点的平均梯度，所述眼部点包括左眼球点及右眼球点，所述嘴部点包括左嘴角点及右嘴角点，所述图像质量评分的公式为：The predetermined point includes an eye point, a nose point, and a mouth point, and the gradient value is an average gradient of an eye point, a nose point, and a mouth point, and the eye point includes a left eye point and The right eye point, the mouth point includes a left mouth corner point and a right mouth corner point, and the image quality score is calculated as:

p＝((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|；p=((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|;

其中，p为图像质量评分，x_LeftEye、x_RightEye表示左眼球、右眼球的X坐标，x_Nose表示鼻尖点的X坐标，grad为眼部点、鼻部点及嘴部点三者的平均梯度。Where p is the image quality score, x_LeftEye, x_RightEye represents the X coordinate of the left and right eyeballs, x_Nose represents the X coordinate of the tip of the nose, and grad is the average gradient of the eye point, nose point and mouth point.

本实施例选取人脸中的眼部点、鼻部点及嘴部点的坐标来对人脸图像的质量进行评分，能够客观、准确地对人脸图像进行评估，以便获取到评分高的人脸图像，方便后续进行校正等处理。In this embodiment, the coordinates of the eye point, the nose point, and the mouth point in the face are selected to score the quality of the face image, and the face image can be objectively and accurately evaluated to obtain a person with a high score. The face image is convenient for subsequent correction and other processing.

如图3所示，图3为本申请人脸识别的方法一实施例的流程示意图，该方法包括以下步骤：As shown in FIG. 3, FIG. 3 is a schematic flowchart of an embodiment of a method for recognizing a face of an applicant, where the method includes the following steps:

步骤S1，对视频数据进行人脸检测，得到人脸图像；Step S1, performing face detection on the video data to obtain a face image;

步骤S2，对人脸图像进行追踪，获取一序列的人脸图像；Step S2, tracking the face image to obtain a sequence of face images;

步骤S3，对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；Step S3, performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with a higher score;

步骤S4，对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；Step S4: performing feature point positioning on the preset number of face images that are scored first, and correcting based on the positioned face image;

步骤S5，将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；Step S5, the corrected face image is input into the depth neural network model generated by the pre-training, and the output face feature vector is obtained;

步骤S6，将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。In step S6, the face feature vector is sent to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.

在一优选的实施例中，在上述图3的实施例的基础上，所述步骤S1之前，还包括：In a preferred embodiment, on the basis of the foregoing embodiment of FIG. 3, before the step S1, the method further includes:

本实施例中，摄像装置在采集视频数据后可能会将其进行压缩，其中，在压缩时可以按照实时性将采集的视频数据进行非实时性压缩或者实时压缩，本实施例优选为实时压缩。另外可以根据实际情况将采集的视频数据进行有损压缩，压缩比率为预定的比率，优选为5:1。视频压缩的算法包括M-JPEG(Motion-Join Photographic Experts Group，运动图像逐帧压缩技术)、Mpeg(Moving Pictures Experts Group，动态图像专家组)、H.264、Wavelet(小波压缩)、JPEG 2000、AVS压缩等，经过上述的压缩算法得到压缩的视频数据。在人脸检测之前，可以分析视频数据是否被压缩，具体地，可以分析其格式是否为压缩后的格式，如果是将其进一步处理，例如对于摄像头利用M-JPEG进行压缩后，其格式为YCrCB，则需要将YCrCB格式的视频数据转换为RGB格式，以便可以执行人脸检测。In this embodiment, the camera device may compress the video data after the video data is collected. The compressed video data may be non-real-time compressed or compressed in real time according to the real-time performance. This embodiment is preferably real-time compression. In addition, the captured video data may be subjected to lossy compression according to actual conditions, and the compression ratio is a predetermined ratio, preferably 5:1. Video compression algorithms include M-JPEG (Motion-Join Photographic Experts Group), Mpeg (Moving Pictures Experts Group), H.264, Wavelet (Wavelet Compression), JPEG 2000, AVS compression, etc., obtains compressed video data through the above compression algorithm. Before the face detection, it can be analyzed whether the video data is compressed. Specifically, it can be analyzed whether the format is a compressed format, and if it is further processed, for example, the camera is compressed by M-JPEG, and the format is YCrCB. , you need to convert the video data in YCrCB format to RGB format so that face detection can be performed.

在一优选的实施例中，在上述图3的实施例的基础上，所述步骤S2，具体包括：In a preferred embodiment, based on the foregoing embodiment of FIG. 3, the step S2 specifically includes:

为人脸i和人脸j中心点之间x方向距离；

为人脸i和人脸j中心点之间y方向距离；

为人脸i和人脸j的宽度差异；

The difference in width between face i and face j;

为人脸i和人脸j的高度差异。

The difference in height between face i and face j.

在一优选的实施例中，在上述图3的实施例的基础上，所述步骤S3，具体包括：In a preferred embodiment, based on the foregoing embodiment of FIG. 3, the step S3 specifically includes:

根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。The quality of each face image in the series is scored according to the gradient values and coordinate values of the predetermined points in the face image.

本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有处理***，所述处理***被处理器执行时实现上述的人脸识别的方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of face recognition described above.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

一种摄像装置，其特征在于，所述摄像装置包括存储器及与所述存储器连接的处理器，所述存储器中存储有可在所述处理器上运行的处理***，所述处理***被所述处理器执行时实现如下步骤：An image pickup apparatus, comprising: a memory and a processor connected to the memory, wherein the memory stores a processing system operable on the processor, wherein the processing system is The processor implements the following steps when it executes:

检测步骤：对视频数据进行人脸检测，得到人脸图像；Detection step: performing face detection on the video data to obtain a face image;

追踪步骤：对人脸图像进行追踪，获取一序列的人脸图像；Tracking step: tracking the face image to obtain a sequence of face images;

图像质量评分步骤：对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；Image quality scoring step: performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with the highest score;

特征点定位步骤：对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；Feature point positioning step: performing feature point positioning on a preset number of face images with a higher score, and correcting based on the positioned face image;

特征向量输出步骤：将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；Feature vector output step: inputting the corrected face image into a depth neural network model generated by pre-training, and acquiring the output face feature vector;

传输步骤：将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。Transmission step: transmitting the face feature vector to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.
根据权利要求1所述的摄像装置，其特征在于，在所述检测步骤之前，所述处理***被所述处理器执行时，还实现如下步骤：The image pickup apparatus according to claim 1, wherein when said processing system is executed by said processor before said detecting step, the following steps are further implemented:

分析视频数据是压缩的视频数据还是非压缩的视频数据；Analyzing whether the video data is compressed video data or uncompressed video data;

若是非压缩的视频数据，则将视频数据的格式转换为可进行人脸检测的格式；If it is uncompressed video data, the format of the video data is converted into a format capable of face detection;

若是压缩的视频数据，则对视频数据进行解码再将视频数据的格式转换为可进行人脸检测的格式。In the case of compressed video data, the video data is decoded and the format of the video data is converted into a format capable of face detection.
根据权利要求1或2所述的摄像装置，其特征在于，所述追踪步骤，具体包括：The camera apparatus according to claim 1 or 2, wherein the tracking step specifically includes:

获取相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，根据相邻两帧人脸图像的X、Y坐标值，以及人高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度；Obtaining the X and Y coordinate values of the center point of the face region in the adjacent two frames of the face image, and the height H and the width W value of the face region, according to the X and Y coordinate values of the adjacent two frames of the face image, and The height H and the width W value of the person are calculated, and the similarity of the face in the adjacent two frames of the face image is calculated;

基于相邻两帧人脸图像中人脸的相似度进行人脸追踪。Face tracking is performed based on the similarity of faces in adjacent two frames of face images.
根据权利要求3所述的摄像装置，其特征在于，所述图像质量评分步骤，具体包括：The image capturing device according to claim 3, wherein the image quality scoring step comprises:

根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。The quality of each face image in the series is scored according to the gradient values and coordinate values of the predetermined points in the face image.
根据权利要求4所述的摄像装置，其特征在于，所述预定的点包括眼部点、鼻部点及嘴部点，所述梯度值为眼部点、鼻部点及嘴部点的平均梯度，所述眼部点包括左眼球点及右眼球点，所述嘴部点包括左嘴角点及右嘴角点，所述图像质量评分步骤，进一步包括：The image pickup apparatus according to claim 4, wherein said predetermined point includes an eye point, a nose point, and a mouth point, and said gradient value is an average of an eye point, a nose point, and a mouth point. a gradient, the eye point includes a left eye point and a right eye point, the mouth point includes a left mouth corner point and a right mouth corner point, and the image quality scoring step further includes:

p＝((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|；p=((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|;

其中，p为图像质量评分，x_LeftEye、x_RightEye表示左眼球、右眼球的X坐标，x_Nose表示鼻尖点的X坐标，grad为眼部点、鼻部点及嘴部点三者的平均梯度。Where p is the image quality score, x_LeftEye, x_RightEye represents the X coordinate of the left and right eyeballs, x_Nose represents the X coordinate of the tip of the nose, and grad is the average gradient of the eye point, nose point and mouth point.
一种人脸识别的方法，其特征在于，所述人脸识别的方法包括：A method for face recognition, characterized in that the method for face recognition comprises:

S1，对视频数据进行人脸检测，得到人脸图像；S1, performing face detection on the video data to obtain a face image;

S2，对人脸图像进行追踪，获取一序列的人脸图像；S2, tracking the face image to obtain a sequence of face images;

S3，对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；S3, performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with a higher score;

S4，对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；S4, performing feature point positioning on the preset number of face images with the highest score, and correcting the image based on the positioned face image;

S5，将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；S5, the corrected face image is input into a depth neural network model generated by pre-training, and the output face feature vector is obtained;

S6，将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。S6: Send the face feature vector to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.
根据权利要求6所述的人脸识别的方法，其特征在于，所述步骤S1之前，还包括：The method for the face recognition according to claim 6, wherein before the step S1, the method further comprises:

分析视频数据是压缩的视频数据还是非压缩的视频数据；Analyzing whether the video data is compressed video data or uncompressed video data;

若是非压缩的视频数据，则将视频数据的格式转换为可进行人脸检测的格式；If it is uncompressed video data, the format of the video data is converted into a format capable of face detection;

若是压缩的视频数据，则对视频数据进行解码再将视频数据的格式转换为可进行人脸检测的格式。In the case of compressed video data, the video data is decoded and the format of the video data is converted into a format capable of face detection.
根据权利要求6或7所述的人脸识别的方法，其特征在于，所述步骤S2，具体包括：The method of the face recognition according to claim 6 or 7, wherein the step S2 comprises:

获取相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，根据相邻两帧人脸图像的X、Y坐标值，以及人高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度；Obtaining the X and Y coordinate values of the center point of the face region in the adjacent two frames of the face image, and the height H and the width W value of the face region, according to the X and Y coordinate values of the adjacent two frames of the face image, and The height H and the width W value of the person are calculated, and the similarity of the face in the adjacent two frames of the face image is calculated;

基于相邻两帧人脸图像中人脸的相似度进行人脸追踪。Face tracking is performed based on the similarity of faces in adjacent two frames of face images.
根据权利要求8所述的人脸识别的方法，其特征在于，所述步骤S3，具体包括：The method for the face recognition according to claim 8, wherein the step S3 comprises:

根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。The quality of each face image in the series is scored according to the gradient values and coordinate values of the predetermined points in the face image.
根据权利要求9所述的人脸识别的方法，其特征在于，所述预定的点包括眼部点、鼻部点及嘴部点，所述梯度值为眼部点、鼻部点及嘴部点的平均梯度，所述眼部点包括左眼球点及右眼球点，所述嘴部点包括左嘴角点及右嘴角点，所述步骤S3，进一步包括：The method according to claim 9, wherein the predetermined point comprises an eye point, a nose point, and a mouth point, and the gradient value is an eye point, a nose point, and a mouth part. An average gradient of the points, the eye points include a left eye point and a right eye point, the mouth point including a left corner point and a right corner corner point, and the step S3 further includes:

p＝((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|；p=((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|;

其中，p为图像质量评分，x_LeftEye、x_RightEye表示左眼球、右眼球的X坐标，x_Nose表示鼻尖点的X坐标，grad为眼部点、鼻部点及嘴部点三者的平均梯度。Where p is the image quality score, x_LeftEye, x_RightEye represents the X coordinate of the left and right eyeballs, x_Nose represents the X coordinate of the tip of the nose, and grad is the average gradient of the eye point, nose point and mouth point.
一种人脸识别的***，其特征在于，所述人脸识别的***包括：A system for face recognition, characterized in that the system for face recognition comprises:

检测模块，用于对视频数据进行人脸检测，得到人脸图像；a detecting module, configured to perform face detection on the video data to obtain a face image;

追踪模块，用于对人脸图像进行追踪，获取一序列的人脸图像；a tracking module for tracking a face image to obtain a sequence of face images;

评分模块，用于对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；a scoring module, configured to perform image quality scoring on the sequence of face images, and obtain a preset number of face images with a higher score;

校正模块，用于对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；a correction module, configured to perform feature point positioning on a preset number of face images that are scored first, and perform correction based on the positioned face image;

输入模块，用于将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；An input module, configured to input the corrected face image into a depth neural network model generated by pre-training, and obtain an output face feature vector;

发送模块，用于将人脸特征向量发送给服务器，以触发与人脸图像样本库中样本中的人脸图像进行比对运算的操作。The sending module is configured to send the face feature vector to the server to trigger an operation of comparing the face image in the sample in the face image sample library.
根据权利要求11所述的人脸识别的***，其特征在于，还包括：The system for face recognition according to claim 11, further comprising:

分析模块，用于分析视频数据是压缩的视频数据还是非压缩的视频数据；An analysis module, configured to analyze whether the video data is compressed video data or uncompressed video data;

第一转换模块，用于若是非压缩的视频数据，则将视频数据的格式转换为可进行人脸检测的格式；a first conversion module, configured to convert the format of the video data into a format capable of face detection if the video data is uncompressed;

第二转换模块，用于若是压缩的视频数据，则对视频数据进行解码再将视频数据的格式转换为可进行人脸检测的格式。The second conversion module is configured to decode the video data and convert the format of the video data into a format capable of face detection if the compressed video data is used.
根据权利要求11或12所述的人脸识别的***，其特征在于，所述追踪模块，具体用于获取相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，根据相邻两帧人脸图像的X、Y坐标值，以及人高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度；基于相邻两帧人脸图像中人脸的相似度进行人脸追踪。The system for recognizing a face according to claim 11 or 12, wherein the tracking module is specifically configured to acquire X and Y coordinate values of a center point of a face region in adjacent two frames of face images, and The height H and the width W of the face region are calculated according to the X and Y coordinate values of the adjacent two frames of the face image, and the human height H and the width W value, and the similarity of the faces in the adjacent two frames of the face image is calculated. Degree; face tracking based on the similarity of faces in two adjacent frames of face images.
根据权利要求13所述的人脸识别的***，其特征在于，所述评分模块，具体用于根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。The system for recognizing a face according to claim 13, wherein the scoring module is specifically configured to: according to a gradient value and a coordinate value of a predetermined point in the face image, each face image in the series Quality is scored.
根据权利要求14所述的人脸识别的***，其特征在于，所述预定的点包括眼部点、鼻部点及嘴部点，所述梯度值为眼部点、鼻部点及嘴部点的平均梯度，所述眼部点包括左眼球点及右眼球点，所述嘴部点包括左嘴角点及右嘴角点，所述追踪模块，进一步用于：The system for face recognition according to claim 14, wherein said predetermined points include eye points, nose points, and mouth points, and said gradient values are eye points, nose points, and mouth portions. An average gradient of points, the eye points including a left eye point and a right eye point, the mouth point including a left mouth corner point and a right mouth corner point, and the tracking module is further configured to:

p＝((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|；p=((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|;

其中，p为图像质量评分，x_LeftEye、x_RightEye表示左眼球、右眼球的X坐标，x_Nose表示鼻尖点的X坐标，grad为眼部点、鼻部点及嘴部点三者的平均梯度。Where p is the image quality score, x_LeftEye, x_RightEye represents the X coordinate of the left and right eyeballs, x_Nose represents the X coordinate of the tip of the nose, and grad is the average gradient of the eye point, nose point and mouth point.
一种计算机可读存储介质，其特征在于，所述计算机可读存储介质上存储有处理***，所述处理***被处理器执行时实现如下步骤：A computer readable storage medium, wherein the computer readable storage medium stores a processing system, and when the processing system is executed by the processor, the following steps are implemented:

检测步骤：对视频数据进行人脸检测，得到人脸图像；Detection step: performing face detection on the video data to obtain a face image;

追踪步骤：对人脸图像进行追踪，获取一序列的人脸图像；Tracking step: tracking the face image to obtain a sequence of face images;

图像质量评分步骤：对序列的人脸图像进行图像质量评分，获取评分靠前的预设数量的人脸图像；Image quality scoring step: performing image quality scoring on the sequence of face images, and obtaining a preset number of face images with the highest score;

特征点定位步骤：对评分靠前的预设数量的人脸图像进行特征点定位，基于定位后的人脸图像进行校正；Feature point positioning step: performing feature point positioning on a preset number of face images with a higher score, and correcting based on the positioned face image;

特征向量输出步骤：将校正后的人脸图像输入至预先训练生成的深度神经网络模型中，并获取输出的人脸特征向量；Feature vector output step: inputting the corrected face image into a depth neural network model generated by pre-training, and acquiring the output face feature vector;

传输步骤：将人脸特征向量发送给服务器，以执行与人脸图像样本库中样本中的人脸图像进行比对运算的步骤。Transmission step: transmitting the face feature vector to the server to perform a step of performing a comparison operation with the face image in the sample in the face image sample library.
根据权利要求16所述的计算机可读存储介质，其特征在于，在所述检测步骤之前，所述处理***被所述处理器执行时，还实现如下步骤：The computer readable storage medium according to claim 16, wherein, before said detecting step, said processing system is executed by said processor, further implementing the following steps:

分析视频数据是压缩的视频数据还是非压缩的视频数据；Analyzing whether the video data is compressed video data or uncompressed video data;

若是非压缩的视频数据，则将视频数据的格式转换为可进行人脸检测的格式；In the case of uncompressed video data, the format of the video data is converted into a format capable of face detection;

若是压缩的视频数据，则对视频数据进行解码再将视频数据的格式转换为可进行人脸检测的格式。In the case of compressed video data, the video data is decoded and the format of the video data is converted into a format capable of face detection.
根据权利要求16或17所述的计算机可读存储介质，其特征在于，所述追踪步骤，具体包括：The computer readable storage medium according to claim 16 or 17, wherein the step of tracking comprises:

获取相邻两帧人脸图像中的人脸区域中心点的X、Y坐标值，以及人脸区域的高度H、宽度W值，根据相邻两帧人脸图像的X、Y坐标值，以及人高度H、宽度W值，计算得到该相邻两帧人脸图像中人脸的相似度；Obtaining the X and Y coordinate values of the center point of the face region in the adjacent two frames of the face image, and the height H and the width W value of the face region, according to the X and Y coordinate values of the adjacent two frames of the face image, and The height H and the width W value of the person are calculated, and the similarity of the face in the adjacent two frames of the face image is calculated;

基于相邻两帧人脸图像中人脸的相似度进行人脸追踪。Face tracking is performed based on the similarity of faces in adjacent two frames of face images.
根据权利要求18所述的计算机可读存储介质，其特征在于，所述图像质量评分步骤，具体包括：The computer readable storage medium according to claim 18, wherein the image quality scoring step comprises:

根据人脸图像中预定的点的梯度值及坐标值对该系列中的每张人脸图像的质量进行评分。The quality of each face image in the series is scored according to the gradient values and coordinate values of the predetermined points in the face image.
根据权利要求19所述的计算机可读存储介质，其特征在于，所述预定的点包括眼部点、鼻部点及嘴部点，所述梯度值为眼部点、鼻部点及嘴部点的平均梯度，所述眼部点包括左眼球点及右眼球点，所述嘴部点包括左嘴角点及右嘴角点，所述图像质量评分步骤，进一步包括：The computer readable storage medium according to claim 19, wherein said predetermined point comprises an eye point, a nose point, and a mouth point, and said gradient value is an eye point, a nose point, and a mouth An average gradient of points, the eye points include a left eye point and a right eye point, the mouth point including a left mouth corner point and a right mouth corner point, and the image quality scoring step further includes:

p＝((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|；p=((x_LeftEye-x_RightEye)^2×grad)/|(x_LeftEye+x_RightEye)/2-x_Nose|;

其中，p为图像质量评分，x_LeftEye、x_RightEye表示左眼球、右眼球的X坐标，x_Nose表示鼻尖点的X坐标，grad为眼部点、鼻部点及嘴部点三者的平均梯度。Where p is the image quality score, x_LeftEye, x_RightEye represents the X coordinate of the left and right eyeballs, x_Nose represents the X coordinate of the tip of the nose, and grad is the average gradient of the eye point, nose point and mouth point.