WO2017096753A1 - 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质 - Google Patents

人脸关键点跟踪方法、终端和非易失性计算机可读存储介质 Download PDF

Info

Publication number
WO2017096753A1
WO2017096753A1 PCT/CN2016/081631 CN2016081631W WO2017096753A1 WO 2017096753 A1 WO2017096753 A1 WO 2017096753A1 CN 2016081631 W CN2016081631 W CN 2016081631W WO 2017096753 A1 WO2017096753 A1 WO 2017096753A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
key point
face key
coordinate
image
Prior art date
Application number
PCT/CN2016/081631
Other languages
English (en)
French (fr)
Inventor
汪铖杰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017096753A1 publication Critical patent/WO2017096753A1/zh
Priority to US15/715,398 priority Critical patent/US10452893B2/en
Priority to US16/567,940 priority patent/US11062123B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of image processing, and in particular, to a face key point tracking method and apparatus, a terminal, and a non-transitory computer readable storage medium.
  • Face tracking refers to the process of determining the motion trajectory and size of a face in a video file or video stream or image sequence. Face tracking is of great significance in the field of image analysis and recognition of images.
  • the robustness and real-time performance of the face tracking algorithm are two indicators that are difficult to satisfy at the same time, because with the improvement of the robustness, the complexity of the algorithm will increase greatly, limited by limited computer processing power, face tracking. The real-time performance will be reduced.
  • a face key point tracking method includes:
  • the coordinate position of the face key point of the image of the previous frame is taken as the initial position of the key point of the face of the next frame image
  • a terminal comprising a memory and a processor, wherein the memory stores computer readable instructions, and when the instructions are executed by the processor, the processor performs the following steps:
  • the coordinate position of the face key point of the image of the previous frame is taken as the initial position of the key point of the face of the next frame image
  • One or more non-transitory computer readable storage media containing computer executable instructions that, when executed by one or more processors, cause the processor to perform the following steps:
  • the coordinate position of the face key point of the image of the previous frame is taken as the initial position of the key point of the face of the next frame image
  • FIG. 1 is a schematic diagram showing the internal structure of a terminal in an embodiment
  • FIG. 2 is a flow chart of a method for tracking a face key point in an embodiment
  • FIG. 3 is a flow chart showing specific steps of configuring an initial position of a face key point according to the face coordinate frame position in an embodiment
  • FIG. 4 is a schematic diagram of centering a face key point and a face coordinate frame position in one embodiment
  • FIG. 5 is a schematic diagram of zooming a face key point in an embodiment
  • FIG. 6 is a schematic diagram of obtaining a coordinate position of a five-point point
  • FIG. 7 is a structural block diagram of a face key point tracking device in an embodiment
  • FIG. 8 is a structural block diagram of a face key point tracking device in another embodiment.
  • FIG. 1 is a schematic diagram showing the internal structure of a terminal in an embodiment.
  • the terminal includes a processor connected via a system bus, a storage medium, a memory and network interface, an image capture device, a display screen, a speaker, and an input device.
  • the storage medium of the terminal stores an operating system, and further includes a face key point tracking device, and the face key point tracking device is used to implement a face key point tracking method.
  • the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
  • the memory in the terminal provides an environment for the operation of the face key tracking device in the storage medium, and the network interface is used for network communication with the server, such as sending a video file request to the server, receiving a video file returned by the server, and the like.
  • the image acquisition device of the terminal can capture an external image, such as a camera to capture an image.
  • the display screen can be a liquid crystal display or an electronic ink display screen
  • the input device can be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the terminal housing, or an external keyboard. Touchpad or mouse.
  • the terminal can be a cell phone, a tablet or a personal digital assistant.
  • FIG. 2 is a flow chart of a method for tracking a face key point in an embodiment. As shown in FIG. 2, a face key point tracking method can be run on the terminal in FIG. 1, including:
  • Step 202 Read a frame image in the video file.
  • the video file may be an online video file or a video file downloaded on the terminal.
  • Online video files can be played while playing.
  • the video files downloaded on the terminal can also be played while playing.
  • the video image is played one frame at a time, and each frame image can be captured for processing.
  • the image of the certain frame may be the first frame image of the video file, or may be other frame images.
  • Step 204 Detect a face position in the image of the one frame, and obtain a position of the face coordinate frame.
  • the step of detecting a face position in the image of the one frame and obtaining the position of the face coordinate frame includes: detecting a face position in a frame image by using a face detection technology, and acquiring a face coordinate frame position.
  • the face detection technology inputs an image including a face map, and can detect a rectangular coordinate frame position of the face.
  • Face detection technology mainly uses Robust Real-Time Face Detection. Face detection can be realized by Haar-Like feature and Adaboost algorithm.
  • Haar-Like features are used to represent faces, and Haar-Like features are trained to obtain weak classifiers.
  • Adaboost algorithm is used to select multiple representative.
  • the weak classifier of the face constitutes a strong classifier, and several strong classifiers are connected in series to form a cascaded classifier of a cascade structure, that is, a face detector.
  • each Haar-Like feature considers the face image information of the reference frame and a field frame.
  • Face detection can also be implemented by using Multi-sale Block based Local Binary Patterns (MBLBP) and Adaboost algorithms.
  • MBLBP Multi-sale Block based Local Binary Patterns
  • Adaboost algorithms the MBLBP feature, which can represent the face frame information of the reference frame and the eight field frames, is used to represent the face, and the MBLBP feature is calculated by comparing the average gray level of the reference frame with the average gray level of each of the eight field frames.
  • MSOF Multi-scale Structured Ordinal Features
  • Adaboost algorithms can also be used to implement face detection.
  • the MSOF feature which can represent the face image information of the reference frame and the 8 field frames, represents the face, and the distance between the 8 field frames and the reference frame is adjustable, and the reference frame and the 8 field frames may not be connected.
  • the face and non-face images can also be collected as a training sample set, and the flexible block based local binary pattern (FBLBP) features of the face and non-face images are extracted to form the FBLBP feature set.
  • FBLBP flexible block based local binary pattern
  • the first classifier is obtained.
  • the first layer classifier includes several optimal second classifiers, and each optimal second classifier is trained by the GentleBoost algorithm.
  • the first classifier is a strong classifier and the second classifier is a weak classifier.
  • the weak classifier is accumulated to obtain a strong classifier.
  • the multi-layer first classifier is cascaded into an adult face detector.
  • the face detector is used to detect the face position in the first frame image or other frame image, and the face coordinate frame position is obtained.
  • the coordinates of the face coordinate frame are the coordinate system with the upper left corner as the coordinate origin when the terminal screen is displayed, and the coordinate system established by the horizontal axis for the X axis and the vertical direction for the Y axis. It is not limited to this, and other custom methods can be used to establish the coordinate system.
  • Step 206 Configure an initial position of the face key point according to the face coordinate frame position.
  • configuring the initial position of the face key point according to the face coordinate frame position includes:
  • Step 302 Align the pre-stored face key point and the face coordinate frame position center by panning the pre-stored face key point.
  • the pre-stored face key has a center
  • the face coordinate frame position also has a center
  • the center of the pre-stored face key point and the face coordinate frame position center are coincident, that is, the center is aligned.
  • Step 304 Scale the pre-stored face key point so that the pre-stored face key point size is consistent with the face coordinate frame size.
  • the face key point size is the same as the face face frame size by scaling the face key point.
  • the face key points and the face coordinate frame positions are matched, and the initial position of the face key points of one frame image is obtained, the calculation amount is small, and the operation is simple.
  • Step 208 Acquire a coordinate position of a face key point according to an initial position of the face key point.
  • the step of acquiring the coordinate position of the face key point according to the initial position of the face key point comprises: acquiring the face key point coordinate position according to the initial position of the face key point by using the face key point positioning technology.
  • the face key point positioning technology refers to inputting a face image, a face key initial position, and obtaining a face key point coordinate position.
  • the face key point coordinate position refers to the two-dimensional coordinate value of a plurality of points.
  • the key point positioning of the face is based on the face detection, further positioning the eyes, eyebrows, nose, mouth, outline, etc. of the face, mainly by using the information near the key points and the relationship between the key points to locate.
  • Face keypoint localization techniques employ regression-based algorithms such as Face Alignment by Explicit Shape Regression.
  • Face Alignment by Explicit Shape Regression uses a two-level boosted regressor. The first level is 10 and the second is 500. In this secondary structure, each node in the first level is a cascade of 500 weak classifiers, that is, a second layer of regressor. In the regressor in the second layer, the features remain unchanged, and in the first layer, the features are changed. At the first level, the output of each node is the input of the previous node.
  • Fern as the original regressor. Fern is a combination of N features and thresholds, dividing the training sample into 2 F-power bins. Each bin corresponds to an output y b , ie
  • the final output is a linear combination of all training samples.
  • the shape index feature is used to obtain the pixel value of the position according to the position of the key point and an offset, and then calculate the difference between the two such pixels, thereby obtaining the shape index feature.
  • the local coordinate is used instead of the global coordinate system, which greatly enhances the robustness of the feature.
  • face keypoint location can include (1)(2)(3):
  • a plurality of positioning results are obtained on the input face image using the trained plurality of positioning models.
  • Each positioning result includes multiple face keypoint locations.
  • the location of the face key points includes the position of the eyes, eyebrows, nose, mouth, and outline.
  • the positioning model A can be obtained by training on the training set C (C 1 - C K ).
  • the face image samples of the training set C 1 -C K can be classified into different types according to factors such as expression, age, race, and identity. In this way, the positioning model A can be trained according to different types.
  • the average value S 0 of the positions of all the key points in the training set C is counted, which is called the average key point position.
  • indicates the number of samples in the training set C, and the average key position S 0 can be obtained by the following formula:
  • the average key position S 0 is placed in the middle of the image, and then the scale invariant feature transform (SIFT) feature of each key point position of the average key point position S 0 is extracted,
  • SIFT scale invariant feature transform
  • the average key point position S 0 is first placed in the middle of the input image, and then the SIFT feature of the S 0 key point positions is extracted to obtain the feature vector f.
  • the set of positioning results S including K positioning results can be obtained by the following equation.
  • a Boost classifier can be trained for each key point, so that L classifiers h 1 , h 2 , ... h L can be obtained .
  • These L classifiers can form an evaluation model E.
  • an image block that is closer to a key point in the face image of the training set C may be used as a positive sample.
  • a distance from the key point for example, the center position and key of the image block
  • the image block whose distance of the point position exceeds the second predetermined distance is used as a negative sample to train the key point classifier.
  • the image blocks of a predetermined size centered on the respective key point positions (x j , y j ) are respectively input to the corresponding key point classifiers h j , thereby obtaining a score h j ( x j , y j ). From this, the S j scores of all the key point classifiers for the positioning result of this key point can be obtained, and then the average score of the positioning result is obtained:
  • the scores of each of the K positioning results S 1 , S 2 , . . . , S K can be obtained, and the optimal positioning result S*, that is, the highest ranked positioning result, is selected as the final face key.
  • the location result of the point location is selected.
  • the input image corresponding to the positioning result S* may be added to the training set C, and a predetermined number of positive sample image blocks and negative sample images are generated by using the L key position positions of the positioning result S*. blocks, then the image block generation with a positive sample and a negative sample image block classifier trained L key points h 1, h 2, ... h L, thereby updating E. evaluation model
  • the online AdaBoost method can be used to train the keypoint classifiers h 1 , h 2 , . . . h L .
  • the type of the positioning model corresponding to the positioning result S* is determined. Specifically, the online K-means method can be used to find the type to which S* belongs based on the SIFT feature vector f corresponding to the positioning result S*. If S * is determined such a class K A k positioning currently available models, it is added to the training set corresponds to A k C k, positioning and model training method based on the aforementioned model retrained Location A k , thereby updating the positioning model A k .
  • a corresponding training set C K+1 is created.
  • the number of samples in the new training set C K+1 exceeds a threshold, use it to train the new positioning model A K+1 .
  • the original K positioning models can be added to K+1 positioning models. After the positioning model is added, the positioning results are increased from the original K to K+1.
  • F is used to represent a matrix consisting of all sample feature vectors f of the sample pictures in training set C, the ith row of F represents the eigenvector of the ith sample; and S represents the key of manual calibration of all samples in training set C.
  • a matrix consisting of point positions the ith row of S represents the key position of the i-th sample; the S 0 represents the matrix of the average key position of all samples in the training set C, and the ith row of S 0 represents the ith The average key position of the sample.
  • f im represents the value of the mth dimension of the feature vector of the i-th sample in the training set C;
  • Sin represents the value of the nth dimension of the manually-calibrated key point position of the i-th sample in the training set C;
  • the value of the mth dimension of the feature vector of the newly added sample the value of the nth dimension of the key point of the manual calibration of the newly added sample; and the value of the nth dimension of the average keypoint position of the newly added sample.
  • the above-mentioned face key point positioning technology is used to obtain the coordinate position of the face key point according to the initial position of the face key point.
  • Step 210 Read an adjacent next frame image in the video file.
  • the adjacent next frame image of the last processed one frame image in the video file is read.
  • Step 212 The coordinate position of the face key point of the image of the previous frame is taken as the initial position of the face key point of the adjacent next frame image.
  • Step 214 Acquire a face key point coordinate position of the next frame image according to an initial position of a face key point of the next frame image.
  • the step of acquiring the coordinate position of the face key point of the next frame image according to the initial position of the face key point of the next frame image comprises: using the face key point positioning technology according to the image of the next frame image The initial position of the face key point acquires the face key point coordinate position of the next frame image.
  • step 216 it is determined that the video file is processed, and if so, the process ends. If not, the process returns to step 210.
  • steps 210 to 214 are repeatedly executed until the application exits or the video file is processed.
  • the key points of the face include the five sense points.
  • the five sense points include eyes, eyebrows, nose, mouth, and ears. Using the five-point tracking, the calculation amount is small, which can improve the tracking efficiency.
  • the above-mentioned face key point tracking method configures the initial position of the face key point through the position of the face coordinate frame, and then acquires the coordinate position of the face key point according to the initial position of the face key point, and reads the next frame image, and the previous one is
  • the coordinate position of the face key point of the frame image is used as the initial position of the face key point of the next frame image, and the coordinate position of the face key point of the next frame image is obtained, thereby skipping the face detector detection, and the face can be improved Key point tracking efficiency.
  • the above-mentioned face key point tracking method can save a lot of calculations, facilitate the mobile terminal to quickly perform face tracking, and improve the efficiency of face key point tracking.
  • the above-described face key point tracking method may perform denoising processing on the read one frame image after reading one frame image or adjacent next frame image in the video file.
  • the image is sharpened by denoising to facilitate more accurate tracking of faces.
  • the read one frame image may be subjected to denoising processing by using an average weighting method, that is, all pixels in the image are processed by using an average weight.
  • Face off The key points are exemplified by the five sense points.
  • one frame image in the video file is read, the face position in one frame image is detected, and the face coordinate frame position 410 is obtained, and the center of the pre-stored face key point 420 and the face coordinate frame position are obtained.
  • the center of the 410 is aligned.
  • the pre-stored face key point 420 is scaled so that the size of the face key point and the face coordinate frame are The same size, so get the initial position of the face key points.
  • the coordinate position of the face key point is obtained according to the initial position of the face key point, as shown by the intersection point "x" in FIG. And reading the adjacent next frame image in the video file; taking the coordinate position of the face key point of the previous frame image as the initial position of the face key point of the next frame image; according to the face key point of the next frame image
  • the initial position acquires the face key point coordinate position of the next frame image.
  • FIG. 7 is a structural block diagram of a face key point tracking device in an embodiment. As shown in FIG. 7, a face key tracking device runs on the terminal, and includes a reading module 702, a detecting module 704, a configuration module 706, and an obtaining module 708. among them:
  • the reading module 702 is configured to read a frame of images in the video file.
  • the video file may be an online video file or a video file downloaded on the terminal.
  • Online video files can be played while playing.
  • the video files downloaded on the terminal can also be played while playing.
  • the detecting module 704 is configured to detect a face position in the image of the frame and obtain a position of the face coordinate frame.
  • the detecting module 704 detects a face position in a frame image by using a face detection technology, and acquires a face coordinate frame position.
  • the face detection technology inputs an image including a face map, and can detect a rectangular coordinate frame position of the face.
  • the configuration module 706 is configured to configure an initial position of the face key according to the face coordinate frame position.
  • the configuration module 706 is further configured to: by panning the pre-stored face key point, align the pre-stored face key point and the face coordinate frame position center; and zoom the pre-stored face key point to make the The pre-stored face key size is the same as the face coordinate frame size.
  • the pre-stored face key has a center
  • the face coordinate frame position also has a center
  • the center of the pre-stored face key point and the face coordinate frame position center are coincident, that is, the center is aligned.
  • the face key point size is the same as the face face frame size by scaling the face key point.
  • the obtaining module 708 is configured to obtain a face key point coordinate position according to the initial position of the face key point.
  • the obtaining module 708 is further configured to acquire a coordinate position of a face key point according to an initial position of the face key point by using a face key point positioning technology.
  • the face key point positioning technology refers to inputting a face image, a face key initial position, and obtaining a face key point coordinate position.
  • the face key point coordinate position refers to the two-dimensional coordinate value of a plurality of points.
  • the reading module 702 is further configured to read an adjacent next frame image in the video file.
  • the adjacent next frame image of the last processed one frame image in the video file is read.
  • the configuration module 706 is further configured to use the face key point coordinate position of the previous frame image as the initial position of the face key point of the next frame image.
  • the obtaining module 708 is further configured to acquire a face key point coordinate position of the next frame image according to an initial position of the face key point of the next frame image.
  • the obtaining module 708 is further configured to acquire a face key point coordinate position of the next frame image according to an initial position of the face key point of the next frame image by using a face key point positioning technology.
  • the key points of the face include the five sense points.
  • the five sense points include eyes, eyebrows, nose, mouth, and ears. Using the five-point tracking, the calculation amount is small, which can improve the tracking efficiency.
  • the above-mentioned face key point tracking device configures the initial position of the face key point through the face coordinate frame position, and then acquires the coordinate position of the face key point according to the initial position of the face key point, and reads the next frame.
  • the image takes the coordinate position of the face key point of the previous frame image as the initial position of the face key point of the next frame image, and obtains the coordinate position of the face key point of the next frame image, thereby skipping the face detector detection Can improve the efficiency of face tracking.
  • FIG. 8 is a structural block diagram of a face key point tracking device in another embodiment.
  • a face key tracking device runs on the terminal, and includes a reading module 702, a detecting module 704, a configuration module 706, and an obtaining module 708, and a denoising module 710. among them:
  • the denoising module 710 is configured to perform denoising processing on the read one frame image after reading one frame image or adjacent next frame image in the video file.
  • the image is sharpened by denoising to facilitate more accurate tracking of faces.
  • the read one frame image may be subjected to denoising processing by using an average weighting method, that is, all pixels in the image are processed by using an average weight.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸关键点跟踪方法包括:读取视频文件中一帧图像;检测所述一帧图像中人脸位置,获取人脸坐标框位置,并配置人脸关键点的初始位置;根据所述人脸关键点的初始位置获取人脸关键点坐标位置;读取视频文件中相邻下一帧图像;将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。

Description

人脸关键点跟踪方法、终端和非易失性计算机可读存储介质
本申请要求于2015年12月11日提交中国专利局、申请号为201510922450.0、发明名称为“人脸关键点跟踪方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像处理领域,特别是涉及一种人脸关键点跟踪方法和装置、终端和非易失性计算机可读存储介质。
背景技术
人脸跟踪是指在视频文件或视频流或图像序列中确定某个人脸的运动轨迹及大小变化的过程。人脸跟踪在图像分析和识别图像领域具有重大意义。人脸跟踪算法的鲁棒性和实时性是两个难以同时满足的指标,因为随着鲁棒性的提升,算法的复杂程度会大幅度增加,受限于有限的计算机处理能力,人脸跟踪的实时性必会降低。
在视频文件或视频流中为了实现人脸跟踪效果,需对每一帧做人脸检测和人脸关键点定位,如此人脸检测算法需要耗费大量时间,跟踪效率低。
发明内容
基于此,有必要提供一种人脸关键点跟踪方法,能节省时间,提高人脸跟踪效率。
此外,还有必要提供一种终端和非易失性计算机可读存储介质,能节省时间,提高人脸跟踪效率。
一种人脸关键点跟踪方法,包括:
读取视频文件中一帧图像;
检测所述一帧图像中人脸位置,获取人脸坐标框位置;
根据所述人脸坐标框位置配置人脸关键点的初始位置;
根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
重复执行如下步骤:
读取视频文件中相邻下一帧图像;
将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
一种终端,包括存储器及处理器,所述存储器中储存有计算机可读指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
读取视频文件中一帧图像;
检测所述一帧图像中人脸位置,获取人脸坐标框位置;
根据所述人脸坐标框位置配置人脸关键点的初始位置;
根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
重复执行如下步骤:
读取视频文件中相邻下一帧图像;
将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行以下步骤:
读取视频文件中一帧图像;
检测所述一帧图像中人脸位置,获取人脸坐标框位置;
根据所述人脸坐标框位置配置人脸关键点的初始位置;
根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
重复执行如下步骤:
读取视频文件中相邻下一帧图像;
将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中终端的内部结构示意图;
图2为一个实施例中人脸关键点跟踪方法的流程图;
图3为一个实施例中根据该人脸坐标框位置配置人脸关键点的初始位置的具体步骤流程图;
图4为一个实施例中将人脸关键点与人脸坐标框位置中心对齐的示意图;
图5为一个实施例中缩放人脸关键点的示意图;
图6为获取五官点坐标位置的示意图;
图7为一个实施例中人脸关键点跟踪装置的结构框图;
图8为另一个实施例中人脸关键点跟踪装置的结构框图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及 实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
图1为一个实施例中终端的内部结构示意图。如图1所示,该终端包括通过***总线连接的处理器、存储介质、内存和网络接口、图像采集装置、显示屏、扬声器和输入装置。其中,终端的存储介质存储有操作***,还包括一种人脸关键点跟踪装置,该人脸关键点跟踪装置用于实现一种人脸关键点跟踪方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存为存储介质中的人脸关键点跟踪装置的运行提供环境,网络接口用于与服务器进行网络通信,如发送视频文件请求至服务器,接收服务器返回的视频文件等。终端的图像采集装置可采集外部图像,例如摄像头拍摄图像等。显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑或者个人数字助理。本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图2为一个实施例中人脸关键点跟踪方法的流程图。如图2所示,一种人脸关键点跟踪方法,可运行于图1中的终端上,包括:
步骤202,读取视频文件中一帧图像。
具体地,视频文件可为在线视频文件或下载在终端上的视频文件。在线视频文件可以一边播放,一边读取。下载在终端上的视频文件也可以一边播放,一边读取。
视频文件在播放时视频图像是一帧一帧的进行播放,可以抓取每一帧图像进行处理。首先,读取视频文件中的其中某一帧图像进行处理。该某一帧图像可为视频文件的第一帧图像,也可为其他帧图像。
步骤204,检测该一帧图像中人脸位置,获取人脸坐标框位置。
本实施例中,检测该一帧图像中人脸位置,获取人脸坐标框位置的步骤包括:利用人脸检测技术检测一帧图像中人脸位置,获取人脸坐标框位置。
具体地,人脸检测技术是输入包含人脸图的图像,可以检测出人脸的矩形坐标框位置。
人脸检测技术主要采用Robust Real-Time Face Detection(鲁棒实时人脸检测)。可通过Haar-Like特征与Adaboost算法实现人脸检测,在该方法中,采用Haar-Like特征表示人脸,对各Haar-Like特征进行训练得到弱分类器,通过Adaboost算法选择多个最能代表人脸的弱分类器构成强分类器,将若干个强分类器串联组成一个级联结构的层叠分类器,即人脸检测器。其中,每个Haar-Like特征考虑基准框与一个领域框的人脸图像信息。
也可采用多尺度块状局部二值模式(Multi-sale Block based Local Binary Patterns,MBLBP)特征与Adaboost算法实现人脸检测。该方法中采用可表示基准框与8个领域框的人脸图像信息的MBLBP特征表示人脸,通过比较基准框的平均灰度与8个领域框各自的平均灰度计算MBLBP特征。
也可采用多尺度的结构化定序测量特征(Multi-scale Structured Ordinal Features,MSOF)与Adaboost算法实现人脸检测。该方法中采用可表示基准框与8个领域框的人脸图像信息的MSOF特征表示人脸,8个领域框相对于基准框的距离可调,且基准框与8个领域框可以不相连。
也可采用采集人脸和非人脸图像作为训练样本集,提取人脸和非人脸图像的弹性的块状局部二值模式(Flexible Block Based Local Binary Patterns,FBLBP)特征构成FBLBP特征集。利用FBLBP特征和GentleBoost算法进行训练,得到第一分类器,第一层分类器包括若干个最优第二分类器,每个最优第二分类器通过GentleBoost算法训练所得。第一分类器为强分类器,第二分类器为弱分类器。将弱分类器累加得到强分类器。将多层第一分类器级联成人脸检测器。采用人脸检测器检测第一帧图像或其他帧图像中的人脸位置,获取人脸坐标框位置。
人脸坐标框的坐标是以终端屏幕显示时左上角作为坐标原点,以横向为X轴和纵向为Y轴建立的坐标系,不限于此,也可采用其他自定义方式建立坐标系。
步骤206,根据该人脸坐标框位置配置人脸关键点的初始位置。
在一个实施例中,如图3所示,根据该人脸坐标框位置配置人脸关键点的初始位置包括:
步骤302,通过平移预存的人脸关键点,使该预存的人脸关键点和该人脸坐标框位置中心对齐。
具体地,预存的人脸关键点有中心,人脸坐标框位置也有中心,将预存的人脸关键点的中心与人脸坐标框位置中心重合在一起,即中心对齐。
步骤304,缩放该预存的人脸关键点,使该预存的人脸关键点尺寸与该人脸坐标框尺寸一致。
具体地,当预存的人脸关键点和人脸坐标框位置中心重合在一起后,通过缩放人脸关键点,使得人脸关键点尺寸与人脸坐标框尺寸一样。
通过平移和缩放人脸关键点,使得人脸关键点和人脸坐标框位置匹配,得到一帧图像的人脸关键点的初始位置,计算量小,操作简单。
步骤208,根据该人脸关键点的初始位置获取人脸关键点坐标位置。
本实施例中,根据该人脸关键点的初始位置获取人脸关键点坐标位置的步骤包括:利用人脸关键点定位技术根据该人脸关键点的初始位置获取人脸关键点坐标位置。
具体地,人脸关键点定位技术是指输入包含人脸图像、人脸关键点初始位置,得到人脸关键点坐标位置。人脸关键点坐标位置是指多个点的二位坐标值。
人脸关键点定位是在人脸检测基础上,进一步定位人脸的眼睛、眉毛、鼻子、嘴巴、轮廓等,主要是利用关键点附近的信息以及各个关键点之间的相互关系来定位。人脸关键点定位技术采用基于回归的算法,如Face Alignment by Explicit Shape Regression。Face Alignment by Explicit Shape  Regression使用了一个两级的boosted regressor。使用了第一层10级,第二层500级。在这个二级结构中,第一级中每个节点都是500个弱分类器的级联,也就是一个第二层的regressor中。在第二层中regressor中,特征是保持不变的,在第一层中,特征是变化的。在第一层,每一个节点的输出都是上一个节点的输入。
fern作为原始regressor。fern是N个特征和阈值的组合,将训练样本划分为2的F次幂个bins。每一个bin对应一个输出yb,即
Figure PCTCN2016081631-appb-000001
这里β是过拟合系数,|Ωb|是当前bin中样本总数。这样,最后的输出就是一个所有训练样本的线性组合。并采用shape index feature,就是根据关键点的位置和一个偏移量,取得该位置的像素值,然后计算两个这样的像素的差值,从而得到了形状索引特征。该方法中采用的是局部坐标而非全局坐标系,极大的增强了特征的鲁棒性。
此外,人脸关键点定位可包括(1)(2)(3):
(1)使用训练的多个定位模型在输入的人脸图像上得到多个定位结果。每一个定位结果包括多个人脸关键点位置。人脸关键点位置包括眼睛、眉毛、鼻子、嘴巴、轮廓的位置。
假设采用K个定位模型A1~AK,这K个定位模型的集合被表示为A。将输入的人脸图像与K个定位模型对齐,用(x,y)表示像素在图像上的位置,从而得到的K个定位结果,分别表示为S1,S2,...,SK,每个定位结果S中具有L个人脸关键点位置,因此,S可表示为:S={x1y1,x2,y2,…,xL,yL}。
可通过在训练集C(C1~CK)上训练得到定位模型A。每一个训练集CK具有大量人脸图像样本的集合,训练集CK中的每一张人脸图像样本Ii上标定 了L个关键点位置,即Si={xi1,yi1,xi2,yi2,…,xiL,yiL}。
训练集C1~CK的人脸图像样本可根据表情、年龄、人种、身份等因素被分类为不同类型。如此,可以根据不同的类型来训练得到定位模型A。
在训练定位模型A时,首先统计训练集C中全部样本关键点位置的平均值S0,称为平均关键点位置。|C|表示训练集C中样本的数量,则可通过以下的公式来得到平均关键点位置S0
Figure PCTCN2016081631-appb-000002
对于训练集C中每一张人脸图像样本Ii,将平均关键点位置S0放置在图像中间,然后提取平均关键点位置S0的各个关键点位置的尺度不变特征变换(SIFT)特征,将提取的SIFT特征拼接成特征向量fi。这样,可以根据训练集C中的全部样本图片建立一个回归模型,使得
fi·A=Si-S0             公式(2)
对于一张输入的需要定位的人脸图像,首先将平均关键点位置S0放置在该输入图像的中间,然后提取S0个关键点位置的SIFT特征拼接得到特征向量f。可通过以下等式得到包括K个定位结果的定位结果集合S。
S=S0+f·A          公式(3)
通过上述方式,可从训练的多个定位模型得到关于输入图像的关键点位置的多个定位结果。
(2)对得到的多个定位结果进行评价,从中选择出最优的定位结果。
训练集C中的人脸图像样本Ii上标定了L个关键点的位置Si={xi1,yi1,xi2,yi2,…,xiL,yiL}。可针对每一个关键点训练一个Boost分类器,从而可得到L个分类器h1,h2,...hL。这L个分类器可形成评价模型E。
在训练分类器时,可使用训练集C的人脸图像中距离关键点位置较近的图像块(例如,图像块的中心位置与关键点位置的距离在第一预定距离以内)作为正样本,并使用距离关键点位置较远(例如,图像块的中心位置与关键 点位置的距离超过第二预定距离)的图像块作为负样本来训练关键点分类器。
关键点定位结果Si进行评价时,将以各个关键点位置(xj,yj)为中心的预定大小的图像块分别输入到对应的关键点分类器hj,从而得到一个评分hj(xj,yj)。由此可得到全部关键点分类器针对此关键点定位结果的Sj评分,然后得到该定位结果的平均评分:
Figure PCTCN2016081631-appb-000003
可以得到K个定位结果S1,S2,…,SK中的每个定位结果的评分,并从中选择最优的定位结果S*,即,评分最高的定位结果,作为最终的人脸关键点位置的定位结果。
(3)得到的最优定位结果S*的评分超过预定阈值T,则利用最优定位结果更新评价模型和/或定位模型。
具体地,在更新评价模型时,可将与定位结果S*对应的输入图像加入到训练集C中,利用定位结果S*的L个关键点位置生成预定数量的正样本图像块和负样本图像块,然后利用生成的正样本图像块和负样本图像块来训练L个关键点的分类器h1,h2,...hL,从而可更新评价模型E。例如,根据本发明的实施例,可使用在线AdaBoost方法来训练关键点分类器h1,h2,...hL
在更新定位模型时,当确定存在评分超过预定阈值的新的定位结果S*时,确定与定位结果S*对应的定位模型的类型。具体地,可基于与定位结果S*对应的SIFT特征向量f,利用在线K均值方法查找S*所属的类型。如果确定S*属于当前已有的K个定位模型中的某一类Ak,则将其加入与Ak对应的训练集Ck,并基于前面所述的训练定位模型的方法重新训练定位模型Ak,从而更新定位模型Ak
如果确定S*不属于当前已有的K类的定位模型中的某一类,则新建一个对应的训练集CK+1。当新增训练集CK+1中的样本数量超过一门限时,使用其训练新的定位模型AK+1。这样,可从原有的K个定位模型增加到K+1个定位模型,在增加定位模型之后,定位结果从原来的K个增加为K+1个。
用F来表示为训练集C中的样本图片的全部样本特征向量f组成的矩阵,F的第i行表示第i个样本的特征向量;用S表示训练集C中全部样本的人工标定的关键点位置组成的矩阵,S的第i行表示第i个样本的关键点位置;用S0表示训练集C中全部样本的平均关键点位置组成的矩阵,S0的第i行表示第i个样本的平均关键点位置。则在更新之前的原有定位模型A满足以下等式:
F·A=S-S0
可以通过最小二乘方式求解A:
A=(FTF)-1·F·(S--S0)
其中协方差矩阵:
Covxx=FTF,Covxy=F·(S-S0)
Covxx和Covxy的第m行第n列的元素可以表示为:
Figure PCTCN2016081631-appb-000004
其中,fim表示训练集C中第i样本的特征向量的第m维的值;Sin表示训练集C中第i样本的人工标定的关键点位置的第n维的值;表示训练集C中第i个样本的平均关键点位置的第n维的值。
当新增样本s*时,可如以下的等式所示更新协方差矩阵的元素:
Figure PCTCN2016081631-appb-000005
其中,表示新增样本的特征向量的第m维的值;表示新增样本的人工标定的关键点的第n维的值;表示新增样本的平均关键点位置的第n维的值。
采用上述的人脸关键点定位技术根据人脸关键点的初始位置获取人脸关键点坐标位置。
步骤210,读取视频文件中相邻下一帧图像。
具体地,读取视频文件中上一被处理的一帧图像的相邻下一帧图像。
步骤212,将上一帧图像的人脸关键点坐标位置作为相邻下一帧图像的人脸关键点的初始位置。
步骤214,根据该下一帧图像的人脸关键点的初始位置获取该下一帧图像的人脸关键点坐标位置。
本实施例中,根据该下一帧图像的人脸关键点的初始位置获取该下一帧图像的人脸关键点坐标位置的步骤包括:利用人脸关键点定位技术根据该下一帧图像的人脸关键点的初始位置获取该下一帧图像的人脸关键点坐标位置。
步骤216,判断视频文件处理完毕,若是,则结束,若否,则返回步骤210。
具体地,重复执行步骤210至214,直到应用退出或视频文件处理完毕。
该人脸关键点包括五官点。五官点包括眼睛、眉毛、鼻子、嘴巴、耳朵。采用五官点跟踪,计算量小,可提高跟踪效率。
上述人脸关键点跟踪方法,通过人脸坐标框位置配置人脸关键点的初始位置,再根据人脸关键点的初始位置获取人脸关键点坐标位置,读取下一帧图像,将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置,得到下一帧图像的人脸关键点坐标位置,以此跳过人脸检测器检测,可提高人脸关键点跟踪的效率。
此外,因移动终端的数据处理能力有限,采用上述人脸关键点跟踪方法,可节省大量的计算,方便移动终端快速进行人脸跟踪,提高人脸关键点跟踪的效率。
在一个实施例中,上述人脸关键点跟踪方法,在读取视频文件中的一帧图像或相邻下一帧图像后,可对读取的一帧图像进行去噪处理。通过去噪处理提高图像的清晰度,方便更加准确的跟踪人脸。
具体地,可对读取的一帧图像采用平均加权方法进行去噪处理,即对图像中所有像素采用平均加权进行处理。
下面结合具体的应用场景描述人脸关键点跟踪方法的实现过程。人脸关 键点以五官点为例。如图4所示,读取视频文件中一帧图像,检测到一帧图像中人脸位置,并获取人脸坐标框位置410,将预存的人脸关键点420的中心与人脸坐标框位置410的中心对齐。如图5所示,当预存的人脸关键点420的中心与人脸坐标框位置410的中心对齐后,缩放预存的人脸关键点420,使得人脸关键点的尺寸与人脸坐标框的尺寸一样,如此得到人脸关键点的初始位置。如图6所示,根据人脸关键点的初始位置获取人脸关键点坐标位置,即五官点的坐标位置,如图6中交叉点“x”所示。再读取视频文件中相邻下一帧图像;将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;根据下一帧图像的人脸关键点的初始位置获取下一帧图像的人脸关键点坐标位置。
图7为一个实施例中人脸关键点跟踪装置的结构框图。如图7所示,一种人脸关键点跟踪装置,运行于终端上,包括读取模块702、检测模块704、配置模块706和获取模块708。其中:
读取模块702用于读取视频文件中一帧图像。
具体地,视频文件可为在线视频文件或下载在终端上的视频文件。在线视频文件可以一边播放,一边读取。下载在终端上的视频文件也可以一边播放,一边读取。
检测模块704用于检测该一帧图像中人脸位置,获取人脸坐标框位置。
本实施例中,检测模块704利用人脸检测技术检测一帧图像中人脸位置,获取人脸坐标框位置。
具体地,人脸检测技术是输入包含人脸图的图像,可以检测出人脸的矩形坐标框位置。
配置模块706用于根据该人脸坐标框位置配置人脸关键点的初始位置。
本实施例中,配置模块706还用于通过平移预存的人脸关键点,使该预存的人脸关键点和该人脸坐标框位置中心对齐;以及缩放该预存的人脸关键点,使该预存的人脸关键点尺寸与该人脸坐标框尺寸一致。
具体地,预存的人脸关键点有中心,人脸坐标框位置也有中心,将预存的人脸关键点的中心与人脸坐标框位置中心重合在一起,即中心对齐。当预存的人脸关键点和人脸坐标框位置中心重合在一起后,通过缩放人脸关键点,使得人脸关键点尺寸与人脸坐标框尺寸一样。通过平移和缩放人脸关键点,使得人脸关键点和人脸坐标框位置匹配,得到一帧图像的人脸关键点的初始位置,计算量小,操作简单。
获取模块708用于根据该人脸关键点的初始位置获取人脸关键点坐标位置。
本实施例中,获取模块708还用于利用人脸关键点定位技术根据该人脸关键点的初始位置获取人脸关键点坐标位置。
具体地,人脸关键点定位技术是指输入包含人脸图像、人脸关键点初始位置,得到人脸关键点坐标位置。人脸关键点坐标位置是指多个点的二位坐标值。
重复执行如下过程:
读取模块702还用于读取视频文件中相邻下一帧图像。
具体地,读取视频文件中上一被处理的一帧图像的相邻下一帧图像。
配置模块706还用于将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置。
获取模块708还用于根据该下一帧图像的人脸关键点的初始位置获取该下一帧图像的人脸关键点坐标位置。
本实施例中,获取模块708还用于利用人脸关键点定位技术根据该下一帧图像的人脸关键点的初始位置获取该下一帧图像的人脸关键点坐标位置。
直到应用退出或视频文件处理完毕。
该人脸关键点包括五官点。五官点包括眼睛、眉毛、鼻子、嘴巴、耳朵。采用五官点跟踪,计算量小,可提高跟踪效率。
上述人脸关键点跟踪装置,通过人脸坐标框位置配置人脸关键点的初始位置,再根据人脸关键点的初始位置获取人脸关键点坐标位置,读取下一帧 图像,将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置,得到下一帧图像的人脸关键点坐标位置,以此跳过人脸检测器检测,可提高人脸关键点跟踪的效率。
图8为另一个实施例中人脸关键点跟踪装置的结构框图。如图8所示,一种人脸关键点跟踪装置,运行于终端上,除了包括读取模块702、检测模块704、配置模块706和获取模块708,还包括去噪模块710。其中:
去噪模块710用于在读取视频文件中的一帧图像或相邻下一帧图像后,可对读取的一帧图像进行去噪处理。通过去噪处理提高图像的清晰度,方便更加准确的跟踪人脸。
具体地,可对读取的一帧图像采用平均加权方法进行去噪处理,即对图像中所有像素采用平均加权进行处理。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (18)

  1. 一种人脸关键点跟踪方法,包括:
    读取视频文件中一帧图像;
    检测所述一帧图像中人脸位置,获取人脸坐标框位置;
    根据所述人脸坐标框位置配置人脸关键点的初始位置;
    根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    重复执行如下步骤:
    读取视频文件中相邻下一帧图像;
    将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  2. 根据权利要求1所述的方法,其特征在于,所述检测所述一帧图像中人脸位置,获取人脸坐标框位置的步骤包括:
    利用人脸检测技术检测所述一帧图像中人脸位置,获取人脸坐标框位置。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述人脸坐标框位置配置人脸关键点的初始位置的步骤包括:
    通过平移预存的人脸关键点,使所述预存的人脸关键点和所述人脸坐标框位置中心对齐;
    缩放所述预存的人脸关键点,使所述预存的人脸关键点尺寸与所述人脸坐标框尺寸一致。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述人脸关键点的初始位置获取人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  5. 根据权利要求1所述的方法,其特征在于,所述人脸关键点包括五官点。
  6. 根据权利要求1所述的方法,其特征在于,还包括:
    在读取视频文件中的一帧图像或相邻下一帧图像后,对读取的一帧图像或相邻下一帧图像进行去噪处理。
  7. 一种终端,包括存储器及处理器,所述存储器中储存有计算机可读指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
    读取视频文件中一帧图像;
    检测所述一帧图像中人脸位置,获取人脸坐标框位置;
    根据所述人脸坐标框位置配置人脸关键点的初始位置;
    根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    重复执行如下步骤:
    读取视频文件中相邻下一帧图像;
    将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  8. 根据权利要求7所述的终端,其特征在于,所述检测所述一帧图像中人脸位置,获取人脸坐标框位置的步骤包括:
    利用人脸检测技术检测所述一帧图像中人脸位置,获取人脸坐标框位置。
  9. 根据权利要求7所述的终端,其特征在于,所述根据所述人脸坐标框位置配置人脸关键点的初始位置的步骤包括:
    通过平移预存的人脸关键点,使所述预存的人脸关键点和所述人脸坐标框位置中心对齐;
    缩放所述预存的人脸关键点,使所述预存的人脸关键点尺寸与所述人脸 坐标框尺寸一致。
  10. 根据权利要求7所述的终端,其特征在于,所述根据所述人脸关键点的初始位置获取人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  11. 根据权利要求7所述的终端,其特征在于,所述人脸关键点包括五官点。
  12. 根据权利要求7所述的终端,其特征在于,还包括:
    在读取视频文件中的一帧图像或相邻下一帧图像后,对读取的一帧图像或相邻下一帧图像进行去噪处理。
  13. 一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行以下步骤:
    读取视频文件中一帧图像;
    检测所述一帧图像中人脸位置,获取人脸坐标框位置;
    根据所述人脸坐标框位置配置人脸关键点的初始位置;
    根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    重复执行如下步骤:
    读取视频文件中相邻下一帧图像;
    将上一帧图像的人脸关键点坐标位置作为下一帧图像的人脸关键点的初始位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  14. 根据权利要求13所述的非易失性计算机可读存储介质,其特征在于,所述检测所述一帧图像中人脸位置,获取人脸坐标框位置的步骤包括:
    利用人脸检测技术检测所述一帧图像中人脸位置,获取人脸坐标框位置。
  15. 根据权利要求13所述的非易失性计算机可读存储介质,其特征在于,所述根据所述人脸坐标框位置配置人脸关键点的初始位置的步骤包括:
    通过平移预存的人脸关键点,使所述预存的人脸关键点和所述人脸坐标框位置中心对齐;
    缩放所述预存的人脸关键点,使所述预存的人脸关键点尺寸与所述人脸坐标框尺寸一致。
  16. 根据权利要求13所述的非易失性计算机可读存储介质,其特征在于,所述根据所述人脸关键点的初始位置获取人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述人脸关键点的初始位置获取人脸关键点坐标位置;
    根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置的步骤包括:
    利用人脸关键点定位技术根据所述下一帧图像的人脸关键点的初始位置获取所述下一帧图像的人脸关键点坐标位置。
  17. 根据权利要求13所述的非易失性计算机可读存储介质,其特征在于,所述人脸关键点包括五官点。
  18. 根据权利要求13所述的非易失性计算机可读存储介质,其特征在于,还包括:
    在读取视频文件中的一帧图像或相邻下一帧图像后,对读取的一帧图像或相邻下一帧图像进行去噪处理。
PCT/CN2016/081631 2015-12-11 2016-05-11 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质 WO2017096753A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US15/715,398 US10452893B2 (en) 2015-12-11 2017-09-26 Method, terminal, and storage medium for tracking facial critical area
US16/567,940 US11062123B2 (en) 2015-12-11 2019-09-11 Method, terminal, and storage medium for tracking facial critical area

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510922450.0A CN106874826A (zh) 2015-12-11 2015-12-11 人脸关键点跟踪方法和装置
CN201510922450.0 2015-12-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/715,398 Continuation US10452893B2 (en) 2015-12-11 2017-09-26 Method, terminal, and storage medium for tracking facial critical area

Publications (1)

Publication Number Publication Date
WO2017096753A1 true WO2017096753A1 (zh) 2017-06-15

Family

ID=59013685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/081631 WO2017096753A1 (zh) 2015-12-11 2016-05-11 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质

Country Status (3)

Country Link
US (2) US10452893B2 (zh)
CN (1) CN106874826A (zh)
WO (1) WO2017096753A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020259118A1 (en) * 2019-06-28 2020-12-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and device for image processing, method and device for training object detection model
CN113657462A (zh) * 2021-07-28 2021-11-16 讯飞智元信息科技有限公司 用于训练车辆识别模型的方法、车辆识别方法和计算设备

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924452B (zh) * 2015-06-26 2022-07-19 英特尔公司 用于图像中的脸部对准的组合形状回归
US10592729B2 (en) * 2016-01-21 2020-03-17 Samsung Electronics Co., Ltd. Face detection method and apparatus
CN106682598B (zh) * 2016-12-14 2021-02-19 华南理工大学 一种基于级联回归的多姿态的人脸特征点检测方法
CN108304758B (zh) * 2017-06-21 2020-08-25 腾讯科技(深圳)有限公司 人脸特征点跟踪方法及装置
CN108875506B (zh) * 2017-11-17 2022-01-07 北京旷视科技有限公司 人脸形状点跟踪方法、装置和***及存储介质
CN109508620A (zh) * 2018-08-01 2019-03-22 上海晨鱼网络科技有限公司 基于增强现实的化妆方法、***、电子终端及存储介质
CN108960206B (zh) * 2018-08-07 2021-01-22 北京字节跳动网络技术有限公司 视频帧处理方法和装置
CN109344742B (zh) 2018-09-14 2021-03-16 腾讯科技(深圳)有限公司 特征点定位方法、装置、存储介质和计算机设备
CN109241921A (zh) * 2018-09-17 2019-01-18 北京字节跳动网络技术有限公司 用于检测人脸关键点的方法和装置
CN111199165B (zh) * 2018-10-31 2024-02-06 浙江宇视科技有限公司 图像处理方法及装置
WO2020136795A1 (ja) * 2018-12-27 2020-07-02 日本電気株式会社 情報処理装置、情報処理方法、およびプログラム
CN109871760B (zh) * 2019-01-15 2021-03-26 北京奇艺世纪科技有限公司 一种人脸定位方法、装置、终端设备及存储介质
CN109858402B (zh) * 2019-01-16 2021-08-31 腾讯科技(深圳)有限公司 一种图像检测方法、装置、终端以及存储介质
CN109993067B (zh) * 2019-03-07 2022-01-28 北京旷视科技有限公司 面部关键点提取方法、装置、计算机设备和存储介质
CN110211181B (zh) * 2019-05-15 2021-04-23 达闼机器人有限公司 视觉定位的方法、装置、存储介质和电子设备
CN110544272B (zh) * 2019-09-06 2023-08-04 腾讯科技(深圳)有限公司 脸部跟踪方法、装置、计算机设备及存储介质
CN111242088B (zh) * 2020-01-22 2023-11-28 上海商汤临港智能科技有限公司 一种目标检测方法、装置、电子设备及存储介质
CN113409354A (zh) * 2020-03-16 2021-09-17 深圳云天励飞技术有限公司 人脸跟踪方法、装置及终端设备
CN111523467B (zh) * 2020-04-23 2023-08-08 北京百度网讯科技有限公司 人脸跟踪方法和装置
CN111582207B (zh) * 2020-05-13 2023-08-15 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN111695512B (zh) * 2020-06-12 2023-04-25 嘉应学院 一种无人值守文物监测方法及装置
CN112417985A (zh) * 2020-10-30 2021-02-26 杭州魔点科技有限公司 一种人脸特征点追踪方法、***、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321134A1 (en) * 2011-06-15 2012-12-20 Samsung Electornics Co., Ltd Face tracking method and device
CN103377367A (zh) * 2012-04-28 2013-10-30 中兴通讯股份有限公司 面部图像的获取方法及装置
CN103942542A (zh) * 2014-04-18 2014-07-23 重庆卓美华视光电有限公司 人眼跟踪方法及装置
CN104361332A (zh) * 2014-12-08 2015-02-18 重庆市科学技术研究院 一种用于疲劳驾驶检测的人脸眼睛区域定位方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7688988B2 (en) * 2004-06-17 2010-03-30 Fujifilm Corporation Particular image area partitioning apparatus and method, and program for causing computer to perform particular image area partitioning processing
US20050281464A1 (en) * 2004-06-17 2005-12-22 Fuji Photo Film Co., Ltd. Particular image area partitioning apparatus and method, and program for causing computer to perform particular image area partitioning processing
US7957555B2 (en) * 2006-02-08 2011-06-07 Fujifilm Corporation Method and apparatus for localizing an object part in digital image data by updating an initial position estimate based on a displacement of the object part
KR20080073933A (ko) * 2007-02-07 2008-08-12 삼성전자주식회사 객체 트래킹 방법 및 장치, 그리고 객체 포즈 정보 산출방법 및 장치
JP2009277027A (ja) * 2008-05-15 2009-11-26 Seiko Epson Corp 画像における顔の器官の画像に対応する器官領域の検出
US9396539B2 (en) * 2010-04-02 2016-07-19 Nokia Technologies Oy Methods and apparatuses for face detection
CN102136062B (zh) * 2011-03-08 2013-04-17 西安交通大学 一种基于多分辨lbp的人脸检索方法
US8983152B2 (en) * 2013-05-14 2015-03-17 Google Inc. Image masks for face-related selection and processing in images
CN104715227B (zh) * 2013-12-13 2020-04-03 北京三星通信技术研究有限公司 人脸关键点的定位方法和装置
CN106295476B (zh) * 2015-05-29 2019-05-17 腾讯科技(深圳)有限公司 人脸关键点定位方法和装置
US10778939B2 (en) * 2017-09-22 2020-09-15 Facebook, Inc. Media effects using predicted facial feature locations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321134A1 (en) * 2011-06-15 2012-12-20 Samsung Electornics Co., Ltd Face tracking method and device
CN103377367A (zh) * 2012-04-28 2013-10-30 中兴通讯股份有限公司 面部图像的获取方法及装置
CN103942542A (zh) * 2014-04-18 2014-07-23 重庆卓美华视光电有限公司 人眼跟踪方法及装置
CN104361332A (zh) * 2014-12-08 2015-02-18 重庆市科学技术研究院 一种用于疲劳驾驶检测的人脸眼睛区域定位方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020259118A1 (en) * 2019-06-28 2020-12-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and device for image processing, method and device for training object detection model
US11457138B2 (en) 2019-06-28 2022-09-27 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and device for image processing, method for training object detection model
CN113657462A (zh) * 2021-07-28 2021-11-16 讯飞智元信息科技有限公司 用于训练车辆识别模型的方法、车辆识别方法和计算设备

Also Published As

Publication number Publication date
US20200005022A1 (en) 2020-01-02
US20180018503A1 (en) 2018-01-18
US10452893B2 (en) 2019-10-22
US11062123B2 (en) 2021-07-13
CN106874826A (zh) 2017-06-20

Similar Documents

Publication Publication Date Title
WO2017096753A1 (zh) 人脸关键点跟踪方法、终端和非易失性计算机可读存储介质
CN109977262B (zh) 从视频中获取候选片段的方法、装置及处理设备
US8792722B2 (en) Hand gesture detection
US8750573B2 (en) Hand gesture detection
US10769496B2 (en) Logo detection
WO2021036059A1 (zh) 图像转换模型训练方法、异质人脸识别方法、装置及设备
US8805018B2 (en) Method of detecting facial attributes
WO2018196396A1 (zh) 基于一致性约束特征学习的行人再识别方法
WO2021051545A1 (zh) 基于行为识别模型的摔倒动作判定方法、装置、计算机设备及存储介质
JP6112801B2 (ja) 画像認識装置及び画像認識方法
WO2022021029A1 (zh) 检测模型训练方法、装置、检测模型使用方法及存储介质
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
Zakaria et al. Face detection using combination of Neural Network and Adaboost
Zhang et al. Weakly supervised human fixations prediction
Nag et al. A new unified method for detecting text from marathon runners and sports players in video (PR-D-19-01078R2)
WO2023109361A1 (zh) 用于视频处理的方法、***、设备、介质和产品
WO2023123923A1 (zh) 人体重识别方法、人体重识别装置、计算机设备及介质
Du et al. Precise glasses detection algorithm for face with in-plane rotation
CN116091946A (zh) 一种基于YOLOv5的无人机航拍图像目标检测方法
CN116543261A (zh) 用于图像识别的模型训练方法、图像识别方法设备及介质
JP2012048624A (ja) 学習装置、方法及びプログラム
CN111666976A (zh) 基于属性信息的特征融合方法、装置和存储介质
Saabni Facial expression recognition using multi Radial Bases Function Networks and 2-D Gabor filters
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质
Cho et al. A space-time graph optimization approach based on maximum cliques for action detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16871959

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29/10/2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16871959

Country of ref document: EP

Kind code of ref document: A1