CN111444875B

CN111444875B - Face tracking method, device, equipment and computer readable storage medium

Info

Publication number: CN111444875B
Application number: CN202010266059.0A
Authority: CN
Inventors: 万成涛; 谭泽汉; 陈彦宇; 马雅奇
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2024-05-03
Anticipated expiration: 2040-04-07
Also published as: CN111444875A

Abstract

The invention discloses a face tracking method, a face tracking device, face tracking equipment and a computer readable storage medium. The method comprises the following steps: detecting whether a human body appears in a human face detection area through a preset human body sensor; if the human body sensor detects that the human body appears in the human face detection area, detecting a video image of the first occurrence of the first human face in a real-time video stream corresponding to the human face detection area; and tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame after the video image of the first face appearing for the first time is detected. When the embodiment of the invention is used for tracking the human face, the human face area of the video image of the next frame is determined according to the human face area of the video image of the previous frame, and the cyclic traversal and the human face detection of all pixel points of each video image in the video stream are not needed, so that the problem of large operation amount caused by human face tracking can be effectively reduced, and the human face tracking effect can be ensured.

Description

Face tracking method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a face tracking method, device, apparatus, and computer readable storage medium.

Background

With the development of technology, face tracking has become a hotspot problem that is widely focused on by the public. Face tracking has great application potential in the aspects of intelligent monitoring, robots, human-computer interaction and the like. For example: in the application fields of video monitoring systems, video conferences, intelligent robots, identity recognition, image tracking and the like in public places, the face tracking technology is used as a basic technology for development and application. Face tracking is to accurately track a face based on face detection and accurate matching. The face detection technology aims at finding out the position of the area where the face is located in an image.

At present, common practice of face tracking is: and acquiring a video stream through a camera, traversing each frame of image in the video stream, and carrying out face detection on all pixel spaces of each frame of video image. In the process, the face area in each frame of video image is detected by continuously running the face detection algorithm, so that the aim of face tracking is achieved, but the face area is detected in the full-pixel space of the video image by continuously running the face detection algorithm, so that the operation amount of the face tracking process is large.

Disclosure of Invention

The main purpose of the embodiment of the invention is to provide a face tracking method, a device, equipment and a computer readable storage medium, so as to solve the problem that the operation amount of the face tracking process is large because the existing face tracking method adopts a full-pixel space to detect a face area.

Aiming at the technical problems, the embodiment of the invention is solved by the following technical scheme:

The embodiment of the invention provides a face tracking method, which comprises the following steps: detecting whether a human body appears in a human face detection area through a preset human body sensor; if the human body sensor detects that the human body appears in the human face detection area, detecting a video image of the first occurrence of the first human face in a real-time video stream corresponding to the human face detection area; the first face is the face closest to a camera for collecting the real-time video stream; and tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame after the video image of the first face appearing for the first time is detected.

After detecting the video image in which the first face appears for the first time, the method further comprises: and pausing to detect the video image of the next first face appearing first, and starting to detect the video image of the next first face appearing first when the human body sensor detects the human body appearing in the face detection area again.

Wherein, in the real-time video stream corresponding to the face detection area, detecting the video image of the first face appearing for the first time includes: and detecting a video image of the first face appearing in the real-time video stream corresponding to the face detection area by using a preset face detection algorithm based on geometric features.

Wherein, according to the face area of the first face in the previous frame of video image, tracking the face area of the first face in the next frame of video image comprises: determining dense characteristic weighting coefficients of pixel dense areas corresponding to each gray value in a face area of the first face in a previous frame of video image; and determining the face area of the first face in the video image of the next frame according to a preset Mean shift Mean-shift algorithm and dense characteristic weighting coefficients of the pixel dense areas corresponding to each gray value.

Wherein, in the face region of the first face in the previous frame of video image, determining a dense feature weighting coefficient of a pixel dense region corresponding to each gray value comprises: dividing the face area of the first face in the video image of the previous frame into a plurality of cells; the number of the pixel points in each cell is the same; for each cell, if the number of pixel points with the same gray value in the cell is greater than a preset number threshold, determining the cell as a dense cell corresponding to the gray value; determining the region formed by all the dense cells corresponding to each gray value as a pixel dense region corresponding to the gray value; determining the area of a pixel dense region corresponding to each gray value and the centrifugal distance of the pixel dense region relative to the face region of the first-example face in the video image of the previous frame; determining dense characteristic weighting coefficients of the pixel dense areas corresponding to each gray value according to the areas of the pixel dense areas corresponding to each gray value and the centrifugal distances; and when the face area of the first face in the video image of the next frame is determined according to the Mean-shift algorithm, taking the dense characteristic weighting coefficient corresponding to each gray value as the coefficient of the weight corresponding to the pixel point of the gray value.

After detecting the video image in which the first face appears for the first time, before tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame, the method further comprises: extracting an image of the first face from a video image of the first face appearing for the first time; according to the pre-stored face image, face recognition processing is carried out on the image of the first example face, and whether a face image matched with the image of the first example face exists or not is determined; and if the face image matched with the image of the first face does not exist, tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame.

Wherein, each time after tracking the face area of the first face in the video image of the next frame, the method further comprises: and extracting the image of the first example face from the next frame of video image, and executing face recognition processing on the image of the first example face extracted from the next frame of video image according to the pre-stored face image.

The embodiment of the invention also provides a face tracking device, which comprises: the sensing module is used for detecting whether a human body appears in the human face detection area through a preset human body sensor; the detection module is used for detecting a video image of a first face appearing in a real-time video stream corresponding to the face detection area under the condition that the induction module detects that the face detection area appears a human body through the human body inductor; the first face is the face closest to a camera for collecting the real-time video stream; and the tracking module is used for tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame from the video image of the first face appearing for the first time.

The embodiment of the invention also provides a face tracking device, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the face tracking method of any of the above.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a face tracking program, and the face tracking program realizes the steps of the face tracking method when being executed by a processor.

The embodiment of the invention has the following beneficial effects:

According to the embodiment of the invention, the human face detection is triggered by the human body sensor, and the first human face is detected only when the human body appears in the human face detection area, so that the problem of large resource consumption caused by continuous operation of the human face detection algorithm can be avoided; when the embodiment of the invention is used for tracking the human face, the human face area of the video image of the next frame is determined according to the human face area of the video image of the previous frame, and the cyclic traversal and the human face detection of all pixel points of each video image in the video stream are not needed, so that the problem of large operation amount caused by human face tracking can be effectively reduced, and the human face tracking effect can be ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a face tracking method according to an embodiment of the invention;

FIG. 2 is a flowchart of steps for face tracking according to one embodiment of the present invention;

FIG. 3 is a flowchart of the steps for determining dense feature weighting coefficients according to one embodiment of the invention;

FIG. 4 is a flowchart illustrating steps for determining a face region according to one embodiment of the present invention;

FIG. 5 is a flowchart of the steps for face recognition according to one embodiment of the present invention;

FIG. 6 is a block diagram of a face tracking device according to an embodiment of the present invention;

Fig. 7 is a block diagram of a face tracking device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and the embodiments, in order to make the objects, technical solutions and advantages of the present invention more apparent.

According to an embodiment of the invention, a face tracking method is provided. Fig. 1 is a flowchart of a face tracking method according to an embodiment of the invention.

Step S110, detecting whether a human body appears in a human face detection area through a preset human body sensor; if yes, go to step S120; if not, step S110 is continued.

And the human body sensor is used for detecting whether the human body activity exists in the human face detection area. Further, the human body sensor may be an infrared sensor.

The face detection area may be a single person space area or a multiple person space area.

For example: the face detection area is a security check area (single person space area) through which only one person passes.

Another example is: the face detection area is a face detection area (multi-person spatial area) that allows for co-location of multiple persons, such as: and (5) a face card punching area of the enterprise.

Step S120, if the human body sensor detects that the human body appears in the human face detection area, detecting a video image in which the first human face appears for the first time in the real-time video stream corresponding to the human face detection area.

The first face refers to the face nearest to the camera for collecting real-time video stream.

The real-time video stream refers to a video stream collected by a camera in real time.

The first video image with the first face is that the video image comprises one or more faces; in the continuously acquired real-time video stream, the face closest to the camera in the video image does not appear in the previous frame of video image, or in the continuously acquired real-time video stream, the face closest to the camera in the video image is not the face closest to the camera in the previous frame of video image, or when the face detection area is detected to appear in the human body through the human body sensor, the video image is the first frame of video image acquired by the camera, and one or more faces (the face closest to the camera is the first face) are detected in the video image.

And detecting a video image of the first face in a real-time video stream corresponding to the face detection area by using a preset face detection algorithm. In the process of detecting the first example face, the face area of the first example face in the video image can be determined. The face region may be a region of a face detection frame identified by a face detection algorithm on the video image.

Before the video image with the first face appearing for the first time is detected, histogram equalization processing can be performed on the video image so as to improve the contrast of the video image. The histogram equalization processing can solve the problem that the video image is unclear due to poor light conditions to a certain extent, and the problem that the face area cannot be detected easily occurs due to the unclear video image.

Step S130, after the video image in which the first face appears for the first time is detected, tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame.

The previous frame video image and the next frame video image are two adjacent frame video images, and the acquisition time of the previous frame video image is earlier than the acquisition time of the next frame video image.

The video image of the first face appearing for the first time is used as a first frame video image of face tracking, namely, the face area of the first face in the second frame video image is tracked according to the face area of the first face in the first frame image, the face area of the first face in the third frame video image is tracked according to the face area of the first face in the second frame image, and face tracking is performed by pushing the same.

Tracking the face region of the first face in the video image of the subsequent frame, including but not limited to the following ways: tracking a face area of the first face in a video image of a next frame by means of a Mean shift algorithm; and tracking the face area of the first face in the video image of the next frame by using a Mean-shift algorithm weighted by dense features. The tracking mode can also determine a candidate region of the first face in the video image of the next frame by utilizing the difference value of the video images of the previous frame and the next frame, and detect the face region of the first face in the preset range of the candidate region by utilizing a face detection algorithm. The candidate region is within a preset range, and an area of the candidate region is smaller than an area of the preset range.

The Mean-Shift algorithm is an iterative step, namely: firstly, in a video image of a previous frame, a point y ₀ in a face area of a first face is determined as a starting point; in the next frame of video image, there are a plurality of points around the point y ₀, the sum of the offsets required for the point y ₀ to move to each point is calculated, and the average offset is obtained, and the direction of the average offset is the direction in which the distribution of the points around the point y ₀ is dense, so that the average offset contains the size and the direction; and moving the point y ₀ according to the average offset, then taking the moved position as a new starting point, continuously calculating a new average offset, and continuously moving the new starting point with the new average offset.

In the embodiment, the human face detection is triggered by the human body sensor, and the first human face is detected only when the human body appears in the human face detection area, so that the problem of large resource consumption caused by continuous operation of a human face detection algorithm can be avoided; when the face is tracked, the face area of the video image of the next frame is determined according to the face area of the video image of the previous frame, and the problem of large operation amount caused by face tracking can be effectively reduced without carrying out cyclic traversal and face detection on all pixel points of each video image in a video stream, and the face tracking effect can be ensured.

The face tracking method of the embodiment can be applied to small terminal equipment. For example: a portable terminal device. The kinds of portable terminal devices include, but are not limited to: a mobile phone and a human face card punch.

The overall dimension of small-size terminal equipment is less, and the development board power is lower, and memory space is less, for example: the small terminal is set to have a size of about 3.5x2.3x1 inch, a computing power of about 1T, a RAM (random access memory ) of 4G, and an sdk (Software Development Kit, software development kit) of 64G. The face detection method based on geometric features (haar features) can be adopted in the face detection aspect, so that the detection speed is ensured, a precondition is provided for the follow-up real-time face tracking of the small terminal equipment, in addition, the pixel dense features are applied to the tracking algorithm for ensuring the face tracking effect, the detection of all pixel points in an image is not needed, the operation amount is greatly reduced, and the embodiment can be well applied to the small terminal equipment.

The face detection step of step S120 is further described below.

In this embodiment, preferably, a preset face detection algorithm based on geometric features is used to detect a video image in which a first face appears for the first time in a real-time video stream corresponding to the face detection area.

The geometric feature may be a HOG (Histogram of Oriented Gradient, directional gradient histogram), LBP (Local Binary Patterns, linear back projection algorithm), haar, etc.

For example: the face detection can be performed by adopting a OpenCV (Open Source Computer Vision Library) intermediate-level cascade classifier (CASCADE CLASSIFIER) based on Haar characteristics, so that a group of sampling face rectangular region (face detection frame) sequences can be rapidly obtained in the video image, and the region with the face rectangular region sequences being highly overlapped is used as the face region of the first-example face.

Specifically, before face detection, a camera is set, and the camera is used for collecting real-time video streams of a face detection area. For example: and a camera is arranged on the face card punching machine. Further, the real-time video stream of the face detection area can be continuously collected through the camera, and the camera can be called to collect the real-time video stream of the face detection area after the human body appears in the face detection area through the human body sensor. The camera can default to a normally open state, and is similar to the working mode of a real-time monitoring camera, and under the condition of large traffic, if one person passes through a face detection area at intervals, the repeated starting and closing of the camera can be avoided, and the service life of the camera is prolonged.

After detecting that a human body appears in a human face detection area through a human body sensor, starting a human face detection algorithm, and starting to detect the human face from a video image of a current frame shot by a camera through the human face detection algorithm, and if the human face is detected in one frame of video image, determining a first human face according to the number of the detected human faces and the position of the human face. If the video image only comprises one face, directly determining the face as a first example face; if the video image comprises a plurality of faces, the face corresponding to the face detection frame with the largest area is determined to be the first face, or the face with the smallest depth of view in the plurality of faces is determined to be the first face.

In this embodiment, the first face detection may be suspended at intervals of a preset period, i.e. the face detection algorithm may be suspended, or the next first face may be suspended every time a first face is detected.

Detecting the first face is paused every preset time period, including: detecting each first face appearing in the real-time video stream in a preset time period, pausing face detection when the preset time period is over, and starting face detection when the human body sensor detects that the human body appears in the face detection area again. The method can be applied to a multi-person space area, and the face detection algorithm is prevented from being started frequently.

Detecting a next first face is paused whenever detecting a first face, including: after the video image of the first face appearing first is detected, the video image of the next first face appearing first is paused to be detected, and when the human body sensor detects the human body appearing first in the face detection area again, the video image of the next first face appearing first is started to be detected. For example: the infrared sensor senses whether a human body appears in the human face detection area, when a person approaches, a human face detection algorithm is triggered to detect the human face, and the human face detection is stopped after the video image of the first human face is obtained. This approach may be used in single person spatial areas to avoid the continued operation of face detection algorithms.

The following further describes the face tracking step of step S130.

As shown in fig. 2, a flowchart of steps for face tracking according to an embodiment of the present invention is shown.

In step S210, in the face region of the previous frame of video image, the dense feature weighting coefficient of the pixel dense region corresponding to each gray value is determined.

The dense feature weighting coefficient is used for increasing the weight of the pixel dense region in the face region.

Step S220, according to a preset Mean-shift algorithm and dense feature weighting coefficients of the pixel dense areas corresponding to each gray value, determining a face area of the first face in the video image of the next frame.

And when the face area of the first face in the video image of the next frame is determined according to the Mean-shift algorithm, taking the dense characteristic weighting coefficient corresponding to each gray value as the coefficient of the weight corresponding to the pixel point of the gray value.

The step of determining the dense feature weighting coefficients is described further below.

FIG. 3 is a flowchart illustrating steps for determining dense feature weighting coefficients according to an embodiment of the present invention.

Step S310, dividing the face area of the first face in the previous frame of video image into a plurality of cells; wherein the number of pixels in each cell is the same.

For example: the face region is divided into a division map which divides the face region into a plurality of small cells by taking a cell of 5×5 pixels as a basic unit, that is, the cell contains 25 pixel points.

Step S320, for each cell, if the number of pixels with the same gray value in the cell is greater than a preset number threshold, determining the cell as a dense cell corresponding to the gray value.

The dense cells refer to that the number of pixel points with the same gray value in the cells is larger than a preset number threshold. The number threshold may be an empirical value or a value obtained through experimentation. For example: the number threshold is 12 pixels.

After the cell is determined to be a dense cell corresponding to the gray value, the cell may be marked as the gray value.

Since the face region has been segmented, the determination of the pixel dense region becomes a confirmation of the dense cells. The search for the entire divided map is performed in units of cells. For example: when a cell is searched, if the number of pixels in the cell having a gray value is greater than 12 (i.e., more than half of the pixels in the cell), the gray value is considered to be dense in the cell, and the cell is also marked with the gray value. If the number of pixels in the cell where no gray value exists is greater than 12, then it is assumed that no dense gray value exists in the cell.

Step S330, determining the area formed by all the dense cells corresponding to each gray value as the pixel dense area corresponding to the gray value.

In the segmentation map of the video image, dense cells marked with the same gray value are inquired, and a region formed by the dense cells marked with the same gray value is determined as a pixel dense region corresponding to the same gray value.

Step S340, determining an area of the pixel dense region corresponding to each gray value and a distance between the pixel dense region and a face region of the previous frame of video image corresponding to the first example face.

The area of the pixel-dense region may be replaced with the number of pixel points contained in the pixel-dense region. Assuming that the number of dense cells corresponding to a gray value u is N, the area of the pixel dense region corresponding to the gray value u is s _u＝N×A₀,A₀, which is the dense cell area (the number of pixel points).

The centrifugal distance of the pixel dense region relative to the face region of the first face in the previous frame of video image can be simply called as the centrifugal distance corresponding to the gray value, and is the average value of the spatial Euclidean distances between the centers of the dense cells in the pixel dense region and the center of the face region.

Step S350, determining a dense feature weighting coefficient corresponding to each gray value according to the area of the pixel dense region corresponding to each gray value and the centrifugal distance.

For the sake of clarity of the following description, the face area of the first face in the previous frame of video image is simply referred to as the target face area, and the face area of the first face in the next frame of video image is simply referred to as the tracking face area.

The dense feature weighting coefficient f _u corresponding to the gradation value u may use the following expression:

Where e is a natural base, d ₀ is a minimum non-zero value in the centrifugal distances corresponding to all the gray values in the target face region, d _u is a centrifugal distance corresponding to the gray value u in the target face region, s _u is an area of the pixel dense region corresponding to the gray value u in the target face region, s is an area (number of pixel points) of the target face region, and num is a number of the pixel dense regions corresponding to the gray value u in the target face region.

In this embodiment, the gray values range from 0 to 255, that is, dense feature weighting coefficients corresponding to 0 to 255 gray values are to be determined. If the gray value does not have a dense cell, the pixel dense region corresponding to the gray value is a null value, and the dense characteristic weight coefficient of the pixel dense region corresponding to the gray value is 1.

In this embodiment, in the target face region, the larger the area, the larger the dense feature weighting coefficient of the pixel dense region with the smaller the centrifugal distance; the smaller the area, the smaller the dense feature weighting coefficient of the pixel dense region with larger centrifugal distance. In this way, the differentiation of different pixel-dense areas can be increased, and the accuracy of the subsequent Mean-shfit algorithm can be increased.

The step of determining the face region of the first face in the video image of the subsequent frame is further described below. Fig. 4 is a flowchart illustrating a face region determination procedure according to an embodiment of the present invention.

Step S410, according to the positions and gray values of all pixel points in the target face area, calculating the probability density of each gray value in the target face area.

And calculating the probability density q _u of the gray value u in the target face region, and representing the characteristics of the target face region by using the set of the probability densities q _u so as to construct a target characteristic model.

The specific target feature model q may be expressed as follows:

q＝{q_u}_{u＝0,1,……,255}；

Wherein, The spatial coordinate vector is the spatial coordinate vector of the ith pixel point in the target face area relative to the central position of the target face area; n is the number of pixel points in the target face area; k (x) is a monotonically decreasing function, and k (x) can allocate smaller weight to a pixel point far from the center position, and in this embodiment, k (x) can be taken as an Epanechnikov kernel function; /(I)For judging the position/>Gray value of pixel at/>, atThe value is 1 when the time is taken, otherwise, the value is 0; delta [ x ] is a Kronecker function and C is a first normalized coefficient. The first normalized coefficient C may be defined as the following expression:

Step S420, according to the positions and gray values of each pixel point of the target face area, determining the probability density of each gray value in the tracking face area.

Calculating the probability density p _u(y₀ of the gray value u in the tracking face region), and characterizing the characteristics of the tracking face region by using the set of the probability densities p _u(y₀), so as to construct a tracking characteristic model.

The specific tracking feature model p (y ₀) can be expressed as follows:

p(y₀)＝{p_u(y₀)}_{u＝0,1,……,255}；

Wherein x _i is the ith pixel point in the tracking face area; y ₀ is the center position of the tracking face region, and y ₀ is a preset value, in this embodiment, y ₀ may be set to be the same as the center position of the target face region; h is the width of the tracking face region, and can be set to be the same as the width of the target face region; n _h is the number of pixel points in the tracking face region, and n _h can be set to be the same as the number of pixel points in the target face region; b (x _i) is used for judging the gray value of the pixel point, when b (x _i) =u, the value is 1, otherwise, the value is 0; c _h is a second normalized coefficient, which can be defined as the following expression:

Step S430, determining a new position to which the center position of the target face area moves according to the gray value of each pixel point in the target face area, the dense feature weighting coefficient of the pixel dense area corresponding to each gray value, the probability density of each gray value in the target face area and the probability density of each gray value in the tracking face area.

According to the new position and the size of the target face area, the tracking face area can be determined in the video image of the next frame.

Specifically, according to the Mean-shift tracking algorithm, a new position y to which the center position of the target face region moves is calculated, and the expression of the new position y is as follows:

In this way, the Mean-shift algorithm is used for face tracking, and the key feature weight w _i is selected. The specific embodiment of the invention redefines the characteristic weight w _i as follows:

according to the embodiment, all pixel points in the video image are not required to be detected, the position of the target face area in the video image is taken as the center, and the tracking face area can be found in the pixel points on the upper, lower, left and right of the center in the next frame of video image.

The face tracking method of the embodiment of the invention can be applied to face recognition application scenes. Although face detection techniques can find faces in images, because the person is dynamic and the environment is polygonal, face pose and environmental factors have adverse effects on the face recognition result. For example: due to the changes of angles and expressions of the faces, differences of the appearance of the faces (such as changes of hairstyles), and the influence of environmental factors such as illumination, shielding and the like, the accurate identification of the faces under different conditions is a relatively difficult matter, and the accuracy of face identification results is low. The face tracking method of the embodiment of the invention can dynamically carry out face recognition so as to recognize the face under the optimal face posture and environmental conditions.

The following describes steps of face recognition based on the face tracking method of the present embodiment.

As shown in fig. 5, a flowchart of the steps for face recognition according to an embodiment of the present invention is shown.

Step S510, extracting the image of the first face from the video image of the first face.

Step S520, performing face recognition processing on the image of the first example face according to the pre-stored face image.

Step S530, determining whether a face image matched with the image of the first face exists; if yes, go to step S540; if not, step S550 is performed.

Step S540, if it is determined that there is a face image matching the image of the first face, it is determined that the face recognition passes.

If the face recognition is determined to pass, face recognition passing prompt can be performed.

For example: when a face is punched, the face recognition prompt is "successful in punching a card".

Step S550, if it is determined that there is no face image matching the image of the first face, the face area of the first face in the video image of the next frame is tracked according to the face area of the first face in the video image of the previous frame.

If it is determined that there is no face image matching the image of the first face, possibly due to the pose of the face and/or environmental factors, the embodiment starts to perform face tracking after determining that there is no face image matching the image of the first face, and continues face recognition during the face tracking process, so as to eliminate adverse effects caused by the pose of the face and/or environmental factors on the face recognition.

Further, after tracking the face area of the first face in the video image of the next frame, the image of the first face is extracted from the video image of the next frame, and the process goes to step S520, and the face recognition processing is continuously performed on the image of the first face extracted from the video image of the next frame according to the pre-stored face image.

The embodiment of the invention also provides a face tracking device. The face tracking apparatus may be provided in a small terminal device.

Fig. 6 is a block diagram of a face tracking apparatus according to an embodiment of the present invention.

The face tracking device comprises: a sensing module 610, a detecting module 620 and a tracking module 630.

The sensing module 610 is configured to detect whether a human body appears in the face detection area through a preset human body sensor.

The detection module 620 is configured to detect, when the sensing module detects that a human body appears in the face detection area through the human body sensor, a video image in which a first face appears for the first time in a real-time video stream corresponding to the face detection area; the first face is the face closest to a camera for collecting the real-time video stream.

The tracking module 630 is configured to track, starting from a video image in which the first face appears for the first time, a face area of the first face in a video image of a previous frame according to a face area of the first face in a video image of a next frame.

The functions of the apparatus according to the embodiments of the present invention have been described in the foregoing method embodiments, so that the descriptions of the embodiments are not exhaustive, and reference may be made to the related descriptions in the foregoing embodiments, which are not repeated herein.

The embodiment provides face tracking equipment. Fig. 7 is a block diagram of a face tracking apparatus according to an embodiment of the present invention.

In this embodiment, the face tracking apparatus includes, but is not limited to: a processor 710 and a memory 720.

The processor 710 is configured to execute a face tracking program stored in the memory 720 to implement the face tracking method described above.

Specifically, the processor 710 is configured to execute a face tracking program stored in the memory 720, so as to implement the following steps: detecting whether a human body appears in a human face detection area through a preset human body sensor; if the human body sensor detects that the human body appears in the human face detection area, detecting a video image of the first occurrence of the first human face in a real-time video stream corresponding to the human face detection area; the first face is the face closest to a camera for collecting the real-time video stream; and tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame after the video image of the first face appearing for the first time is detected.

After detecting the video image in which the first face appears for the first time, before tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame, the method further comprises: extracting an image of the first face from a video image of the first face appearing for the first time; according to the pre-stored face image, face recognition processing is carried out on the image of the first example face, and whether a face image matched with the image of the first example face exists or not is determined; and if the face image matched with the image of the first face does not exist, tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame. And each time after the face area of the first face in the next frame of video image is tracked, extracting the image of the first face in the next frame of video image, and executing face recognition processing on the image of the first face extracted in the next frame of video image according to the pre-stored face image.

The embodiment of the invention also provides a computer storage medium. The computer-readable storage medium herein stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When one or more programs in a computer-readable storage medium are executed by one or more processors, the face tracking method described above is implemented.

Specifically, the processor is configured to execute a face tracking program stored in the memory, so as to implement the following steps: detecting whether a human body appears in a human face detection area through a preset human body sensor; if the human body sensor detects that the human body appears in the human face detection area, detecting a video image of the first occurrence of the first human face in a real-time video stream corresponding to the human face detection area; the first face is the face closest to a camera for collecting the real-time video stream; and tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame after the video image of the first face appearing for the first time is detected.

The above description is only an example of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A face tracking method, comprising:

Detecting whether a human body appears in a human face detection area through a preset human body sensor;

If the human body sensor detects that the human body appears in the human face detection area, detecting a video image of the first occurrence of the first human face in a real-time video stream corresponding to the human face detection area, wherein the method comprises the following steps: based on geometric features, a middle-level classifier is adopted, a group of sampling face rectangular region sequences are obtained in the video image, and the region with the height of the face rectangular region sequences overlapped is used as the face region of the first-example face; the first face is the face closest to a camera for collecting the real-time video stream;

Extracting an image of the first face from a video image of the first face appearing for the first time; according to the pre-stored face image, face recognition processing is carried out on the image of the first example face, and whether a face image matched with the image of the first example face exists or not is determined; if no face image matched with the image of the first face exists, tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame; each time after the face area of the first face in a next frame of video image is tracked, extracting the image of the first face in the next frame of video image, and executing face recognition processing on the image of the first face extracted in the next frame of video image according to the pre-stored face image;

And tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame after the video image of the first face appearing for the first time is detected.

2. The method of claim 1, further comprising, after detecting the video image in which the first face appears for the first time:

And pausing to detect the video image of the next first face appearing first, and starting to detect the video image of the next first face appearing first when the human body sensor detects the human body appearing in the face detection area again.

3. The method according to claim 1, wherein detecting a video image in which a first face appears for the first time in a real-time video stream corresponding to the face detection area, comprises:

and detecting a video image of the first face appearing in the real-time video stream corresponding to the face detection area by using a preset face detection algorithm based on geometric features.

4. The method of claim 1, wherein tracking the face region of the first face in the subsequent frame of video image based on the face region of the first face in the previous frame of video image comprises:

Determining dense characteristic weighting coefficients of pixel dense areas corresponding to each gray value in a face area of the first face in a previous frame of video image;

And determining the face area of the first face in the video image of the next frame according to a preset Mean shift Mean-shift algorithm and dense characteristic weighting coefficients of the pixel dense areas corresponding to each gray value.

5. The method of claim 4, wherein determining dense feature weighting coefficients for the pixel dense regions corresponding to each gray value in the face region of the first face in the previous frame of video image comprises:

Dividing the face area of the first face in the video image of the previous frame into a plurality of cells; the number of the pixel points in each cell is the same;

for each cell, if the number of pixel points with the same gray value in the cell is greater than a preset number threshold, determining the cell as a dense cell corresponding to the gray value;

Determining the region formed by all the dense cells corresponding to each gray value as a pixel dense region corresponding to the gray value;

determining the area of a pixel dense region corresponding to each gray value and the centrifugal distance of the pixel dense region relative to the face region of the first-example face in the video image of the previous frame;

determining dense characteristic weighting coefficients of the pixel dense areas corresponding to each gray value according to the areas of the pixel dense areas corresponding to each gray value and the centrifugal distances; and when the face area of the first face in the video image of the next frame is determined according to the Mean-shift algorithm, taking the dense characteristic weighting coefficient corresponding to each gray value as the coefficient of the weight corresponding to the pixel point of the gray value.

6. A face tracking apparatus, comprising:

the sensing module is used for detecting whether a human body appears in the human face detection area through a preset human body sensor;

The detection module is configured to detect, when the sensing module detects that a human body appears in the face detection area through the human body sensor, a video image in which a first face appears for the first time in a real-time video stream corresponding to the face detection area, where the detection module includes: based on geometric features, a middle-level classifier is adopted, a group of sampling face rectangular region sequences are obtained in the video image, and the region with the height of the face rectangular region sequences overlapped is used as the face region of the first-example face; the first face is the face closest to a camera for collecting the real-time video stream; extracting an image of the first face from a video image of the first face appearing for the first time; according to the pre-stored face image, face recognition processing is carried out on the image of the first example face, and whether a face image matched with the image of the first example face exists or not is determined; if no face image matched with the image of the first face exists, tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame; each time after the face area of the first face in a next frame of video image is tracked, extracting the image of the first face in the next frame of video image, and executing face recognition processing on the image of the first face extracted in the next frame of video image according to the pre-stored face image;

and the tracking module is used for tracking the face area of the first face in the video image of the next frame according to the face area of the first face in the video image of the previous frame from the video image of the first face appearing for the first time.

7. A face tracking device, the face tracking device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the face tracking method as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium, wherein a face tracking program is stored on the computer-readable storage medium, which when executed by a processor, implements the steps of the face tracking method according to any one of claims 1 to 5.