CN113223057A - Face tracking method and device, electronic equipment and storage medium - Google Patents

Face tracking method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113223057A
CN113223057A CN202110628739.7A CN202110628739A CN113223057A CN 113223057 A CN113223057 A CN 113223057A CN 202110628739 A CN202110628739 A CN 202110628739A CN 113223057 A CN113223057 A CN 113223057A
Authority
CN
China
Prior art keywords
tracking
frame
face
target
current video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110628739.7A
Other languages
Chinese (zh)
Inventor
王顺利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110628739.7A priority Critical patent/CN113223057A/en
Publication of CN113223057A publication Critical patent/CN113223057A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a face tracking method, a face tracking device, electronic equipment and a storage medium, wherein the face tracking method comprises the following steps: when the face of a video is tracked, if a lost target face is tracked in any current video frame, acquiring the position information of a first tracking frame; amplifying the area framed by the first tracking frame according to the target amplification factor to obtain an amplified first tracking frame; determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; intercepting an image in the area from the current video frame as a local image corresponding to the amplified first tracking frame; detecting whether a human face exists in the local image; if so, generating a second tracking frame at the position of the corresponding face in the current video frame as a face tracking result in the current video frame. The method and the device can achieve the purpose of effectively tracking the target which fails to be tracked in the tracking process.

Description

Face tracking method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for tracking a human face, an electronic device, and a storage medium.
Background
In order to perform special effect rendering on a face in a video frame of a video, the face in the video frame needs to be detected and tracked first, and then the detected and tracked face needs to be subjected to special effect rendering.
However, if the face moves rapidly, the face in the video frame appears blurred, and the existing tracking algorithm is easy to fail when tracking the blurred face, so that the situation that the blurred face is lost during tracking occurs. In this case, the special effect rendering cannot be performed on the blurred face which is lost by tracking, and the effect of performing the special effect rendering on the face is undoubtedly affected.
Disclosure of Invention
An object of the embodiments of the present application is to provide a face tracking method, an apparatus, an electronic device, and a storage medium, so as to achieve the purpose of effectively tracking a face that fails to be tracked in a tracking process. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a face tracking method, including:
when the face of a video is tracked, if a lost target face is tracked in any current video frame, acquiring the position information of a first tracking frame; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
amplifying the area framed by the first tracking frame according to the target magnification to obtain an amplified first tracking frame;
determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
detecting whether a human face exists in the local image;
if so, generating a second tracking frame at the position of the corresponding face in the current video frame as a face tracking result in the current video frame.
Optionally, a plurality of faces exist in each video frame of the video, the tracking frame of each face is marked with an identifier, and the identifiers of the tracking frames of faces belonging to the same person in different video frames are the same;
when the face of the video is tracked, if the face of the lost target is tracked in any current video frame, the position information of the first tracking frame is acquired, and the method comprises the following steps:
when the face of a video is tracked, if a target face corresponding to a lost tracking frame with a target identifier is tracked in any current video frame, determining the position information of the tracking frame with the target identifier in a previous video frame of the current video frame as the position information of a first tracking frame.
Optionally, the amplifying the area framed by the first tracking frame according to the target magnification to obtain the amplified first tracking frame includes:
and acquiring a geometric center point of the first tracking frame, and amplifying the area framed by the first tracking frame by taking the geometric center point as a reference according to a target amplification factor to obtain the amplified first tracking frame.
Optionally, after generating a second tracking frame at the position of the corresponding face in the current video frame, the method further includes:
determining a face contour from the region of the current video frame indicated by the second tracking box;
generating a graphic frame containing the contour according to the determined position of the face contour, and using the graphic frame as a correction frame;
and replacing the second tracking frame with the correction frame to obtain the corrected tracking frame of the current video frame.
Optionally, determining a face contour from the region of the current video frame indicated by the second tracking frame includes:
identifying key points of each face from the area of the current video frame indicated by the second tracking frame;
and determining the face contour based on the positions of the identified face key points.
Optionally, the determining of the target magnification includes:
acquiring a preset magnification factor for amplifying the tracking frame as a target magnification factor;
alternatively, the first and second electrodes may be,
determining a target shooting mode to which the video belongs during shooting, and obtaining a magnification factor corresponding to the target shooting mode from a preset mapping relation between the shooting mode and the magnification factor as a target magnification factor;
the target shooting mode is one of a plurality of shooting modes, the moving speeds of the human faces in the videos in different shooting modes are different, and the moving speed indicated by the shooting mode in the mapping relation is positively correlated with the magnification.
Optionally, determining a target shooting mode to which the video belongs when shooting comprises:
outputting an input interface used for a user to designate a shooting mode to which the video belongs during shooting, and acquiring a target shooting mode designated through the input interface;
alternatively, the first and second electrodes may be,
and acquiring a default shooting mode as a target shooting mode to which the video belongs during shooting.
In a second aspect, an embodiment of the present application provides a face tracking apparatus, including:
the tracking frame acquisition module is used for acquiring the position information of a first tracking frame if a lost target face is tracked in any current video frame when the face of the video is tracked; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
the tracking frame amplifying module is used for amplifying the area framed by the first tracking frame according to the target amplification factor to obtain an amplified first tracking frame;
an image determining module, configured to determine, according to the position information of the amplified first tracking frame, an area corresponding to the amplified first tracking frame in the current video frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
the face detection module is used for detecting whether a face exists in the local image;
and the tracking frame generating module is used for generating a second tracking frame at the position of the face corresponding to the current video frame when the face exists in the local image, and the second tracking frame is used as the face tracking result in the current video frame.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of the first aspect when executing the program stored in the memory.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of the first aspect.
According to the scheme provided by the embodiment of the application, when the target face in the current video frame is tracked and lost, the position information of the first tracking frame generated by tracking the face in the video frame before the current video frame is utilized to obtain the local image corresponding to the amplified first tracking frame, and the local image is detected to obtain the target face which is tracked and lost again. Because the detection of the local image is less influenced by the blurred image, the detection effect of the blurred image is more stable and an accurate detection frame can be obtained; therefore, the defect that the fuzzy face is difficult to track by the existing tracking algorithm can be overcome, the aim of effectively tracking the face which fails to track in the tracking process is fulfilled, and the effect of rendering the special effect of the face is obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of a face tracking method according to an embodiment of the present application;
fig. 2 is a flowchart of another face tracking method according to an embodiment of the present application;
FIG. 3 is a flowchart of a process for tracking a human face by using the method of the embodiment of the present application;
FIG. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are described below with reference to the drawings of the embodiments of the present application.
In order to achieve the purpose of effectively tracking a target which fails to be tracked in the tracking process, the embodiment of the application provides a face tracking method, a face tracking device, an electronic device and a storage medium.
First, a face tracking method provided in an embodiment of the present application is described below.
The face tracking method provided by the embodiment of the application can be applied to electronic equipment. In a specific application, the electronic device may be a smart phone, a tablet computer, or the like. Specifically, an execution subject of the face tracking method according to the embodiment of the present application may be a face tracking device running in an electronic device. Moreover, the video to which the face tracking method is applied may be a real-time video, for example: the method comprises the steps that a user live broadcasts videos collected in real time through a camera in the process of live broadcasting, or videos collected in real time in the process of video calling; of course, the video to which the face tracking method is applied may also be a video that is acquired in advance, for example: a video saved locally by the electronic device, or a video downloaded from a network.
In addition, after the tracking frame for face tracking of each video frame of the video is obtained by the face tracking method, predetermined processing may be performed based on the tracking frame, for example: and performing special effect rendering on the target in the video frame based on the tracking frame of the video frame.
The face tracking method provided by the embodiment of the application can comprise the following steps:
when the face of a video is tracked, if a lost target face is tracked in any current video frame, acquiring the position information of a first tracking frame; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
amplifying the area framed by the first tracking frame according to the target magnification to obtain an amplified first tracking frame;
determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
detecting whether a human face exists in the local image;
if so, generating a second tracking frame at the position of the corresponding face in the current video frame as a face tracking result in the current video frame.
According to the scheme provided by the embodiment of the application, when the target face in the current video frame is tracked and lost, the position information of the first tracking frame generated by tracking the face in the video frame before the current video frame is utilized to obtain the local image corresponding to the amplified first tracking frame, and the local image is detected to obtain the target face which is tracked and lost again. Because the detection of the local image is less influenced by the blurred image, the detection effect of the blurred image is more stable and an accurate detection frame can be obtained; therefore, the defect that the fuzzy face is difficult to track by the existing tracking algorithm can be overcome, the aim of effectively tracking the face which fails to track in the tracking process is fulfilled, and the effect of rendering the special effect of the face is obviously improved.
A face tracking method provided in an embodiment of the present application is described below with reference to the accompanying drawings.
As shown in fig. 1, a face tracking method provided in an embodiment of the present application includes:
s101, when the face of a video is tracked, if a lost target face is tracked in any current video frame, the position information of a first tracking frame is obtained.
The position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
it should be noted that the existing tracking algorithm (such as the Lucas-Kanade optical flow method) has a small calculation amount, can better realize rigid motion tracking, but has an unsatisfactory tracking effect on a fast moving blurred face, and is easy to track and lose the face. Therefore, when a lost face is tracked, the face tracking loss can be determined at a high rate as a result of blurring of the face in a video frame caused by rapid motion of the face. In order to ensure that the face that fails to be tracked in the tracking process is effectively tracked, it is necessary to perform targeted processing.
The optical flow method is to utilize the change of the intensity of pixels in an image sequence containing a moving object in a time domain and a space domain to further calculate a moving field of the moving object, and finally realize the tracking of the moving object. In the optical flow method, usually, optical flow calculation is performed on a plurality of feature points in an image sequence, and tracking and recognition of a moving object are realized by tracking the feature points, but in the tracking process of the optical flow method, the requirement on matching of the image sequence is relatively high, but in some complex scenes, the obtained image sequence may have a low matching degree, so that errors occur in tracking some feature points, and further errors or failures in tracking and recognition of the moving object may occur. Since the optical flow method belongs to the prior art, it is only briefly introduced here and will not be described again.
It should be noted that, in the embodiment of the present application, the position information of the first tracking frame, which is acquired by tracking the target face in the video frame before the current video frame, is used to determine the position where the face may appear in the current video frame. For several video frames before the current video frame, especially for the video frame adjacent to the current video frame, the time interval between the video frames is short, and therefore, the position change of the target face in different video frames is small. Therefore, the position of the face in the current video frame can be approximately determined by using the position information of the first tracking frame generated by tracking the target face in the video frame before the current video frame. The position information of the first tracking frame comprises information such as coordinates of the tracking frame, and the information can be stored in the electronic equipment locally or uploaded to a cloud end; in practical application, the position information of the first tracking frame can be conveniently acquired according to practical conditions.
It is understood that, for a video frame in which a plurality of faces exist, the position information of the corresponding first tracking frame may also be acquired in a manner similar to that described above.
For example, in one implementation, for each video frame of the video, multiple faces exist, the tracking frame of each face is marked with an identifier, and the identifiers of the tracking frames of the faces belonging to the same person in different video frames are the same. At this time, when the face of the video is tracked, if the lost target face is tracked in any current video frame, the step of acquiring the position information of the first tracking frame may specifically include:
when the face of a video is tracked, if a target face corresponding to a lost tracking frame with a target identifier is tracked in any current video frame, determining the position information of the tracking frame with the target identifier in a previous video frame of the current video frame as the position information of a first tracking frame.
For example, when there are multiple faces in a video frame, the tracking frame corresponding to each face is marked with an identifier, for example, 5 tracking frames are generated corresponding to 5 faces in the previous video frame, and are respectively marked as 1, 2, 3, 4, and 5. In the face tracking, if only 3 faces in the current video frame are tracked, i.e. tracking frames 2, 3 and 4 are generated, in other words, tracking frames 1 and 5 are not generated, indicating that the tracking of the 1 st face and the 5 th face fails, the position information of the tracking frames 1 and 5 is determined from the previous video frame of the current video frame as the position information of the first tracking frame.
S102, amplifying the area framed by the first tracking frame according to the target amplification factor to obtain the amplified first tracking frame.
It should be noted that, in the embodiment of the present application, a local image in which a target may exist is obtained through a first tracking frame, and is detected to reacquire a lost target face. In order to make the target face in the determined partial image as complete as possible, in the embodiment of the present application, before the partial image is determined, the first tracking frame is enlarged in consideration of the fact that the position of the target face in the current video frame is shifted relative to the position of the target face in the video frame before the current video frame.
There are various ways to enlarge the first tracking frame.
For example, in an implementation manner, the area framed by the first tracking frame may be enlarged by taking a geometric center point of the first tracking frame as a reference and according to a target magnification, so as to obtain an enlarged first tracking frame.
Illustratively, in another implementation manner, a random manner is also adopted, and an area framed by the first tracking frame is enlarged according to a target magnification by taking any point in the first tracking frame as a reference, so as to obtain an enlarged first tracking frame.
It should be noted that the target magnification may be determined in various ways.
For example, in one implementation, a preset magnification for zooming in the tracking frame may be directly obtained as the target magnification.
The preset magnification factor can be determined according to an accumulated experience value in the actual face tracking process.
The setting may be made according to an empirical value, considering that the position of the target face in the current video frame is not moved by too large an extent in relation to the position in the previous video frame in the usual case. For example, the target magnification may be set to one magnification.
For example, in another implementation manner, a target shooting mode to which the video belongs at the time of shooting is determined, and a magnification corresponding to the target shooting mode is obtained as a target magnification from a preset mapping relation between shooting modes and magnifications.
The target shooting mode is one of a plurality of shooting modes, the moving speeds of the human faces in the videos in different shooting modes are different, and the moving speed indicated by the shooting mode in the mapping relation is positively correlated with the magnification.
It will be appreciated that the speed of the face movement may vary from scene to scene, for example, customers on the walking street, athletes at track and field events, etc. In the embodiment of the application, corresponding target shooting modes are set corresponding to different scenes, and the corresponding optimal target magnification is set in different shooting modes, so that the face tracking effect can be further improved.
Determining a target shooting mode to which the video belongs when shooting, wherein the target shooting mode to which the video belongs when shooting can include outputting an input interface used for a user to specify the shooting mode to which the video belongs when shooting, and acquiring the target shooting mode specified through the input interface; it may also include acquiring a default shooting mode as a target shooting mode to which the video belongs at the time of shooting.
S103, determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; and intercepting the image in the area from the current video frame to serve as a local image corresponding to the amplified first tracking frame.
It can be understood that after the corresponding region is determined by the information of the enlarged first tracking frame, the corresponding partial image can be intercepted from the current video frame.
S104, detecting whether a human face exists in the local image;
because the detection of the local image is less influenced by the blurred image, the detection effect on the blurred image is more stable, and an accurate detection frame can be obtained, the local image can be detected after being determined. If the target face is not obtained by the detection, it can be considered that the target face does not exist in the local image actually, that is, the problem of face tracking loss does not exist. If the target face is detected, the target face can be tracked through the subsequent steps. In addition, because the local image is detected, the calculation amount required for detection is small, and excessive calculation resources are not consumed.
The target detection algorithm used for detecting the local image belongs to the prior art, such as R-CNN (Region-conditional Neural Networks), Fast R-CNN (Fast Region-conditional Neural Networks), and the like, and will not be described herein again.
It should be noted that, the accuracy of tracking the target face in the video frame can be ensured by increasing the threshold of the tracking algorithm when tracking the target face, and meanwhile, when detecting after tracking the lost target face, the recall rate is increased by decreasing the threshold of the target detection algorithm when detecting the target face in the local image. Therefore, the advantages of the two aspects are combined, and the tracking quality of the face tracking method in the embodiment of the application can be improved on the whole.
Further, to facilitate the detection of the partial images, the determined partial images may be uniformly scaled to a fixed size, for example, 80 × 80, 60 × 60, or the like, prior to the detection. The size of the specific scaling may be determined according to practical situations.
And S105, if so, generating a second tracking frame at the position of the face corresponding to the current video frame, wherein the second tracking frame is used as the face tracking result in the current video frame.
When the local image is detected through the steps to obtain the target face, the target can be tracked by generating the second tracking frame.
According to the scheme provided by the embodiment of the application, when the target face in the current video frame is tracked and lost, the position information of the first tracking frame generated by tracking the face in the video frame before the current video frame is utilized to obtain the local image corresponding to the amplified first tracking frame, and the local image is detected to obtain the target face which is tracked and lost again. Because the detection of the local image is less influenced by the blurred image, the detection effect of the blurred image is more stable and an accurate detection frame can be obtained; therefore, the defect that the fuzzy face is difficult to track by the existing tracking algorithm can be overcome, the aim of effectively tracking the face which fails to track in the tracking process is fulfilled, and the effect of rendering the special effect of the face is obviously improved.
When the target is tracked for a long time, a time drift phenomenon occurs along with the accumulation of tracking errors. Based on the situation, the embodiment of the application provides another face tracking method, which can effectively avoid drift in the long-time tracking process and keep the stability of face tracking.
As shown in fig. 2, another face tracking method provided in the embodiment of the present application includes:
s201, when the face of the video is tracked, if the face of the lost target is tracked in any current video frame, the position information of the first tracking frame is obtained.
The position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video.
S202, amplifying the area framed by the first tracking frame according to the target amplification factor to obtain the amplified first tracking frame.
S203, determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; and intercepting the image in the area from the current video frame to serve as a local image corresponding to the amplified first tracking frame.
And S204, detecting whether a human face exists in the local image.
S205, if yes, generating a second tracking frame at the position of the face corresponding to the current video frame.
In the embodiment of the present application, the steps S201 to S205 may be the same as those of the steps S101 to S105 in the above embodiment, and are not described herein again.
S206, determining a face contour from the area of the current video frame indicated by the second tracking frame.
It should be noted that, after the second tracking frame is generated, the area of the current video frame covered by the second tracking frame may be utilized to determine the contour of the target face in the area.
There are various ways to determine the contour of the target face from the area of the current video frame indicated by the second tracking frame.
For example, in one implementation, the key points of each face may be identified from the area of the current video frame indicated by the second tracking frame;
and determining the face contour based on the positions of the identified face key points.
In the embodiment of the present application, each key point of the target face may be identified by using a key point estimation algorithm, which belongs to the prior art and is not described herein again.
And S207, generating a graphic frame containing the contour according to the determined position of the human face contour, and using the graphic frame as a correction frame.
After the outline of the target face is determined, a correction frame for correcting the second tracking frame can be obtained based on the position of the outline.
And S208, replacing the second tracking frame with the correcting frame to obtain the corrected tracking frame of the current video frame.
According to the embodiment of the application, the second tracking frame obtained by local detection is updated through the correction frame obtained by key point estimation, so that the drift phenomenon occurring in long-time tracking can be effectively improved.
According to the scheme provided by the embodiment of the application, when the target face in the current video frame is tracked and lost, the position information of the first tracking frame generated by tracking the face in the video frame before the current video frame is utilized to obtain the local image corresponding to the amplified first tracking frame, and the local image is detected to obtain the target face which is tracked and lost again. Because the detection of the local image is less influenced by the blurred image, the detection effect of the blurred image is more stable and an accurate detection frame can be obtained; therefore, the defect that the fuzzy face is difficult to track by the existing tracking algorithm can be overcome, the aim of effectively tracking the face which fails to track in the tracking process is fulfilled, and the effect of rendering the special effect of the face is obviously improved.
According to the face tracking method, after a lost target face is tracked, the determined local image is detected to obtain a second tracking frame, so that the lost target face is tracked, and tracking maintenance of the target face is realized; in addition, the contour of the target face is determined based on the area of the current video frame indicated by the second tracking frame, and then a correction frame is generated to correct the second tracking frame, so that long-time and stable tracking of the target is effectively realized.
For the convenience of understanding the scheme of the foregoing embodiment of the present application, a specific implementation manner of the face tracking implemented by applying the face tracking method of the present application is briefly described below with reference to fig. 3.
As shown in fig. 3, when tracking a target face in a current video frame, since there may be a plurality of target faces in the video frame, tracking may be successful for a part of the target faces, and tracking may be failed for the other remaining part of the target faces. When the tracking is successful, a face tracking frame can be directly obtained; for a target face that fails to be tracked, the tracking of the lost target face may be implemented by local target detection according to the face tracking method in the above embodiment, so as to obtain a second tracking frame, that is, the face tracking frame in fig. 3. After the face tracking frame is obtained, the face tracking frame can be corrected by using the target key point, regardless of the directly obtained face tracking frame or the face tracking frame obtained by adopting the method of the embodiment of the application. By the scheme, long-time and stable tracking of the target face can be effectively realized.
As shown in fig. 4, an embodiment of the present application further provides a face tracking apparatus, including:
a tracking frame obtaining module 401, configured to, when a face of a video is tracked, if a lost target face is tracked in any current video frame, obtain position information of a first tracking frame; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
a tracking frame amplifying module 402, configured to amplify the area framed by the first tracking frame according to a target amplification factor, to obtain an amplified first tracking frame;
an image determining module 403, configured to determine, according to the position information of the enlarged first tracking frame, an area corresponding to the enlarged first tracking frame in the current video frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
a face detection module 404, configured to detect whether a face exists in the local image;
a tracking frame generating module 405, configured to generate a second tracking frame at a position corresponding to a face in the current video frame when the face is detected to exist in the local image, as a face tracking result in the current video frame.
Optionally, a plurality of faces exist in each video frame of the video, the tracking frame of each face is marked with an identifier, and the identifiers of the tracking frames of faces belonging to the same person in different video frames are the same;
the tracking frame obtaining module is specifically configured to, when a face of a video is tracked, if a target face corresponding to a lost tracking frame with a target identifier is tracked in any current video frame, determine position information of the tracking frame with the target identifier in a previous video frame of the current video frame as position information of a first tracking frame.
Optionally, the tracking frame amplifying module is specifically configured to obtain a geometric center point of the first tracking frame, and amplify an area framed by the first tracking frame according to a target amplification factor with the geometric center point as a reference, so as to obtain an amplified first tracking frame.
Optionally, the apparatus further comprises a tracking frame rectification module;
the tracking frame correction module is used for determining a face contour from the area of the current video frame indicated by the second tracking frame;
generating a graphic frame containing the contour according to the determined position of the face contour, and using the graphic frame as a correction frame;
and replacing the second tracking frame with the correction frame to obtain the corrected tracking frame of the current video frame.
Optionally, the tracking frame rectification module is specifically configured to identify key points of each face from the area of the current video frame indicated by the second tracking frame;
and determining the face contour based on the positions of the identified face key points.
Optionally, the determining of the target magnification includes:
acquiring a preset magnification factor for amplifying the tracking frame as a target magnification factor;
or determining a target shooting mode to which the video belongs during shooting, and obtaining a magnification factor corresponding to the target shooting mode from a preset mapping relation between the shooting mode and the magnification factor as a target magnification factor;
the target shooting mode is one of a plurality of shooting modes, the moving speeds of the human faces in the videos in different shooting modes are different, and the moving speed indicated by the shooting mode in the mapping relation is positively correlated with the magnification.
Optionally, determining a target shooting mode to which the video belongs when shooting comprises:
outputting an input interface used for a user to designate a shooting mode to which the video belongs during shooting, and acquiring a target shooting mode designated through the input interface;
or acquiring a default shooting mode as a target shooting mode to which the video belongs when shooting.
The embodiment of the present application further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501 is configured to implement the face tracking method according to any of the above embodiments when executing the program stored in the memory 503.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the face tracking method in any of the above embodiments.
In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the face tracking method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A face tracking method, comprising:
when the face of a video is tracked, if a lost target face is tracked in any current video frame, acquiring the position information of a first tracking frame; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
amplifying the area framed by the first tracking frame according to the target magnification to obtain an amplified first tracking frame;
determining an area corresponding to the amplified first tracking frame in the current video frame according to the position information of the amplified first tracking frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
detecting whether a human face exists in the local image;
if so, generating a second tracking frame at the position of the corresponding face in the current video frame as a face tracking result in the current video frame.
2. The method according to claim 1, wherein a plurality of faces exist in each video frame of the video, the tracking frame of each face is marked with an identifier, and the identifiers of the tracking frames of faces belonging to the same person in different video frames are the same;
when the face of the video is tracked, if the face of the lost target is tracked in any current video frame, the position information of the first tracking frame is acquired, and the method comprises the following steps:
when the face of a video is tracked, if a target face corresponding to a lost tracking frame with a target identifier is tracked in any current video frame, determining the position information of the tracking frame with the target identifier in a previous video frame of the current video frame as the position information of a first tracking frame.
3. The method according to claim 1, wherein the magnifying the area framed by the first tracking frame according to the target magnification to obtain a magnified first tracking frame comprises:
and acquiring a geometric center point of the first tracking frame, and amplifying the area framed by the first tracking frame by taking the geometric center point as a reference according to a target amplification factor to obtain the amplified first tracking frame.
4. The method according to any one of claims 1 to 3, wherein after generating a second tracking frame at a position of a corresponding face in the current video frame, the method further comprises:
determining a face contour from the region of the current video frame indicated by the second tracking box;
generating a graphic frame containing the contour according to the determined position of the face contour, and using the graphic frame as a correction frame;
and replacing the second tracking frame with the correction frame to obtain the corrected tracking frame of the current video frame.
5. The method of claim 4, wherein determining a face contour from the region of the current video frame indicated by the second tracking box comprises:
identifying key points of each face from the area of the current video frame indicated by the second tracking frame;
and determining the face contour based on the positions of the identified face key points.
6. The method according to any one of claims 1-3, wherein the target magnification is determined in a manner comprising:
acquiring a preset magnification factor for amplifying the tracking frame as a target magnification factor;
alternatively, the first and second electrodes may be,
determining a target shooting mode to which the video belongs during shooting, and obtaining a magnification factor corresponding to the target shooting mode from a preset mapping relation between the shooting mode and the magnification factor as a target magnification factor;
the target shooting mode is one of a plurality of shooting modes, the moving speeds of the human faces in the videos in different shooting modes are different, and the moving speed indicated by the shooting mode in the mapping relation is positively correlated with the magnification.
7. The method of claim 6, wherein determining a target shooting mode to which the video belongs when shot comprises:
outputting an input interface used for a user to designate a shooting mode to which the video belongs during shooting, and acquiring a target shooting mode designated through the input interface;
alternatively, the first and second electrodes may be,
and acquiring a default shooting mode as a target shooting mode to which the video belongs during shooting.
8. A face tracking device, comprising:
the tracking frame acquisition module is used for acquiring the position information of a first tracking frame if a lost target face is tracked in any current video frame when the face of the video is tracked; the position information of the first tracking frame is the position information of the tracking frame of the target face in the previous video frame in the video frame before the current video frame, which is acquired in the process of tracking the face of the video;
the tracking frame amplifying module is used for amplifying the area framed by the first tracking frame according to the target amplification factor to obtain an amplified first tracking frame;
an image determining module, configured to determine, according to the position information of the amplified first tracking frame, an area corresponding to the amplified first tracking frame in the current video frame; intercepting the image in the area from the current video frame as a local image corresponding to the amplified first tracking frame;
the face detection module is used for detecting whether a face exists in the local image;
and the tracking frame generating module is used for generating a second tracking frame at the position of the face corresponding to the current video frame when the face exists in the local image, and the second tracking frame is used as the face tracking result in the current video frame.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN202110628739.7A 2021-06-04 2021-06-04 Face tracking method and device, electronic equipment and storage medium Pending CN113223057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110628739.7A CN113223057A (en) 2021-06-04 2021-06-04 Face tracking method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110628739.7A CN113223057A (en) 2021-06-04 2021-06-04 Face tracking method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113223057A true CN113223057A (en) 2021-08-06

Family

ID=77083020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110628739.7A Pending CN113223057A (en) 2021-06-04 2021-06-04 Face tracking method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113223057A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106558224A (en) * 2015-09-30 2017-04-05 徐贵力 A kind of traffic intelligent monitoring and managing method based on computer vision
CN106599836A (en) * 2016-12-13 2017-04-26 北京智慧眼科技股份有限公司 Multi-face tracking method and tracking system
CN109151439A (en) * 2018-09-28 2019-01-04 上海爱观视觉科技有限公司 A kind of the automatic tracing camera system and method for view-based access control model
CN109754383A (en) * 2017-11-08 2019-05-14 中移(杭州)信息技术有限公司 A kind of generation method and equipment of special efficacy video
CN110853076A (en) * 2019-11-08 2020-02-28 重庆市亿飞智联科技有限公司 Target tracking method, device, equipment and storage medium
CN110991280A (en) * 2019-11-20 2020-04-10 北京影谱科技股份有限公司 Video tracking method and device based on template matching and SURF
CN111476065A (en) * 2019-01-23 2020-07-31 北京奇虎科技有限公司 Target tracking method and device, computer equipment and storage medium
CN111627045A (en) * 2020-05-06 2020-09-04 佳都新太科技股份有限公司 Multi-pedestrian online tracking method, device and equipment under single lens and storage medium
CN112016353A (en) * 2019-05-30 2020-12-01 普天信息技术有限公司 Method and device for carrying out identity recognition on face image based on video
CN112750146A (en) * 2020-12-31 2021-05-04 浙江大华技术股份有限公司 Target object tracking method and device, storage medium and electronic equipment
CN112767448A (en) * 2021-01-25 2021-05-07 北京影谱科技股份有限公司 Automatic error recovery method in video tracking

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106558224A (en) * 2015-09-30 2017-04-05 徐贵力 A kind of traffic intelligent monitoring and managing method based on computer vision
CN106599836A (en) * 2016-12-13 2017-04-26 北京智慧眼科技股份有限公司 Multi-face tracking method and tracking system
CN109754383A (en) * 2017-11-08 2019-05-14 中移(杭州)信息技术有限公司 A kind of generation method and equipment of special efficacy video
CN109151439A (en) * 2018-09-28 2019-01-04 上海爱观视觉科技有限公司 A kind of the automatic tracing camera system and method for view-based access control model
CN111476065A (en) * 2019-01-23 2020-07-31 北京奇虎科技有限公司 Target tracking method and device, computer equipment and storage medium
CN112016353A (en) * 2019-05-30 2020-12-01 普天信息技术有限公司 Method and device for carrying out identity recognition on face image based on video
CN110853076A (en) * 2019-11-08 2020-02-28 重庆市亿飞智联科技有限公司 Target tracking method, device, equipment and storage medium
CN110991280A (en) * 2019-11-20 2020-04-10 北京影谱科技股份有限公司 Video tracking method and device based on template matching and SURF
CN111627045A (en) * 2020-05-06 2020-09-04 佳都新太科技股份有限公司 Multi-pedestrian online tracking method, device and equipment under single lens and storage medium
CN112750146A (en) * 2020-12-31 2021-05-04 浙江大华技术股份有限公司 Target object tracking method and device, storage medium and electronic equipment
CN112767448A (en) * 2021-01-25 2021-05-07 北京影谱科技股份有限公司 Automatic error recovery method in video tracking

Similar Documents

Publication Publication Date Title
CN110189285B (en) Multi-frame image fusion method and device
CN108541374B (en) Image fusion method and device and terminal equipment
CN109145771B (en) Face snapshot method and device
CN109598744B (en) Video tracking method, device, equipment and storage medium
CN111193965B (en) Video playing method, video processing method and device
CN110796600B (en) Image super-resolution reconstruction method, image super-resolution reconstruction device and electronic equipment
CN109035287B (en) Foreground image extraction method and device and moving vehicle identification method and device
US11532089B2 (en) Optical flow computing method and computing device
CN110691259A (en) Video playing method, system, device, electronic equipment and storage medium
CN109241345B (en) Video positioning method and device based on face recognition
CN113538286B (en) Image processing method and device, electronic equipment and storage medium
CN109447022B (en) Lens type identification method and device
CN111028179A (en) Stripe correction method and device, electronic equipment and storage medium
CN111080571A (en) Camera shielding state detection method and device, terminal and storage medium
US20220309627A1 (en) Face image straight line processing method, terminal device and storage medium
CN111091146B (en) Picture similarity obtaining method and device, computer equipment and storage medium
CN111445487A (en) Image segmentation method and device, computer equipment and storage medium
CN115004227A (en) Image processing method, device and equipment
JP2015226326A (en) Video analysis method and video analysis device
CN113158773B (en) Training method and training device for living body detection model
CN113888438A (en) Image processing method, device and storage medium
CN110689565B (en) Depth map determination method and device and electronic equipment
CN112393804B (en) Image correction method and device
CN113223057A (en) Face tracking method and device, electronic equipment and storage medium
CN111445411A (en) Image denoising method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination