CN110008810B

CN110008810B - Method, device, equipment and machine-readable storage medium for face image acquisition

Info

Publication number: CN110008810B
Application number: CN201910019304.5A
Authority: CN
Inventors: 郑丹丹
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2023-07-14
Anticipated expiration: 2039-01-09
Also published as: CN110008810A

Abstract

The embodiment of the invention provides a method, a device, equipment and a machine-readable storage medium for acquiring a face image. The method comprises the following steps: receiving a current frame, wherein the current frame is one frame in a frame sequence, and the frame sequence is continuously shot through a camera under the condition of no user interaction; face detection is carried out on the current frame; if m faces are detected in the current frame, determining the largest face in the m faces; the current target face is determined based on the largest face so that an image of a final target face is output from a face template library at the end of the frame sequence, wherein the final target face is determined by considering the target faces of the respective frames of the frame sequence, the face template library storing images of different faces detected from the respective frames. The embodiment of the invention can complete the acquisition of the face image without interfering with the user and without matching the user, thereby improving the user experience.

Description

Method, device, equipment and machine-readable storage medium for face image acquisition

Technical Field

The present invention relates to the field of information processing, and in particular, to a method, apparatus, device, and machine-readable storage medium for face image acquisition.

Background

At present, various technologies based on face images have been widely applied to various fields such as public security, intelligent monitoring, identity authentication and financial payment. In these techniques, the primary task is to acquire face images that meet the requirements. In general, in order to acquire a high-quality face image, not only environmental factors such as illumination and the like need to be considered, but also coordination of the acquired person is often required, such as requiring the acquired person to face the camera, keep a certain posture, and the like. However, many real-world scenarios may have difficulty meeting such requirements, and how to effectively acquire the face image becomes one of the problems to be solved.

Disclosure of Invention

In view of the foregoing problems of the prior art, embodiments of the present invention provide a method, apparatus, device, and machine-readable storage medium for face image acquisition.

In one aspect, an embodiment of the present invention provides a method for acquiring a face image, including: receiving a current frame, wherein the current frame is one frame in a frame sequence, and the frame sequence is continuously shot through a camera under the condition of no user interaction; performing face detection on the current frame; if m faces are detected in the current frame, determining the largest face in the m faces, wherein m is a positive integer; determining a current target face based on the maximum face so as to output an image of a final target face from a face template library at the end of the frame sequence, wherein the final target face is determined by considering the target faces of the frames of the frame sequence, and the face template library is used for storing the images of the different faces detected from the frames.

In another aspect, an embodiment of the present invention provides an apparatus for acquiring a face image, including: a receiving unit, configured to receive a current frame, where the current frame is one frame in a frame sequence, and the frame sequence is continuously shot by a camera without user interaction; the detection unit is used for carrying out face detection on the current frame; a face determining unit configured to: if the detection unit detects m faces in the current frame, determining the largest face in the m faces, wherein m is a positive integer; the face determining unit is further configured to: determining a current target face based on the maximum face so as to output an image of a final target face from a face template library at the end of the frame sequence, wherein the final target face is determined by considering the target faces of the frames of the frame sequence, and the face template library is used for storing the images of the different faces detected from the frames.

In another aspect, embodiments of the present invention provide a computing device comprising: at least one processor; a memory in communication with the at least one processor, having stored thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to implement the above-described method.

In another aspect, embodiments of the present invention provide a machine-readable storage medium storing executable instructions that, when executed by a machine, cause the machine to perform the above-described method.

In the embodiment of the invention, under the condition of no user interaction, the camera continuously shoots the frame sequence, carries out face detection on each frame in the frame sequence and determines the target face, and determines the final target face by considering the target face of each frame, so that the image corresponding to the final target face is output from the face template library when the frame sequence is finished, thereby completing face image acquisition without interference to a user and without user cooperation, and improving user experience. In addition, by obtaining an image of the final target face based on a sequence of frames continuously photographed, a face image with better quality can be selected in a relatively long time, so that the effect of subsequent application based on the face image can be improved.

Drawings

The above features, advantages and the manner of attaining them will be further described in greater detail by reference to the following description of the preferred embodiment and the accompanying drawings, wherein:

Fig. 1 is a schematic flow chart of a method for acquiring a face image according to an embodiment of the invention.

Fig. 2A is a schematic diagram of one example of an implementation for maintaining a face template library in accordance with an embodiment of the present invention.

Fig. 2B is a schematic flow chart of a manner for determining a current target face according to an embodiment of the invention.

Fig. 2C-2F are schematic diagrams of examples of scenes for determining a current target face according to an embodiment of the present invention.

Fig. 3 is a schematic block diagram of an apparatus for acquiring a face image according to an embodiment of the present invention.

Fig. 4 is a hardware block diagram of a computing device for acquiring face images according to an embodiment of the invention.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable those skilled in the art to better understand and practice the subject matter described herein and are not limiting on the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the invention. Various examples may omit, replace, or add various procedures or components as desired. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined.

In various technologies based on face images, it is a primary task to acquire face images that meet the requirements. For this reason, not only environmental factors are often considered, but also corresponding coordination of the person to be collected is required, such as requiring the person to be collected to face the camera, to maintain a certain posture for a certain period of time, and so on.

For example, in face-based payment techniques (which may also be referred to as "face-brushing" payment), there is typically a special acquisition phase, i.e. presenting a special interactive interface through a terminal for payment with a camera, displaying an image of the user in the camera, so that the user perceives that a face image is being acquired, and stopping at a suitable time, informing the user that the currently acquired face image is the face image for comparison, after which the acquired face image is used to compare with the reserved image to complete the payment process.

In this scenario, the entire acquisition phase needs to be controlled to be completed within a certain time. If the acquisition takes too long, the user experience is deteriorated, and if the acquisition is too fast, the user may not be ready, and problems such as blurring of acquired images, poor posture, improper illumination and the like are easily caused. It can be seen that how to effectively acquire the face image is one of the problems to be solved.

In this regard, the embodiment of the invention provides a technical scheme for acquiring face images. The technical scheme of the embodiment of the invention can be applied to various face image-based technologies. For example, the technical scheme of the embodiment of the invention can be adopted in the face-brushing payment technology. The "face brushing" payment technique can be applied to various scenes requiring payment, such as restaurants, supermarkets, hospitals, shops, and the like. In particular, terminals for payment may be provided at the checkout stations at these locations. The terminal may be equipped with a camera for capturing images, such as a three-dimensional camera. Based on the image shot by the three-dimensional camera, the distance between the user and the camera can be calculated, so as to assist in face image acquisition.

In addition, in this scenario, the face image acquisition process of the embodiment of the present invention may be triggered by a purchase event, for example, when it is detected that the code scanning of the commodity is started through the terminal, the face image acquisition process may be started. When confirmation payment is detected, the face image acquisition process may end. The whole process does not need to interact with the user, namely, a special interaction interface is not needed to be presented on the terminal to inform the user that the user needs to cooperate to acquire the face image, so that the user cannot be interfered, and the user experience can be improved.

In the embodiment of the invention, the face tracking technology can be involved. Face tracking may refer to the process of detecting the presence of a face in real time and determining its motion trajectory and size change in an input sequence of image frames (for simplicity of description, image frames are abbreviated herein as "frames"). Multi-face tracking is a tracking process for multiple faces. One implementation of the face tracking technique is described below in connection with specific examples.

Assuming that a face is detected in frame 1, a corresponding identification may be assigned to the face, which identification is referred to herein as Track ID for ease of description. Next, in the next frame (referred to herein as frame 2) of frame 1, face detection of a smaller network is performed in the vicinity of the face region detected by frame 1. If the face is successfully detected, the face detected in frame 2 is considered to be the same face as the face detected in frame 1, and thus the face detected in frame 2 also has the same Track ID. At this time, it can be considered that the face in frame 1 is tracked in frame 2.

If no face is detected in the vicinity of the face area, full-image review is performed in frame 2 using a larger network. The face detected in frame 2 at this time will be given a new Track ID. At this time, it can be considered that the face in frame 1 is not tracked in frame 2.

Similar principles can also be employed for tracking multiple faces.

In the multi-face tracking technology, face detection is required. Face detection may be implemented by various applicable algorithms, such as algorithms based on deep learning and key point localization.

The technical scheme will be described in detail below in connection with embodiments of the present invention.

As shown in fig. 1, in step 110, a current frame is received.

The current frame may be one of a sequence of frames that may be continuously taken by the camera without user interaction.

In step 120, face detection is performed on the current frame.

In step 130, if m faces are detected in the current frame, the largest face among the m faces is determined, where m is a positive integer.

In step 140, a current target face is determined based on the largest face so that an image of a final target face is output from a face template library at the end of the frame sequence, wherein the final target face may be determined by considering the target faces of the frames of the frame sequence, and the face template library may be used to store images of different faces detected from the frames.

In the embodiment of the invention, under the condition that user interaction is not performed, the camera continuously shoots the frame sequence, face detection is performed on each frame in the frame sequence, the target face is determined, and the final target face is determined by considering the target faces of the frames, so that the image corresponding to the final target face is output from the face template library when the frame sequence is finished, and therefore, the face image acquisition can be completed without interference to a user and without user cooperation, and the user experience can be improved.

In addition, by obtaining an image of the final target face based on a sequence of frames continuously photographed, a face image with better quality can be selected in a relatively long time, so that the effect of subsequent application based on the face image can be improved.

In the embodiment of the invention, the frame sequence can be obtained by continuously shooting by the camera, namely, a series of image frames are obtained. For example, in a "swipe" payment scenario, a camera may be started to capture an image frame upon detecting the start of scanning merchandise on the terminal for payment, and the capturing may be ended upon detecting the confirmation of payment on the terminal for payment. The photographing may be continued during the period from the start of scanning the goods to the confirmation of payment, thereby obtaining a frame sequence comprising one or more frames. When shooting is ended, the frame sequence ends accordingly. In the embodiment of the invention, the camera can be a three-dimensional camera or other cameras with similar functions.

As each frame (e.g., referred to herein as the current frame) is obtained, face detection may be performed for that frame. For example, a region of interest (Region of Interest, ROI) in which face detection is performed may be set in advance. The region of interest may be set according to actual needs or by methods known in the art. Furthermore, in some scenarios, many faces may appear in the region of interest. At this time, not all faces may need to be taken into account. In this way, a preliminary screening may be performed based on a certain condition, such as determining a detected face based on a distance between the face and the camera. For example, the m faces detected in the current frame may be faces having a distance from the camera of less than 700 mm. After faces are detected, the largest of these faces may be determined. The maximum face may refer to a face having the largest area in the current frame.

Thereafter, a target face may be determined based on the maximum face. The target face may refer to a face intended to be acquired. Accordingly, the current target face may be a face that is intended to be collected by the pointer for the current frame.

In one embodiment, in step 140, the current target face may be determined in conjunction with whether the current frame is the first frame in the sequence of frames.

For example, if the current frame is the first frame in a sequence of frames, the maximum face may be determined to be the current target face.

If the current frame is not the first frame in the frame sequence, such as the middle frame or the last frame, the current target face may be determined based on the previous target face and the maximum face described above. The previous target face may refer to a target face of a frame previous to the current frame. It can be seen that, in this embodiment, the current target face is determined by combining the target face of the previous frame and the maximum face of the current frame, so that frequent switching of the faces to be collected can be avoided, and temporary interference can be avoided.

For example, in a "face-brushing" payment scenario, when an image is acquired during a commodity scanning process, a person changing (such as exchanging positions between users) during a commodity scanning process, a payment process, a commodity taking process, etc., a face queued later is acquired, etc., so that by determining a current target face by combining a target face of a previous frame and a maximum face of a current frame, such interference can be avoided, and thus, the target face currently intended to be acquired can be accurately determined.

In one embodiment, there may be a variety of situations with respect to determining the current target face based on the previous target face and the maximum face.

For example, if the m faces detected in the current frame do not include the previous target face, i.e., the previous target face is not tracked in the current frame, the maximum face determined for the current frame may be determined as the current target face.

If the m faces include a previous target face and the previous target face is the largest face of the current frame, the largest face may be determined to be the current target face. For example, if a previous target face is still present in the current frame and is the largest face, then the largest face may be considered to be the current target face.

If the m faces include the previous target face, but the previous target face is not the largest face of the current frame, the current target face may be determined based on the state information of the largest face and the previous target face in the current frame. Such status information may include, for example, information of location, physical distance from the camera, etc.

In the present embodiment, whether the m faces include the previous target face may be determined by comparing the Track ID of the previous target face with the Track IDs of the m faces, respectively. For example, if the Track ID of the previous target face is found among the m Track IDs of the m faces, the m faces are considered to include the previous target face, otherwise not included.

Whether the previous target face is the maximum face may be determined by comparing the Track ID of the previous target face with the Track ID of the maximum face of the current frame. For example, if the Track IDs of both are the same, then the previous target face is considered to be the largest face, otherwise not.

It can be seen that by determining the current target face by combining the previous target face of the previous frame and the detected face in the current frame, some temporary disturbances as described above can be avoided, so that the target face currently intended to be acquired is correctly determined.

In another embodiment, the previous target face is determined to be the current target face if the first distance is greater than the second distance, wherein the first distance represents a physical distance between the largest face of the current frame and the camera and the second distance represents a physical distance between the previous target face and the camera. For example, if the physical distance between the maximum face and the camera is greater than the physical distance between the previous target face and the camera for the current frame, i.e., the previous target face is closer to the camera, the current target face may be considered to be still the previous target face.

If the first distance is less than the second distance and the third distance is less than the predetermined value, the current target face is determined based on the distances of the maximum face and the previous target face each from the center of the imaging area of the camera. The third distance may represent a physical distance between the previous target face and the maximum face. For example, if the largest face is closer to the camera, but the distance from the previous target face is also very close, then the current target face may be determined based on which of the two is closer to the center of the imaging region. The predetermined value mentioned here may be empirically preset in connection with actual situations. For example, the predetermined value may be 80mm.

If the third distance is greater than the predetermined value, the maximum face may be determined as the current target face. For example, if the physical distance between the previous target face and the maximum face is large, the maximum face is considered to be the current target face.

For example, the third distance may be obtained by a difference between the second distance and the first distance.

The first distance, the second distance and the third distance can be determined according to information acquired by the camera. For example, the physical distances may be determined by position information of the faces acquired by the three-dimensional camera.

In another embodiment, the maximum face is determined to be the current target face if the maximum face is closer to the center of the imaging region than the previous target face. If the previous target face is closer to the center of the imaging region than the maximum face, the previous target face is determined to be the current target face.

It can be seen that by determining the current target face based on the comparison between the previous target face of the previous frame and the maximum face of the current frame, some unnecessary interference, such as user position movement during acquisition, etc., can be avoided, so that the target face currently intended to be acquired can be correctly determined.

As a specific implementation, each face may have a corresponding Track ID, so in determining the current target face, the Track ID of the current target face is actually determined based on the Track ID of the previous target face and the Track ID of the largest face.

As can be seen from the above procedure, the target face of each frame is actually determined based on the maximum face of the frame and the target face of the frame immediately preceding the frame, except for the first frame, so that the final target face is determined based on the maximum face of the last frame and the target face of the frame immediately preceding the last frame, after the frame-by-frame processing.

In the embodiment of the invention, in order to output the image of the final target face, a face template library for storing the face image needs to be maintained in the whole acquisition process. In an embodiment of the invention, a face template library may be used to store images of different faces detected in individual frames of a sequence of frames.

In one embodiment, if m faces are detected in the current frame, it may be determined whether to update the face template library with m current images of the m faces in the current frame.

Specifically, if the current frame is the first frame in the frame sequence, m current images may be stored in the face template library.

For example, m current images and the Track IDs of the respective m faces represented by the m current images may be stored in the face template library, and the quality information of the respective m current images may also be stored.

If the current frame is not the first frame in the frame sequence, it may be retrieved in a face template library if there are already m existing images corresponding to the m faces. It may then be determined whether to update the face template library based on the search result. For example, each Track ID stored in the face template library may be matched with the Track ID of m faces to determine whether existing images of those faces already exist in the face template library.

For example, assuming that an existing image of a first face of m faces already exists in the face template library, a current image of the first face may be compared in quality with the existing image of the first face, and whether to update the face template library may be determined based on the quality comparison result.

That is, for faces for which images already exist in the face template library, the current image and the existing image may be compared in quality. If the quality of the current image is better, e.g., the quality of the current image is higher than the quality of the existing image by a certain proportion (e.g., 30%), the existing image in the face template library may be replaced with the current image. If the quality of the existing image is better, the existing image may not be replaced.

Therefore, the face image with the best quality can be stored in the face template library, so that the output final target face image is also the best quality, and the subsequent application effect based on the face image can be improved.

For another example, assuming that there is no existing image of a second face of the m faces in the face template library, and that at least one specific face recorded in the face template library is not included in the m faces, that is, the at least one specific face is not tracked in the current frame, that is, tracking of the at least one specific face is lost (for example, tracking loss may be caused by occurrence of a low head or occlusion, etc.), and the second face is a newly detected face in the current frame, the second face may be compared with the at least one specific face, and whether to update the face template library is determined based on the comparison result.

If the second face is successfully compared with a fourth face in the at least one specific face, the second face and the fourth face can be considered to be the same face. For example, the comparison may be performed according to the feature information of the second face and the feature information of at least one specific face. This may be accomplished using any suitable algorithm in the art. If the alignment score is above a certain predetermined value, the alignment may be considered successful. The predetermined value may be empirically set in connection with actual situations.

And under the condition that the comparison of the second face and the fourth face is successful, comparing the quality of the current image of the second face with the quality of the existing image of the fourth face. If the quality of the current image of the second face is better, the existing image of the fourth face in the face template library can be replaced by the current image of the second face.

It can be appreciated that since the second face is a newly detected face in the current frame, the second face has a Track ID different from the Track ID of the fourth face. Then, in the case that the comparison of the second face and the fourth face is successful and the quality of the current image of the second face is better, the existing image of the fourth face in the face template library may be replaced with the current image of the second face, and the Track ID of the second face may be modified to the Track ID of the fourth face.

If the comparison of the second face and at least one specific face fails, the second face is considered to be a newly added face and is not a lost face, so that the current image of the second face can be stored in a face template library.

Further, if a newly added third face is included in the m faces as compared to the previous face, the current image of the third face may be stored in the face template library.

Therefore, by using the maintenance mode of the face template library, the interference in the face tracking process can be avoided, so that all the face images possibly appearing in the acquisition stage can be correctly maintained, and the quality of the face images in the face template library can be ensured to be optimal.

As described above, based on the maximum face of the last frame and the target face of the last frame, the final target face can be determined. For example, the Track ID of the final target face may be determined. Thus, at the end of the frame sequence, the face image corresponding to the Track ID, namely the image of the final target face, can be searched from the face template library based on the Track ID of the final target face.

In one embodiment, an image of the final target face may be presented on an associated display screen at the end of the frame sequence.

In another embodiment, track IDs of target faces of respective frames in the sequence of frames may be recorded, and images of respective target faces found from the face template library may be presented on an associated display screen, so that a target face image satisfying the requirements may be selected as a final target face image by the user.

Embodiments of the present invention will be described below with reference to specific examples. It should be understood that these examples are only intended to aid one skilled in the art in better understanding the embodiments of the present invention and are not intended to limit the scope of the embodiments of the present invention.

In the example of fig. 2A, it is assumed that the face template library currently stores images of 4 faces. Each face may have a corresponding Track ID, assuming 1, 2, 3 and 4, respectively. In addition, the face template library may store image quality information of the 4 faces. For example, as shown in fig. 2A, the image quality of a face with Track ID 1 is 30, the image quality of a face with Track ID 2 is 60, the image quality of a face with Track ID 3 is 25, and the image quality of a face with Track ID 4 is 55. For simplicity of illustration, the corresponding face image is not shown in fig. 2A.

A specific example of two cases of updating the face template library is shown in fig. 2A.

Case (1):

if the current image of a face in the current frame is higher than the existing image quality of a face with the same Track ID in the face template library by a certain proportion (for example, 30%), the image corresponding to the Track ID in the face template library can be updated, that is, the existing image is replaced by the current image.

For example, in fig. 2A, assuming that the quality of the current image of the face with Track ID 2 is 65, which is better than the quality 60 of the existing image in the face template library, the existing image is replaced with the current image.

Case (2):

if faces with Track IDs 2 and 4 (denoted as missing IDs in fig. 2A) are not tracked in the current frame, but a face with Track ID 5 (denoted as new ID in fig. 2A) is newly detected, the face with Track ID 5 can be compared with faces with

Track IDs

2 and 4, respectively. Assuming that the comparison between the face with Track ID 5 and the face with Track ID 4 is successful, for example, the comparison score is greater than 72, the quality of the current image of the face with Track ID 5 and the quality of the existing image of the face with Track ID 4 can be compared. If the quality of the current image of the face with the Track ID of 5 is better, the existing image of the face with the Track ID of 4 can be replaced by the current image of the face with the Track ID of 5, and the Track ID corresponding to the image is modified from 5 to 4.

Therefore, by means of the face template library maintenance method in the embodiment of the invention, all the face images possibly appearing in the acquisition stage can be correctly maintained.

In the example of fig. 2B, it is assumed that the maximum face of the current frame has been determined, and thus the flow of fig. 2B occurs after the maximum face of the current frame is determined.

In step 201B, it is determined whether the m faces detected in the current frame include a previous target face (i.e., a target face of a frame previous to the current frame).

If the m faces do not include a previous target face, then in step 207B, the largest face of the current frame is determined to be the current target face.

If the m faces include the previous target face, in step 202B, it is determined whether the previous target face is the maximum face.

If the previous target face is the largest face, then in step 207B, the largest face is determined to be the current target face.

If the previous maximum face is not the maximum face, a current target face is determined based on the first distance, the second distance, and/or the third distance. The first distance may represent a physical distance between the maximum face and the camera, the second distance may represent a physical distance between the previous target face and the camera, and the third distance may represent a physical distance between the previous target face and the maximum face.

In step 203B, it is determined whether the first distance is greater than the second distance. If the first distance is greater than the second distance, then in step 208B, the previous target face is determined to be the current target face.

In step 204B, it is determined whether the first distance is less than the second distance and the third distance is less than a predetermined value.

If the first distance is less than the second distance and the third distance is less than a predetermined value, a current target face is determined based on which of the maximum face and the previous target face is closer to the center of the imaging area of the camera. In step 206B, it is determined whether the largest face is closer to the center of the imaging region than the previous target face. If not, then in step 208B, the previous target face is determined to be the current target face. If so, in step 209B, the maximum face is determined as the current target face.

In step 205B, it is determined whether the third distance is greater than a predetermined value. If so, in step 209B, the current target face is determined from the maximum face.

For ease of understanding, the following description is provided in connection with the examples of FIGS. 2C-2F.

In the example of fig. 2C and 2D, it is assumed that m faces detected by the camera are 201C to 204C, and that the face 201C is the largest face in the current frame and the face 202C is the previous target face. As shown in fig. 2C and 2D, the first distance may represent the physical distance between the maximum face 201C and the camera, and the second distance may represent the physical distance between the previous target face 202C and the camera. The third distance may represent a physical distance between the previous target face 202C and the maximum face 201C. At this time, the first distance is smaller than the second distance. In addition, in the example of fig. 2C and 2D, assuming that the third distance is smaller than the predetermined value, it is necessary at this time to determine the current target face based on which of the maximum face and the previous target face is closer to the center of the imaging area of the camera.

For example, in the example of fig. 2C, assuming that the maximum face 201C is closer to the center of the imaging region, the maximum face 201C may be determined as the current target face. Whereas in the example of fig. 2D, the previous target face 202C is closer to the center of the imaging region, the previous target face 202C is determined to be the current target face.

In the example of fig. 2E, it is assumed that m faces detected by the camera are 201E to 204E, and that the face 201E is the previous target face and the face 202E is the largest face of the current frame. As shown in fig. 2E, the first distance may represent the physical distance between the maximum face 202E and the camera, and the second distance may represent the physical distance between the previous target face 201E and the camera. At this time, at

In the example of fig. 2E, where the first distance is greater than the second distance, the previous target face 201E may be determined to be the current target face.

In the example of fig. 2F, it is assumed that m faces detected by the camera are 201F to 204F, and that the face 201F is the largest face in the current frame and the face 202F is the previous target face. As shown in fig. 2F, the first distance may represent the physical distance between the maximum face 201F and the camera, and the second distance may represent the physical distance between the previous target face 202F and the camera. The third distance may represent a physical distance between the previous target face 202F and the maximum face 201F. In the example of fig. 2F, it is assumed that the third distance is greater than a predetermined value, at which time the maximum face 201F may be determined as the current target face.

It can be seen that, in this embodiment, the current target face is determined by combining the target face of the previous frame and the maximum face of the current frame, so that some interference in the acquisition process can be effectively avoided, and frequent switching of the target face is avoided.

As shown in fig. 3, the apparatus 300 may include a receiving unit 310, a detecting unit 320, and a face determining unit 330.

The receiving unit 310 receives the current frame. The current frame is one of a sequence of frames that is continuously captured by the camera without user interaction. The detection unit 320 performs face detection on the current frame. If the detection unit 320 detects m faces in the current frame, the face determination unit 330 determines the largest face among the m faces, where m is a positive integer. The face determination unit 330 also determines the current target face based on the maximum face so as to output an image of the final target face from the face template library at the end of the frame sequence. The final target face is determined by considering the target face of each frame of the sequence of frames and a face template library is used to store images of the different faces detected from each frame.

In one embodiment, if the current frame is the first frame in the frame sequence, the face determination unit 330 may determine the maximum face as the current target face. The face determination unit 330 may determine the current target face based on a previous target face and a maximum face if the current frame is not the first frame in the frame sequence, wherein the previous target face is the target face of the previous frame of the current frame.

In another embodiment, if the m faces do not include the previous target face, the face determination unit 330 may determine the maximum face as the current target face.

The face determination unit 330 may determine the maximum face as the current target face if the m faces include the previous target face and the previous target face is the maximum face.

The face determination unit 330 may determine the current target face based on state information of the maximum face and the previous target face in the current frame if the m faces include the previous target face and the previous target face is not the maximum face.

In another embodiment, the face determination unit 330 may determine the previous target face as the current target face if the first distance is greater than the second distance, wherein the first distance represents a physical distance between the maximum face and the camera and the second distance represents a physical distance between the previous target face and the camera.

The face determination unit 330 may determine the current target face based on the distances of the maximum face and the previous target face each from the center of the imaging region of the camera if the first distance is smaller than the second distance and the third distance is smaller than a predetermined value, wherein the third distance represents the physical distance between the previous target face and the maximum face.

The face determination unit 330 may determine the maximum face as the current target face if the third distance is greater than a predetermined value.

In another embodiment, if the maximum face is closer to the center of the imaging region than the previous target face, the face determination unit 330 may determine the maximum face as the current target face.

The face determination unit 330 may determine the previous target face as the current target face if the previous target face is closer to the center of the imaging region than the maximum face.

In another embodiment, the apparatus 300 may further include a template determination unit 340. The template determination unit 340 may determine whether to update the face template library with m current images of m faces in the current frame.

In another embodiment, if the current frame is the first frame in the frame sequence, the template determination unit 340 may store m current images into the face template library.

If the current frame is not the first frame in the frame sequence, the template determination unit 340 may retrieve whether m existing images corresponding to m faces already exist in the face template library, and determine whether to update the face template library with the m current images based on the retrieval result.

In another embodiment, if the m faces include the first face and there is already an existing image of the first face in the face template library, the template determination unit 340 may perform quality comparison of the current image of the first face with the existing image of the first face and determine whether to update the face template library based on the quality comparison result.

If the m faces include the second face but there is no existing image of the second face in the face template library and there is an existing image of at least one specific face in the face template library but the m faces do not include the at least one specific face, the template determination unit 340 may compare the second face with the at least one specific face and determine whether to update the face template library based on the comparison result.

The template determination unit 340 may store the current image of the third face into the face template library if the m faces include the third face and there already exist existing images of faces other than the third face among the m faces in the face template library.

In another embodiment, if the quality of the current image of the first face is better than the quality of the existing image of the first face, the template determination unit 340 may replace the existing image of the first face in the face template library with the current image of the first face.

The template determination unit 340 may not replace the existing image of the first face in the face template library with the current image of the first face if the quality of the existing image of the first face is better than the quality of the current image of the first face.

In another embodiment, if the second face is successfully compared with the fourth face of the at least one specific face and the quality of the current image of the second face is better than the quality of the existing image of the fourth face, the template determination unit 340 may replace the existing image of the fourth face in the face template library with the current image of the second face.

The template determination unit 340 may not replace the existing image of the fourth face in the face template library with the current image of the second face if the comparison of the second face with the fourth face is successful and the quality of the existing image of the fourth face is better than the quality of the current image of the second face.

The template determination unit 340 may store the current image of the second face in the face template library if the second face fails to be aligned with at least one specific face.

The respective units of the apparatus 300 may perform the corresponding steps in the method embodiments of fig. 1 to 2B, and thus, for brevity of description, specific operations and functions of the respective units of the apparatus 300 are not described herein.

The apparatus 300 may be implemented in hardware, software, or a combination of hardware and software. For example, when implemented in software, apparatus 300 may be formed by a processor of a device in which it resides, reading corresponding executable instructions in a memory (e.g., non-volatile memory) into memory for execution.

As shown in fig. 4, computing device 400 may include at least one processor 410, a memory 420, a memory 430, and a communication interface 440, and at least one processor 410, memory 420, memory 430, and communication interface 440 are connected together via a bus 450. The at least one processor 410 executes at least one executable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory 420.

In one embodiment, the executable instructions stored in memory 420, when executed by at least one processor 410, cause computing device 400 to implement the various operations and functions described above in connection with fig. 1-3. For brevity of description, a detailed description is omitted herein.

Computing device 400 may be implemented in any suitable form known in the art including, for example, but not limited to, a desktop computer, a laptop computer, a smart phone, a tablet computer, a consumer electronic device, a wearable smart device, and the like.

The embodiment of the invention also provides a machine-readable storage medium. The machine-readable storage medium may store executable instructions that, when executed by a machine, cause the machine to perform the specific processes of the method embodiments described above with reference to fig. 1-2B.

For example, machine-readable storage media may include, but are not limited to, random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), electrically erasable programmable Read-Only Memory (EEPROM), static random access Memory (Static Random Access Memory, SRAM), hard disk, flash Memory, and the like.

It is to be understood that not all steps or elements of the above-described processes and apparatus structures are required, and that some steps or elements may be omitted according to actual needs. The order of execution of the steps is not fixed and may be determined as desired. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities respectively, or may be implemented jointly by some components in multiple independent devices.

The previous description is provided to enable any person skilled in the art to make or use embodiments of the present invention. Various modifications to the embodiments of the invention will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for acquiring a face image, comprising:

receiving a current frame, wherein the current frame is one frame in a frame sequence, and the frame sequence is continuously shot through a camera under the condition of no user interaction;

performing face detection on the current frame;

if m faces are detected in the current frame, determining the largest face in the m faces, wherein m is a positive integer;

determining a current target face for the current frame based on the maximum face and based on whether the current frame is a first frame in the sequence of frames, so as to output an image of a final target face from a face template library at the end of the sequence of frames, wherein the final target face is determined by considering the target faces of the frames of the sequence of frames in a frame-by-frame manner, the final target face being the target face determined for the last frame in the sequence of frames, the face template library being used to store images of different faces detected from the frames;

wherein the determining the current target face includes:

if the current frame is the first frame in the frame sequence, determining the maximum face as the current target face;

If the current frame is not the first frame in the sequence of frames, determining the current target face based on a previous target face and the maximum face according to at least one of the following factors, wherein the previous target face is the target face of the previous frame to the current frame: whether the m faces include the previous target face, whether the previous target face is the maximum face, and state information of the previous target face in the current frame.

2. The method of claim 1, wherein the determining the current target face based on the previous target face and the maximum face comprises:

if the m faces do not include the previous target face, determining the maximum face as the current target face;

determining the maximum face as the current target face if the m faces include the previous target face and the previous target face is the maximum face;

if the m faces include the previous target face and the previous target face is not the maximum face, determining the current target face based on state information of the maximum face and the previous target face in the current frame.

3. The method of claim 2, wherein the determining the current target face based on the state information of the maximum face and the previous target face in the current frame comprises:

determining the previous target face as the current target face if a first distance is greater than a second distance, wherein the first distance represents a physical distance between the maximum face and the camera and the second distance represents a physical distance between the previous target face and the camera;

determining the current target face based on the distance of each of the maximum face and the previous target face from the center of the imaging area of the camera if the first distance is less than the second distance and a third distance is less than a predetermined value, wherein the third distance represents a physical distance between the previous target face and the maximum face;

and if the third distance is larger than the preset value, determining the maximum face as the current target face.

4. A method according to claim 3, wherein said determining the current target face based on the distance of each of the maximum face and the previous target face from the center of the imaging area of the camera comprises:

Determining the maximum face as the current target face if the maximum face is closer to the center of the imaging region than the previous target face;

and if the previous target face is closer to the center of the imaging area than the maximum face, determining the previous target face as the current target face.

5. The method according to any one of claims 1 to 4, further comprising:

and determining whether to update the face template library by using m current images of the m faces in the current frame.

6. The method of claim 5, wherein the determining whether to update the face template library with m current images of the m faces in the current frame comprises:

if the current frame is the first frame in the frame sequence, storing the m current images into the face template library;

if the current frame is not the first frame in the frame sequence, retrieving, at the face template library, whether there have been m existing images corresponding to the m faces, and determining whether to update the face template library with the m current images based on the retrieval result.

7. The method of claim 6, wherein the determining whether to update the face template library with the m current images based on the search results comprises:

if the m faces include a first face and an existing image of the first face already exists in the face template library, comparing a current image of the first face with the existing image of the first face in quality, and determining whether to update the face template library based on a quality comparison result;

if the m faces include a second face but no existing image of the second face exists in the face template library and an existing image of the at least one specific face exists in the face template library, but the m faces do not include the at least one specific face, comparing the second face with the at least one specific face, and determining whether to update the face template library based on the comparison result;

if the m faces include a third face and there is already an existing image of a face other than the third face in the m faces in the face template library, storing a current image of the third face in the face template library.

8. The method of claim 7, wherein the determining whether to update the face template library based on the quality comparison result comprises:

if the quality of the current image of the first face is better than that of the existing image of the first face, replacing the existing image of the first face in the face template library with the current image of the first face;

and if the quality of the existing image of the first face is better than that of the current image of the first face, not replacing the existing image of the first face in the face template library with the current image of the first face.

9. The method of claim 7, wherein the determining whether to update the face template library based on the comparison result comprises:

if the second face is successfully compared with a fourth face in the at least one specific face and the quality of the current image of the second face is better than that of the existing image of the fourth face, replacing the existing image of the fourth face in the face template library with the current image of the second face;

if the comparison of the second face and the fourth face is successful and the quality of the existing image of the fourth face is better than that of the current image of the second face, the existing image of the fourth face in the face template library is not replaced by the current image of the second face;

And if the comparison of the second face and the at least one specific face fails, storing the current image of the second face into the face template library.

10. An apparatus for acquiring a face image, comprising:

a receiving unit, configured to receive a current frame, where the current frame is one frame in a frame sequence, and the frame sequence is continuously shot by a camera without user interaction;

the detection unit is used for carrying out face detection on the current frame;

a face determining unit configured to: if the detection unit detects m faces in the current frame, determining the largest face in the m faces, wherein m is a positive integer;

the face determining unit is further configured to: determining a current target face based on the maximum face and based on whether the current frame is a first frame in the sequence of frames, so as to output an image of a final target face from a face template library at the end of the sequence of frames, wherein the final target face is determined by considering the target faces of the frames of the sequence of frames in a frame-by-frame manner, the final target face is a target face determined for a last frame in the sequence of frames, the face template library is used for storing images of different faces detected from the frames;

The face determining unit is specifically configured to, when determining a current target face:

11. The apparatus according to claim 10, wherein the face determination unit is configured, when determining the current target face based on a previous target face and the maximum face, to:

12. The apparatus according to claim 11, wherein the face determination unit is configured, when determining the current target face based on the state information of the maximum face and the previous target face in the current frame, to:

13. The apparatus according to claim 12, wherein the face determination unit is configured, when determining the current target face based on the distances of the maximum face and the previous target face from the center of the imaging area of the camera, to:

14. The apparatus according to any one of claims 10 to 13, further comprising:

and the template determining unit is used for determining whether to update the face template library by using m current images of the m faces in the current frame.

15. The apparatus according to claim 14, wherein the template determining unit, when determining whether to update the face template library with m current images of the m faces in the current frame, is specifically configured to:

16. The apparatus according to claim 15, wherein the template determining unit, when determining whether to update the face template library with the m current images based on the search result, is specifically configured to:

17. The apparatus according to claim 16, wherein the template determining unit is configured to, when determining whether to update the face template library based on the quality comparison result:

18. The apparatus according to claim 16, wherein the template determining unit is configured to, when determining whether to update the face template library based on the comparison result:

19. A computing device, comprising:

at least one processor;

a memory in communication with the at least one processor, having stored thereon executable instructions that when executed by the at least one processor cause the at least one processor to implement the method of any of claims 1 to 9.

20. A machine-readable storage medium storing executable instructions that when executed by a machine cause the machine to perform the method of any one of claims 1 to 9.