CN111738193A

CN111738193A - Face snapshot method and face snapshot system

Info

Publication number: CN111738193A
Application number: CN202010610528.6A
Authority: CN
Inventors: 徐朋飞; 唐剑
Original assignee: Hunan Goke Microelectronics Co Ltd
Current assignee: Hunan Goke Microelectronics Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-02

Abstract

The invention provides a face snapshot method and a face snapshot system, and relates to the technical field of computer vision. The method comprises the steps of preprocessing an acquired image to obtain an image to be detected, performing face detection on the image to be detected according to a pre-trained face detection model to obtain a face area image and coordinate information of a face area in the face area image, wherein the face detection model comprises a plurality of layers of neural networks, the number of output channels arranged in each convolution layer of each layer of neural network and the number of groups corresponding to grouping convolution meet preset conditions, performing face quality evaluation on the face area in the face area image to obtain quality evaluation information of the face area, performing target tracking according to the coordinate information of the face area, adding the face area belonging to the same target into a queue corresponding to the target, and storing the face area of which the quality evaluation information is greater than a set threshold value in the queue. The detection speed can be effectively improved, the detection precision can be considered, and a good detection effect is realized.

Description

Face snapshot method and face snapshot system

Technical Field

The invention relates to the technical field of computer vision, in particular to a face snapshot method and a face snapshot system.

Background

The face snapshot system can be used as a front-end processing stage implemented by other technologies, such as a face recognition related scheme and a public place counting scheme based on face snapshot, and the two schemes need to capture the same face sequence in the later recognition and counting processes. Therefore, the precision and speed of the face snapshot system directly affect the processing effect of the next stage.

In the current face snapshot system based on edge calculation, the timeliness requirements on detection, tracking and evaluation algorithms are high, missing detection is easy to occur for the far face snapshot, and fluctuation of a tracking module can be caused. In order to solve the problem of missing detection of a small face in face snapshot, two common methods are available at present: the first is to change the acquisition device to make the resolution of the small face in the far distance higher in the picture, such as using a long-focus camera; the second is to increase the resolution of the algorithm input and perform the detection on the high resolution image. However, the first method requires a dedicated acquisition device, and the coupling of the algorithm to the acquisition device is high; the second method causes the times of floating-point operations per second (floating-point operations per second) in CNN (Convolutional Neural Network) to increase, which has high requirement for computational power and results in slow computation speed.

Disclosure of Invention

In view of this, the present invention aims to provide a face snapshot method and a face snapshot system, which not only can effectively improve the detection speed, but also can consider the detection precision at the same time, thereby achieving a better detection effect.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a face snapshot method, where the method includes:

preprocessing the acquired image to obtain an image to be detected;

carrying out face detection on the image to be detected according to a pre-trained face detection model to obtain a face area image and coordinate information of a face area in the face area image; the face detection model comprises a plurality of layers of neural networks, and the number of output channels arranged in each convolution layer of each layer of neural network and the corresponding grouping number of the convolution layers when the convolution layers execute grouping convolution operation meet preset conditions;

performing face quality evaluation on a face region in the face region image to obtain quality evaluation information of the face region;

tracking a target according to the coordinate information of the face area, and adding the face areas belonging to the same target into a queue corresponding to the target;

and storing the face area of which the quality evaluation information is greater than a set threshold value in the queue.

In an optional embodiment, the multilayer neural network includes a first neural network, a second neural network, and a third neural network, where a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the first neural network is a first preset value, a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the second neural network is a second preset value, and a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the third neural network is a third preset value.

In an optional embodiment, the step of performing face quality evaluation on the face region in the face region map to obtain quality evaluation information of the face region includes:

performing face quality evaluation on a face region in the face region image according to a pre-trained face quality evaluation model to obtain quality evaluation information of the face region; the human face quality evaluation model is obtained by training by using a pre-established human face quality evaluation standard.

In an optional embodiment, the step of tracking the target according to the coordinate information of the face area and adding the face areas belonging to the same target into the queue corresponding to the target includes:

filtering a historical face region detected in a previous frame of image of the image to be detected by using a filter to predict coordinate information of the historical face region appearing in the image to be detected so as to obtain predicted coordinate information;

judging whether the predicted coordinate information is matched with the coordinate information of the face area;

and if so, adding the face area into a queue where the historical face area is located.

In an optional embodiment, the step of determining whether the predicted coordinate information matches the coordinate information of the face region includes:

calculating an intersection ratio according to the predicted coordinate information and the coordinate information of the face area;

and if the intersection ratio is larger than a preset value, judging that the predicted coordinate information is matched with the coordinate information of the face area.

In a second aspect, an embodiment of the present invention provides a face snapshot system, including an image preprocessing module, a neural network accelerator, and a central processing unit, where the image preprocessing module is electrically connected to the neural network accelerator, and the neural network accelerator is electrically connected to the central processing unit;

the image preprocessing module is used for preprocessing the acquired image to obtain an image to be detected and inputting the image to be detected into the neural network accelerator;

the neural network accelerator is used for carrying out face detection on the image to be detected according to a pre-trained face detection model to obtain a face area image and coordinate information of a face area in the face area image; the face detection model comprises a plurality of layers of neural networks, and the number of output channels arranged in each convolution layer of each layer of neural network and the corresponding grouping number of the convolution layers when the convolution layers execute grouping convolution operation meet preset conditions;

the neural network accelerator is also used for carrying out face quality evaluation on the face area in the face area image to obtain quality evaluation information of the face area, and inputting the coordinate information of the face area and the quality evaluation information of the face area into the central processing unit;

the central processing unit is used for tracking a target according to the coordinate information of the face area, adding the face areas belonging to the same target into a queue corresponding to the target, and storing the face areas with the quality evaluation information larger than a set threshold value in the queue.

In an optional embodiment, the neural network accelerator is configured to perform face quality estimation on a face region in the face region map according to a pre-trained face quality estimation model to obtain quality estimation information of the face region; the human face quality evaluation model is obtained by training by using a pre-established human face quality evaluation standard.

In an optional implementation manner, the central processing unit is configured to perform filtering processing on a historical face region detected in a previous frame of image of the image to be detected by using a filter, so as to predict coordinate information of the historical face region appearing in the image to be detected, further obtain predicted coordinate information, and add the face region to a queue where the historical face region is located when it is determined that the predicted coordinate information matches the coordinate information of the face region.

In an optional embodiment, the central processing unit is configured to calculate an intersection ratio according to the predicted coordinate information and the coordinate information of the face region, and when the intersection ratio is greater than a preset value, determine that the predicted coordinate information matches the coordinate information of the face region.

In the face snapshot method and the face snapshot system provided by the embodiment of the invention, the image to be detected is obtained by preprocessing the acquired image, then the face detection is carried out on the image to be detected according to the pre-trained face detection model, the face area image and the coordinate information of the face area in the face area image are obtained, the face detection model comprises a plurality of layers of neural networks, and the number of output channels arranged in each convolution layer of each layer of neural network and the number of corresponding groups of the convolution layers when the convolution layers execute the grouping convolution operation meet the preset conditions. After obtaining the face area image and the coordinate information of the face area, carrying out face quality evaluation on the face area in the face area image to obtain the quality evaluation information of the face area, carrying out target tracking according to the coordinate information of the face area, adding the face areas belonging to the same target into a queue corresponding to the target, and finally storing the face areas of which the quality evaluation information is greater than a set threshold value in the queue. The number of output channels in each convolution layer of each layer of the neural network of the face detection model and the corresponding grouping number of the convolution layer when the convolution layer executes the grouping convolution operation meet the preset condition, so that the different convolution layers can both find the appropriate grouping number when the grouping convolution is carried out, the detection speed can be effectively improved, the detection precision can be considered, and the better detection effect is realized. In addition, the face snapshot is used as a front-end processing stage implemented by a plurality of other technologies, and the face areas with the quality evaluation information larger than the set threshold value are screened out for storage, so that the storage space and the broadband transmission quantity can be effectively reduced, and the application and the processing of the subsequent stage are facilitated.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flow chart illustrating a face snapshot method according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating the sub-steps of step S104 in FIG. 1;

fig. 3 shows a block diagram of a face snapshot system according to an embodiment of the present invention.

Icon: 200-a face snapshot system; 210-an image pre-processing module; 220-neural network accelerator; 230-central processing unit.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the process of implementing the technical scheme of the embodiment of the invention, the inventor researches and discovers that in the solution for small face missed detection in the prior art, although the mode of increasing the resolution can improve the recall rate and the detection effect of the small face is better, the algorithm flops can be increased steeply and the calculation speed is reduced. In order to guarantee the speed, a back bone (neural network model) needs to be redesigned, and the convolution calculation is reduced. If the convolution operation of depthwiseconv is used instead of the conventional method, such as a mobilene network, a large number of convolutions of 1 × 1 are caused, and too many packet convolutions cause a large number of DMA (Direct memory access) data transportation, so that the utilization rate of a MAC (multi-access) module is reduced; but the removal of the packet convolution results in a significant increase in the computational load of the network.

In addition, in the current face quality evaluation scheme, there are two general ways, one is a traditional face quality evaluation algorithm and one is based on CNN. The former is less robust than the latter, and most of the current methods are face quality evaluation algorithms based on CNN, but the algorithm effect is general due to lack of public face quality evaluation standards (benchmark).

Based on the research on the defects, the embodiment of the invention provides a face snapshot method and a face snapshot system, aiming at the problem of missing detection of the small face, the number of output channels in each convolution layer of each layer of neural network of a face detection model and the corresponding grouping number of the convolution layer when the convolution layer executes the grouping convolution operation are set to accord with the preset conditions, so that the different convolution layers can find the appropriate grouping number when in grouping convolution, the detection speed can be effectively improved, the detection precision can be considered, and the better detection effect is realized. Aiming at the problem that the model effect is poor due to the fact that the face quality evaluation is not disclosed, a model with better robustness can be obtained for the face quality evaluation by pre-establishing a benchmark and then training based on the regression CNN. The following describes the face snapshot method and the face snapshot system provided in the embodiment of the present invention in detail.

Fig. 1 is a schematic flow chart of a face snapshot method according to an embodiment of the present invention. It should be noted that the face snapshot method provided in the embodiment of the present invention is not limited by fig. 1 and the following specific sequence, and it should be understood that, in other embodiments, the sequence of some steps in the face snapshot method provided in the embodiment of the present invention may be interchanged according to actual needs, or some steps in the face snapshot method may also be omitted or deleted. The specific process shown in FIG. 1 will be described in detail below.

And S101, preprocessing the acquired image to obtain an image to be detected.

In this embodiment, the image may be acquired by using a camera, and after the image acquired by the camera is acquired, normalization processing is performed on each frame of image, so that the acquired image to be detected is substantially consistent with the training set data.

Step S102, carrying out face detection on an image to be detected according to a pre-trained face detection model to obtain a face area image and coordinate information of a face area in the face area image; the face detection model comprises a plurality of layers of neural networks, and the number of output channels arranged in each convolution layer of each layer of neural network and the corresponding grouping number of the convolution layers when the convolution layers execute grouping convolution operation meet preset conditions.

In this embodiment, when the face detection model performs face detection on an image to be detected, a plurality of candidate frames are generated on the image to be detected, each candidate frame has a score for indicating the probability that the candidate frame is a face, and the candidate frame with the probability greater than a certain threshold is screened out as a face region in this embodiment, so as to obtain a face region map and coordinate information of the face region in the face region map.

In this embodiment, the face detection model is established based on the CNN network, because too many packet convolutions will bring a large amount of DMA data handling, thereby reducing the utilization rate of the MAC module, and removing the packet convolutions, which will lead to a large increase in the calculated amount of the network, so that in this embodiment, it is possible to set an appropriate packet number (group) for the convolutional layer of each layer of the neural network in the face detection model through a plurality of experiments, so that the number of output channels (channels) and the number of packets set in each convolutional layer conform to preset conditions, thereby effectively improving the utilization rate of the MAC module, and achieving a balance (tradeoff) between the detection speed and the precision.

As an embodiment, the multi-layer neural network includes a first neural network, a second neural network, and a third neural network, where a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the first neural network is a first preset value, a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the second neural network is a second preset value, and a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the third neural network is a third preset value.

It can be understood that the first preset value, the second preset value and the third preset value are obtained through multiple experiments, and the face detection model is set according to the first preset value, the second preset value and the third preset value, so that the calculated amount can be reduced, the detection precision can be improved, and the tradeoff of the detection speed and the detection precision can be realized.

In one example, the three layers of neural networks are stage1 (first neural network), stage2 (second neural network), and stage3 (third neural network), and the featuremap output by stage1 is 8 times of downsampled value, that is, 1/8 of the input image size (size); the size of the featuremap output by stage2 is 1/16 of the input image; the featuremap output by stage3 is sized to be an input image 1/32; stage1 is composed of 6 convolutional layers, the ratio of the number of channels and the group corresponding to each convolutional layer is 4 (first preset value), stage2 is also composed of 6 convolutional layers, the ratio of the number of channels and the group corresponding to each convolutional layer is 8 (second preset value), stage3 is composed of 2 convolutional layers, and the ratio of the number of channels and the group corresponding to each convolutional layer is 32.

Step S103, carrying out face quality evaluation on the face area in the face area image to obtain the quality evaluation information of the face area.

In this embodiment, after the face region is detected, the face region needs to be subjected to quality evaluation, that is, the detected face region is scored, so as to obtain quality evaluation information, and according to the quality evaluation information, whether the quality of the face region is good or not, whether the face region needs to be stored or not, and whether the next stage of processing can be performed or not can be evaluated. For example, if a blurred face or a face with a large inclination angle is scored with a low score, it can be determined that the face is not suitable for storage or for the next stage of processing.

In this embodiment, the quality evaluation information may be understood as a score obtained by the face area, and a value range of the score may be set according to an actual requirement, for example, between 0 and 5.

Optionally, step S103 specifically includes: carrying out face quality evaluation on a face area in a face area image according to a pre-trained face quality evaluation model to obtain quality evaluation information of the face area; the human face quality evaluation model is obtained by training by utilizing a pre-established human face quality evaluation standard.

In this embodiment, the face quality evaluation model is obtained by training based on regression CNN by using pre-established benchmarks, and the face quality evaluation model has better robustness when performing face quality evaluation, and can output a more accurate high-quality face for later face recognition or other purposes, thereby effectively solving the problem of poor model effect caused by the fact that no benchmarks are disclosed in the face quality evaluation in the prior art.

And step S104, tracking the target according to the coordinate information of the face area, and adding the face areas belonging to the same target into a queue corresponding to the target.

In this embodiment, a scheme of predicting by using a filter method and matching with a detection result of the face detection model may be used to achieve a tracking effect, and compared with a conventional tracking algorithm such as an optical flow method and a KCF (Kernel correlation filter), a tracking speed may be effectively increased.

Alternatively, as shown in fig. 2, the step S104 may include the following sub-steps:

and a substep S1041 of filtering the historical face region detected in the previous frame of image of the image to be detected by using a filter to predict the coordinate information of the historical face region appearing in the image to be detected, so as to obtain predicted coordinate information.

In this embodiment, a kalman filter may be used to predict a previously detected historical face region, so as to obtain predicted coordinate information.

Step S1042, determine whether the predicted coordinate information matches the coordinate information of the face region.

In this embodiment, the intersection ratio may be calculated according to the predicted coordinate information and the coordinate information of the face region, and if the intersection ratio is greater than a preset value, it is determined that the predicted coordinate information matches the coordinate information of the face region.

The Intersection Over Union (IOU) refers to a ratio between an Intersection and a Union of a rectangular frame corresponding to the predicted coordinate information and a rectangular frame corresponding to the coordinate information of the face region. When the intersection-to-parallel ratio is calculated, the intersection of the two rectangular frames can be calculated, and then the intersection is subtracted by the sum of the areas of the two rectangular frames to obtain a union. And when the calculated intersection ratio is larger than a preset value, judging that the matching is successful.

And in the substep S1043, if the matching is performed, adding the face region into a queue where the historical face region is located.

In this embodiment, the face regions in the same queue have the same ID, and when the coordinate information of the face region is successfully matched with some predicted coordinate information, the face region may be added to the queue where the historical face region corresponding to the predicted coordinate information is located; and when the coordinate information of the face area is not successfully matched with all the predicted coordinate information, the face area is different from the face areas in all the current queues, and the face area corresponds to a new ID. Therefore, the same ID of the face of the same target in the frame sequence can be ensured, for example, the ID corresponding to the face region of the third frame in the previous frame image is 3, and after the tracking calculation, the ID corresponding to the face region of the third frame in the current frame and the next frame image is still 3.

And step S105, storing the face area of which the quality evaluation information is greater than the set threshold value in the queue.

In this embodiment, for each queue composed of face regions with the same ID, all face regions in one queue may be sorted according to the quality evaluation information of the face regions, and then the face regions with quality evaluation information greater than a set threshold are selected for storage, so as to be used for the calculation of the next stage, such as face recognition, and the like, thereby effectively reducing the storage space and the broadband transmission amount.

It should be noted that, in practical application, in order to improve the face snapshot speed, the face detection and the target tracking may be implemented in one network, which is not limited in the present application.

In summary, in the face snapshot method provided in the embodiment of the present invention, by setting the number of output channels in each convolution layer of each layer of the neural network of the face detection model and the corresponding number of packets of the convolution layer when performing the packet convolution operation, the appropriate number of packets can be found for different convolution layers when performing the packet convolution, which not only effectively improves the detection speed, but also considers the detection precision, and achieves a better detection effect. The face areas with the quality evaluation information larger than the set threshold are screened out and stored, so that the storage space and the broadband transmission quantity can be effectively reduced, and the application and the processing in the subsequent stage are facilitated. In addition, a model with better robustness can be obtained by pre-establishing benchmark and then training based on regression CNN, so that a more accurate high-quality face is output for later-stage face recognition or other purposes.

Referring to fig. 3, a block diagram of a face snapshot system 200 according to an embodiment of the present invention is shown. It should be noted that the basic principle and the generated technical effect of the face snapshot system 200 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and corresponding contents in the above embodiments may be referred to. The face snapshot system 200 comprises an image preprocessing module 210, a neural network accelerator 220 and a central processing unit 230, wherein the image preprocessing module 210 is electrically connected with the neural network accelerator 220, and the neural network accelerator 220 is electrically connected with the central processing unit 230.

The image preprocessing module 210 is configured to preprocess the acquired image to obtain an image to be detected, and input the image to be detected into the neural network accelerator 220.

It is understood that the above step S101 can be implemented on the image preprocessing module 210.

The neural network accelerator 220 is configured to perform face detection on an image to be detected according to a pre-trained face detection model, so as to obtain a face area map and coordinate information of a face area in the face area map; the face detection model comprises a plurality of layers of neural networks, and the number of output channels arranged in each convolution layer of each layer of neural network and the corresponding grouping number of the convolution layers when the convolution layers execute grouping convolution operation meet preset conditions.

Optionally, the multilayer neural network includes a first neural network, a second neural network, and a third neural network, where a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the first neural network is a first preset value, a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the second neural network is a second preset value, and a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the third neural network is a third preset value.

The neural network accelerator 220 is further configured to perform face quality evaluation on the face region in the face region map to obtain quality evaluation information of the face region, and input the coordinate information of the face region and the quality evaluation information of the face region into the central processing unit 230.

Optionally, the neural network accelerator 220 is specifically configured to perform face quality estimation on a face region in a face region map according to a pre-trained face quality estimation model to obtain quality estimation information of the face region; the human face quality evaluation model is obtained by training by utilizing a pre-established human face quality evaluation standard.

It is understood that the above steps S102 and S103 can be implemented on the neural network accelerator 220.

The central processing unit 230 is configured to perform target tracking according to the coordinate information of the face area, add the face areas belonging to the same target into a queue corresponding to the target, and store the face areas in the queue whose quality evaluation information is greater than a set threshold.

Optionally, the central processing unit 230 is specifically configured to perform filtering processing on a historical face region detected in a previous frame of image of the image to be detected by using a filter, so as to predict coordinate information of the historical face region appearing in the image to be detected, and further obtain predicted coordinate information, and add the face region to a queue where the historical face region is located when it is determined that the predicted coordinate information matches the coordinate information of the face region.

Optionally, the central processing unit 230 is configured to calculate an intersection ratio according to the predicted coordinate information and the coordinate information of the face region, and when the intersection ratio is greater than a preset value, determine that the predicted coordinate information matches the coordinate information of the face region.

It is understood that the above steps S104 and S105 can be implemented on the central processor 230.

In summary, in the face snapshot system provided in the embodiment of the present invention, by setting the number of output channels in each convolution layer of each layer of the neural network of the face detection model and the number of packets corresponding to the convolution layer when performing the packet convolution operation, the appropriate number of packets can be found for different convolution layers when performing the packet convolution, so that the MAC module on the neural network accelerator is fully utilized, which not only can effectively improve the detection speed, but also can give consideration to the detection accuracy, thereby achieving a better detection effect, and improving the speed of the whole system. The central processing unit screens out the face areas with the quality evaluation information larger than the set threshold value for storage, so that the storage space and the broadband transmission capacity can be effectively reduced, and the subsequent stage application and processing are facilitated. In addition, a model with better robustness can be obtained by pre-establishing benchmark and then training based on regression CNN, so that a more accurate high-quality face is output for later-stage face recognition or other purposes.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A face snapshot method, the method comprising:

preprocessing the acquired image to obtain an image to be detected;

2. The method of claim 1, wherein the multi-layer neural network comprises a first neural network, a second neural network and a third neural network, wherein a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the first neural network is a first preset value, a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the second neural network is a second preset value, and a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the third neural network is a third preset value.

3. The method according to claim 1, wherein the step of performing face quality assessment on the face region in the face region map to obtain quality assessment information of the face region comprises:

4. The method according to claim 1, wherein the step of tracking the target according to the coordinate information of the face area and adding the face areas belonging to the same target into the queue corresponding to the target comprises:

5. The method of claim 4, wherein the step of determining whether the predicted coordinate information matches the coordinate information of the face region comprises:

6. A face snapshot system is characterized by comprising an image preprocessing module, a neural network accelerator and a central processing unit, wherein the image preprocessing module is electrically connected with the neural network accelerator, and the neural network accelerator is electrically connected with the central processing unit;

7. The system of claim 6, wherein the multi-layer neural network comprises a first neural network, a second neural network and a third neural network, wherein a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the first neural network is a first preset value, a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the second neural network is a second preset value, and a ratio of the number of output channels and the number of packets corresponding to each convolutional layer of the third neural network is a third preset value.

8. The system according to claim 6, wherein the neural network accelerator is configured to perform face quality evaluation on a face region in the face region map according to a pre-trained face quality evaluation model, so as to obtain quality evaluation information of the face region; the human face quality evaluation model is obtained by training by using a pre-established human face quality evaluation standard.

9. The system according to claim 6, wherein said central processing unit is configured to perform filtering processing on a historical face region detected in a previous frame of image of said image to be detected by using a filter to predict coordinate information of said historical face region appearing in said image to be detected, so as to obtain predicted coordinate information, and add said face region to a queue where said historical face region is located when it is determined that said predicted coordinate information matches said coordinate information of said face region.

10. The system according to claim 9, wherein the central processing unit is configured to calculate an intersection ratio according to the predicted coordinate information and the coordinate information of the face region, and determine that the predicted coordinate information matches the coordinate information of the face region when the intersection ratio is greater than a preset value.