CN113836980A

CN113836980A - Face recognition method, electronic device and storage medium

Info

Publication number: CN113836980A
Application number: CN202010587883.6A
Authority: CN
Inventors: 丁肇臻; 侯春华; 申光
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2021-12-24
Also published as: WO2021259033A1; BR112022026549A2

Abstract

The invention discloses a face recognition method, electronic equipment and a storage medium, wherein the method comprises the steps of extracting a plurality of frames of face images containing a target face from a video stream; respectively extracting the face features of a plurality of frames of face images to obtain first face features; performing feature enhancement on the first face features, and fusing the enhanced first face features to obtain second face features; and comparing the second face features with third face features stored in advance to determine a face recognition result. The technical scheme provided by the embodiment of the invention can be used for carrying out face recognition based on the face features extracted from a plurality of frames of face images, and can solve the problem that the recognition result is greatly influenced by noise interference because the face recognition is carried out only based on the features of a single image in the traditional method. The embodiment of the invention also performs characteristic enhancement and fusion on the first face characteristic extracted from the multi-frame face image, and further improves the success rate and reliability of face recognition.

Description

Face recognition method, electronic device and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a face recognition method, an electronic device, and a storage medium.

Background

At present, the face recognition is widely applied to various application scenes such as security monitoring, crime arrest, people flow statistical analysis and the like. However, in the practical application process, the face recognition is easily interfered by various external noises. Such as: human face deflection; a large-amplitude side face; motion blur and out-of-focus blur; human faces have shielding (such as masks and sunglasses); low illumination intensity and contrast; video transmission of artificial blocks due to the codec process, etc. Due to the interference of noise, the accuracy of face recognition is greatly reduced, thereby limiting the application development of the face recognition technology.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a face recognition method, electronic equipment and a storage medium, which can reduce the influence of noise interference on the face recognition precision, thereby improving the success rate of face recognition.

In one aspect, an embodiment of the present invention provides a face recognition method, including:

extracting a plurality of frames of face images containing a target face from the video stream;

respectively extracting the face features of a plurality of frames of face images to obtain first face features;

performing feature enhancement on the first face features, and fusing the enhanced first face features to obtain second face features;

and comparing the second face features with third face features stored in advance to determine a face recognition result.

In another aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the face recognition method as claimed in the above are implemented.

In still another aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the face recognition method as described above.

The embodiment of the invention comprises the following steps: extracting a plurality of frames of face images containing a target face from the video stream; respectively extracting the face features of a plurality of frames of face images to obtain first face features; performing feature enhancement on the first face features, and fusing the enhanced first face features to obtain second face features; and comparing the second face features with third face features stored in advance to determine a face recognition result. According to the technical scheme provided by the embodiment of the invention, the face recognition is carried out based on the face features extracted from a plurality of frames of face images, so that the face feature samples are richer and more diversified, the feature complementation is realized, more available information is obtained during the face recognition, and the problem that the recognition result is greatly influenced by noise interference because the face recognition is carried out based on the features of a single image in the traditional method is solved. The embodiment of the invention also performs characteristic enhancement and fusion on the first face characteristic extracted from the multi-frame face image, realizes compensation on the face characteristic and further improves the success rate and reliability of face recognition.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a flowchart of a face recognition method according to an embodiment of the present invention;

FIG. 2 is a sub-flowchart of step S100 in FIG. 1;

FIG. 3 is a sub-flowchart of step S110 in FIG. 2;

FIG. 4 is a sub-flowchart of step S130 in FIG. 2;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be understood that in the description of the embodiments of the present invention, a plurality (or a plurality) means two or more, more than, less than, more than, etc. are understood as excluding the number, and more than, less than, etc. are understood as including the number. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.

Fig. 1 shows a flowchart of a face recognition method according to an embodiment of the present invention. As shown in fig. 1, the method includes, but is not limited to, the following steps S100 to S400.

Step S100, extracting a plurality of frames of face images containing the target face from the video stream.

In specific implementation, video acquisition can be completed through the front-end camera, and then subsequent processing is performed on a video stream output by the camera to obtain a multi-frame face image containing a target face. In step S100 of the embodiment of the present invention, a plurality of frames of face images including a target face are extracted from a video stream, and may be implemented by steps S110 to S130 shown in fig. 2.

Step S110, extracting a plurality of frames of first face images including the target face from the video stream.

Illustratively, step S110 may be specifically realized by steps S111 and S112 as shown in fig. 3.

Step S111, carrying out face detection on the video stream, and acquiring face position information of a target face in a current frame of the video stream.

For example, a face detection network such as a Multi-task cascade neural network (MTCNN) or a retina face may be used to obtain the position information of the target face in the video frame of the current frame. The position information may be information such as position information of a key point of a face and boundary information of the face.

And step S112, carrying out face track tracking according to the face position information, and extracting a plurality of frames of first face images containing the target face from the video stream.

Illustratively, the position of the target face in the next frame of video picture can be predicted according to the video picture position information of the target face in the current frame, which is acquired during face detection, so as to realize face trajectory tracking. By tracking the target face track, images of the target face can be intercepted from multi-frame video pictures of the video stream, so that a series of face track images containing the target face are obtained, and the series of face track images are used as multi-frame first face images.

In some embodiments, the face keypoint location information may specifically include a plurality of contour point location information. Wherein the plurality of contour point position information may include left eye position information, right eye position information, nose position information, left mouth angle position information, and right mouth angle position information.

Correspondingly, as shown in fig. 3, step S110 may further include step S113, calibrating an angle of the target face in the first face image according to the position information of the plurality of contour points.

For example, since a target object may be moving in an acquired video stream, there may be a tilt in the angle of a target face in a partial image in a multi-frame first face image composed of a series of face track images obtained by face track tracking. Therefore, the target human face in the first human face image can be calibrated by utilizing the position information of the contour point. Specifically, the position information of the plurality of contour points may be input into a face calibration algorithm, and the face calibration algorithm may be used to perform tilt correction on the target face in the first face image.

And step S120, respectively carrying out face quality analysis processing on the multiple frames of first face images to obtain face prior information of each frame of first face image.

The embodiment of the invention adopts a lightweight face quality evaluation algorithm to carry out face quality analysis processing on each frame of first face image, and obtains face prior information corresponding to each frame of first face image.

Specifically, the first face image of each frame may be subjected to face quality evaluation in multiple dimensions, and the obtained face prior information includes multiple different types of index parameters.

Illustratively, the index parameter may include three types of index parameters, a blur degree parameter, a deflection angle parameter, and a resolution parameter. In specific implementation, a lightweight face feature extraction model can be used for obtaining a feature model length of a first face image, and a blurring degree parameter is determined according to the obtained feature model length, wherein generally, the larger the feature model length is, the lower the blurring degree is; the first face image can be subjected to binarization processing by using local feature binarization LBP, a face symmetry index is output, and a deflection angle parameter is determined according to the face symmetry index, for example, when the symmetry index is 1, the face angle is represented, and the deflection angle is 0; the method comprises the steps of determining a pupil distance by using left eye position information and right eye position information obtained in face detection, and determining a resolution parameter according to the pupil distance, wherein generally, the larger the pupil distance is, the higher the resolution is, and the smaller the pupil distance is, the lower the resolution is.

It should be understood that the embodiment of the present invention is not limited to the three index parameters, and may also include other different types of index parameters, or replace any one or more of the three index parameters with other different types of index parameters, and the embodiment of the present invention is not limited thereto.

According to the embodiment of the invention, multidimensional quality evaluation is carried out on the first face image of each frame through a plurality of different types of index parameters, so that the strength of the detail features of the face in the first face image on different dimensions can be reflected. In the foregoing example, a lightweight face quality scoring method is adopted, the face quality is inspected in three dimensions, namely, a fuzzy degree, a face deflection angle and a resolution, and the obtained face prior information is used for subsequently selecting an image with high quality to extract face features, so that the extracted features have good richness and the feature diversification are ensured; on the other hand, the method can be used in the subsequent characteristic enhancement link to improve the generalization of the human face characteristics.

And step S130, selecting a plurality of frames of second face images from the plurality of frames of first face images according to the face prior information of each frame of first face image. Specifically, step S130 may include steps S131 to S133 as shown in fig. 4.

Step S131, a plurality of index parameters are linearly weighted to obtain a global quality score, and a first preset number of primary selection images are obtained from a plurality of frames of first face images according to the global quality score.

Illustratively, the global quality score may be obtained by performing linear weighting calculation on a plurality of index parameters included in the face prior information in step S120. And comprehensive quality evaluation can be carried out on each frame of first face image through the global quality score, ranking is carried out on each frame of first face image according to the global quality score, and the first face image with the top ranking is selected as the primary selection image. And the number of the primary selection images to be acquired can be determined by presetting the first preset number value.

As an example, the first preset amount may be a percentage value, such as setting the first preset amount to 30%. When 100 frames of first face images are extracted from the video stream in step S110 and the face prior information of each frame of first face image is acquired according to the method provided in step S120, linear weighting calculation is performed on a plurality of index parameters included in the face prior information to obtain a global quality score of each frame of first face image. And then ranking 100 first face images according to the global quality score of each first face image from high to low, and taking the first face image which is ranked 30 times as an initial selection image.

Step S132, the primary selection images with the first preset number are arranged and combined to obtain a plurality of primary selection image combinations, wherein each primary selection image combination comprises the primary selection images with the second preset number.

Continuing with the foregoing example, the second preset number may be set according to the number of second face images to be finally acquired, for example, the second preset number is set to 3. Thus, 30 primary selected images can be arranged and combined to obtain C₃ ³ ₀And each primary selection image combination comprises 3 primary selection images.

Step S133, obtaining the image distinguishing degree parameter of each primary selection image combination according to the index parameters, and selecting the final selection image combination from the primary selection image combinations according to the image distinguishing degree parameter.

The image distinguishing degree parameter is used for representing the difference of the human face detail characteristics among the multiple frames of primary selected images contained in the primary selected image combination. Generally, on the premise of ensuring the face quality, the greater the degree of distinguishing the detail features of the face, the more available information is contained in the image combination, and the face features thus extracted exhibit strong generalization, and are more suitable for a face recognition system in an open scene.

The image distinguishing degree parameter of the primary selection image combination can determine the image distinguishing degree parameter of the current primary selection image combination by calculating the cumulative distance between every two images in the combination in multiple dimensions.

Continuing with the foregoing example, assuming that the current initially selected images are combined into T1, three images numbered P1, P2 and P3 are included in the combined T1, and the distances of P1 and P2, P1 and P3, P2 and P3 in the respective dimensions are calculated, such as: calculating distances of P1 and P2 in three dimensions of a blur degree, a face deflection angle and a resolution as S1(P1P2), S2(P1P2) and S3(P1P2), calculating distances of P1 and P3 in three dimensions of the blur degree, the face deflection angle and the resolution as S1(P1P3), S2(P1P3) and S3(P1P3), calculating distances of P2 and P3 in three dimensions of the blur degree, the deflection angle, the face and the resolution as S1(P2P3), S2(P2P3) and S3(P2P3), and setting image distinguishing degree parameters of the initially selected image combination T1 as follows:

S1＝S1(P1P2)+S2(P1P2)+S3(P1P2)+S1(P1P3)+S2(P1P3)+S3(P1P3)+S1(P2P3)+S2(P2P3)+S3(P2P3)。

and selecting the primary image combination with the maximum image distinguishing degree parameter as a final image combination according to the calculated image distinguishing degree parameter of each primary image combination.

And step S134, combining the images contained in the final selected image as a second face image.

And when the final image combination is selected, taking the image contained in the final image combination as a second face image. For example, the combination T1 is selected as the final selected image combination, and the images P1, P2, and P3 included in the combination T1 are used as the second face image.

And step S200, respectively extracting the face features of the multiple frames of face images to obtain first face features.

For example, the face image in step S200 may be a second face image of multiple frames obtained in step S134.

Illustratively, a neural network may be used to perform face feature extraction on multiple frames of face images respectively to obtain a first face feature. Wherein the extracted first face features comprise multi-dimensional face vectors. The neural network may employ a face feature extraction algorithm such as Resnet152, thus outputting a set of 256-dimensional deep face features. These features represent the original face image information encoding that has not undergone feature enhancement.

And step S300, performing feature enhancement on the first face features, and fusing the enhanced first face features to obtain second face features.

Illustratively, a point multiplication operation is performed on the first face feature and the face prior information by using a deep convolutional neural network to obtain an enhanced first face feature. The face prior information is obtained by performing face quality analysis processing on the face image in the foregoing step S120.

Continuing with the foregoing example, the 256-dimensional depth face features extracted from the second face image by the Resnet152 algorithm and the face prior information (the blur degree parameter, the face deflection angle parameter, and the resolution parameter) corresponding to the second face image are input into the deep convolutional neural network, and the depth face features and the face prior information are subjected to the dot product operation by the deep convolutional neural network, so that the face features are enhanced by the face prior information output by the face quality evaluation algorithm.

Unlike traditional image-level enhancement methods, such as image deblurring, super-resolution, etc., embodiments of the present invention employ a feature-level enhancement method. Compared with an image level enhancement method, the method adopting feature level enhancement has the advantages that the processing object is a group of multi-dimensional face vectors, the calculation amount is small, and therefore the processing efficiency can be greatly improved.

Illustratively, the deep convolutional neural network for implementing feature enhancement may be a series connection of two fully connected layers, and training is performed on a face feature extraction data set to obtain a feature enhancement module for compensating an original feature. The three quality indexes output by the face quality scoring module reflect the strength of the face image in three dimensions of fuzzy degree, deflection angle and resolution, and the indexes control the feature enhancement module to enhance the original features through dot product operation.

And after the enhancement processing of the first face features is finished, fusing the enhanced first face features to obtain second face features.

Specifically, the enhanced first face features may be fused through an average pooling operation to obtain second face features.

And step S400, comparing the second face features with third face features stored in advance, and determining a face recognition result.

Specifically, the second face features may be compared with third face features stored in advance by using an european style algorithm, and a face recognition result may be determined.

The face recognition method provided by the embodiment of the invention is further exemplified by combining a specific application scenario.

Scene one: face monitoring scene at night in smart city

In the informatization construction of the intelligent city implemented by the country, the intelligent security monitoring system plays an important role. The traditional face recognition monitoring system shows better performance in sunny days with good illumination conditions. Under the night condition, due to the fact that a night scene is complex, the brightness is low, the light supplement equipment is aged, the angle configuration is poor, the temperature and the rain and snow are caused, and the situation that the identification precision is greatly reduced often occurs in the traditional monitoring system. The construction of the urban night monitoring system has important significance for monitoring wanted escapers and social idle persons. Based on such background, the present example illustrates a face recognition monitoring system in a city nighttime deployment and control scenario, and when applying the face recognition method provided by the embodiment of the present invention to the system, the method may specifically include the following method steps:

step S501, collecting a face image set of escapers, social idlers and key monitoring persons. These face images are typically front-facing, high-definition pictures, and thus do not require additional image processing. And coding the face images by adopting a face feature extraction algorithm, and warehousing to form a base library set.

Step S502, acquiring a monitoring video acquired by monitoring equipment in a monitoring area at night, wherein the monitoring area can be a fixed area such as a cell, a street and the like. The monitoring video can be transmitted by adopting an online video stream, and can also be stored to the local by adopting an offline mode. The video stream information is transmitted to a data processing module at the back end to be prepared for video image analysis.

Step S503, carrying out face detection and track tracking on the video information collected by each monitoring device to obtain a group of face track images containing the target face.

And step S504, scoring each face image in the track in three dimensions of fuzzy degree, deflection angle and resolution by adopting a lightweight face quality evaluation algorithm. Meanwhile, the global quality scoring results of the three indexes are also output together, the scores are linear weighting of the three indexes, and the weighting coefficients are obtained by a regression method. The quality of the face image is given by the global quality score, and the three indexes reflect the strength of the detail features of the face on different dimensions.

And step S505, selecting a plurality of images with relatively high quality and large face detail feature discrimination as a face candidate set from the face track images according to the global quality score and the indexes of the three dimensions.

Step S506, extracting the face features of the images of the face candidate set, enhancing the extracted face features by adopting a feature level enhancement method, performing average pooling operation on the enhanced face features, fusing all the face features in the face candidate set, and outputting the face features which are finally used for performing subsequent comparative matching.

And step S507, comparing the face features output in the step 506 with the face features stored in the bottom library, calculating the Euclidean distance between the face features and the face features, and when the distance is smaller than a certain threshold value, considering that the captured face is matched with the ID of the escaped person or the leisure person stored in the bottom library. And signaling the terminal equipment to display the identified result on the display equipment.

Scene two: monitoring and analyzing scene of personnel activity track in crowd-dense environment

In the intelligent upgrading and transformation of large-scale public places, the personnel residence time, the personnel density and the crowd flow direction are counted by utilizing the personnel activity track information, so that the method has higher economic value and social significance. For example, the movement tracks of the personnel are counted at the transfer center of the subway, the flow direction of the personnel is analyzed, the evacuation channels of the personnel can be reasonably allocated, and the transfer efficiency is improved. For another example, in a large-scale market, the residence time of an analyst and the flow direction of the crowd have important reference values for reasonably arranging the exhibition positions and the commodity sales areas. When people move in series, because the acquisition range of a single video acquisition device is limited, a face image usually comes from a plurality of acquisition devices, and in order to obtain a correct face track path, the face features have strong generalization and can be matched with the correct track path. However, in a dense crowd scene, the human face is prone to occlusion, side faces and motion blur, and these noises seriously hinder the stability of the human face features. Based on such a background, the present example illustrates a system for monitoring and analyzing a trajectory of a person in a crowd-dense scene, and when the face recognition method provided by the embodiment of the present invention is applied to the system, the method may specifically include the following method steps:

step S601, obtaining a monitoring video of each monitoring device in a certain public place within a period of time, where the place may be a public area such as a mall, a subway transfer center, an airport, and the like. And giving corresponding ID numbers, such as ID1, ID2, … … and IDN, to the monitoring video collected by each monitoring device. The video data collected by the monitoring devices are transmitted to the background for video image analysis.

Step S602, processing the video stream collected by each ID monitoring device by adopting a face detection and face tracking method to obtain a face track image set corresponding to the ID.

And step S603, scoring each face image in the face track image set corresponding to the ID in three dimensions of fuzzy degree, deflection angle and resolution by adopting a lightweight face quality evaluation algorithm. Meanwhile, the global quality scoring result of the three indexes is also output.

And step S604, generating a face candidate set corresponding to the ID according to the global quality score and the indexes of the three dimensions.

Step S605, extracting, enhancing and fusing the face features of the images contained in the face candidate set corresponding to the ID, and outputting the face features corresponding to the ID;

in step S606, each ID contains a certain number of facial features, which indicate the number of people captured by the monitoring device during the period of time. Performing Euclidean distance calculation between every two human face features of ID1, ID2, … … and IDN, and when the distance is smaller than a certain threshold value, considering that the identity matching is successful, and associating the trajectory information of the personnel under different ID monitoring devices;

and step S607, storing the personnel track in a database in a time axis mode, or displaying the personnel track on an interface for the retrieval or use of an operator.

The scheme provided by the embodiment of the invention aims at the condition that the traditional scheme is easily interfered by various noises in an open monitoring scene, and a large amount of optimization is carried out on the data processing unit, thereby greatly improving the overall performance of the face recognition monitoring system. The concrete expression is as follows:

the identification precision of the system is greatly improved: in a typical monitoring scene, a single face image captured by a face detection algorithm is often interfered by noise, and various defects often exist in the face detail characteristics. For example, in a certain monitoring area, when the distance between an object and an acquisition device is large, an acquired face image is often affected by out-of-focus blur, and when the distance between the object and the acquisition device is small, the face image is often subjected to motion blur. The face recognition method provided by the invention utilizes a plurality of face images with one motion trail of the same object to perform feature fusion extraction, thereby effectively avoiding the situation of information loss possibly occurring by adopting a single face image in the prior art. Meanwhile, the prior information of the face image on a plurality of dimensions is used for face feature enhancement, and finally obtained face features have high generalization. Meanwhile, the human face images are enhanced through feature enhancement, and certain highly incomplete human face images, such as yin and yang faces, large deflection angles, scarf mask shielding and the like, can be kept with considerable feature generalization.

The operating efficiency of the system is greatly improved: on the premise of ensuring high identification precision of the system, the face identification method provided by the invention is based on a lightweight design principle, and a lightweight deep neural convolution network is used in a face quality evaluation module, a face feature fusion module and a face feature enhancement module. For example, the face quality evaluation algorithm calculates in three dimensions of ambiguity, deflection angle and resolution, so that the performance of the system is prevented from being excessively occupied, and the real-time requirement of the face recognition monitoring system can be better met.

Fig. 5 illustrates an electronic device 70 provided by an embodiment of the present invention. As shown in fig. 5, the electronic device 70 includes, but is not limited to:

a memory 72 for storing programs;

a processor 71 for executing the program stored in the memory 72, wherein when the processor 71 executes the program stored in the memory 72, the processor 71 is configured to execute the above-mentioned face recognition method.

The processor 71 and the memory 72 may be connected by a bus or other means.

The memory 72 is a non-transitory computer readable storage medium, and can be used to store a non-transitory software program and a non-transitory computer executable program, such as the face recognition method described in the embodiment of the present invention. The processor 71 implements the face recognition method described above by running non-transitory software programs and instructions stored in the memory 72.

The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store and execute the face recognition method described above. Further, the memory 72 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 72 may optionally include memory located remotely from the processor 71, and these remote memories may be connected to the processor 71 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Non-transitory software programs and instructions necessary to implement the above-described face recognition method are stored in the memory 72 and, when executed by the one or more processors 71, perform the above-described face recognition method, e.g., performing method steps S100 to S400 depicted in fig. 1, method steps S110 to S130 depicted in fig. 2, method steps S111 to S113 depicted in fig. 3, and method steps S131 to S134 depicted in fig. 4.

The embodiment of the invention also provides a storage medium, which stores computer executable instructions, and the computer executable instructions are used for executing the face recognition method.

In one embodiment, the storage medium stores computer-executable instructions, which are executed by one or more control processors 71, for example, by one processor 71 in the electronic device 70, and which cause the one or more processors 71 to perform the above-mentioned face recognition method, for example, the method steps S100 to S400 described in fig. 1, the method steps S110 to S130 described in fig. 2, the method steps S111 to S113 described in fig. 3, and the method steps S131 to S134 described in fig. 4.

The above described embodiments are merely illustrative, wherein elements illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

While the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention is not limited thereto. Under the shared conditions, various equivalent modifications or substitutions can be made, and the equivalent modifications or substitutions are included in the scope of the invention defined by the claims.

Claims

1. A face recognition method, comprising:

2. The method of claim 1, wherein the extracting a plurality of frames of face images including a target face from a video stream comprises:

extracting a plurality of frames of first face images containing the target face from a video stream;

respectively carrying out face quality analysis processing on a plurality of frames of the first face images to obtain face prior information of each frame of the first face images;

selecting a plurality of frames of second face images from the plurality of frames of first face images according to the face prior information of each frame of first face images;

the face feature extraction is respectively carried out on the face images of multiple frames to obtain a first face feature, and the method comprises the following steps:

and respectively extracting the face features of the plurality of frames of second face images to obtain first face features.

3. The face recognition method of claim 2, wherein the face prior information comprises a plurality of different types of index parameters;

selecting a plurality of frames of second face images from the plurality of frames of first face images according to the face prior information, wherein the selecting comprises the following steps:

linearly weighting the index parameters to obtain a global quality score, and acquiring a first preset number of primary selection images from multiple frames of the first face image according to the global quality score;

arranging and combining the primary selection images of the first preset number to obtain a plurality of primary selection image combinations, wherein each primary selection image combination comprises a second preset number of primary selection images;

acquiring an image distinguishing degree parameter of each primary selection image combination according to the index parameters, and selecting a final selection image combination from the primary selection image combinations according to the image distinguishing degree parameter;

and combining the primary selection images contained in the final selection image as the second face image.

4. The face recognition method of claim 3, wherein the index parameters include a blur degree parameter, a deflection angle parameter, and a resolution parameter.

5. The method according to claim 2, wherein the extracting the plurality of frames of the first face image including the target face from the video stream comprises:

performing face detection on the video stream to acquire face position information of the target face in the current frame of the video stream;

and tracking the face track according to the face position information, and extracting a plurality of frames of first face images containing the target face from the video stream.

6. The face recognition method according to claim 5, wherein the face position information includes a plurality of contour point position information;

the extracting the multiple frames of first face images containing the target face from the video stream further comprises:

and calibrating the angle of the target face in the first face image according to the position information of the contour points.

7. The method according to claim 1, wherein the extracting the face features of the plurality of frames of face images respectively to obtain the first face feature comprises:

respectively extracting the face features of a plurality of frames of face images by using a neural network to obtain the first face feature; wherein the extracted first face features comprise multi-dimensional face vectors.

8. The method according to claim 7, wherein the performing feature enhancement on the first facial features comprises:

performing dot product operation on the first face features and face prior information by using a deep convolutional neural network to obtain the enhanced first face features; the face prior information is obtained by performing face quality analysis processing on the face image.

9. The method according to claim 1 or 7, wherein the fusing the enhanced first face features to obtain second face features comprises:

and fusing the enhanced first human face features through average pooling operation to obtain second human face features.

10. The method of claim 1, wherein comparing the second face feature with a third face feature stored in advance to determine a face recognition result comprises:

and comparing the second face features with third face features stored in advance by using an Euclidean algorithm to determine a face recognition result.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1-10 when executing the program.

12. A computer-readable storage medium, characterized in that a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.