CN112215179A

CN112215179A - In-vehicle face recognition method, device, apparatus and storage medium

Info

Publication number: CN112215179A
Application number: CN202011119803.0A
Authority: CN
Inventors: 吴晓东
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-01-12
Anticipated expiration: 2040-10-19
Also published as: CN112215179B

Abstract

The invention relates to artificial intelligence, and provides a method, equipment and a device for identifying a human face in a vehicle and a computer readable storage medium, wherein the method comprises the following steps: processing the human face image in the car into an image to be identified; extracting the face features of the image to be recognized to obtain an original feature matrix; sequentially carrying out up-sampling on the original characteristic matrix for at least three times, and carrying out characteristic weighted summation on the up-sampling matrix obtained after each up-sampling and the original characteristic matrix to sequentially obtain corresponding characteristic matrices; carrying out convolution operation processing on the characteristic matrix to obtain a corresponding characteristic diagram; detecting and identifying the face of the feature map to obtain a face prediction frame and a class confidence of the face prediction frame; and carrying out duplication removal processing on the face prediction frame, and screening out the optimal face prediction frame as a recognition result. The invention can solve the problems of low accuracy and recall rate of human face recognition in the automobile in the prior art under difficult scenes such as haze, rainy days, nights and sheltering.

Description

In-vehicle face recognition method, device, apparatus and storage medium

Technical Field

The present invention relates to artificial intelligence, and more particularly, to a method, device, and apparatus for in-vehicle face recognition and a computer readable storage medium.

Background

Face recognition is an important image processing technical means in a new era, and has wide application in the field of traffic, such as seizing of escapers, entrance security inspection and the like.

The deep learning method based on YOLOv3 becomes one of the popular face recognition algorithms in the industry at present due to the high detection speed. The face recognition algorithm based on the YOLOv3 can achieve real-time detection in simple scenes such as sunny days, daytime and no shielding, and has the effect of higher accuracy, but the accuracy and recall rate are lower in difficult scenes such as haze, rainy days, nighttime and shielding.

Disclosure of Invention

Based on the problems in the prior art, the invention provides a method, a device and a computer readable storage medium for recognizing a human face in a vehicle, and the method, the device and the medium are mainly used for sequentially performing up-sampling on an original feature matrix obtained by extraction for at least three times, performing feature weighted summation on the up-sampled matrix obtained after each up-sampling and the original feature matrix, performing convolution operation respectively to obtain at least three feature maps with different sizes, performing human face detection and recognition on the at least three feature maps with different sizes respectively through a preset target detection frame to obtain a predicted human face frame, and performing de-duplication on the predicted human face frame to obtain an optimal human face frame as a recognition result. The problem of among the prior art in haze, rainy day, night, have difficult scenes such as sheltering from, the low scheduling of rate of accuracy and recall of in-car face identification is solved.

In a first aspect, to achieve the above object, the present invention provides an in-vehicle human face recognition method, including:

processing the acquired in-car face image into an image with a preset size to obtain an image to be identified;

extracting the face features of the image to be recognized through a feature extraction network to obtain an original feature matrix;

sequentially carrying out at least three times of upsampling on the original feature matrix, wherein each time of upsampling is based on the original feature matrix obtained by the previous upsampling, and carrying out feature weighted summation on the upsampling matrix obtained after each time of upsampling and the original feature matrix to sequentially obtain corresponding feature matrices;

performing convolution operation processing on the feature matrix to respectively obtain corresponding feature maps;

carrying out face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and a class confidence coefficient of the face prediction frame;

and according to the class confidence of the face prediction frame, carrying out duplication elimination processing on the face prediction frame, and screening out the optimal face prediction frame from the face prediction frame as a recognition result.

In a second aspect, to achieve the above object, the present invention further provides an in-vehicle human face recognition apparatus, including:

the image size processing unit is used for processing the acquired in-vehicle face image into an image with a preset size to obtain an image to be identified;

the characteristic extraction unit is used for extracting the face characteristic of the image to be recognized through a characteristic extraction network to obtain an original characteristic matrix;

the weighted summation processing unit is used for sequentially carrying out at least three times of upsampling on the original characteristic matrix, wherein each time of upsampling is based on the original characteristic matrix obtained by the previous time of upsampling, and carrying out characteristic weighted summation on the upsampled matrix obtained after each time of upsampling and the original characteristic matrix to sequentially obtain a corresponding characteristic matrix;

the convolution processing unit is used for carrying out convolution operation processing on the characteristic matrix to respectively obtain corresponding characteristic graphs;

the detection and recognition unit is used for carrying out face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and a class confidence coefficient of the face prediction frame;

and the face prediction frame duplicate removal unit is used for carrying out duplicate removal processing on the face prediction frame according to the class confidence of the face prediction frame and screening out the optimal face prediction frame from the face prediction frame as a recognition result.

In a third aspect, to achieve the above object, the present invention further provides an electronic device, including: the system comprises a memory and a processor, wherein the memory stores an in-vehicle face recognition program, and the in-vehicle face recognition program realizes any step in the in-vehicle face recognition method when being executed by the processor.

In a fourth aspect, to achieve the above object, the present invention further provides a computer-readable storage medium, in which an in-vehicle face recognition program is stored, and when being executed by a processor, the in-vehicle face recognition program implements any of the steps of the in-vehicle face recognition method described above.

The method, the device and the computer readable storage medium for recognizing the human face in the car sequentially perform at least three times of upsampling on an extracted original characteristic matrix, perform characteristic weighted summation on the upsampled matrix obtained after each upsampling and the original characteristic matrix, perform convolution operation respectively to obtain at least three characteristic images with different sizes, perform human face detection and recognition on the at least three characteristic images with different sizes respectively through a preset target detection frame to obtain a human face prediction frame, and perform de-duplication on the human face prediction frame to obtain an optimal human face prediction frame as a recognition result. The original feature matrix is subjected to weighting summation for three times, so that the information loss is effectively reduced, the integrity of extracted features is improved, and the overall accuracy and recall rate of in-vehicle face recognition are improved; the optimal face prediction frame is obtained as a recognition result by removing the duplicate of the face prediction frame, so that the feature expression capability under difficult scenes such as haze, rainy days, nights, shelters in a vehicle and the like is obviously enhanced, and the overall accuracy and the recall rate of face recognition in the vehicle are further improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the in-vehicle face recognition method of the present invention;

FIG. 2 is a schematic diagram of an application environment of the in-vehicle face recognition method according to the preferred embodiment of the present invention;

fig. 3 is a block diagram illustrating a preferred embodiment of the in-vehicle facial recognition procedure of fig. 2.

Fig. 4 is a system logic diagram corresponding to the in-vehicle human face recognition method of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a method for identifying a human face in a vehicle. Referring to fig. 1, a flow chart of a preferred embodiment of the in-vehicle face recognition method of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the in-vehicle face recognition method includes: step S110-step S160.

And step S110, processing the acquired in-vehicle face image into an image with a preset size to obtain an image to be identified.

Specifically, through picture size processing, the in-vehicle face image is processed to meet the requirement of the model for the image size, and the subsequent feature extraction network is convenient to extract the in-vehicle face image features. The size of the image to be recognized can be set according to actual needs, for example: 512 by 512 or 1024 by 1024, etc.

And step S120, extracting the face features of the image to be recognized through a feature extraction network to obtain an original feature matrix.

Specifically, the extraction speed is higher by extracting the face features of the image to be recognized through the feature extraction network, and the feature extraction network for realizing the feature extraction is realized by continuously performing convolution calculation or multilayer convolution calculation on the image to be recognized, so that a multilayer original feature matrix is obtained.

As a preferred aspect of the present invention, the feature extraction network includes: the system comprises an input layer for acquiring an image to be recognized, a convolution layer for performing convolution operation processing on the image to be recognized of the input layer, a pooling layer for performing downsampling processing on a first face feature map matrix output by the convolution layer, a full-connection layer for performing full-connection processing on a second face feature map matrix output by the pooling layer, a global average pooling layer for performing average value calculation on pixels of a face feature map output by the full-connection layer, and an output layer for outputting an original feature matrix obtained by the global average pooling layer.

Specifically, the feature extraction network is preferably a CSPRESNEXt50 network, and the CSPRESNEXt50 network enhances the feature expression capability in difficult scenes such as haze, rainy days, nights and shelters in the vehicle, so that the overall accuracy and recall rate of face recognition in the vehicle are improved. The cspraesenext 50 network is a relatively advanced image classification network, and the specific network structure is that the image to be identified processed into a fixed size is input through an input layer, and convolution calculation processing is performed through a convolution layer, wherein the number of convolution layers can be determined according to actual conditions, for example, 3, 4, 6, namely the number of convolution layers (execution times); the size of the convolution kernel may also be set according to actual needs, for example, 7 × 7, 3 × 3, etc.; performing convolution processing through a plurality of convolution modules, such as: conv _ block _1, Conv _ block _2, Conv _ block _3 and Conv _ block _4 have the same structure, but different channels (128, 256, 512 and 1024 respectively) have the specific structure of 1 × 1 convolution +3 × 3 convolution +1 × 1 convolution; and compressing the first face feature map matrix output by the convolutional layer through the pooling layer to improve the feature extraction speed, carrying out full-connection processing on the second face feature map matrix output by the pooling layer through the full-connection layer, carrying out average value calculation on pixels of the face feature maps output by the full-connection layer to finally obtain an original feature matrix, and outputting the original feature matrix through the output layer.

And step S130, sequentially carrying out at least three times of upsampling on the original feature matrix, wherein each time of upsampling is based on the original feature matrix obtained by the previous time of upsampling, and carrying out feature weighted summation on the upsampling matrix obtained after each time of upsampling and the original feature matrix to sequentially obtain the corresponding feature matrix.

Specifically, the original feature matrix is subjected to at least three times of upsampling, preferably three times of upsampling, and of course, the upsampling can be performed more than three times, such as four times, five times, six times, and the like, and after each time of upsampling, the matrix is expanded, and then the expanded feature matrix and the original feature matrix with the same size in the feature extraction network are subjected to feature weighted summation to sequentially obtain corresponding feature matrices.

As a preferred scheme of the present invention, sequentially performing up-sampling on an original feature matrix three times, where each up-sampling is based on an original feature matrix obtained by the previous up-sampling, and performing feature weighted summation on an up-sampling matrix obtained by each up-sampling and the original feature matrix includes:

sequentially carrying out gradient disappearance prevention treatment and first up-sampling on the original feature matrix to obtain an expanded feature matrix;

carrying out first feature weighted summation on the expanded feature matrix and the original feature matrix according to a first preset weight parameter to obtain a first feature matrix;

sequentially carrying out gradient disappearance prevention treatment and second up-sampling on the first feature matrix to obtain an expanded first feature matrix;

carrying out second feature weighted summation on the expanded first feature matrix and the original feature matrix according to a second preset weight parameter to obtain a second feature matrix;

sequentially carrying out gradient disappearance prevention treatment and third up-sampling on the second feature matrix to obtain an expanded second feature matrix;

and carrying out third feature weighted summation on the expanded second feature matrix and the original feature matrix according to a third preset weight parameter to obtain a third feature matrix.

Specifically, the upsampling is used to amplify the original feature matrix, so as to reduce information loss. Such as: the size of the upsampled input matrix is 13 × 256, and then the size of the matrix after the upsampling operation is 26 × 256 (that is, the width and the height of the matrix can be amplified by 2 times), when the feature matrix expanded after the upsampling is subjected to feature weighted summation with the original feature matrix of a certain layer in the middle of the feature extraction network, the size of the matrix needs to be kept consistent, otherwise, the weighted summation operation cannot be performed, for example: assuming that the matrix size (i.e., one of the inputs of Sum) after the first upsampling is 26 × 128, and assuming that the output matrix sizes of the 120 th layer, the 130 th layer and the 140 th layer in the feature extraction network are 13 × 128, 26 × 128 and 26 × 256, respectively, the other input of the weighted summation can only be the output matrix of the 130 th layer (i.e., the matrix sizes must be the same (26 × 128 all) in the feature weighted summation process), but cannot be the feature map matrices of the 120 th layer and the 140 th layer, wherein the first preset weight parameter, the second preset weight parameter and the third preset weight parameter can be set according to actual conditions, and the three preset parameters can be the same or different.

Step S140, carrying out convolution operation processing on the feature matrixes to respectively obtain corresponding feature maps.

Specifically, convolution operation is performed on each feature matrix respectively to obtain corresponding feature maps, and the feature maps corresponding to each feature matrix are different in size.

As a preferred embodiment of the present invention, the performing convolution operation processing on the feature matrix to obtain corresponding feature maps includes:

sequentially carrying out gradient disappearance prevention processing and first convolution operation on the first feature matrix to obtain a first feature map;

sequentially carrying out gradient disappearance prevention processing and second convolution operation on the second feature matrix to obtain a second feature map;

sequentially carrying out gradient disappearance prevention processing and third convolution operation on the third feature matrix to obtain a third feature map;

wherein the gradient disappearance prevention process includes: convolution calculation processing, batch normalization processing and activation function processing.

Specifically, in order to prevent gradient disappearance before performing convolution operation on the first feature matrix, the second feature matrix and the third feature matrix, gradient disappearance prevention processing, namely 3 × 3 convolution + batch normalization + Mish activation function processing, is performed, and convolution calculation is performed on each processed feature matrix to obtain a first feature map, a second feature map and a first feature map respectively. Wherein the gradient disappearance prevention processing is performed as many times as necessary.

Through convolution calculation processing, batch normalization processing and Mish activation function processing, the training speed of the feature matrix of each stage is faster, and the phenomenon of gradient disappearance is prevented.

And S150, carrying out face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and the class confidence of the face prediction frame.

Specifically, a preset target detection frame (anchor box) is a mode of target detection and identification, the preset target detection frame can be used for carrying out face detection and identification on an image to be identified, namely a feature map in the embodiment of the invention, the detected result is a face prediction frame, and the probability of the prediction frame being the face is the class confidence of the face prediction frame.

As a preferred embodiment of the present invention, before the preset target detection box is stored in the block chain and the feature map is subjected to face detection and recognition by the preset target detection box to obtain the face prediction box and the class confidence of the face prediction box, the method further includes:

acquiring human face sample data;

randomly acquiring a specified number of points from the face sample data as face initial sample points;

clustering the face sample data by adopting a clustering algorithm to obtain the clusters with the specified number;

and calculating the coordinates of the central point of each cluster as a preset target detection frame.

Specifically, in the clustering process, the distance function that can be used is: the IOU is I/U, wherein I represents the intersection area of the two marked frames, and U represents the union area of the two marked frames; the clustering algorithm is preferably a kmeans algorithm.

As a preferred scheme of the present invention, the obtaining of the class confidence of the face prediction frame and the face prediction frame by performing face detection and recognition on the feature map by using the preset target detection frame includes:

sliding a preset target detection frame on the feature map, and acquiring a coordinate of a central point of the preset target detection frame on the feature map as a first coordinate;

calculating a predicted coordinate of the feature map according to the first coordinate and the original coordinate of the preset target detection frame;

obtaining a face prediction frame according to the prediction coordinates of the feature map;

and calculating the class confidence of the face prediction box through a binary classification algorithm.

Specifically, the first feature map is taken as an example, assuming that the size of the first feature map is 16 × 16, the number of preset target detection frames (anchor box) allocated to the first feature map is 3, the 3 preset target detection frames are respectively slid in the 16 × 256 grids of the first feature map, and coordinates of the detection frames (i.e., x, y, w, h, (x, y) indicate coordinates of a center point of the detection frame on the first feature map, and (w, h) indicate width and height of the detection frame) and a category (i.e., whether the detection frame is a human face) are predicted for each grid.

The predicted coordinates and width and height values are relative to the coordinates and width and height values of the current 3 preset target detection boxes. Assuming that the width and height of 1 preset target detection frame obtained by clustering with the kmeans algorithm in advance is (2,3), assuming that the preset target detection frame currently slides to the 2 nd grid on the first feature map (i.e. the matrix of y1, 16), the coordinates of the preset anchor box are (1,0), the width and height are (2,3), then assuming that the original coordinates of the preset target detection frame predicted by the prediction model are (0.3,0.6), the width and height are (2.1,1.3), then the coordinates of the prediction frame obtained based on the preset target detection frame on the first feature map are (1+0.3,0+1.6), (1.3,1.6), and the width and height are (2 ^ e 2.1,3 ^1.3) ^ 16.3,11.0), i.e. the predicted coordinates of the first feature map. In addition, similar operations are also performed on the second feature map and the third feature map of 2 scales, and finally face detection (namely, coordinates) and face recognition (namely, classification) are realized, and certainly, the detection model can also be directly used for predicting to directly obtain the result, namely, the probability (fraction between 0 and 1) that the predicted coordinates are the face is directly predicted, if the probability is greater than the preset probability, if the probability is greater than 0.5, the face is obtained, otherwise, the face is not obtained.

And mapping the detection (namely coordinates) of the human face obtained by predicting the three characteristic images through the preset target detection frame to the human face image in the vehicle to finally obtain the human face prediction frame and the class confidence of the human face prediction frame.

And step S160, carrying out duplication elimination processing on the face prediction frame according to the class confidence of the face prediction frame, and screening out the optimal face prediction frame from the face prediction frame as a recognition result.

Specifically, the face prediction frames obtained by mapping and the class confidence of each face prediction frame are subjected to duplication elimination processing, so that an optimal face prediction frame is obtained, and the optimal face prediction frame is used as a recognition result.

As a preferred scheme of the present invention, the removing duplication processing of the face prediction frame according to the class confidence of the face prediction frame, and the screening of the optimal face prediction frame from the face prediction frame as the recognition result includes:

acquiring a face prediction frame with the highest class confidence coefficient of the face prediction frames as the face prediction frame with the highest confidence coefficient, and the rest of the face prediction frames are the rest face prediction frames;

and calculating the intersection ratio of the residual face prediction frame and the face prediction frame with the highest confidence coefficient through an intersection ratio formula, wherein,

the intersection ratio formula is as follows: the IOU is the intersection ratio of the residual face prediction frame and the face prediction frame with the highest confidence coefficient, I is the intersection area of the residual face prediction frame and the face prediction frame with the highest confidence coefficient, and U is the union area of the residual face prediction frame and the face prediction frame with the highest confidence coefficient;

updating the confidence coefficients of the rest face prediction frames according to a preset confidence coefficient updating formula according to the intersection ratio; wherein, the preset confidence updating formula is as follows:

the IOU is the intersection ratio of the residual face prediction frame and the face prediction frame with the highest confidence coefficient; the alpha is an attenuation coefficient; the IOU_thresholdA preset cross-over ratio threshold; score is the confidence coefficient of the updated residual face prediction frame;

and deleting the residual face prediction frames with the confidence degrees lower than the preset confidence degree threshold after the residual face prediction frames are updated, so as to obtain the optimal face prediction frame.

Specifically, it is assumed that there is actually only one face in one picture, but in the process of performing face detection and recognition, 4 face detection frames may be detected, and 3 of the 4 face detection frames are repeated with the real face frame (i.e., IOU >0.5), so that the above steps need to be performed again to remove the 3 repeated frames, and finally, an optimal face prediction frame is obtained as a recognition result.

The invention provides an in-vehicle face recognition method, which is applied to an electronic device 1. Fig. 2 is a schematic diagram of an application environment of the method for recognizing a human face in a vehicle according to the preferred embodiment of the present invention.

In the present embodiment, the electronic device 1 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device 1 includes: a processor 12, a memory 11, a network interface 13, and a communication bus 14.

The memory 11 includes at least one type of readable storage medium. At least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, and a card-type memory 11. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 1.

In the present embodiment, the readable storage medium of the memory 11 is generally used for storing the in-vehicle face recognition program 10 and the like installed in the electronic device 1. The memory 11 may also be used to temporarily store data that has been output or is to be output.

The processor 12 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip for executing program codes stored in the memory 11 or Processing data, such as executing the in-vehicle facial recognition program 10.

The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The communication bus 14 is used to realize connection communication between these components.

In the device embodiment shown in fig. 2, the memory 11, which is a kind of computer storage medium, may include therein an operating system and an in-vehicle face recognition program 10; the processor 12, when executing the in-vehicle face recognition program 10 stored in the memory 11, implements the following steps:

step S110, processing the acquired in-car face image into an image with a preset size to obtain an image to be identified;

step S120, extracting the face features of the image to be recognized through a feature extraction network to obtain an original feature matrix;

step S130, at least three times of upsampling are carried out on the original feature matrix in sequence, each time of upsampling is based on the original feature matrix obtained by the previous time of upsampling, feature weighted summation is carried out on the upsampling matrix obtained after each time of upsampling and the original feature matrix, and corresponding feature matrices are obtained in sequence;

step S140, carrying out convolution operation processing on the feature matrix to respectively obtain corresponding feature maps;

step S150, carrying out face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and a class confidence coefficient of the face prediction frame;

In other embodiments, the in-vehicle facial recognition program 10 may also be divided into one or more modules, which are stored in the memory 11 and executed by the processor 12 to implement the present invention.

The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions. Referring to fig. 3, a block diagram of a preferred embodiment of the in-vehicle facial recognition process 10 of fig. 2 is shown. The in-vehicle face recognition program 10 may be divided into: an image size processing module 110, a feature extraction module 120, a weighted sum processing module 130, a convolution processing module 140, a detection recognition module 150, and a face prediction frame deduplication module 160.

The functions or operation steps implemented by the modules 110 and 160 are similar to those described above, and are not detailed here, for example, where:

the image size processing module 110 is configured to process the acquired in-vehicle face image into an image with a preset size, so as to obtain an image to be identified;

the feature extraction module 120 is configured to perform face feature extraction on the image to be recognized through a feature extraction network to obtain an original feature matrix;

the weighted sum processing module 130 is configured to perform at least three times of upsampling on the original feature matrix in sequence, where each upsampling is based on the original feature matrix obtained by the previous upsampling, and perform feature weighted sum on the upsampled matrix obtained by each upsampling and the original feature matrix to obtain corresponding feature matrices in sequence;

a convolution processing module 140, configured to perform convolution operation processing on the feature matrix to obtain corresponding feature maps respectively;

the detection and recognition module 150 is used for carrying out face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and a class confidence of the face prediction frame; it is emphasized that the preset target detection box is stored in the block chain;

and the face prediction frame duplication elimination module 160 is used for carrying out duplication elimination processing on the face prediction frame according to the class confidence of the face prediction frame and screening the optimal face prediction frame from the face prediction frame as a recognition result.

As shown in fig. 4, in addition, in correspondence with the above method, an embodiment of the present invention further proposes an in-vehicle face recognition apparatus 400, including: the image size processing unit 410, the feature extraction unit 420, the weighted sum processing unit 430, the convolution processing unit 440, the detection recognition unit 450 and the face prediction frame deduplication unit 460 are in one-to-one correspondence with the steps of the in-vehicle face recognition method in the embodiment.

The image size processing unit 410 is configured to process the acquired in-vehicle face image into an image with a preset size, so as to obtain an image to be identified;

the feature extraction unit 420 is configured to perform face feature extraction on the image to be recognized through a feature extraction network to obtain an original feature matrix;

the weighted sum processing unit 430 is configured to perform at least three times of upsampling on the original feature matrix in sequence, where each upsampling processing is based on the original feature matrix obtained by the previous upsampling, and perform feature weighted sum on the upsampled matrix obtained after each upsampling and the original feature matrix to obtain corresponding feature matrices in sequence;

a convolution processing unit 440, configured to perform convolution operation processing on the feature matrices to obtain corresponding feature maps respectively;

the detection and recognition unit 450 is configured to perform face detection and recognition on the feature map through a preset target detection frame to obtain a face prediction frame and a class confidence of the face prediction frame, where it needs to be emphasized that the preset target detection frame is stored in the block chain;

and the face prediction frame duplicate removal unit 460 is configured to perform duplicate removal processing on the face prediction frame according to the class confidence of the face prediction frame, and screen an optimal face prediction frame from the face prediction frame as a recognition result.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an in-vehicle face recognition program is stored in the computer-readable storage medium, and when executed by a processor, the in-vehicle face recognition program implements the following operations:

sequentially carrying out at least three times of upsampling on the original characteristic matrix, wherein each time of upsampling processing is based on the original characteristic matrix obtained by the previous upsampling, and carrying out characteristic weighted summation on the upsampling matrix obtained after each time of upsampling and the original characteristic matrix to sequentially obtain corresponding characteristic matrices;

carrying out convolution operation processing on the feature matrixes to respectively obtain corresponding feature maps;

and according to the class confidence of the face prediction frame, carrying out duplication removal processing on the face prediction frame, and screening out the optimal face prediction frame from the face prediction frame as a recognition result.

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the specific implementation of the in-vehicle face recognition method and the electronic device, and will not be described herein again.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for recognizing a human face in a vehicle is applied to an electronic device, and is characterized by comprising the following steps:

2. The in-vehicle face recognition method according to claim 1, wherein the feature extraction network includes:

the image recognition system comprises an input layer for acquiring the image to be recognized, a convolution layer for performing convolution operation processing on the image to be recognized of the input layer, a pooling layer for performing down-sampling processing on a first face characteristic diagram matrix output by the convolution layer, a full-connection layer for performing full-connection processing on a second face characteristic diagram matrix output by the pooling layer, a global average pooling layer for performing average value calculation on pixels of a face characteristic diagram output by the full-connection layer and an output layer for outputting an original characteristic matrix obtained by the global average pooling layer.

3. The in-vehicle face recognition method according to claim 1, wherein the up-sampling the original feature matrix three times in sequence, each up-sampling is based on the original feature matrix obtained by the previous up-sampling, and the up-sampling matrix obtained by each up-sampling and the original feature matrix are subjected to feature weighted summation to obtain the corresponding feature matrix in sequence, and the method comprises:

4. The in-vehicle face recognition method according to claim 3, wherein the performing convolution operation processing on the feature matrix to obtain corresponding feature maps respectively includes:

5. The in-vehicle face recognition method according to claim 1, wherein the preset target detection box is stored in a block chain, and before the performing face detection and recognition on the feature map through the preset target detection box to obtain class confidences of a face prediction box and a face prediction box, the method further comprises:

acquiring human face sample data;

and calculating the coordinates of the central point of each cluster as the preset target detection frame.

6. The in-vehicle face recognition method according to claim 1, wherein the obtaining of the class confidence of the face prediction frame and the face prediction frame by performing face detection and recognition on the feature map through a preset target detection frame comprises:

sliding the preset target detection frame on the feature map, and acquiring a coordinate of a central point of the preset target detection frame on the feature map as a first coordinate;

obtaining the face prediction frame according to the prediction coordinates of the feature map;

7. The in-vehicle face recognition method according to claim 1, wherein the performing de-duplication processing on the face prediction frames according to the class confidence of the face prediction frames, and the screening out an optimal face prediction frame from the face prediction frames as a recognition result comprises:

acquiring a face prediction frame with the highest class confidence of the face prediction frames as the face prediction frame with the highest confidence, and the rest of the face prediction frames are the rest face prediction frames;

calculating the intersection ratio of the residual face prediction frame and the face prediction frame with the highest confidence coefficient through an intersection ratio formula, wherein,

updating the confidence coefficients of the residual face prediction frames according to a preset confidence coefficient updating formula according to the intersection ratio; wherein the preset confidence updating formula is as follows:

wherein the IOU is the intersection ratio of the residual face prediction frame and the face prediction frame with the highest confidence coefficient; the alpha is an attenuation coefficient; the IOU_thresholdA preset cross-over ratio threshold; score is the confidence coefficient of the updated residual face prediction frame;

8. An in-vehicle face recognition device, the device comprising:

9. An electronic device, comprising: memory in which an in-vehicle face recognition program is stored, a processor, the in-vehicle face recognition program realizing the steps of the in-vehicle face recognition method according to any one of claims 1 to 7 when executed by the processor.

10. A computer-readable storage medium, in which an in-vehicle facial recognition program is stored, which when executed by a processor, implements the steps of the in-vehicle facial recognition method according to any one of claims 1 to 7.