CN111898454A

CN111898454A - Weight binarization neural network and transfer learning human eye state detection method and device

Info

Publication number: CN111898454A
Application number: CN202010624577.5A
Authority: CN
Inventors: 刘振焘; 吴敏; 曹卫华; 蒋承汕; 李锶涵; 郝曼
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2020-11-06

Abstract

The invention provides a weight binarization neural network and a method and equipment for detecting human eye state by transfer learning, wherein the method comprises the following steps: collecting human eye images and preprocessing the human eye images; constructing a human eye positioning convolution neural network model based on weight binarization, and predicting to obtain binocular coordinates; constructing a boundary frame by taking the binocular coordinates as a center; constructing a human eye detection convolutional neural network model based on weight binarization, and completing training of the human eye detection convolutional neural network model based on weight binarization by using a human face database and a human eye database by adopting transfer learning; the bounding box is used as the input of a human eye detection convolutional neural network model based on weight binarization to complete human eye state detection; the beneficial effects provided by the invention are as follows: the method reduces or even overcomes the influence on human eye recognition caused by uncertainty of head posture, external environment illumination, interference under complex background conditions and shielding, and improves the robustness of human eye recognition.

Description

Weight binarization neural network and transfer learning human eye state detection method and device

Technical Field

The invention relates to the field of image processing, in particular to a weight binarization neural network and a method and equipment for detecting human eye state by transfer learning.

Background

Current methods for detecting the state of the human eye can be roughly classified into two types of methods based on feature analysis and pattern classification. The method based on feature analysis mainly relies on the geometric features of the eye, such as the iris, pupil, eyelid shape or aspect ratio of human eyes, to distinguish the open and closed states of the eye, or to judge the ratio of white pixels in the eye image. Such methods rely on accurate eye positioning and are susceptible to environmental disturbances leading to misjudgment. The detection method based on pattern classification firstly extracts shape or texture features of an eye region, such as local binary pattern features, directional gradient histogram features, Haar features, Gabor wavelet features and the like, and then trains a classifier through a support vector machine, an Adaboost classifier or a neural network and the like to automatically learn a classification rule so as to judge the opening and closing states of the eye.

These methods have advantages, but are all susceptible to interference from factors such as illumination, facial pose, image sharpness, etc. in practical applications.

Disclosure of Invention

In view of the above, the invention provides a weight binarization neural network and a transfer learning human eye state detection method, which can greatly reduce or even overcome the problems of uncertainty of head posture, influence of external environment illumination, interference under complex background conditions, influence of shielding and the like, and the method provided by the invention has higher robustness and can better adapt to environmental changes; the method comprises the following steps:

s101: collecting RGB images of the human face by using a camera;

s102: preprocessing the face RGB image to obtain a preprocessed face image, constructing a weight binarization convolution neural network model for human eye positioning, and training the weight binarization convolution neural network model for human eye positioning by using a face database; the weight binarization convolutional neural network model for human eye positioning comprises four levels; the first hierarchical structure comprises three convolutional neural networks, F1, LE1, and RE 1; the second hierarchy includes five convolutional neural networks, F2, LE21, LE22, RE21, and RE22, respectively; the third hierarchical structure comprises three convolutional neural networks, namely F3, LE3 and RE 3; the fourth hierarchy includes two convolutional neural networks, LE4 and RE4, respectively;

s103: the preprocessed human face image is used as the input of the weight binarization convolutional neural network model for human eye positioning, the output of the weight binarization convolutional neural network model for human eye positioning is the final prediction coordinate of human eyes, and human eye positioning is completed;

s104: with the final predicted coordinates of the human eyes as the center, constructing a cutting frame to cut the human eye area to obtain a finally extracted human eye image;

s105: constructing a weight binarization cascaded convolutional neural network model for human eye state detection, wherein the weight binarization cascaded convolutional neural network model for human eye state detection comprises six convolutional layers, two pooling layers and two full-connection layers;

s106: sequentially training a weight binarization cascade convolution neural network model for human eye state detection by using a human face database and a human eye state database to obtain a trained weight binarization cascade convolution neural network model for human eye state detection;

s107: and inputting the finally extracted human eye image in the step S104 into the trained cascaded convolutional neural network model with the weight binarization for human eye state detection to obtain the final state of the human eye.

Further, in step S102, the preprocessing is performed on the face RGB image, and the obtaining of the preprocessed face image specifically includes: carrying out gray level transformation on the face RGB image to obtain a face gray level image; and (4) performing size cutting on the face gray level image to respectively obtain a left face image and a right face image of the face.

Further, step S103 specifically includes:

s201: inputting the face gray level image to F1 to obtain F1 predicted binocular coordinates; inputting the left face image into LE1, and obtaining left eye coordinates predicted by LE 1; inputting the right face image into RE1 to obtain the right eye coordinate predicted by RE 1;

s202: correspondingly adding the binocular coordinate predicted by F1, the left eye coordinate predicted by LE1 and the right eye coordinate predicted by RE1, and dividing by 2 to obtain the binocular coordinate finally predicted by the first level of the weight binarization convolution neural network model for human eye positioning;

s203: presetting a bounding box by taking the finally predicted binocular coordinate of the first level of the weight binarization convolutional neural network model for human eye positioning as the center, and taking the bounding box as the input of F2 of the first level of the weight binarization convolutional neural network model for human eye positioning to obtain the predicted binocular coordinate of F2; presetting a bounding box by taking the left-eye coordinate predicted by LE1 as a center, and taking the bounding box as the input of LE21 and LE22 of a second level of the weight binarization convolutional neural network model for human eye positioning to obtain the left-eye coordinate predicted by LE21 and LE 22; presetting a boundary box by taking the right-eye coordinate predicted by the RE1 as a center, and taking the boundary box as the input of RE21 and RE22 of a second level of the weight binarization convolutional neural network model for human eye positioning to obtain the right-eye coordinate predicted by the RE21 and RE 22;

s204: correspondingly adding the binocular coordinates predicted by the F2, the left eye coordinates predicted by the LE21 and the LE22 and the right eye coordinates predicted by the RE21 and the RE22, and dividing by 3 to obtain the binocular coordinates finally predicted by the second level of the weight binarization convolutional neural network model for human eye positioning;

s205: constructing a bounding box by taking the finally predicted binocular coordinate of the second level as the center, and taking the bounding box as the input of a third level F3 to obtain the predicted binocular coordinate of a third level F3; dividing the sum of the left-eye coordinates predicted by LE21 and LE22 by 2 to form a bounding box as the input of a third level LE3, and obtaining the left-eye coordinates predicted by a third level LE 3; dividing the sum of the right-eye coordinates predicted by RE21 and RE22 by 2 to form a boundary box serving as the input of a third-level RE3, and obtaining the right-eye coordinate predicted by a third-level RE 3;

s206: correspondingly adding the binocular coordinate predicted by the third level F3, the left eye coordinate predicted by the third level LE3 and the right eye coordinate predicted by the third level RE4, and dividing by 2 to obtain the final predicted binocular coordinate of the third level;

s207: constructing a bounding box by taking the left eye coordinate in the binocular coordinate finally predicted by the third level as the center, wherein the bounding box is used as the input of the fourth level LE4 to obtain the left eye coordinate predicted by the fourth level LE 4; taking the right-eye coordinate in the binocular coordinate finally predicted by the third level as the center, constructing a bounding box as the input of the RE4 of the fourth level, and obtaining the right-eye coordinate predicted by the RE4 of the fourth level; the left eye coordinate predicted by the fourth level LE4 and the right eye coordinate predicted by the fourth level RE4 jointly form the final predicted coordinate of the human eye output by the weight binarization convolution neural network model of the human eye positioning.

Further, in step S105, a cascaded convolutional neural network model for weight binarization for human eye state detection is constructed, specifically: the weight binarization cascade convolution neural network model for human eye state detection comprises two cascade convolution neural network models which are respectively a main weight binarization cascade convolution neural network model and a secondary weight binarization cascade convolution neural network model; the structure of the cascade convolution neural network model for the primary weight binarization is the same as that of the cascade convolution neural network model for the secondary weight binarization.

Further, step S106 specifically includes:

s301: pre-training the secondary weight binarization cascade convolution neural network model by using a face image database with a large number of samples to obtain secondary weight binarization cascade convolution neural network model initial parameters;

s302: transmitting the initial parameters of the secondary weight binarization cascaded convolutional neural network model to the primary weight binarization cascaded convolutional neural network model through transfer learning to obtain a primary weight binarization cascaded convolutional neural network model with the initial parameters;

s303: retraining the cascade convolution neural network model with the primary weight binaryzation of the initial parameters by using an image database marked with the human eye state to obtain a trained cascade convolution neural network model with the primary weight binaryzation; the trained cascaded convolutional neural network model with the main weight binaryzation is the trained cascaded convolutional neural network model with the weight binaryzation for detecting the human eye state.

A storage device stores instructions and data and is used for a weight binarization neural network and a transfer learning human eye state detection method.

A human eye state detection device based on weight binarization convolutional neural network and transfer learning comprises: a processor and a storage device; the processor loads and executes instructions and data in the storage device to realize a weight binarization neural network and a transfer learning human eye state detection method.

The beneficial effects provided by the invention are as follows: the method reduces or even overcomes the influence on human eye recognition caused by uncertainty of head posture, external environment illumination, interference under complex background conditions and shielding, and improves the robustness of human eye recognition.

Drawings

FIG. 1 is a schematic flow chart of the weight binarization neural network and the transfer learning human eye state detection method of the present invention;

FIG. 2 is a schematic diagram of a weight binarization convolution neural network model structure for human eye positioning according to the present invention;

FIG. 3 is a schematic diagram of a weight binarization convolutional neural network structure for human eye state detection according to the present invention;

FIG. 4 is a schematic diagram of the weight binarization convolutional neural network training process of the present invention;

FIG. 5 is a hardware device operational diagram of an embodiment of the present invention;

FIG. 6 is a schematic diagram of a histogram comparison of accuracy of a conventional human eye detection method and a human eye detection method of the present invention;

fig. 7 is a table comparing accuracy rates of a conventional human eye detection method and the human eye detection method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides a method for detecting a weight binarization neural network and a transfer learning human eye state, including the following steps:

s101: collecting RGB images of the human face by using a camera;

in the embodiment, the conventional camera acquires the face image, the frame rate is about 30 frames per second, and the image output format is RGB;

referring to fig. 2, fig. 2 is a schematic structural diagram of a weight binarization convolutional neural network model for human eye positioning according to the present invention;

in this embodiment, when the weight binarization convolutional neural network model for human eye positioning is trained, the adopted database is a laboratory Faces in the Wild (LFW) database;

in this embodiment, the weight is binarized, and the specific weight is limited to 1 or-1;

s105: constructing a weight binarization cascade convolution neural network model for human eye state detection, wherein the specific structure refers to fig. 2 and fig. 3 are schematic structural diagrams of the weight binarization convolution neural network for human eye state detection;

in the embodiment, a face database adopted by a weight binarization cascade convolution neural network model for training eye state detection is from a fer2013 face expression database; the data samples of the human eye state database are data samples combined by a CEW database and a ZJU database;

In step S102, the preprocessing is performed on the face RGB image, and the obtaining of the preprocessed face image specifically includes: carrying out gray level transformation on the face RGB image to obtain a face gray level image; and (4) performing size cutting on the face gray level image to respectively obtain a left face image and a right face image of the face.

Step S103 specifically includes:

In step S105, a cascaded convolutional neural network model for weight binarization for human eye state detection is constructed, specifically: the weight binarization cascade convolution neural network model for human eye state detection comprises two cascade convolution neural network models which are respectively a main weight binarization cascade convolution neural network model and a secondary weight binarization cascade convolution neural network model; the structure of the cascade convolutional neural network model for the primary weight binarization and the structure of the cascade convolutional neural network model for the secondary weight binarization are the same, namely the structure shown in fig. 3.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a training process of a weight binarization convolutional neural network according to the present invention;

step S106 specifically includes:

Referring to fig. 5, fig. 5 is a schematic diagram of a hardware device according to an embodiment of the present invention, where the hardware device specifically includes: a human eye state detection device 401, a processor 402 and a storage device 403 based on weight binary convolution neural network and transfer learning.

A human eye state detection device 401 based on weight binarization convolutional neural network and transfer learning: the human eye state detection device 401 based on the weight binarization convolutional neural network and the transfer learning realizes the weight binarization neural network and the transfer learning human eye state detection method.

The processor 402: the processor 402 loads and executes the instructions and data in the storage device 403 to implement the weight binarization neural network and the transfer learning human eye state detection method.

The storage device 403: the storage device 403 stores instructions and data; the storage device 403 is used to implement the weight binarization neural network and the transfer learning human eye state detection method.

Please refer to fig. 6 and 7; FIG. 6 is a schematic diagram illustrating a comparison of human eye state detection accuracy of a feature extraction method in a CEW database according to an embodiment of the present invention; FIG. 7 is a comparison of the accuracy of the eye state detection methods in the ZJU database according to the present invention.

Open in fig. 6 indicates the recognition accuracy in the human eye-Open state, Closed indicates the recognition accuracy in the human eye-Closed state, and Average indicates the Average human eye state recognition accuracy. Gabor, LBP, HOG Method and MultiHPOG in the figure are some traditional feature extraction methods, and Our Method is the human eye detection Method based on weight binarization convolutional neural network and transfer learning provided by the invention. As can be seen from the figure, the human eye state detection capability of LBP and MultiHPOG is better, the performance of Gabor is the worst, and the accuracy of the human eye state detection method provided by the invention is obviously higher than that of the traditional method.

The Method in fig. 7 is listed as some methods that have been proposed for application to human eye state detection, wherein Ourmethod is the weight binarization neural network and the transfer learning human eye state detection Method proposed by the present invention. The Accuracy column is the Accuracy result of the human eye state detection on the ZJU database by each method.

The invention comprehensively considers that in addition to the influence of the disordered image background on the positioning and state classification of human eyes, facial organs such as eyebrows, lips and the like can also cause troubles on the positioning and opening and closing state classification of the eyes. The conventional method such as the cascade classifier method is very easy to make wrong judgment. According to the invention, through the fine parameter adjustment of the six convolution layers, the two pooling layers, the two fully-connected layers and all the composition layers, the error that the misjudgment section is easy to generate in the traditional method under the above condition is overcome.

The invention comprehensively considers the problems of training efficiency and sample number in model training, and compared with the traditional convolutional neural network method, the invention overcomes the problems of overlong training time, insufficient training samples and the like of the traditional convolutional neural network through transfer learning and weight binarization, so that the time cost can be reduced and the recognition efficiency can be improved under the condition of finishing high-accuracy human eye state recognition.

The invention comprehensively considers the difficulty of popularization, does not need to wear any physical measuring equipment, does not influence the normal behavior of the detected person, has good universality and can be popularized to the aspects of production operation fatigue detection, automobile driving fatigue detection, aircraft driving attention detection and the like.

A convolutional neural network based on weight binarization and a human eye state detection method based on transfer learning. The binary convolution neural network contained in the method can effectively extract the state characteristics of human eyes, and the binary convolution neural network is not only beneficial to reducing the storage capacity of the model, but also can accelerate the calculation speed. The transfer learning applies the knowledge learned from the source domain to the target domain, namely, the trained model parameters are transferred to the new model to help the new model training, thereby improving the training efficiency of the new model

The beneficial effects of the implementation of the invention are as follows: the method reduces or even overcomes the influence on human eye recognition caused by uncertainty of head posture, external environment illumination, interference under complex background conditions and shielding, and improves the robustness of human eye recognition.

The features of the above-described embodiments and embodiments of the invention may be combined with each other without conflict.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A weight binarization neural network and a transfer learning human eye state detection method are characterized in that: the method specifically comprises the following steps:

s101: collecting RGB images of the human face by using a camera;

2. The method for detecting the state of the human eye through weight binarization neural network and transfer learning as claimed in claim 1, wherein: in step S102, the preprocessing is performed on the face RGB image, and the obtaining of the preprocessed face image specifically includes: carrying out gray level transformation on the face RGB image to obtain a face gray level image; and (4) performing size cutting on the face gray level image to respectively obtain a left face image and a right face image of the face.

3. The method for detecting the state of the human eye through weight binarization neural network and transfer learning as claimed in claim 2, wherein: step S103 specifically includes:

4. The method for detecting the state of the human eye through weight binarization neural network and transfer learning as claimed in claim 1, wherein: in step S105, a cascaded convolutional neural network model for weight binarization for human eye state detection is constructed, specifically: the weight binarization cascade convolution neural network model for human eye state detection comprises two cascade convolution neural network models which are respectively a main weight binarization cascade convolution neural network model and a secondary weight binarization cascade convolution neural network model; the structure of the cascade convolution neural network model for the primary weight binarization is the same as that of the cascade convolution neural network model for the secondary weight binarization.

5. The method for detecting the state of the human eye through the weight binarization neural network and the transfer learning as claimed in claim 4, wherein: step S106 specifically includes:

6. A storage device, characterized by: the storage device stores instructions and data for realizing the weight binarization neural network and the transfer learning human eye state detection method as claimed in any one of claims 1-5.

7. A human eye state detection device of weight binary convolution neural network and transfer learning is characterized in that: the method comprises the following steps: a processor and a storage device; the processor loads and executes instructions and data in the storage device to realize the weight binarization neural network and the transfer learning human eye state detection method as claimed in any one of claims 1 to 5.