CN112364831B

CN112364831B - Face recognition method and online education system

Info

Publication number: CN112364831B
Application number: CN202011384120.8A
Authority: CN
Inventors: 姜培生
Original assignee: Beijing Wisdom Rongsheng Technology Co ltd
Current assignee: Beijing wisdom Rongsheng Technology Co.,Ltd.
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-02-25
Anticipated expiration: 2040-11-30
Also published as: CN112364831A

Abstract

The invention discloses a face recognition method and an online education system, wherein the method comprises the following steps: acquiring a face image; inputting the face image into a pre-trained face recognition model, and outputting a face feature vector by the pre-trained face recognition model; identifying user identity information based on the face feature vector; the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; and the full connection layer takes the adjustment sign information as output. The obtained face feature vector has stronger resolution and can distinguish the face characteristics more accurately, thereby improving the accuracy of identifying the user identity information based on the face feature vector.

Description

Face recognition method and online education system

Technical Field

The invention relates to the technical field of computers, in particular to a face recognition method and an online education system.

Background

With the advancement of science and technology and the change of times, face recognition technology is widely applied to the fields of finance, education and the like, such as identity recognition in online education and the like.

At present, face recognition mainly performs identity recognition on a face user based on a feature vector of a face. However, the feature vector formed by the features extracted directly from the face image cannot accurately represent the effective information of the face of the user. Therefore, the accuracy of a mode of directly extracting the face feature vector from the face image to perform face recognition is relatively low, and particularly for a scene with high user identity recognition requirements, if the face recognition accuracy is low and the accuracy is poor, great threats can be brought to the personal safety and the property safety of a user.

Disclosure of Invention

The invention aims to provide a face recognition method and an online education system, which are used for solving the technical problems.

In a first aspect, an embodiment of the present invention provides a face recognition method, where the method includes:

acquiring a face image;

inputting the face image into a pre-trained face recognition model, and outputting a face feature vector by the pre-trained face recognition model;

identifying user identity information based on the face feature vector;

the pre-trained face recognition model comprises an upper convolution layer, a lower convolution layer, an adjusting layer and a full-connection layer;

the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information;

the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; and the full connection layer takes the adjustment characteristic information as output.

Optionally, adjusting the first feature information based on the second feature information to obtain adjusted feature information, including:

obtaining relative entropy of second feature information and the first feature information;

obtaining a quotient of second characteristic information and the first characteristic information;

taking the product of the quotient and the relative entropy as an adjustment factor;

and adding the first characteristic information to the adjusting factor to obtain adjusting characteristic information.

Optionally, the adjustment feature information is a feature vector of a group of face images.

Optionally, identifying user identity information based on the face feature vector includes:

matching the face feature vector with a feature vector stored in a large database, wherein the large database comprises a plurality of feature vectors;

and taking the user information corresponding to the feature vector which is successfully matched as the user identity information which needs to be identified.

Optionally, matching the face feature vector with a feature vector stored in a large database includes:

calculating the cross entropy of the face feature vector and each feature vector;

and taking the feature vector corresponding to the minimum value of the cross entropy as the feature vector successfully matched.

In a second aspect, an embodiment of the present invention provides an online education system, including:

the face acquisition module is used for acquiring a face image of the user and sending the face image to the face recognition module;

the face recognition module is used for inputting the face image into a pre-trained face recognition model, and the pre-trained face recognition model outputs a face feature vector; identifying user identity information based on the face feature vector, and sending the user identity information to a course management module; the pre-trained face recognition model comprises an upper convolution layer, a lower convolution layer, an adjusting layer and a full-connection layer; the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; the full connection layer takes the adjustment characteristic information as output;

and the course management module is used for acquiring the course to be learned by the user according to the identity information of the user and recommending the course to be learned to the user.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a face recognition method and an online education system, wherein the method comprises the following steps: acquiring a face image; inputting the face image into a pre-trained face recognition model, and outputting a face feature vector by the pre-trained face recognition model; identifying user identity information based on the face feature vector; the pre-trained face recognition model comprises an upper convolution layer, a lower convolution layer, an adjusting layer and a full-connection layer; the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; and the full connection layer takes the adjustment characteristic information as output. The lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; the full-connection layer takes the adjustment feature information as output, the output face feature vector is obtained by adjusting the extracted results of the up-sampling feature extraction and the down-sampling feature extraction, and the feature vector is obtained by combining the two extraction results, so that the finally output feature vector contains more features of the face, the obtained face feature vector has stronger resolution, the face features can be distinguished more accurately, and the accuracy of identifying the user identity information based on the face feature vector is improved.

Drawings

Fig. 1 is a flowchart of a face recognition method according to an embodiment of the present invention.

Fig. 2 is a schematic block structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

Examples

The embodiment of the invention provides a face recognition method, which can be applied to scenes needing face verification, such as financial systems, medical systems, online education systems, financial payment systems and the like, and can be particularly applied to interactive robots, artificial intelligence systems and the like. As shown in fig. 1, the method includes:

s101: and acquiring a human face image. The face image is acquired through a camera device, and is particularly acquired through the shooting of a suspected head.

S102: and inputting the face image into a pre-trained face recognition model, and outputting a face characteristic vector by the pre-trained face recognition model.

S103: and identifying user identity information based on the face feature vector. It should be noted that the user may be a student, a teacher, a doctor, a bank client, a bank worker, or the like.

The pre-trained face recognition model comprises an upper convolution layer, a lower convolution layer, an adjusting layer and a full-connection layer; the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; and the full connection layer takes the adjustment characteristic information as output. Namely, the output face feature vector is the adjustment feature information, and is obtained by adjusting the first feature information based on the second feature information.

By adopting the scheme, the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; the full-connection layer takes the adjustment feature information as output, the output face feature vector is obtained by adjusting the extracted results of the up-sampling feature extraction and the down-sampling feature extraction, and the feature vector is obtained by combining the two extraction results, so that the finally output feature vector contains more features of the face, the obtained face feature vector has stronger resolution, the face features can be distinguished more accurately, and the accuracy of identifying the user identity information based on the face feature vector is improved.

The lower convolutional layer includes a first convolutional layer, a second convolutional layer, and a third convolutional layer, and the upper convolutional layer includes a fourth convolutional layer, a fifth convolutional layer, and a sixth convolutional layer.

The training mode of the face recognition model is as follows:

obtaining a training sample, wherein the training sample comprises a labeled face image and an unlabeled face image;

inputting the training sample into a first convolution layer, and performing downsampling feature extraction on the training sample by the first convolution layer to obtain a first downsampling feature;

the second convolution layer carries out downsampling feature extraction on the first downsampling feature to obtain a second downsampling feature;

the third convolution layer carries out down-sampling feature extraction on the second down-sampling feature to obtain a third down-sampling feature;

the full connection layer carries out supervised learning on the third down-sampling feature corresponding to the labeled face image, a Loss function when the full connection layer carries out the supervised learning on the third down-sampling feature corresponding to the labeled face image is a first supervised Loss function, and the first supervised Loss function can adopt an Angular-Softmax Loss function;

unsupervised learning is carried out on the unmarked face image through a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer, and the unsupervised loss function is adopted as the loss function adopted for unsupervised learning of the unmarked face image through the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer;

when the fully connected layer finishes supervised learning on the third down-sampling features corresponding to the labeled face images and the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer and the sixth convolution layer finish unsupervised learning on the unlabeled face images, determining that the face recognition model training is finished, wherein the loss function of the face recognition model is determined according to the following formula:

Loss＝γ*LossA+(1-γ)*lossB；

wherein, Loss is a Loss function of the face recognition model, and LossA is a converged first supervised Loss function; lossB is the converged unsupervised loss function, γ is the control index, λ ∈ (0.5, 1).

When the first supervised loss function is converged, the full connection layer finishes supervised learning on the third downsampling features corresponding to the labeled face image; and when the unsupervised loss function converges, the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer perform unsupervised learning on the unmarked face image, and the unsupervised learning is finished.

The method for performing unsupervised learning on the unmarked face image through the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer comprises the following steps:

the fourth convolution layer performs up-sampling feature extraction on a third down-sampling feature corresponding to the unmarked face image to obtain a first up-sampling feature;

the fifth convolution layer performs up-sampling feature extraction on the first up-sampling feature to obtain a second up-sampling feature;

the sixth convolution layer performs upsampling feature extraction on the second upsampling feature to obtain a third upsampling feature;

controlling the third downsampling feature and the first upsampling feature by a first loss function;

controlling a second downsampling feature and a second upsampling feature by a second loss function;

controlling the first down-sampling feature and the third up-sampling feature by a third loss function;

and if the first loss function, the second loss function and the third loss function are converged, determining that the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer perform unsupervised learning on the unmarked face image, and finishing the unsupervised learning.

The converged unsupervised loss function is equal to the sum of the converged first, second and third loss functions, i.e.: LossB 1+ Loss2+ Loss3, where Loss1, Loss2, and Loss3 are respectively a first Loss function, a second Loss function, and a third Loss function, and the first Loss function, the second Loss function, and the third Loss function may adopt a smith l1Loss function.

When the first loss function, the second loss function and the third loss function are all converged, determining the output of the third convolution layer corresponding to the labeled face image as a retraining supervised feature; the training method of the face recognition model further comprises the following steps:

after the unsupervised learning of the unmarked face image is finished by the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer, the fully-connected layer conducts supervised learning on the retrained supervised feature, the second supervised loss function is adopted by the fully-connected layer for the supervised learning of the retrained supervised feature, and the second supervised loss function is equal to the weighted sum of the first supervised loss function, the first loss function, the second loss function and the third loss function, and is specifically calculated according to the following formula:

Loss1A＝λ*LossA+(1-λ)/3*Loss2+(1-λ)/3*Loss3+(1-λ)/3*Loss1；

where Loss1A represents a second supervised Loss function.

When the fully connected layer finishes the supervised learning of the retrained supervised sample based on the second supervised loss function, the loss function of the face recognition model is determined according to the following formula:

Loss＝γ*Loss1A+(1-γ)*lossB。

by adopting the scheme to train the face recognition model, the face recognition model with high recognition precision can be trained by using less training data. The basic cross entropy Loss-based improved Angular-Softmax Loss function is adopted, an unsupervised learning self-coding structure is introduced, and data labeling is greatly reduced and the input data volume is improved because the unsupervised learning self-coding structure does not need to label data. Simultaneously because the feature above self-coding structure to backbone (eigenvector) has learnt, promoted the distinguishability of embedding feature during reverse transmission, promoted the model to the discriminative power of similar but inequality face, face identification 1 can be promoted greatly to this structure: and N precision. N is a positive integer greater than 2.

Therefore, the first characteristic information can be finely adjusted according to the second characteristic information, and the accuracy of the adjustment of the characteristic information on the expression of the face information is improved.

The matching of the face feature vector and the feature vector stored in the big database is specifically that if the included angle between the face feature vector and the feature vector is less than 10 degrees and the variance between the face feature vector and the feature vector is less than 1, the matching of the face feature vector and the feature vector is determined to be successful.

The embodiment of the present application further provides an execution subject for executing the above steps, and the execution subject may be an online education system. The system comprises:

By adopting the scheme, the model is additionally provided with the self-coding structure, the self-coding structure does not need to label the class information of the sample X during training, and the self-coding structure compresses the input according to the input X to obtain a compressed feature vector (Embedding), namely the essence of original data. If the part is used in a face recognition model, the feature vector can replace the feature extraction result of the face, but in order to make the cosine distance (Cos distance) of the feature vector obtained by the model for the same face closer and the Cos distance of different faces farther, a supervision module LossA is added in the application. When the labeled sample is trained by using the LossA, the characteristic vector in the class can be closer to the characteristic vector in the network main branch (BackBone) as much as possible, and the characteristic vector outside the class is farther, while the characteristic vector obtained by the main branch (BackBone) can be closer to the original data when the unlabeled sample is self-encoded, and the labeled sample and the characteristic vector are mutually promoted, so that the extracted characteristic vector has distinguishing capability.

With regard to the system in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The embodiment of the present application further provides an electronic device, where the single sub-device may be an interactive robot, and as shown in fig. 2, the electronic device at least includes a data interface 501 and a processor 502. The processor 502 performs data interaction with the memory system 600 through the data interface 501, and the specific processor 502 performs data interaction with a memory block in the memory system 600 through the data interface 501.

In order to illustrate the data interaction between the processor 502 and the storage system 600, as a possible implementation, the processor 502 executes the following steps when executing the above-mentioned face recognition method: acquiring a face image; inputting the face image into a pre-trained face recognition model, and outputting a face feature vector by the pre-trained face recognition model; identifying user identity information based on the face feature vector; the pre-trained face recognition model comprises an upper convolution layer, a lower convolution layer, an adjusting layer and a full-connection layer; the lower convolution layer carries out down-sampling feature extraction on the face image to extract first feature information; the upper convolution layer carries out upper sampling feature extraction on the first feature information to obtain second feature information; the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; and the full connection layer takes the adjustment characteristic information as output.

Optionally, as shown in fig. 2, the electronic device further includes a storage system 600. Similarly, the processor 502 interacts with the memory blocks in the memory system 600 through the data interface 501.

Optionally, the electronic device further comprises a memory 504, a computer program stored on the memory 504 and executable on the processor 502, the processor 502 implementing the steps of any one of the face recognition methods described above when executing the program.

The storage system 600 may be the memory 504, or may be different from the memory 504, or the storage system 600 may be a partial storage partition of the memory 504, or the memory 504 may be a certain storage block in the storage system 600.

Where in fig. 2 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned face recognition methods.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Moreover, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A face recognition method, comprising:

acquiring a face image;

identifying user identity information based on the face feature vector;

the adjusting layer adjusts the first characteristic information based on second characteristic information to obtain adjusting characteristic information; the full connection layer takes the adjustment characteristic information as output;

the lower convolutional layer comprises a first convolutional layer, a second convolutional layer and a third convolutional layer, and the upper convolutional layer comprises a fourth convolutional layer, a fifth convolutional layer and a sixth convolutional layer;

the training mode of the face recognition model is as follows:

the full-connection layer performs supervised learning on the third down-sampling features corresponding to the labeled face images, and a first supervised loss function is adopted as a loss function when the full-connection layer performs supervised learning on the third down-sampling features corresponding to the labeled face images;

performing unsupervised learning on the unmarked face image through the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer; the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer and the sixth convolution layer adopt unsupervised loss functions for unsupervised learning of the unmarked face images;

Loss＝γ*LossA+(1-γ)*lossB；

wherein, Loss is a Loss function of the face recognition model, and LossA is a converged first supervised Loss function; lossB is a converged unsupervised loss function, gamma is a control index, and lambda belongs to (0.5, 1);

when the first supervised loss function is converged, the full connection layer finishes supervised learning on the third downsampling features corresponding to the labeled face image; when the unsupervised loss function is converged, the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer perform unsupervised learning on the unmarked face image, and the unsupervised learning is finished; the method for performing unsupervised learning on the unmarked face image through the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer comprises the following steps:

if the first loss function, the second loss function and the third loss function are converged, determining that the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer and the sixth convolutional layer perform unsupervised learning on the unmarked face image, and finishing the unsupervised learning;

the converged unsupervised loss function is equal to the sum of the converged first, second and third loss functions, i.e.:

LossB 1+ Loss2+ Loss3, where Loss1, Loss2, and Loss3 are respectively a first Loss function, a second Loss function, and a third Loss function, and the first Loss function, the second Loss function, and the third Loss function may adopt a smith l1Loss function;

when the first loss function, the second loss function and the third loss function are all converged, determining the output of the third convolution layer corresponding to the labeled face image as a retraining supervised feature;

Loss1A＝λ*LossA+(1-λ)/3*Loss2+(1-λ)/3*Loss3+(1-λ)/3*Loss1；

wherein Loss1A represents a second supervised Loss function;

Loss＝γ*Loss1A+(1-γ)*lossB。

2. the method of claim 1, wherein the adjusted feature information is a feature vector of a set of face images.

3. The method of claim 1, wherein identifying user identity information based on the facial feature vector comprises:

4. The method of claim 3, wherein matching the face feature vector to a feature vector stored in a large database comprises:

5. An online education system, characterized in that the system comprises:

the face acquisition module is used for acquiring a face image of a user and sending the face image to the face recognition module;

the course management module is used for acquiring courses to be learned by the user according to the identity information of the user and recommending the courses to be learned to the user;

the training mode of the face recognition model is as follows:

Loss＝γ*LossA+(1-γ)*lossB；

Loss1A＝λ*LossA+(1-λ)/3*Loss2+(1-λ)/3*Loss3+(1-λ)/3*Loss1；

wherein Loss1A represents a second supervised Loss function;

Loss＝γ*Loss1A+(1-γ)*lossB。

6. the system of claim 5, wherein the adjusted feature information is a feature vector of a set of face images.

7. The system of claim 5, wherein identifying user identity information based on the facial feature vector comprises:

8. The system of claim 7, wherein matching the face feature vector to a feature vector stored in a large database comprises: