CN116758617B - Campus student check-in method and campus check-in system under low-illuminance scene - Google Patents

Campus student check-in method and campus check-in system under low-illuminance scene Download PDF

Info

Publication number
CN116758617B
CN116758617B CN202311027739.7A CN202311027739A CN116758617B CN 116758617 B CN116758617 B CN 116758617B CN 202311027739 A CN202311027739 A CN 202311027739A CN 116758617 B CN116758617 B CN 116758617B
Authority
CN
China
Prior art keywords
low
data set
face
image
campus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311027739.7A
Other languages
Chinese (zh)
Other versions
CN116758617A (en
Inventor
肖芸
李武
云贵全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Information Technology College
Original Assignee
Sichuan Information Technology College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Information Technology College filed Critical Sichuan Information Technology College
Priority to CN202311027739.7A priority Critical patent/CN116758617B/en
Publication of CN116758617A publication Critical patent/CN116758617A/en
Application granted granted Critical
Publication of CN116758617B publication Critical patent/CN116758617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application provides a campus student check-in method and a campus check-in system under a low-light-intensity scene. The sign-in method comprises the following steps: the self-calibration technology is utilized to enhance the low-light intensity of the campus face image; creating a face sign-in encoder by using a converter neural network model and encoding an image; performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; creating a decoder by using a converter neural network model and realizing a low-light illuminance detection head; and training and storing the model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-light scene. The sign-in system comprises a processing unit, a coding unit, an alignment unit and an identification unit, so as to realize the sign-in method. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample. The overall properties of the image can be aligned to reduce feature bias.

Description

Campus student check-in method and campus check-in system under low-illuminance scene
Technical Field
The application relates to the technical field of face recognition check-in, in particular to a campus student check-in method and a campus check-in system under a low-light-intensity scene.
Background
Face recognition sign-in is a basic task of computer vision and is widely applied to industrial scenes such as face sign-in, automatic driving, scene understanding and the like. While low light environments are an integral part of everyday activities, low light environments pose a significant challenge to computer vision. In general, an image obtained during night or foggy days has characteristics of low contrast, low brightness, noise, and blurring due to insufficient light. Such images directly degrade the performance of the existing face check-in model, resulting in significant detection errors. Despite a major breakthrough in the face recognition field, existing research involves bright images, not dim light. Therefore, the campus face check-in method suitable for the low-illumination image is very important for application of the artificial intelligence in the campus. The face recognition system in the low illumination scene at present mainly comprises: (1) image enhancement-based detection methods. In order to obtain reliable detection in adverse conditions such as night or cloudy days, the requirement for low-light image enhancement must be met. This method requires pre-processing the low-light image to improve brightness and contrast, and then detecting the enhanced image. (2) an end-to-end detection based method. This class uses supervised learning to build a detection model and requires a large amount of annotated training data that is expensive and time consuming to collect. And (3) an adaptive detection method based on an unsupervised field. Using a labeled dataset as a source domain and an unlabeled dataset as a target domain helps the model learn domain invariant feature representation at the domain or class level. In the case of no or few tags in the normal illumination data set, the method may apply features learned from the normal illumination data to the low illumination image detection.
Disclosure of Invention
The application aims to at least solve one of the following technical problems in the prior art:
(1) Low light illumination scenes cannot be handled. In the existing campus face check-in system, images obtained in foggy days and nights have the characteristics of low contrast, low brightness, noise, blurring caused by insufficient light and the like. Such images directly degrade the performance of existing object detection models, resulting in significant detection errors.
(2) Image global feature loss. Meaning that the distribution of source domain images with normal illumination and target domain images with low illumination may not be exactly matched at the global image level using contrast loss, as the two domains have different scene layouts and object combinations.
(3) The local features of the image are lost. It is meant that the local features, such as texture and color of the source domain image with normal illumination and the target domain image with low illumination, are perfectly matched, which may fail due to the deviation of class-level semantics of the two images. In most current approaches, researchers are only concerned with using local or global feature alignment.
Therefore, the first aspect of the application provides a campus student check-in method under a low-light-intensity scene.
The second aspect of the application provides a campus check-in system.
The application provides a campus student check-in method under a low-light-intensity scene, which comprises the following steps:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology;
s2, creating a face sign-in encoder by using a converter neural network model and encoding an image;
s3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology;
s4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head;
and S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene.
According to the technical scheme, the campus student check-in method under the low-light-intensity scene can also have the following additional technical characteristics:
in the above technical solution, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set;
s12, performing low-light image enhancement on the images in the source domain data set by using a homomorphic filtering model, learning the illuminance relation between the low-illuminance images and the expected clear images, performing illuminance estimation while enhancing the images, acquiring enhanced output brightness by removing the estimated illuminance, and establishing an illuminance learning relation according to a homomorphic filtering theory;
s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, a self-calibration model is formed.
In the above technical solution, step S1 further includes:
s14, training a self-calibration model; unsupervised learning is employed to enhance the network learning capability, where fidelity is defined as the total loss of self-calibration model.
In the above technical solution, step S2 includes:
s21, creating a face sign-in encoder; the encoder is a bidirectional encoding structure based on a converter neural network, and comprises a multi-head attention and a feedforward neural network;
s22, dividing an image in the source domain data set into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image;
s24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing;
s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image.
In the above technical solution, in step S3, the performing multi-scale local feature alignment includes:
s31, shallow face image feature vectors of the low-illumination face images are sent into a gradient inversion layer GRL, an antagonism learning strategy is used for reducing loss of a multi-scale local feature alignment module in a forward propagation process, and in a reverse propagation process, the gradient inversion layer GRL multiplies input errors by negative scalar to increase loss of the multi-scale local feature alignment module, so that low-level feature differences of a source domain data set and a target domain data set are reduced;
s32, feeding the characteristic diagram generated in the S31 into a plurality of convolution layers with different channel sizes;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment module comprises local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data set.
In the above technical solution, in step S3, the multi-scale global feature alignment includes
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a domain classification layer, so that a domain classifier cannot distinguish whether the features come from a source domain data set or a target domain data set, and training a loss function of a multi-scale global feature alignment module by using a least square method, wherein the loss function of the multi-scale global feature alignment module comprises global feature alignment loss of the source domain data set and global feature alignment loss of the target domain data set.
In the above technical solution, step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; the decoder includes a multi-headed attention and feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network;
s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; and calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network.
In the above technical solution, step S43 includes:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix, a key matrix and a value matrix;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
In the above technical solution, step S5 includes:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set;
s52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition.
The application also provides a campus check-in system, which adopts the method of any one of the technical proposal to realize the campus student check-in under the low-light illumination scene, comprising the following steps:
the processing unit is used for enhancing low-illuminance of the campus face image by using a self-calibration technology, and a campus face image data preprocessing module is established;
the encoding unit is used for creating a face sign-in encoder by using the converter neural network model, encoding the image and creating a campus face encoding unit;
the alignment unit is used for carrying out multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology, and establishing a campus face alignment unit;
and the identification unit is used for creating a decoder by using the converter neural network model, realizing a low-light detection head, training and storing the model by using an unsupervised domain self-adaptive technology, and obtaining a face sign-in result in a low-light scene.
In summary, due to the adoption of the technical characteristics, the application has the beneficial effects that:
(1) The present application improves a generic object detection network by using a normal illumination image as a source domain and a low illumination image as a target domain. The application can effectively detect the object in the low-illumination image through unsupervised self-adaptation, and obviously reduce the dependence of the model on the sample.
(2) The present application develops a new domain adaptive multi-scale local feature alignment module and a multi-scale global feature alignment module that performs multi-scale local feature alignment on a feature map to align the perceived field in the feature map, thereby reducing low-level feature bias. And carrying out multi-scale global (image-level) feature alignment on the feature map to align the overall attribute of the image, thereby reducing feature deviations of the background, the scene, the target layout and the like.
(3) Based on comprehensive evaluation and comparison with the current method, the method provided by the application has the advantages that the performance is improved in the low-illumination campus face check-in, and the method provided by the application has good generalization capability.
Additional aspects and advantages of the application will be set forth in part in the description which follows, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
fig. 1 is a flow chart of a campus student check-in method in a low-light scene according to an embodiment of the present application;
fig. 2 is a block diagram of a campus check-in system according to an embodiment of the present application.
The correspondence between the reference numerals and the component names in fig. 1 to 2 is:
210. a processing unit; 220. a coding unit; 230. an alignment unit; 240. and an identification unit.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced otherwise than as described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.
A campus student check-in method and a campus check-in system in a low-light scene according to some embodiments of the present application are described below with reference to fig. 1 to 2.
Some embodiments of the application provide a campus student check-in method in a low-light-intensity scene.
As shown in fig. 1, a first embodiment of the present application provides a campus student check-in method under a low-light scene, including the following steps:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology; specifically, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set; in a specific embodiment, the first 80% of the extracted low-light face recognition dataset is set as the source domain dataset X S Then, the remaining 20% of the low-illuminance face recognition data set is extracted and set as the target domain data set X T The method comprises the steps of carrying out a first treatment on the surface of the Finally, the target domain data set X T The tag in (a) is deleted.
S12, using homomorphic filtering model to make source domain data set X S The image in (2) is subjected to low-light image enhancement, the illuminance relation between the low-light image and the expected clear image is learned, and the formula is as follows:c, wherein a is a desired clear image, y is a low-illumination image, and c is adjustable light; then, carrying out illumination estimation while enhancing the image, obtaining enhanced output brightness by removing the estimated illumination, and establishing an illumination learning relation according to a homomorphic filtering theory; wherein a parameter θ is introduced to map the illuminance relationship +.>The method comprises the steps of carrying out a first treatment on the surface of the The illuminance learning relationship F is:
wherein u is t Residual terms representing the T-th stage (t=0, …, T-1), T being the total number of stages, x t Representing the illuminance at the T-th stage (t=0, …, T-1), y being a low-illuminance image; wherein a weight sharing mechanism is employed, i.e. with the same architecture H and weights θ in each stage.
S13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; self-calibration maps s and v are then introduced and added to the low-light observations, presenting an illuminance difference between the input in each phase and the first phase, the formula:
wherein,a self-calibration model is generated; t is greater than or equal to 1, z t Is a sharp image input of each stage, v t Is the conversion input of each stage s t Is the mapping input of each stage, K θ Is an introduced parameterized operator and a learnable parameter.
Finally, the illuminance learning relation that forms the self-calibration model, i.e., the unit (t.gtoreq.1) converted into the basic t-th stage, can be written as:
in some embodiments, step S1 further comprises:
s14, training a self-calibration model; in view of the inaccuracy of existing training methods, unsupervised learning is employed to enhance the ability of the network to learn, where the total loss of the model is defined as L f ,L f Representing fidelity, then there are:
where d is the predicted illuminance result, e t-1 And outputting the result of the calibration.
S2, creating a face sign-in encoder by using a converter neural network model and encoding an image; specifically, step S2 includes:
s21, creating a face sign-in encoder; the encoders are bi-directional encoding structures based on a neural network of transducers, in one particular embodiment 12 in number, each comprising a multi-headed attention and a feed-forward neural network;
s22, dividing the image in the source domain data set Xs into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image; the calculation method comprises the following steps:
multihead = Concat(head 1 , …, head h )×W Q
wherein multihead represents image multi-head attention, concat () represents image attention connection function, head i Represents the i-th image attention, h represents the vector size, W Q Representing the weight vector, Q representing the query matrix, softmax (.) representing the normalization function, K T Representing the transposed key matrix, V representing the value matrix; d, d k Representing the dimensions of the key matrix.
S24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing, wherein the method comprises the following steps:
M=U+Sublayer(U)
wherein M represents the attention score vector after residual connection, U represents the attention of the face image, and subayer () represents the residual connection.
S25, inputting the attention score vector M subjected to normalization processing into a feedforward neural network FFN to perform linear transformation, and obtaining a shallow face image feature vector of the low-illumination face image. Specifically, the following formula is employed:
FFN(M) = Max(0, W 1 +b 1 )×W 2 +b 2
wherein Max (.) represents the activation function of the neuron, 0 represents the gradient of the activation function, W 1 Represents the weight of layer 1, W 2 Representing layer 2 weights, b 1 Representing parameters to be learned of the first layer, b 2 Representing the parameters to be learned of the second layer.
S3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology; specifically, in step S3, the performing multi-scale local feature alignment includes:
s31, sending shallow face image feature vectors of the low-illumination face image into a gradient inversion layer GRL, and using an antagonism learning strategy to furthest reduce the loss of a multi-scale local feature alignment module in the forward propagation process, wherein in the backward propagation process, the gradient inversion layer GRL multiplies an input error by a negative scalar to furthest increase the loss of the multi-scale local feature alignment module, so that the low-level feature difference of a source domain data set and a target domain data set is reduced;
s32, feeding the feature map generated in the S31 into a plurality of convolution layers with different channel sizes so as to improve domain invariance of features obtained by a feature extraction network;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer DC, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment moduleIncluding local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data setLoss of function. The calculation method comprises the following steps:
wherein the method comprises the steps ofAnd->Is the local feature alignment loss in the source domain and the target domain,/->Is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multiscale local feature extractor, < ->Is the output of the jth multi-scale domain classifier layer, W and H are the width and height of the feature map, W and H are the parameters representing the values of the width and height of the feature map, and>and->The total number of normally illuminated images and the total number of low illuminated images, respectively.
In step S3, the multi-scale global feature alignment includes
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a field classification layer DC, so that the field classifier cannot distinguish whether the features come from a source field data set or a target field data set, and the field invariance of the generated network is improved; training a loss function of a multi-scale global feature alignment module by using a least square methodIncluding global feature alignment loss for the source domain dataset and global feature alignment loss for the target domain dataset:
wherein,is the i-th image of the source field input, < >>Is the i-th image of the target field input, < >>Is the j-th multi-scale global feature extractor, < >>Is the output of the j-th multi-scale domain classification layer,/->And->The total number of normally illuminated images and the total number of low illuminated images, respectively.
S4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head; specifically, step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; in a specific embodiment, the number of decoders is 12, each decoder comprising a multi-headed attention and a feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network; the method specifically comprises the following steps:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix Q, a key matrix K and a value matrix V;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
S44, creating a face sign-in detection head network, and obtaining a weight vector by training a fully-connected neural networkAnd deviation term->The method comprises the steps of carrying out a first treatment on the surface of the For an input vector g of a first layer of the fully connected neural network, the number of input neurons is 768, and the number of output neurons is 2; according to the prediction result of the detection network, utilizing the trafficThe cross entropy loss function calculates face recognition effect loss estimates. Wherein, the forward propagation function P of the full-connection network and the loss function L of the full-connection network used in the calculation process f1 The formula of (2) is as follows:
wherein f is a network activation function; g is a face sample; s is the number of samples;loss for the ith sample; />A label representing sample i, with a positive value of 1 and a negative value of 0; />Representing the probability that sample i is predicted to be positive; n is the number of categories.
And S5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene. Specifically, step S5 includes:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set; the formula of the unsupervised domain adaptive training is as follows:
wherein,for final delivery of the modelGo out vector (I),>for knowledge learned in the source domain dataset, < > for>For knowledge learned on the target domain dataset.
S52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition. The formula of the face recognition classification Y is:
Y = f(K)×X+h
wherein f (-) is a network activation function, K represents a weight matrix, X is a semantic representation vector of a face image, and h is a model parameter to be learned.
The second embodiment of the present application provides a campus check-in system, as shown in fig. 2, for implementing a campus student check-in a low-light scene by using the method described in the above embodiment, including:
the processing unit 210 performs low-light intensity enhancement on the campus face image by using a self-calibration technology, and establishes a campus face image data preprocessing module;
the encoding unit 220 creates a face check-in encoder by using the converter neural network model and encodes the image, and creates a campus face encoding unit;
the alignment unit 230 performs multi-scale local feature alignment and multi-scale global feature alignment on the encoder by using a gradient inversion technology, and establishes a campus face alignment unit;
the recognition unit 240 creates a decoder and implements a low-light detection head using the converter neural network model, trains and saves the model using an unsupervised domain adaptive technique, and obtains a face check-in result in a low-light scene.
In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (5)

1. The campus student check-in method under the low-light-intensity scene is characterized by comprising the following steps of:
s1, performing low-illuminance enhancement on a campus face image by using a self-calibration technology;
s2, creating a face sign-in encoder by using a converter neural network model and encoding an image;
s3, performing multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology;
s4, creating a decoder by using a converter neural network model and realizing a low-illuminance detection head;
s5, training and storing a model by using an unsupervised domain self-adaptive technology to obtain a face sign-in result in a low-illuminance scene;
wherein, step S1 includes:
s11, establishing a low-illuminance face recognition data set, extracting at least part of data in the low-illuminance face recognition data set as a source domain data set, and taking the rest data in the low-illuminance face recognition data set as a target domain data set;
s12, performing low-light image enhancement on the images in the source domain data set by using a homomorphic filtering model, learning the illuminance relation between the low-illuminance images and the expected clear images, performing illuminance estimation while enhancing the images, acquiring enhanced output brightness by removing the estimated illuminance, and establishing an illuminance learning relation according to a homomorphic filtering theory;
s13, self-calibrating is conducted on the relationship of the contrast degree; firstly, defining a self-calibration module to enable each stage in the low-light image enhancement process to converge to the same state; defining the input of each previous stage as low light observation, bridging the input of each stage; then, introducing a self-calibration map and adding it to low-light observations, presenting an illuminance difference between the input in each stage and the first stage; finally, forming a self-calibration model;
s14, training a self-calibration model; enhancing the network learning capability by adopting unsupervised learning, wherein the fidelity is defined as the total loss of the self-calibration model;
the step S2 comprises the following steps:
s21, creating a face sign-in encoder; the encoder is a bidirectional encoding structure based on a converter neural network, and comprises a multi-head attention and a feedforward neural network;
s22, dividing an image in the source domain data set into a plurality of image blocks;
s23, inputting each image block into a face sign-in encoder to perform vector calculation, obtaining a multi-head attention vector of a face image, and creating a multi-head attention moment array to calculate the attention score of the multi-head attention vector of the face image;
s24, carrying out normalization processing on the attention score to obtain the attention score after normalization processing;
s25, inputting the attention score subjected to normalization processing into a feedforward neural network FFN for linear transformation to obtain a shallow face image feature vector of the low-illumination face image;
the step S4 includes:
s41, creating a face sign-in decoder; the decoder is a bidirectional coding structure based on a converter neural network; the decoder includes a multi-headed attention and feed-forward neural network;
s42, the feature vector obtained in the step S3 is sent to a decoder;
s43, traversing the decoder to realize a face detection head network;
s44, creating a face sign-in detection head network, and obtaining a weight vector and a deviation term by training a fully-connected neural network; calculating face recognition effect loss estimation by using a cross entropy loss function according to the prediction result of the detection network;
step S43 includes:
s431, traversing each decoder layer in turn to obtain an attention score;
s432, creating a multi-head attention matrix, wherein the multi-head attention moment matrix comprises a query matrix, a key matrix and a value matrix;
s433, calculating attention scores according to the multi-head attention matrix;
s434, carrying out normalization processing on the attention score to obtain a normalized attention score;
s435, inputting the attention score after normalization processing into a feedforward neural network, and outputting an image semantic vector through the feedforward neural network.
2. The method for checking in a campus student in a low light level scene as claimed in claim 1, wherein in step S3, the performing multi-scale local feature alignment includes:
s31, shallow face image feature vectors of the low-illumination face images are sent into a gradient inversion layer GRL, an antagonism learning strategy is used for reducing loss of a multi-scale local feature alignment module in a forward propagation process, and in a reverse propagation process, the gradient inversion layer GRL multiplies input errors by negative scalar to increase loss of the multi-scale local feature alignment module, so that low-level feature differences of a source domain data set and a target domain data set are reduced;
s32, feeding the characteristic diagram generated in the S31 into a plurality of convolution layers with different channel sizes;
s33, inputting the feature map processed in the S32 into a corresponding domain classification layer, and training a loss function of a multi-scale local feature alignment module by using a least square method, wherein the loss function of the multi-scale local feature alignment module comprises local feature alignment loss of a source domain data set and local feature alignment loss of a target domain data set.
3. The campus student check-in method in a low light level scenario of claim 2, wherein in step S3, the multi-scale global feature alignment comprises
S34, sending the output vector subjected to multi-scale local feature alignment into a gradient inversion layer GRL, and using an antagonism learning strategy, wherein the gradient inversion layer GRL minimizes the loss of a multi-scale global feature alignment module during forward propagation and maximizes the loss of the multi-scale global feature alignment module by multiplying an input error by a negative scalar during backward propagation, thereby reducing low-level feature differences between a source domain data set and a target domain data set;
s35, feeding the characteristic diagram generated in the S34 into a plurality of convolution layers with different channel sizes;
s36, inputting the feature map processed in the S35 into a domain classification layer, so that a domain classifier cannot distinguish whether the features come from a source domain data set or a target domain data set, and training a loss function of a multi-scale global feature alignment module by using a least square method, wherein the loss function of the multi-scale global feature alignment module comprises global feature alignment loss of the source domain data set and global feature alignment loss of the target domain data set.
4. A campus student check-in method in a low light level scenario as claimed in claim 3, wherein step S5 comprises:
s51, performing self-adaptive training in an unsupervised domain; firstly, training a model by using a source domain data set, and storing the model and setting the model as a source domain after training; then, the learned knowledge of the model is migrated to a target domain data set and set as a target domain; finally, testing the model by using the target domain data set;
s52, recognizing a face sign-in; and (5) classifying and regressing the output vector of the model in the step (S51) by using a fully connected neural network to obtain a detection result of face recognition.
5. A campus check-in system, wherein the method of any one of claims 1 to 4 is used to implement a campus student check-in a low light scene, comprising:
the processing unit is used for enhancing low-illuminance of the campus face image by using a self-calibration technology, and a campus face image data preprocessing module is established;
the encoding unit is used for creating a face sign-in encoder by using the converter neural network model, encoding the image and creating a campus face encoding unit;
the alignment unit is used for carrying out multi-scale local feature alignment and multi-scale global feature alignment on the encoder by utilizing a gradient inversion technology, and establishing a campus face alignment unit;
and the identification unit is used for creating a decoder by using the converter neural network model, realizing a low-light detection head, training and storing the model by using an unsupervised domain self-adaptive technology, and obtaining a face sign-in result in a low-light scene.
CN202311027739.7A 2023-08-16 2023-08-16 Campus student check-in method and campus check-in system under low-illuminance scene Active CN116758617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311027739.7A CN116758617B (en) 2023-08-16 2023-08-16 Campus student check-in method and campus check-in system under low-illuminance scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311027739.7A CN116758617B (en) 2023-08-16 2023-08-16 Campus student check-in method and campus check-in system under low-illuminance scene

Publications (2)

Publication Number Publication Date
CN116758617A CN116758617A (en) 2023-09-15
CN116758617B true CN116758617B (en) 2023-11-10

Family

ID=87953595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311027739.7A Active CN116758617B (en) 2023-08-16 2023-08-16 Campus student check-in method and campus check-in system under low-illuminance scene

Country Status (1)

Country Link
CN (1) CN116758617B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807740A (en) * 2019-09-17 2020-02-18 北京大学 Image enhancement method and system for window image of monitoring scene
CN113052210A (en) * 2021-03-11 2021-06-29 北京工业大学 Fast low-illumination target detection method based on convolutional neural network
CN113111947A (en) * 2021-04-16 2021-07-13 北京沃东天骏信息技术有限公司 Image processing method, apparatus and computer-readable storage medium
CN113269903A (en) * 2021-05-24 2021-08-17 上海应用技术大学 Face recognition class attendance system
CN113902915A (en) * 2021-10-12 2022-01-07 江苏大学 Semantic segmentation method and system based on low-illumination complex road scene
CN114998145A (en) * 2022-06-07 2022-09-02 湖南大学 Low-illumination image enhancement method based on multi-scale and context learning network
CN115861101A (en) * 2022-11-29 2023-03-28 福州大学 Low-illumination image enhancement method based on depth separable convolution
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism
WO2023092386A1 (en) * 2021-11-25 2023-06-01 中国科学院深圳先进技术研究院 Image processing method, terminal device, and computer readable storage medium
CN116580243A (en) * 2023-05-24 2023-08-11 北京理工大学 Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10997690B2 (en) * 2019-01-18 2021-05-04 Ramot At Tel-Aviv University Ltd. Method and system for end-to-end image processing

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807740A (en) * 2019-09-17 2020-02-18 北京大学 Image enhancement method and system for window image of monitoring scene
CN113052210A (en) * 2021-03-11 2021-06-29 北京工业大学 Fast low-illumination target detection method based on convolutional neural network
CN113111947A (en) * 2021-04-16 2021-07-13 北京沃东天骏信息技术有限公司 Image processing method, apparatus and computer-readable storage medium
CN113269903A (en) * 2021-05-24 2021-08-17 上海应用技术大学 Face recognition class attendance system
CN113902915A (en) * 2021-10-12 2022-01-07 江苏大学 Semantic segmentation method and system based on low-illumination complex road scene
WO2023092386A1 (en) * 2021-11-25 2023-06-01 中国科学院深圳先进技术研究院 Image processing method, terminal device, and computer readable storage medium
CN114998145A (en) * 2022-06-07 2022-09-02 湖南大学 Low-illumination image enhancement method based on multi-scale and context learning network
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism
CN115861101A (en) * 2022-11-29 2023-03-28 福州大学 Low-illumination image enhancement method based on depth separable convolution
CN116580243A (en) * 2023-05-24 2023-08-11 北京理工大学 Cross-domain remote sensing scene classification method for mask image modeling guide domain adaptation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Low-Light Image Enhancement Combined with Attention Map and U-Net Network;Weiji He等;《2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE)》;397-401 *
Multi-Scale Feature Guided Low-Light Image Enhancement;Lanqing Guo等;《2021 IEEE International Conference on Image Processing (ICIP)》;554-558 *
低光照条件下的图像增强和识别关键技术研究;梁锦秀;《中国博士学位论文全文数据库 (信息科技辑)》(第01期);I138-97 *
基于注意力机制和域适应的低照度目标检测方法研究;肖芸;《中国优秀硕士学位论文全文数据库 (信息科技辑)》(第02期);I138-3196 *

Also Published As

Publication number Publication date
CN116758617A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN110414462B (en) Unsupervised cross-domain pedestrian re-identification method and system
CN105447473B (en) A kind of any attitude facial expression recognizing method based on PCANet-CNN
CN108985268B (en) Inductive radar high-resolution range profile identification method based on deep migration learning
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
Komorowski et al. Minkloc++: lidar and monocular image fusion for place recognition
CN114492574A (en) Pseudo label loss unsupervised countermeasure domain adaptive picture classification method based on Gaussian uniform mixing model
CN112381788B (en) Part surface defect increment detection method based on double-branch matching network
CN111931814B (en) Unsupervised countering domain adaptation method based on intra-class structure tightening constraint
CN109492610B (en) Pedestrian re-identification method and device and readable storage medium
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN111242870B (en) Low-light image enhancement method based on deep learning knowledge distillation technology
CN114283325A (en) Underwater target identification method based on knowledge distillation
CN116486408A (en) Cross-domain semantic segmentation method and device for remote sensing image
CN116758617B (en) Campus student check-in method and campus check-in system under low-illuminance scene
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
CN116246338B (en) Behavior recognition method based on graph convolution and transducer composite neural network
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
Schenkel et al. Domain adaptation for semantic segmentation using convolutional neural networks
CN116797821A (en) Generalized zero sample image classification method based on fusion visual information
CN115761268A (en) Pole tower key part defect identification method based on local texture enhancement network
CN117523626A (en) Pseudo RGB-D face recognition method
Xia et al. Multi-RPN Fusion-Based Sparse PCA-CNN Approach to Object Detection and Recognition for Robot-Aided Visual System
CN116895002B (en) Multi-graph contrast learning-based method and system for detecting adaptive targets from domain
CN117523549B (en) Three-dimensional point cloud object identification method based on deep and wide knowledge distillation
CN116129198B (en) Multi-domain tire pattern image classification method, system, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant