CN114220136A

CN114220136A - Method, system and device for training user behavior recognition model and face recognition

Info

Publication number: CN114220136A
Application number: CN202111286929.1A
Authority: CN
Inventors: 王易木; 于鲲
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-03-22

Abstract

The application discloses a training method of a user behavior recognition model, which comprises the following steps: acquiring data of a label-free time sequence sensor as original data; obtaining time domain data, frequency domain data and statistic data, inputting the time domain data, the frequency domain data and the statistic data into the model, and pre-training the model to obtain a pre-training model; acquiring time domain data, frequency domain data and statistic data of a sensor with a label; and respectively inputting the time domain data, the frequency domain data and the statistic data to a pre-training model to obtain a user behavior recognition model. According to the method, data of the sensor is input into the model, and the problems that long-term dependence of a traditional neural network, the sequence attribute of the traditional neural network cannot capture long-distance features and the like are solved when the features of the data of the sensor are extracted based on the attention mechanism of the model. And the interaction attention mechanism based on the model can enable multi-dimensional characteristics of data of the time sequence sensor to realize interaction, so that the understanding accuracy of the model to the data is improved, and the accuracy of behavior category identification is further improved.

Description

Method, system and device for training user behavior recognition model and face recognition

Technical Field

The present application relates to the field of facial recognition technology, and in particular, to a training method for a user behavior recognition model, a method for facial recognition, a system for facial recognition, a training apparatus for a user behavior recognition model, an apparatus for facial recognition, an electronic device, and a computer storage medium.

Background

In recent years, with the development of deep learning techniques, face recognition systems have been widely used in the fields of finance, security, traffic, education, and the like. Face recognition systems have been widely spread and have revealed a number of security issues. For example, privacy security, transmission security, and storage security of facial data, and various prosthetic attacks on facial recognition systems, etc. In particular, since the facial prosthesis of a legitimate user is manufactured through various media such as electronic photographs, printed photographs, recorded and broadcast videos, and then attacks the facial recognition system, the prosthesis attack puts higher demands on the security guarantee of facial recognition, which also forces the need for further improvement of the facial recognition technology.

The existing face recognition technology mostly adopts various optical sensing cameras to extract essential difference characteristics of true faces and false faces, so that the threat of various attack presenting means to a face recognition system is better dealt with. The sensors used in the current face recognition method mainly include: visible light cameras, near infrared cameras, depth cameras, heat cameras, and multispectral cameras, with which certain human physiological information, facial texture information, and geometric shape information, etc., key features for facial recognition methods, may be captured or enhanced.

However, many mobile devices do not have multiple image sensors, and thus it is impractical to use a multi-view image sensor for anti-counterfeit detection, thereby affecting the accuracy of face recognition. And the acquisition and processing of the depth sensor data have limitations, which is not beneficial to improving the accuracy of the facial recognition.

Therefore, how to improve the accuracy of face recognition becomes an urgent problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a training method of a user behavior recognition model, and aims to solve the problem of how to improve the accuracy of face recognition in the prior art. The embodiment of the application also provides a method for facial recognition, a system for facial recognition, a training device for a user behavior recognition model, a device for facial recognition, an electronic device and a computer storage medium.

The embodiment of the application provides a training method of a user behavior recognition model, which comprises the following steps:

acquiring data of a label-free time sequence sensor as original data;

preprocessing the original data to obtain time domain data, frequency domain data and statistic data;

respectively inputting the time domain data, the frequency domain data and the statistic data into a model, and performing pre-training to obtain a pre-training model;

acquiring time domain data of the time sequence sensor with the label, and acquiring corresponding frequency domain data and statistic data;

and respectively inputting the time domain data, the frequency domain data and the statistic data of the timing sequence sensor with the label to the pre-training model, and performing retraining to obtain the user behavior recognition model.

Optionally, the inputting the time domain data, the frequency domain data, and the statistic data into a model respectively, and performing pre-training to obtain a pre-training model, includes:

extracting time domain characteristics, frequency domain characteristics and statistic characteristics corresponding to the time domain data, the frequency domain data and the statistic data;

and aggregating the time domain features, the frequency domain features and the statistic features to be used as training data.

Optionally, after extracting time domain features, frequency domain features, and statistic features corresponding to the time domain data, the frequency domain data, and the statistic data, the method further includes:

obtaining the similar characteristics and the different characteristics among the time domain characteristics, the frequency domain characteristics and the statistic characteristics;

and aggregating the similar features and the different features to be used as training data.

and inputting the time domain data, the frequency domain data and the statistic data to different processing units of the model in parallel.

Optionally, the model is a transfomer model, and the processing unit is an encoder of the transfomer.

Optionally, before the time domain data, the frequency domain data, and the statistic data are respectively input to the model, the method further includes: preprocessing time domain data, frequency domain data and statistic data;

correspondingly, before the time domain data, the frequency domain data and the statistic data of the labeled time sequence sensor are respectively input to the pre-training model, the method further comprises the following steps: and preprocessing the time domain data, the frequency domain data and the statistic data of the labeled time sequence sensor.

Optionally, the data of the untagged time-series sensor and the time-domain data of the tagged time-series sensor respectively include time stamp information.

An embodiment of the present application further provides a method for face recognition, including:

acquiring face information of an object to be recognized through an image sensor, wherein the face information at least comprises motion information of a face or a part of the face;

inputting the face information into an image recognition model, and obtaining a first recognition result including identity information of an object to be recognized and whether the object is a living body;

the method comprises the steps that when face information of an object to be identified is obtained, the object to be identified is monitored through a time sequence sensor, and monitoring data of the time sequence sensor are obtained;

inputting the monitoring data into the user behavior recognition model generated by the training method of the user behavior recognition model, and obtaining a second recognition result of the object to be recognized, wherein the second recognition result comprises the behavior category of the object to be recognized;

and obtaining the identity information of the object to be recognized according to the first recognition result and the second recognition result, and determining whether the object to be recognized is a third recognition result of the living body.

An embodiment of the present application further provides a system for face recognition, including: the system comprises an image recognition model, a time sequence sensor, a decision model and a user behavior recognition model;

the image sensor is used for acquiring the face information of an object to be recognized and sending the face information of the object to be recognized to the image recognition model; the face information includes at least motion information of a face or a part of a face;

the image recognition model is used for receiving the face information of the object to be recognized, obtaining a first recognition result including the identity information of the object to be recognized and whether the object to be recognized is a living body according to the face information of the object to be recognized, and sending the first recognition result including the identity information of the object to be recognized and whether the object to be recognized is a living body to the decision model;

the time sequence sensor is used for acquiring the facial information of the object to be identified, acquiring the monitoring data of the time sequence sensor by monitoring the object to be identified and inputting the monitoring data of the time sequence sensor into the user behavior identification model;

the user behavior recognition model is used for receiving monitoring data of the time sequence sensor, obtaining a second recognition result comprising the behavior category of the object to be recognized according to the monitoring data, and sending the second recognition result to the decision model;

the decision model is used for receiving the first recognition result and the second recognition result, obtaining identity information of the object to be recognized according to the first recognition result and the second recognition result, and determining whether the object to be recognized is a third recognition result of a living body.

The embodiment of the present application further provides a training apparatus for a user behavior recognition model, including:

the tag-free data acquisition unit is used for acquiring data of the tag-free time sequence sensor as original data;

the preprocessing unit is used for preprocessing the original data to obtain time domain data, frequency domain data and statistic data;

the pre-training unit is used for respectively inputting the time domain data, the frequency domain data and the statistic data into a model for pre-training to obtain a pre-training model;

the system comprises a tagged data acquisition unit, a time domain data acquisition unit and a statistical data acquisition unit, wherein the tagged data acquisition unit is used for acquiring time domain data of a tagged time sequence sensor and acquiring corresponding frequency domain data and statistical data;

and the retraining unit is used for respectively inputting the time domain data, the frequency domain data and the statistic data of the timing sequence sensor with the label into the pre-training model for retraining to obtain the user behavior recognition model.

An embodiment of the present application further provides an apparatus for face recognition, including:

a face information acquisition unit configured to acquire, by an image sensor, face information of an object to be recognized, the face information including at least motion information of a face or a part of a face;

a first recognition result acquisition unit configured to input the face information into an image recognition model, and acquire a first recognition result including identity information of an object to be recognized and whether the object is a living body;

the monitoring data acquisition unit is used for acquiring the facial information of the object to be identified, monitoring the object to be identified through a time sequence sensor and acquiring the monitoring data of the time sequence sensor;

a second recognition result obtaining unit, configured to input the monitoring data into the user behavior recognition model generated by the user behavior recognition model training method, and obtain a second recognition result including a behavior category of the object to be recognized;

and the identification result determining unit is used for obtaining the identity information of the object to be identified according to the first identification result and the second identification result and determining whether the object to be identified is a third identification result of the living body.

An embodiment of the present application further provides an electronic device, where the electronic device includes: a processor; a memory for storing a computer program for execution by the processor to perform the above described method.

An embodiment of the present application further provides a computer storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the method described above.

Compared with the prior art, the method has the following advantages:

the embodiment of the application provides a training method of a user behavior recognition model, which comprises the following steps: acquiring data of a label-free time sequence sensor as original data; preprocessing the original data to obtain time domain data, frequency domain data and statistic data; respectively inputting the time domain data, the frequency domain data and the statistic data into a model, and performing pre-training to obtain a pre-training model; acquiring time domain data of the time sequence sensor with the label, and acquiring corresponding frequency domain data and statistic data; and respectively inputting the time domain data, the frequency domain data and the statistic data of the timing sequence sensor with the label to the pre-training model, and performing retraining to obtain the user behavior recognition model. According to the embodiment of the application, the data of the time sequence sensor is input into the model, the multi-head attention mechanism based on the model can be used for solving the problems that the long-term dependence of a traditional neural network, the sequence attribute of the traditional neural network cannot capture long-distance features and the like when the features of the data of the time sequence sensor are extracted, and meanwhile, the extraction time of the features of the data of the time sequence sensor is shortened. In addition, the interaction attention mechanism based on the model can enable the multi-dimensional features of the data of the time sequence sensor to realize interaction, namely multi-dimensional feature extraction, and the time domain features, the frequency domain features and the statistic features are fused, so that not only can the priori knowledge be fused, but also the multi-dimensional complementarity features can be automatically extracted through the model, the obtained features can be more comprehensively expressed, the understanding accuracy of the model on the data of the time sequence sensor is further improved, and the accuracy of behavior category identification is further improved.

The embodiment of the application provides a method for face recognition, which comprises the steps of acquiring face information of an object to be recognized through an image sensor, wherein the face information at least comprises motion information of a face or a part of the face; inputting the face information into an image recognition model, and obtaining a first recognition result including identity information of an object to be recognized and whether the object is a living body; the method comprises the steps that when face information of an object to be identified is obtained, the object to be identified is monitored through a time sequence sensor, and monitoring data of the time sequence sensor are obtained; inputting the monitoring data into a user behavior recognition model to obtain a second recognition result comprising the behavior category of the object to be recognized; and obtaining the identity information of the object to be recognized according to the first recognition result and the second recognition result, and determining whether the object to be recognized is a third recognition result of the living body. According to the embodiment of the application, the face information of a user is acquired through the image sensor, the first identification result is correspondingly acquired, the monitoring data of the time sequence sensor is acquired while the face information is monitored, the monitoring data of the time sequence sensor is input into the user behavior identification model, the second identification result is acquired, and therefore the identity information of the object to be identified and whether the object to be identified is a living body are acquired according to the first identification result and the second identification result. The monitoring data of the time sequence sensor based on normal authentication of the user is obviously different from the information of the time sequence sensor when the prosthesis attacks the system, and a second identification result corresponding to the monitoring data of the time sequence sensor is output by depending on the user behavior identification model and is combined with the first identification result, so that the accuracy of facial identification is improved.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in the present application.

Fig. 2 is a flowchart of a training method for a user behavior recognition model according to a first embodiment of the present application.

Fig. 3 is a schematic diagram of a user behavior recognition model according to a first embodiment of the present application.

Fig. 4 is a flowchart of a method for face recognition according to a second embodiment of the present application.

Fig. 5 is a schematic diagram of a system for face recognition according to a third embodiment of the present application.

Fig. 6 is a schematic diagram of a training apparatus for a user behavior recognition model according to a fourth embodiment of the present application.

Fig. 7 is a schematic diagram of an apparatus for face recognition according to a fifth embodiment of the present application.

Fig. 8 is a schematic view of an electronic device according to a sixth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the present application. The embodiments of this application are capable of embodiments in many different forms than those described herein and can be similarly generalized by those skilled in the art without departing from the spirit and scope of the embodiments of this application and, therefore, the embodiments of this application are not limited to the specific embodiments disclosed below.

In order to make those skilled in the art better understand the solution of the present application, a detailed description is given below of a specific application scenario of an embodiment of the present application based on the method for face recognition provided by the present application, as shown in fig. 1, which is a schematic diagram of the application scenario provided by the present application.

The application scenario is a scenario in which a user unlocks a terminal. The terminal can be a mobile phone, a computer, a tablet computer and other various types of equipment. When a user needs to unlock the terminal, the user places the terminal at a proper position by hand, an image sensor of the terminal shoots the face of the user, and face information of the user is obtained, wherein the face information at least comprises the motion information of the face or part of the face. The motion information of the face is the motion information of all faces performed by the user according to the instruction of the terminal, and the motion information is the information of the motion performed by the face, such as facial expression and shaking motion. The motion information of the partial face is motion information of the partial face performed by the user in response to an instruction from the terminal, and the motion information is information of a face performing motion, for example, blinking motion, mouth opening motion, or the like.

After obtaining the face information of the user, a first recognition result including the identity information of the object to be recognized and whether the object is a living body corresponding to the face information of the user may be obtained through an image recognition model. The identity information of the object to be recognized is used for representing the identity of the object to be recognized, such as an XX user. It is noted that the first recognition result corresponding to the face information of the user may determine whether or not the object to be recognized is a living body under the corresponding detection condition. For example, the detection condition is set as the detection of a picture, when the terminal instructs the object to be recognized to make a corresponding action of shaking head, smiling, etc., the object with detection on the picture cannot be completed, and then it can be determined whether the object to be recognized is a living body through the first recognition result.

In this scenario, while the image sensor of the terminal acquires the facial information of the user, the time sequence sensor of the terminal also detects the face of the user, and correspondingly acquires the monitoring data of the time sequence sensor. The monitoring data of the time sequence sensor at least comprises gyroscope monitoring data and linear accelerator monitoring data. The monitoring data of the time sequence sensor is correspondingly monitored when the user hand control terminal moves. For example, when a user photographs a face using a terminal to acquire face information, a hand of the user is shaken, so that monitoring data of a timing sensor is data corresponding to monitoring when the terminal moves.

After the monitoring data of the time sequence sensor is obtained, the monitoring data of the time sequence sensor can be input into the user behavior recognition model, so that a second recognition result comprising the behavior category of the object to be recognized can be obtained through the user behavior recognition model according to the monitoring data of the time sequence sensor. The second recognition result obtained by the user behavior recognition model is matched with the first recognition result, which may be a supplement to the first recognition result, or may be an auxiliary image recognition model for further determining whether the object to be recognized is a living body. For example, when performing verification by video recording, the video recording is of motion such as shaking head or smiling, and the first recognition result is that it is not possible to correctly determine whether the object to be recognized is a living body. But the behavior classification information determined by the second recognition result has a specific relation and a specific rule with the behavior actions of the user such as shaking head, smiling and the like, so that the recognition result of whether the object to be recognized is a living body can be obtained while the identity information of the object to be recognized is obtained. Namely, the identity information of the object to be recognized and the recognition result of whether the object to be recognized is a living body are obtained according to the first recognition result and the second recognition result. And after obtaining the identity information of the object to be recognized and the recognition result of whether the object to be recognized is a living body, sending the obtained face recognition result of the object to be recognized to the terminal.

The user behavior recognition model is obtained by training in the following way: acquiring data of a label-free time sequence sensor as original data; preprocessing the original data to obtain time domain data, frequency domain data and statistic data; and respectively inputting the time domain data, the frequency domain data and the statistic data into a model, and pre-training to obtain a pre-training model. And then, acquiring time domain data of the time sequence sensor with the label, acquiring corresponding frequency domain data and statistic data, respectively inputting the time domain data, the frequency domain data and the statistic data of the time sequence sensor with the label to the pre-training model, and performing retraining to obtain the user behavior recognition model. Further training of the user behavior recognition model may be found in the detailed description of the subsequent embodiments.

The monitoring data of the time sequence sensor is input into the user behavior recognition model, the problems that long-term dependence of a traditional neural network, long-distance features cannot be captured by sequence attributes of the traditional neural network and the like are solved when the features of the monitoring data are extracted based on a multi-head attention mechanism of a transform coding layer of the user behavior recognition model, and meanwhile, the extraction time of the features of the monitoring data is shortened. In addition, interaction of multi-dimensional features of the monitoring data can be achieved based on the interaction attention mechanism of the user behavior recognition model, and therefore the understanding accuracy of the user behavior recognition model on the monitoring data is improved.

The method comprises the steps of obtaining facial information of a user through an image sensor, correspondingly obtaining a first identification result including identity information of an object to be identified and whether the object to be identified is a living body, obtaining corresponding monitoring data when a time sequence sensor monitors the facial information, inputting the monitoring data of the time sequence sensor into a user behavior identification model, obtaining a second identification result, and obtaining the identity information of the object to be identified and the identification result whether the object to be identified is the living body according to the first identification result and the second identification result. The monitoring data of the time sequence sensor based on normal authentication of the user is obviously different from the information of the time sequence sensor when the prosthesis attacks the system, and a second identification result corresponding to the monitoring data of the time sequence sensor is output by depending on the user behavior identification model and is combined with the first identification result, so that the accuracy of facial identification is improved.

It should be noted that the specific limitation made on the application scenario of the method for face recognition in the embodiment of the present application is only one embodiment of the application scenario of the method for face recognition provided in the present application, and the application scenario embodiment is provided to facilitate understanding of the method for face recognition provided in the present application, and is not used to limit the method for face recognition provided in the present application. The face recognition in the embodiment of the present application also has other application scenarios, which will not be described in detail here.

Corresponding to the above scenario, a first embodiment of the present application provides a training method for a user behavior recognition model, as shown in fig. 2, fig. 2 is a flowchart of the training method for the user behavior recognition model provided in the first embodiment of the present application. The method comprises the following steps:

in step S201, data of the unlabeled time-series sensor is acquired as raw data.

In this step, the data of the unlabeled time-series sensor may be obtained through a database, or may be obtained from historical data obtained by the time-series sensor, and there are many ways to obtain the data of the unlabeled time-series sensor, which is not limited in this embodiment. The data of the untagged timing sensor includes time stamp information. The timestamp information is used to tell the model when each data was entered. The data quantity based on the timing sequence sensor without the label is large, so that the model can be trained by using large-scale data, and the problem of overfitting of the model caused by small data volume can be effectively prevented.

Step S202, preprocessing the original data to obtain time domain data, frequency domain data and statistic data.

After the data of the unlabeled time sequence sensor is obtained, the data of the unlabeled time sequence sensor can be used as raw data, and the raw data is preprocessed to obtain frequency domain data and statistic data.

The method for preprocessing the original data to obtain the frequency domain data comprises the following steps: and transforming the original data to obtain corresponding frequency domain data. The original data may be transformed by fourier transform or wavelet transform.

The method for preprocessing the original data to obtain the statistic data comprises the following steps: and acquiring statistical data from the original data according to preset expert experience, wherein the statistical data comprises extreme values, mean values, variances and other second-order statistics, high-order statistics and the like. Compared with the existing algorithm which only uses time domain data to obtain the characteristics of the sensor, the method combines frequency domain data and statistic data to describe the characteristics of the sensor, the frequency domain data describes the frequency and amplitude conditions of the data from another angle, and the statistic data can be fused with expert experience, and the characteristics with a priori knowledge, such as extreme values, mean values, variance and other second-order statistics, high-order statistics and the like, are extracted through calculation of the statistics, so that the behavior characteristics of the sensor can be described from more dimensions.

Step S203, respectively inputting the time domain data, the frequency domain data and the statistic data to a model, and pre-training to obtain a pre-training model.

In this step, the model may be a transfomer model, and after preprocessing the raw data to obtain frequency domain data and statistic data, the time domain data, the frequency domain data thereof, and the statistic data need to be preprocessed, for example, normalization processing, embedding into a one-dimensional token by using a linear mapping or the like, and using a preprocessing result as an input of the transfomer model. the transform model includes an encoding component and a decoding component, the encoding component may include a plurality of parallel processing units, i.e., a plurality of encoders, and each encoder may include a plurality of layers of transform blocks. In this embodiment, the inputting the time domain data, the frequency domain data, and the statistic data into the model respectively includes: and respectively inputting the preprocessing results of the time domain data, the frequency domain data and the statistic data to three different encoders of the model in parallel.

Specifically, as shown in fig. 3, for example, the time domain data, the frequency domain data and the statistic data containing the preprocessing are respectively input to the encoder 1, the encoder 2 and the encoder 3, and in the foregoing three encoders, each encoder may include N stacked layers of transfomer blocks, for example, 6 layers, and each of the transfomer blocks may include a multi-domain fusion attention layer and a feedforward neural network layer. In each encoder, data vectors of the time domain data, the frequency domain data and the statistic data are respectively defined as Q (query vector), K (key vector) and V (value vector), and after the data vectors are input into a multi-layer multi-domain fusion attention layer and a feedforward neural network layer, a higher-dimensional feature representation containing three domain difference features is output. In the three encoders, Q, K, V vectors defined by the time domain data, the frequency domain data, and the statistic data are different from each other, and for example, when the time domain data, the frequency domain data, and the statistic item number are input to the first encoder, V vector, Q vector, and K vector, respectively, and the three kinds of domain data are defined as different vectors, and features are extracted by the three encoders, respectively.

After obtaining the characteristics of higher dimensionality, the characteristic representation of each encoder is fused and input to a decoder, and the decoder comprises a plurality of multi-head attention layers, namely a feedforward neural network layer. In the decoder, a certain number of elements can be randomly masked (mask) and the model can predict the masked elements, so as to train the model prediction capability.

It should be noted that the number of data of the unlabeled time sequence sensor is large, and the self-supervision pre-training is performed by using the data of the large-scale unlabeled time sequence sensor, so that the model can obtain the hidden information of the large-scale data to prevent the model from being over-fitted. In addition, the essence of the multi-head attention mechanism is the calculation of a plurality of independent self-attention mechanisms, and the final splicing is used as an integration function, so that overfitting is prevented to a certain extent.

In this embodiment, in the above manner, the time domain data, the frequency domain data, and the statistic data are respectively input to a model for pre-training, so as to obtain a pre-training model, including: extracting time domain characteristics, frequency domain characteristics and statistic characteristics corresponding to the time domain data, the frequency domain data and the statistic data; and aggregating the time domain features, the frequency domain features and the statistic features to be used as training data. In this embodiment, after extracting the time domain features, the frequency domain features, and the statistic features corresponding to the time domain data, the frequency domain data, and the statistic data, the method further includes: obtaining the similar characteristics and the different characteristics among the time domain characteristics, the frequency domain characteristics and the statistic characteristics; and aggregating the similar features and the different features to be used as training data.

In the embodiment, time domain data, frequency domain data and statistic data are processed through a transform attention mechanism, the mutual similar characteristics of the time domain data, the frequency domain data and the statistic data are calculated, the mutual difference characteristics of the time domain data, the frequency domain data and the statistic data are further acquired, the problem of interaction among dimensions which are neglected due to direct fusion of multi-dimensional characteristics is solved, the understanding and generalization capability of a model to the data semantics of a time sequence sensor is improved, and the classification capability of the model is improved.

Step S204, acquiring time domain data of the time sequence sensor with the label, and acquiring corresponding frequency domain data and statistic data.

In this step, the time domain data of the time series sensor with the tag may be obtained through a database, or may be obtained from historical data obtained by the time series sensor, and there are many ways to obtain the time domain data of the time series sensor with the tag, which is not limited in this application in the first embodiment. The time domain data of the tagged timing sensor includes timestamp information. The time stamp information is used to inform the model when each data is entered and to inform the model of the order in which each data is entered.

After the time domain data of the time sequence sensor with the label is obtained, the time domain data of the time sequence sensor with the label can be preprocessed to obtain frequency domain data and statistic data.

The method comprises the following steps of preprocessing time domain data of the time sequence sensor with the label to obtain frequency domain data, wherein the preprocessing comprises the following steps: and transforming the time domain data of the timing sensor with the label to obtain corresponding frequency domain data. The time domain data of the labeled time sequence sensor can be transformed by fourier transform, wavelet transform or the like.

Preprocessing time domain data of the time sequence sensor with the label to obtain statistic data, wherein the preprocessing comprises the following steps: and acquiring statistic data from the time domain data of the labeled time sequence sensor according to preset expert experience, wherein the statistic data comprises second-order statistics such as extreme values, mean values and variances, and high-order statistics. Compared with the existing algorithm which only uses time domain data to obtain the characteristics of the sensor, the method combines frequency domain data and statistic data to describe the characteristics of the sensor, the frequency domain data describes the frequency and amplitude conditions of the data from another angle, and the statistic data can be fused with expert experience, and the characteristics with a priori knowledge, such as extreme values, mean values, variance and other second-order statistics, high-order statistics and the like, are extracted through calculation of the statistics, so that the behavior characteristics of the sensor can be described from more dimensions.

Step S205, respectively inputting the time domain data, the frequency domain data and the statistic data of the timing sequence sensor with the label to the pre-training model, and performing retraining to obtain the user behavior recognition model.

After preprocessing the time domain data of the time sequence sensor with the tag to obtain the frequency domain data and the statistic data, preprocessing the time domain data, the frequency domain data and the statistic data of the time sequence sensor with the tag, such as normalization processing, embedding the time domain data, the frequency domain data and the statistic data into a one-dimensional token by adopting a linear mapping mode and the like, and taking a preprocessing result as the input of a transform model. In this step, the model may be a transfomer model, where the transfomer model includes an encoding component and a decoding component, the encoding component may include a plurality of parallel processing units, i.e., a plurality of encoders, and each encoder may include a plurality of layers of transfomer blocks. In this embodiment, the respectively inputting the time domain data, the frequency domain data, and the statistic data into a pre-training model includes: and respectively inputting the preprocessing results of the time domain data, the frequency domain data and the statistic data to three different encoders of a pre-training model in parallel.

Specifically, as shown in fig. 3, for example, the time domain data, the frequency domain data and the statistic data containing the preprocessing are respectively input to the encoder 1, the encoder 2 and the encoder 3, and in the foregoing three encoders, each encoder may include N stacked layers of transfomer blocks, for example, 6 layers, and each of the transfomer blocks may include a multi-domain fusion attention layer and a feedforward neural network layer. In each encoder, data vectors of the time domain data, the frequency domain data and the statistic data are respectively defined as Q (query vector), K (key vector) and V (value vector), and after the data vectors are input into a multi-layer multi-domain fusion attention layer and a feedforward neural network layer, a higher-dimensional feature representation containing three domain difference features is output. In the three encoders, Q, K, V vectors defined by time domain data, frequency domain data, and statistic data are different from each other, and for example, when input to the first encoder, the time domain data is defined as a V vector, the frequency domain data is defined as a Q vector, and the number of statistic items is defined as a K vector, and the three kinds of domain data are defined as different vectors and features are extracted by the three encoders, respectively.

After obtaining the features of higher dimensionality, the features of each encoder are input to a decoder after representing fusion, and the decoder comprises a plurality of multi-head attention layers, namely a feedforward neural network layer. In the decoder, a certain number of elements can be randomly masked (mask) and the model can predict the masked elements, so as to train the model prediction capability.

It should be noted that the essence of the multi-head attention mechanism is the calculation of multiple independent self-attention mechanisms, and the final stitching functions as an integration and also prevents overfitting to some extent.

In this embodiment, in the above manner, the time domain data, the frequency domain data, and the statistic data are respectively input to a pre-training model, and are retrained, so as to obtain a user behavior recognition model, including: extracting time domain characteristics, frequency domain characteristics and statistic characteristics corresponding to the time domain data, the frequency domain data and the statistic data; and aggregating the time domain features, the frequency domain features and the statistic features to be used as training data. In this embodiment, after extracting the time domain features, the frequency domain features, and the statistic features corresponding to the time domain data, the frequency domain data, and the statistic data, the method further includes: and acquiring similar characteristics and different characteristics among the time domain characteristics, the frequency domain characteristics and the statistic characteristics, and aggregating the similar characteristics and the different characteristics to serve as training data.

A first embodiment of the present application provides a training method for a user behavior recognition model, including: acquiring data of a label-free time sequence sensor as original data; preprocessing the original data to obtain time domain data, frequency domain data and statistic data; respectively inputting the time domain data, the frequency domain data and the statistic data into a model, and performing pre-training to obtain a pre-training model; acquiring time domain data of the time sequence sensor with the label, and acquiring corresponding frequency domain data and statistic data; and respectively inputting the time domain data, the frequency domain data and the statistic data of the timing sequence sensor with the label to the pre-training model, and performing retraining to obtain the user behavior recognition model. According to the first embodiment of the application, data of the time sequence sensor is input into the model, and the problems that long-term dependence of a traditional neural network, long-distance features cannot be captured by sequence attributes of the traditional neural network and the like are solved when the features of the data of the time sequence sensor are extracted based on a multi-head attention mechanism of the model, and meanwhile, the extraction time of the features of the data of the time sequence sensor is shortened. In addition, the interaction attention mechanism based on the model can enable the multi-dimensional features of the data of the time sequence sensor to realize interaction, namely multi-dimensional feature extraction, and the time domain features, the frequency domain features and the statistic features are fused, so that not only can the priori knowledge be fused, but also the multi-dimensional complementarity features can be automatically extracted through the model, the obtained features can be more comprehensively expressed, the understanding accuracy of the model on the data of the time sequence sensor is further improved, and the accuracy of behavior category identification is further improved.

A second embodiment of the present application provides a method for face recognition, as shown in fig. 4, and fig. 4 is a flowchart of a method for face recognition provided in the second embodiment of the present application. The method comprises the following steps:

in step S301, face information of an object to be recognized is acquired by an image sensor, the face information including at least motion information of a face or a part of a face.

In this step, the image sensor is disposed in the terminal, and the terminal may be a mobile phone, a computer, a tablet computer, or other various devices. An image sensor of a terminal captures a face of an object to be recognized, and obtains face information of the object to be recognized, the face information including at least motion information of the face or a part of the face. The motion information of the face refers to the motion information of all faces of the object to be recognized, which is performed by the terminal according to instructions, and the motion information refers to information of actions performed by the face, such as facial expressions, shaking movements and the like. The motion information of the partial face refers to motion information of the partial face performed by the object to be recognized according to an instruction of the terminal, and the motion information refers to information of a motion performed by the face, such as blinking motion, mouth opening motion, and the like.

Step S302, the face information is input into an image recognition model, and a first recognition result comprising identity information of the object to be recognized is obtained.

After obtaining the face information of the user, a first recognition result including the identity information of the object to be recognized and whether the object is a living body corresponding to the face information of the user may be obtained through an image recognition model. The identity information of the object to be recognized is used for representing the identity of the object to be recognized, such as an XX user. Wherein the first recognition result corresponding to the face information of the user may determine whether the object to be recognized is a living body under the corresponding detection condition. For example, the detection condition is set as the detection of a picture, when the terminal instructs the object to be recognized to make a corresponding action of shaking head, smiling, etc., the object with detection on the picture cannot be completed, and then it can be determined whether the object to be recognized is a living body through the first recognition result.

Step S303, the face information of the object to be identified is acquired, and meanwhile, the object to be identified is monitored through a time sequence sensor, and monitoring data of the time sequence sensor are acquired.

When the image sensor of the terminal acquires the face information of the object to be recognized, the time sequence sensor of the terminal also detects the face of the object to be recognized, and correspondingly acquires the monitoring data of the time sequence sensor. The monitoring data of the time sequence sensor at least comprises gyroscope monitoring data and linear accelerator monitoring data. The monitoring data of the time sequence sensor is the data correspondingly monitored when the control terminal moves according to the object to be identified. For example, when an object to be recognized photographs a face using a terminal to acquire face information, a hand of the object to be recognized is shaken, so that monitoring data of a timing sensor is data corresponding to monitoring when the terminal moves.

Step S304, inputting the monitoring data into a user behavior recognition model to obtain a second recognition result comprising the behavior category of the object to be recognized.

After the monitoring data of the time sequence sensor is obtained, the monitoring data of the time sequence sensor can be input into the user behavior recognition model, so that a second recognition result comprising the behavior category of the object to be recognized can be obtained through the user behavior recognition model according to the monitoring data of the time sequence sensor. The second recognition result obtained by the user behavior recognition model is matched with the first recognition result, which may be a supplement to the first recognition result, or may be an auxiliary image recognition model for further determining whether the object to be recognized is a living body. For example, when performing verification by video recording, the video recording is of motion such as shaking head or smiling, and the first recognition result is that it is not possible to correctly determine whether the object to be recognized is a living body. But the behavior classification information determined by the second recognition result has a specific relationship and a specific rule with the behavior actions of the user such as shaking head, smiling and the like, so that whether the object to be recognized is a living body can be correctly determined. As in step S305.

Step S305, obtaining the identity information of the object to be identified and determining whether the object to be identified is a third identification result of the living body according to the first identification result and the second identification result.

In this step, the first recognition result and the second recognition result are combined, so that the identity information of the object to be recognized and whether the object to be recognized is a living body can be correspondingly obtained, and the recognition accuracy is improved.

A second embodiment of the present application provides a method for face recognition, which acquires, by an image sensor, face information of an object to be recognized, the face information including at least motion information of a face or a part of a face; inputting the face information into an image recognition model, and obtaining a first recognition result including identity information of an object to be recognized and whether the object is a living body; the method comprises the steps that when face information of an object to be identified is obtained, the object to be identified is monitored through a time sequence sensor, and monitoring data of the time sequence sensor are obtained; inputting the monitoring data into a user behavior recognition model to obtain a second recognition result comprising the behavior category of the object to be recognized; and obtaining the identity information of the object to be recognized according to the first recognition result and the second recognition result, and determining whether the object to be recognized is a third recognition result of the living body. According to the method and the device, the face information of the user is acquired through the image sensor, the first identification result is correspondingly acquired, the monitoring data of the time sequence sensor is acquired while the face information is monitored, the monitoring data of the time sequence sensor is input into the user behavior identification model, the second identification result is acquired, and whether the object to be identified is a living body is determined according to the first identification result and the second identification result. The monitoring data of the time sequence sensor based on normal authentication of the user is obviously different from the information of the time sequence sensor when the prosthesis attacks the system, and a second identification result corresponding to the monitoring data of the time sequence sensor is output by depending on the user behavior identification model and is combined with the first identification result, so that the accuracy of facial identification is improved.

A third embodiment of the present application provides a system for face recognition, as shown in fig. 5, and fig. 5 is a schematic diagram of a system for face recognition provided in the third embodiment of the present application.

The third embodiment of the present application provides a system 400 for facial recognition, comprising: an image sensor 401, a timing sensor 402, a decision model 403, a user behavior recognition model 404, and an image recognition model 405.

The image sensor 401 is configured to acquire face information of an object to be recognized, and send the face information of the object to be recognized to the image recognition model 405. The face information includes at least motion information of a face or a part of a face. Specifically, the image sensor 401 captures a face of an object to be recognized, and obtains face information of the object to be recognized, the face information including at least motion information of the face or a part of the face. The motion information of the face refers to motion information of all faces of the object to be recognized according to the instructions, and the motion information refers to information of the face performing actions, such as facial expressions, head shaking actions, and the like. The motion information of the partial face refers to motion information of the partial face performed by the object to be recognized in accordance with the instruction, and the motion information refers to information that the face performs an action, such as blinking, mouth opening, and the like.

The image recognition model 405 is configured to receive the face information of the object to be recognized, obtain a first recognition result including the identity information of the object to be recognized and whether the object is a living body according to the face information of the object to be recognized, and send the first recognition result including the identity information of the object to be recognized and whether the object is a living body to the decision model;

after the face information of the object to be recognized is obtained, a first recognition result including the identity information of the object to be recognized and whether it is a living body, which corresponds to the face information of the user, may be obtained by the image recognition model 405. Wherein, the identity information of the object to be recognized is used for characterizing the identity of the object to be recognized, such as the XX user. It is noted that the first recognition result corresponding to the face information of the user may determine whether or not the object to be recognized is a living body under the corresponding detection condition. For example, the detection condition is set as the detection of a picture, when the terminal instructs the object to be recognized to make a corresponding action of shaking head, smiling, etc., the object with detection on the picture cannot be completed, and then it can be determined whether the object to be recognized is a living body through the first recognition result.

The time sequence sensor 402 is configured to obtain the monitoring data of the time sequence sensor 402 by monitoring the object to be recognized while obtaining the facial information of the object to be recognized, and input the monitoring data of the time sequence sensor 402 to the user behavior recognition model 404. Specifically, while the image sensor 401 acquires the face information of the object to be recognized, the timing sensor 402 of the system also detects the face of the object to be recognized, and correspondingly acquires the monitoring data of the timing sensor 402. The monitoring data of the timing sensor 402 at least includes gyroscope monitoring data and linear accelerator monitoring data. The monitoring data of the time sequence sensor 402 is the data correspondingly monitored when the control terminal moves according to the object to be identified. For example, when the object to be recognized photographs a face using the terminal to acquire face information, a hand of the object to be recognized is shaken, so that the monitoring data of the timing sensor 402 is data corresponding to monitoring when the terminal moves. After obtaining the monitoring data, the monitoring data of the time series sensor 402 is input to the user behavior recognition model 404.

The user behavior recognition model 404 is configured to receive monitoring data of the time sequence sensor 402, obtain a second recognition result including a behavior category of an object to be recognized according to the monitoring data, and send the second recognition result to the decision model 403. It should be noted that the second recognition result obtained by the user behavior recognition model is matched with the first recognition result, which may be a supplement to the first recognition result, or may be an auxiliary image recognition model for further determining whether the object to be recognized is a living body. For example, when performing verification by video recording, the video recording is of motion such as shaking head or smiling, and the first recognition result is that it is not possible to correctly determine whether the object to be recognized is a living body. But the behavior classification information determined by the second recognition result has a specific relationship and a specific rule with the behavior actions of the user such as shaking head, smiling and the like, so that whether the object to be recognized is a living body can be correctly determined.

The decision model 403 is configured to receive the first recognition result and the second recognition result, obtain identity information of the object to be recognized according to the first recognition result and the second recognition result, and determine a third recognition result whether the object to be recognized is a living body.

Corresponding to the training method of the user behavior recognition model provided in the first embodiment of the present application, a fourth embodiment of the present application correspondingly provides a training apparatus of the user behavior recognition model. Since the device embodiment is substantially similar to the first embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the first embodiment for relevant points. The device embodiments described below are merely illustrative.

Please refer to fig. 6, which is a schematic diagram of a training apparatus for a user behavior recognition model according to a fourth embodiment of the present application. The training device of the user behavior recognition model comprises: a non-tag data acquisition unit 501, configured to acquire data of a non-tag time sequence sensor as raw data; a preprocessing unit 502, configured to preprocess the original data to obtain time domain data, frequency domain data, and statistic data; a pre-training unit 503, configured to input the time domain data, the frequency domain data, and the statistic data to a model respectively, and perform pre-training to obtain a pre-training model; a tagged data acquisition unit 504, configured to acquire time domain data of the tagged timing sensor, and obtain corresponding frequency domain data and statistic data; and a retraining unit 505, configured to input the time domain data, the frequency domain data, and the statistic data of the time sequence sensor with the tag to the pre-training model, respectively, and perform retraining to obtain the user behavior recognition model.

Optionally, the pre-training unit 503 is configured to extract time domain features, frequency domain features, and statistic features corresponding to the time domain data, the frequency domain data, and the statistic data; and aggregating the time domain features, the frequency domain features and the statistic features to be used as training data.

Optionally, the pre-training unit 503 is further configured to obtain similar features and difference features among the time domain features, the frequency domain features, and the statistic features; and aggregating the similar features and the different features to be used as training data.

Optionally, the pre-training unit 503 is configured to input the time domain data, the frequency domain data, and the statistic data to different processing units of the model in parallel.

Optionally, the method further includes: and the first normalization processing unit is used for preprocessing the time domain data, the frequency domain data and the statistic data before the time domain data, the frequency domain data and the statistic data are respectively input into the model.

Optionally, the method further includes: and the second normalization processing unit is used for preprocessing the time domain data, the frequency domain data and the statistic data of the time sequence sensor with the label before the time domain data, the frequency domain data and the statistic data of the time sequence sensor with the label are respectively input into the pre-training model.

A fifth embodiment of the present application correspondingly provides an apparatus for face recognition, corresponding to the method for face recognition provided by the second embodiment of the present application. Since the apparatus embodiment is substantially similar to the second embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the second embodiment for relevant points. The device embodiments described below are merely illustrative.

Please refer to fig. 7, which is a diagram illustrating an apparatus for face recognition according to a fifth embodiment of the present application. The apparatus for face recognition includes: a face information acquisition unit 601 for acquiring, by an image sensor, face information of an object to be recognized, the face information including at least motion information of a face or a part of a face; a first recognition result acquisition unit 602 configured to input the face information into an image recognition model, and acquire a first recognition result including identity information of an object to be recognized and whether the object is a living body; a monitoring data acquisition unit 603 configured to acquire face information of the object to be recognized, and monitor the object to be recognized by using a time-series sensor to obtain monitoring data of the time-series sensor; a second recognition result obtaining unit 604, configured to input the monitoring data into a user behavior recognition model, and obtain a second recognition result including a behavior category of the object to be recognized; a recognition result determining unit 605, configured to obtain identity information of the object to be recognized according to the first recognition result and the second recognition result, and determine a third recognition result whether the object to be recognized is a living body.

A sixth embodiment of the present application also provides an electronic device, corresponding to the training method of the user behavior recognition model of the first embodiment and the method for face recognition of the second embodiment of the present application. As shown in fig. 8, fig. 8 is a schematic view of an electronic device provided in a sixth embodiment of the present application. The electronic device includes: a processor 701; a memory 702 for storing a computer program to be executed by a processor for performing the training method of the user behavior recognition model of the first embodiment and the method for face recognition of the second embodiment.

Corresponding to the training method of the user behavior recognition model of the first embodiment and the method for face recognition of the second embodiment of the present application, a seventh embodiment of the present application also provides a computer storage medium storing a computer program executed by a processor to perform the training method of the user behavior recognition model of the first embodiment and the method for face recognition of the second embodiment.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A training method of a user behavior recognition model is characterized by comprising the following steps:

acquiring data of a label-free time sequence sensor as original data;

2. The method for training the user behavior recognition model according to claim 1, wherein the pre-training is performed by inputting the time domain data, the frequency domain data, and the statistic data to the model, respectively, and includes:

3. The method for training the user behavior recognition model according to claim 2, wherein after extracting the time domain features, the frequency domain features and the statistic features corresponding to the time domain data, the frequency domain data and the statistic data, the method further comprises:

4. The method for training the user behavior recognition model according to any one of claims 1 to 3, wherein the pre-training is performed by inputting the time domain data, the frequency domain data and the statistic data into the model, respectively, and comprises:

5. The method of claim 4, wherein the model is a fransformer model, and the processing unit is a fransformer encoder.

6. The method for training a user behavior recognition model according to claim 1, wherein before inputting the time domain data, the frequency domain data, and the statistic data into the model, respectively, further comprises: preprocessing time domain data, frequency domain data and statistic data;

7. The method of claim 1, wherein the unlabeled time-series sensor data and the time-domain labeled time-series sensor data each include time stamp information.

8. A method for facial recognition, comprising:

inputting the monitoring data into the user behavior recognition model generated by the training method of the user behavior recognition model according to any one of claims 1 to 7, and obtaining a second recognition result including the behavior category of the object to be recognized;

9. A system for facial recognition, comprising: the system comprises an image sensor, an image recognition model, a time sequence sensor, a decision model and a user behavior recognition model;

10. An apparatus for training a user behavior recognition model, comprising:

11. An apparatus for facial recognition, comprising:

a second recognition result obtaining unit, configured to input the monitoring data into the user behavior recognition model generated by the training method of the user behavior recognition model according to any one of claims 1 to 7, and obtain a second recognition result including a behavior category of the object to be recognized;

12. An electronic device, characterized in that the electronic device comprises: a processor; a memory for storing a computer program for execution by the processor to perform the method of any one of claims 1-7, 8.

13. A computer storage medium, characterized in that it stores a computer program that is executed by a processor to perform the method of any one of claims 1-7, 8.