CN114627518A

CN114627518A - Data processing method, data processing device, computer readable storage medium and processor

Info

Publication number: CN114627518A
Application number: CN202011467170.2A
Authority: CN
Inventors: 何斌; 商磊; 孙佰贵; 李�昊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2022-06-14

Abstract

The invention discloses a data processing method, a data processing device, a computer readable storage medium and a processor. Wherein, the method comprises the following steps: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image. The invention solves the technical problem that the living body identification result is inaccurate when the living body identification is carried out.

Description

Data processing method, data processing device, computer readable storage medium and processor

Technical Field

The invention relates to the field of living body identification, in particular to a data processing method, a data processing device, a computer readable storage medium and a processor.

Background

At present, live recognition is an important means of distinguishing the identity of an individual, but devices for performing live recognition all face challenges from non-live attack samples. Lawbreakers can attack the recognition system by holding photos or video clips of related personnel, so that the identity of the personnel can be disguised through the face recognition system. Therefore, the non-living body attack is a great potential safety hazard of the existing face recognition system, and the technical problem that the living body recognition result is inaccurate when the living body recognition is carried out exists.

In view of the above technical problem that the result of living body identification is inaccurate when living body identification is performed, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, a computer readable storage medium and a processor, which are used for at least solving the technical problem that a living body identification result is inaccurate when living body identification is carried out.

According to an aspect of an embodiment of the present invention, there is provided a data processing method. The method can comprise the following steps: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

According to an aspect of an embodiment of the present invention, there is provided a method of identifying a living body. The method can comprise the following steps: acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera; calling a living body recognition model to extract texture features and high-level semantic features of a target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map; and predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

According to an aspect of an embodiment of the present invention, there is provided a data processing method. The method can comprise the following steps: the method comprises the steps that an attendance checking request is received by a display interface of an attendance checking system, wherein the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one picture in a plurality of images of the target object acquired by a camera; the attendance checking system receives the texture features and the high-level semantic features of the fed back target picture based on the attendance checking request, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and the high-level semantic feature map; the attendance system displays the living body identification result on a display interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

According to an aspect of an embodiment of the present invention, there is provided another data processing method. The method can comprise the following steps: displaying a living body authentication interface on a payment system, and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface; the payment system outputs a verification instruction, wherein the verification instruction is used for instructing a target object displayed in the target picture to execute a predetermined action instructed by the verification instruction; the payment system acquires texture features and high-level semantic features of a target picture based on a verification instruction, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map recognized from a sample picture and the high-level semantic feature map; the payment system displays a living body identification result on a living body authentication interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture; the payment system executes a payment operation when confirming that the target object is a living body.

According to another aspect of the embodiment of the invention, a data processing device is also provided. The apparatus may include: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a sample picture, and the sample picture is any one of a plurality of images acquired by a camera; the first extraction unit is used for extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and the input unit is used for inputting the target feature map to a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

According to another aspect of the embodiments of the present invention, there is also provided an identification apparatus of a living body. The apparatus may include: the second acquisition unit is used for acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera; the second extraction unit is used for calling a living body recognition model to extract texture features and high-level semantic features of the target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map recognized from the sample picture and the high-level semantic feature map; and the prediction unit is used for predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

According to another aspect of the embodiments of the present invention, there is provided another data processing apparatus. The apparatus may include: the system comprises a first receiving unit, a second receiving unit and a third receiving unit, wherein the first receiving unit is used for receiving an attendance checking request through a display interface of an attendance checking system, the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one of a plurality of images of the target object acquired by a camera; the second receiving unit is used for receiving the texture features and the high-level semantic features of the fed-back target picture based on the attendance checking request through the attendance checking system, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and the high-level semantic feature map; the first display unit is used for displaying the living body identification result on a display interface through the attendance system, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

According to another aspect of the embodiments of the present invention, there is also provided another identification apparatus of a living body. The apparatus may include: the second display unit is used for displaying the living body authentication interface through the payment system and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface; the payment system comprises an output unit and a verification unit, wherein the output unit is used for outputting a verification instruction through the payment system, and the verification instruction is used for indicating a target object displayed in a target picture to execute a preset action indicated by the verification instruction; the third acquisition unit is used for acquiring the texture features and the high-level semantic features of the target picture based on the verification instruction through the payment system, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map recognized from the sample picture and the high-level semantic feature map; a third display unit for displaying the living body recognition result on the living body authentication interface through the payment system, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture; an execution unit for executing a payment operation in a case where the target object is confirmed to be a living body by the payment system.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium. The computer readable storage medium includes a stored program, wherein the program, when executed by a processor, controls an apparatus in which the computer readable storage medium is located to perform a data processing method of an embodiment of the present invention.

According to another aspect of the embodiments of the present invention, there is also provided a processor. The processor is configured to execute a program, wherein the program executes the data processing method according to the embodiment of the present invention when the program is executed by the processor.

According to another aspect of the embodiment of the invention, a data processing system is also provided. The system may include: a processor; a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

In the embodiment of the invention, a sample picture is obtained, wherein the sample picture is any one picture in a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image. That is to say, the texture feature map and the high-level semantic feature map of the sample picture are extracted by the method of silent living body identification and are spliced to obtain the target feature map, the living body identification model is obtained through training of the target feature map, and then the living body identification model is adopted to carry out living body identification on the living body identification image to be carried out, so that non-living body attack samples are refused to be identified, the purpose of filtering the non-living body attack samples is achieved, the technical problem that the living body identification result is inaccurate when the living body identification is carried out is solved, and the technical effect of improving the accuracy of the living body identification is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of data processing according to an embodiment of the present invention;

fig. 3 is a flowchart of an identification method of a living body according to an embodiment of the present invention;

FIG. 4 is a flow diagram of another data processing method according to an embodiment of the invention;

FIG. 5 is a flow diagram of another data processing method according to an embodiment of the invention;

FIG. 6A is a schematic diagram of a living body identification system according to an embodiment of the present invention;

FIG. 6B is a schematic illustration of a scenario for live recognition according to an embodiment of the present invention;

FIG. 6C is a schematic illustration of another live recognition scenario according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic view of an identification apparatus of a living body according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of another data processing apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of another data processing apparatus according to an embodiment of the present invention;

fig. 11 is a block diagram of a computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

the method comprises the steps of face living body identification, namely judging whether a face image acquired by a camera is real face data or fake attack face data (including printed photos, posters, A4 paper, screen attacks from mobile phones, displays and pads, and fake attacks such as 3D masks);

in the living body, a human face object directly collected by a camera is data of a real human face;

a non-living body attack, wherein a face object acquired by a camera is fake attack face data (including printed photos, posters, A4 paper, screen attacks from mobile phones, displays and pads, and fake attacks such as 3D masks);

attribute information containing facial features and facial expression information (nose and tip, smile, etc.) for a living body; for non-live attacks, the attribute information contains the type of non-live attack (printed photograph, screen attack, etc.), lighting conditions (low lighting, backlight, etc.);

geometric information, which contains a depth map of a human face for a living body; for non-live attacks, the geometric information contains a reflection map of a paper, screen or mask.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a data processing method, to note that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data processing method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, that is, implementing the data processing method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

In the operating environment shown in fig. 1, the present application provides a data processing method as shown in fig. 2. It should be noted that the data processing method of this embodiment may be executed by the mobile terminal of the embodiment shown in fig. 1.

Fig. 2 is a flow chart of a data processing method according to an embodiment of the present invention. As shown in fig. 2, the data processing method may include the steps of:

step S202, a sample picture is obtained, wherein the sample picture is any one picture in a plurality of images collected by the camera.

In the technical solution provided by step S202 of the present invention, a plurality of images may be acquired by a camera, for example, the plurality of images are an image sequence continuously acquired by the camera within a period of time, and any one of the plurality of images is determined as a sample picture, that is, the embodiment relies on one sample picture acquired by the camera to perform recognition to perform silent living body recognition. Wherein, the sample picture can comprise a living sample and a non-living attack sample.

And step S204, extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map.

In the technical solution provided by step S204 of the present invention, after the sample picture is obtained, the texture feature map and the high-level semantic feature map of the sample picture may be extracted.

In this embodiment, the extraction of the texture feature map of the sample picture also needs to be trained, which may be to calculate texture information of local details of the sample picture to form feature expression, and then perform a pooling operation (pool) on the features for multiple times to continuously increase the receptive field of the convolution neurons to form texture features under different scales, which may be bottom texture features, and then generate the texture feature map through the bottom texture features.

In this embodiment, the high-level semantic information of the sample picture may be fully extracted through a cascade structure of a plurality of residual modules, and a high-level speech feature map is generated through the high-level speech information, so as to output the high-level semantic feature map.

Since the local details of the sample picture have a significant guiding effect on the two categories of the living attack category and the living recognition category, the embodiment can splice the texture feature map and the high-level semantic feature map after generating the texture feature map and the high-level semantic feature map of the sample picture, for example, determine the channel directions of the texture feature map and the high-level semantic feature map, and splice the texture feature map and the high-level semantic feature map along the determined channel directions, thereby obtaining the target feature map.

And S206, inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

In the technical solution provided by step S206 of the present invention, after the texture feature map and the high-level semantic feature map are spliced to obtain the target feature map, the target feature map may be input to a full connectivity layer (FC) of the neural network model, and a living body recognition model is obtained through training.

In this embodiment, the neural network model includes a fully-connected layer, and the embodiment may input the obtained target feature map to the fully-connected layer for training to obtain a living body recognition model, and extract texture features and high-level semantic features of a verification image to be subjected to living body recognition through the living body recognition model, where the texture features and the high-level semantic features may further perform living body recognition on the verification image, for example, to predict a non-living body attack type. The living body recognition model of the embodiment is obtained by training the target feature map obtained by splicing the texture feature map and the high-level semantic feature map, so that the defects that the learned features are not generalized and the discrimination capability of the sample is insufficient can be avoided.

Obtaining a sample picture through the steps S202 to S206, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image. That is to say, in the embodiment, the texture feature map and the high-level semantic feature map of the sample picture are extracted by using the silent living body recognition method, and are spliced to obtain the target feature map, so that the living body recognition model is obtained through training of the target feature map, and then the living body recognition model is used for carrying out living body recognition on the living body recognition image to be carried out, so that a non-living body attack sample is rejected to be recognized, the purpose of filtering the non-living body attack sample is achieved, the technical problem that the living body recognition result is inaccurate when the living body recognition is carried out is solved, and the technical effect of improving the accuracy of the living body recognition is achieved.

The above-described method of this embodiment is further described below.

As an optional implementation manner, in step S204, extracting a texture feature map of the sample picture includes: determining a region picture where a target object is located in the sample picture; identifying texture information in the region picture; and performing pooling operation on the texture information in the region picture by adopting a central differential convolution network model to generate a texture feature map.

In this embodiment, when the extraction of the texture feature map of the sample picture is implemented, an area picture where a target object is located in the sample picture may be determined, for the area picture where the target object is located, texture information in the area picture is identified, the texture information is texture information of local details of the sample picture, a feature expression is formed through the texture information, and a central difference convolution network model (central difference conv, CD conv) is further adopted to perform a pooling operation on the texture information in the area picture, for example, the texture information in the area picture is subjected to a pooling operation three times, so that the receptive fields of convolution neurons are continuously increased, thereby forming texture features at different scales, and generating the texture feature map through the texture features. The number of the central differential convolution network models is not particularly limited in this embodiment. The residual ratio can be adjusted.

As an optional implementation, extracting a texture feature map and a high-level semantic feature map of a sample picture includes: analyzing a sample picture by adopting an extraction network formed by cascading a plurality of residual modules in a pre-training model, and extracting high-level semantic information from the sample picture; and constructing a high-level semantic feature map based on the high-level semantic information.

In this embodiment, the overall network structure of the pre-trained model may be consistent with the network structure of the residual model (ResNet-18), and may be initialized with parameters of ResNet-18 pre-trained by the image dataset (ImageNet). The pre-training model of the embodiment includes a plurality of residual error modules, for example, the residual error modules are in a cascade structure to form an extraction network, and the sample picture can be analyzed through the extraction network to sufficiently extract high-level semantic information from the sample picture, and then a high-level semantic feature map is constructed based on the high-level semantic information and is output.

As an optional implementation, the high-level semantic information includes at least one of: the face information and the illumination information of the sample picture, wherein a high-level semantic feature map is constructed based on the high-level semantic information, and the construction method comprises the following steps: training face information in the sample picture, and generating attribute information of the face in the sample picture; training light information in the sample picture to generate illumination information of the face in the sample picture; and constructing a high-level semantic feature map based on the attribute information and the illumination information of the face in the sample picture.

In this embodiment, the high-level semantic information may include face information and illumination information of a target object in the sample picture, where the face information and the illumination information of the target object belong to attribute information of a live sample and a non-live attack sample in the sample picture, the face information may be used to generate attribute information of a face in the sample picture, the attribute information is used to indicate an attribute of the face (a face attribute), and is a type of class label, the illumination information may be an illumination condition of an environment and is used to generate illumination information of the face in the sample picture, and the illumination information is used to indicate an illumination condition of the face and may be illumination information of a camera when the sample picture is collected as the non-live attack sample. When the high-level semantic feature map is constructed based on the high-level semantic information, the face information in the sample picture can be trained, so that the attribute information of the face is generated, and the light information in the sample picture is trained, so that the illumination information of the face in the sample picture is generated. After the attribute information and the illumination information of the face in the sample picture are obtained, a high-level semantic feature map can be constructed based on the attribute information and the illumination information of the face in the sample picture, the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map, a living body recognition model is obtained by training the target feature map, living body recognition is carried out on an image to be subjected to living body recognition through the living body recognition model, and the problem that the living body recognition result is inaccurate due to the fact that face information and illumination information acquired by different faces are not fully considered is avoided.

Optionally, in this embodiment, in consideration that the face information and the illumination information of the sample picture belong to high-level semantic information, this embodiment may directly use the constructed high-level semantic feature map as an input of a fully-connected layer of the neural network model to further predict the attribute information and the illumination information of the face.

Optionally, in this embodiment, for live body identification, non-live body attack sample classification, and illumination classification of the sample picture, cross entropy loss may be adopted, and the corresponding weights may be set to 1, 0.1, and 0.01, respectively; since even the same face may have a plurality of face attributes, for the classification of the face attributes, the embodiment may use a cross entropy loss function (BCE loss), and may set the weight value thereof to 1.

As an optional implementation manner, after extracting the texture feature map and the high-level semantic feature map of the sample picture in step S204, the method further includes: and obtaining depth estimation and reflection estimation of the sample picture based on the high-level semantic feature map, wherein MES loss is adopted as a loss function aiming at an estimation function of the depth estimation and the reflection estimation.

In this embodiment, the living body identification of the sample picture is regarded as a binary problem, and the depth estimation and the reflection estimation are both from the high-level semantic feature map, so that after the texture feature map and the high-level semantic feature map of the sample picture are extracted, the embodiment can obtain the depth estimation and the reflection estimation (adaptation) of the sample picture based on the high-level semantic feature map through the central differential convolution network model, wherein the depth estimation can be the depth map estimation of a living body sample, and the reflection estimation can be the reflection map estimation of a non-living body attack sample.

Alternatively, for the estimation functions of the depth estimation and the reflection estimation, MES loss may be used as the loss function, and the weight of the loss function may be set to 0.1.

As an alternative implementation, after obtaining the living body identification model in step S206, the method further includes: acquiring a verification picture to be subjected to living body identification; extracting texture features at a bottom layer and semantic features at a high layer from the verification picture by adopting a living body recognition model; and predicting whether the target object in the verification picture is a living body or not based on the texture feature and the semantic feature.

In this embodiment, after the living body recognition model is trained, the living body recognition model may be used to perform living body recognition on an image to be subjected to living body recognition. The embodiment can acquire the verification picture to be subjected to living body identification, and the verification picture can be acquired through the image acquisition equipment. After the verification picture is acquired, a living body recognition model can be adopted to extract texture features at the bottom layer and semantic features at the high layer from the verification picture. Optionally, when different attributes are predicted, the embodiment may specifically design a bottom-layer texture feature extraction network and a high-layer semantic feature extraction network, and may further improve the accuracy thereof by using a post-processing means, where the living body recognition model includes the bottom-layer texture feature extraction network and the high-layer semantic feature extraction network, the bottom-layer texture feature extraction network may also be referred to as a bottom-layer texture feature extractor, and the high-layer semantic feature extraction network may also be referred to as a high-layer semantic feature extractor. According to the embodiment, the texture features at the bottom layer can be extracted from the verification picture through the bottom layer texture feature extraction network, the semantic features at the high layer can be extracted from the verification picture through the high layer semantic feature extraction network, and whether the target object in the verification picture is a living body or not is predicted based on the texture features and the semantic features.

As an optional implementation manner, in a case that a target object in the verification picture is a non-living body, determining a type of the verification picture based on the texture feature and the semantic feature extracted from the verification picture, where the type of the verification picture includes at least one of the following: a plane photograph, a photograph of a stereoscopic mask, a display image in a display device.

In this embodiment, when the target object in the verification picture is a non-living body, that is, the verification picture is determined to be a non-living body attack sample, the type of the verification picture may be determined based on the texture feature and the semantic feature extracted from the verification picture, that is, the type of the non-living body attack sample may be determined, which may be a plane photograph, a stereoscopic mask photograph, or a display image in the display device.

Alternatively, considering that the reflection map has a negative effect on the prediction of the printed non-living body attack sample, when the training picture is the printed attack sample (e.g., a photo, a4 paper), the embodiment can cut off the gradient propagation from the reflection map, and can perform moire detection processing on the verification picture to further exclude the false negative sample, thereby improving the accuracy of living body identification.

Optionally, the embodiment may intercept the face of the target object in the sample picture, obtain an image block in the face, where only skin color is included, perform third-order color moment calculation on the image block, and perform clustering with the obtained calculation result as a feature. Alternatively, the embodiment may divide the sample picture of the non-living body sample into two types including the moire pattern and not including the moire pattern, and train a moire pattern recognition network by using the living body sample and the non-living body attack sample including the moire pattern as training sets, so that the moire pattern detection processing may be further performed on the verification picture predicted as the living body through the moire pattern recognition network, so as to further exclude the false negative sample, thereby improving the accuracy of the living body recognition.

The embodiment of the invention also provides a living body identification method based on the test process of the living body identification model.

Fig. 3 is a flowchart of a method of identifying a living body according to an embodiment of the present invention. As shown in fig. 3, the method may include the steps of:

step S302, a target picture to be subjected to living body identification is obtained, wherein the target picture is any one of a plurality of images of a target object acquired by a camera.

In the technical solution provided by step S302 of the present invention, a plurality of images of the target object may be acquired by the camera, and any one of the plurality of images is determined as a target image to be subjected to living body identification, that is, the embodiment relies on one image acquired by the camera for identification, so as to perform silent living body identification.

And step S304, calling a living body recognition model to extract texture features and high-level semantic features of the target picture.

In the technical solution provided in step S304 of the present invention, after a target picture to be subjected to living body recognition is acquired, a living body recognition model may be called, and texture features and high-level semantic features of the target picture are extracted through the living body recognition model, where the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map.

In this embodiment, the living body recognition model is a model trained in advance for recognizing a living body from a target picture to be subjected to living body recognition. Optionally, in the embodiment, texture information of local details of the sample picture is calculated first to form feature expression, and then the features are subjected to a plurality of pooling operations to continuously increase the receptive field of the convolution neurons, form texture features under different scales, and generate a texture feature map of the sample picture through the texture feature map; the high-level semantic information of the sample picture can be fully extracted through a structure of cascading a plurality of residual modules, and a high-level voice feature map of the sample picture is generated through the high-level voice information; after the texture feature map and the high-level semantic feature map of the sample picture are generated, the texture feature map and the high-level semantic feature map of the sample picture can be spliced to obtain a target feature map of the sample picture, the target feature map can be input to a full connection layer of a neural network model to be trained to obtain a living body recognition model, and then the living body recognition model is called to extract the texture feature and the high-level semantic feature of the target picture.

And step S306, predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

In the technical solution provided in step S306 of the present invention, after the texture feature and the high-level semantic feature of the target picture are obtained, whether the target object is a living body may be predicted based on the texture feature and the high-level semantic feature of the target picture, and whether the target object in the target picture is a living body may be predicted by learning the texture feature and the semantic feature using a living body recognition model.

The embodiment of the invention also provides another data processing method from the attendance application scene.

Fig. 4 is a flow chart of another data processing method according to an embodiment of the present invention. As shown in fig. 4, the method may include the steps of:

step S402, a display interface of the attendance system receives an attendance request, wherein the attendance request is used for capturing a target picture of a target object, and the target picture is any one of a plurality of images of the target object acquired by a camera.

In the technical solution provided in step S402 of the present invention, the attendance system has a display interface, and the embodiment may receive an attendance request on the display interface, where the attendance request may be triggered by a target object to be subjected to attendance. The attendance request is used for capturing a target picture of the target object, and any one of a plurality of images of the target object acquired by the camera can be determined as the target picture to be subjected to living body identification, that is, the embodiment relies on one picture acquired by the camera for identification so as to perform silent living body identification.

And S404, the attendance system receives the texture features and the high-level semantic features of the fed back target picture based on the attendance request.

In the technical solution provided by step S404 of the present invention, a living body recognition model is used to extract texture features and high-level semantic features of a target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map.

In this embodiment, after the display interface of the attendance system receives an attendance request, the attendance system may respond to the attendance request, and extract texture features and high-level semantic features of a target picture using a pre-trained living body recognition model, where the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map and a high-level semantic feature map recognized from a sample picture.

And S406, the attendance system displays the living body identification result on a display interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

In the technical solution provided in step S406 of the present invention, after the attendance system receives the texture feature and the high-level semantic feature of the fed back target picture based on the attendance request, the attendance system receives the texture feature and the high-level semantic feature of the fed back target picture based on the attendance request.

In this embodiment, a living body recognition model may be used to learn texture features and semantic features received by the attendance system, so as to predict whether a target object in a target picture is a living body, and further display a living body recognition result of whether the target object is a living body on a display interface of the attendance system, thereby achieving the purpose of checking attendance on human faces.

The embodiment of the invention also provides another data processing method from the payment application scene.

Fig. 5 is a flow chart of another data processing method according to an embodiment of the present invention. As shown in fig. 5, the method may include the steps of:

and step S502, displaying a living body authentication interface on the payment system, and displaying a target picture to be subjected to living body identification in the living body authentication interface.

In the technical solution provided by step S502 of the present invention, the target picture is any one of a plurality of images of the target object acquired by the camera, and the target object is located in the authentication area of the living body authentication interface.

In this embodiment, a living body authentication interface is displayed on the payment system, and the living body authentication interface is an interface for performing living body authentication on the target object.

Step S504, the payment system outputs a verification instruction, where the verification instruction is used to instruct the target object displayed in the target picture to execute a predetermined action instructed by the verification instruction.

In the technical solution provided in step S504 above, after the target picture to be subjected to living body identification is displayed in the living body authentication interface, the payment system may output a verification instruction, where the verification instruction may be used to instruct the target object displayed in the target picture to perform a predetermined action, such as a head-up action, a head-down action, a head-turning action, and the like, so that the action of the target object meets a standard, which may be voice, text, and the like, and is not limited herein.

Step S506, the payment system acquires the texture feature and the high-level semantic feature of the target picture based on the verification instruction.

In the technical solution provided by step S506 of the present invention, after the payment system outputs the verification instruction, the payment system obtains the texture feature and the high-level semantic feature of the target picture based on the verification instruction, wherein the texture feature and the high-level semantic feature of the target picture are extracted by using a living body recognition model, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map and the high-level semantic feature map recognized from the sample picture.

In this embodiment, the payment system may extract texture features and high-level semantic features of the target picture from the target picture using a pre-trained living body recognition model in response to the verification instruction. The living body identification model is a pre-trained model used for identifying the living body of a target picture to be subjected to living body identification. Optionally, in the embodiment, texture information of local details of the sample picture is calculated first to form feature expression, and then the features are subjected to a plurality of pooling operations to continuously increase the receptive field of the convolution neurons, form texture features under different scales, and generate a texture feature map of the sample picture through the texture feature map; the high-level semantic information of the sample picture can be fully extracted through a structure of cascading a plurality of residual modules, and a high-level voice feature map of the sample picture is generated through the high-level voice information; after the texture feature map and the high-level semantic feature map of the sample picture are generated, the texture feature map and the high-level semantic feature map of the sample picture can be spliced to obtain a target feature map of the sample picture, the target feature map can be input to a full connection layer of a neural network model to be trained to obtain a living body recognition model, and then the texture feature and the high-level semantic feature of the target picture are extracted by adopting the living body recognition model.

Step S508, the payment system displays the living body identification result on the living body authentication interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

In the technical solution provided by the foregoing step S508 of the present invention, after the payment system acquires the texture feature and the high-level semantic feature of the target picture, the living body identification model may be used to learn the texture feature and the semantic feature, so as to predict whether the target object in the target picture is a living body, and further display a living body identification result of whether the target object is a living body on the living body authentication interface.

In step S510, the payment system performs a payment operation in a case where the target object is confirmed to be a living body.

In the technical solution provided by step S510 of the present invention, if the target object is displayed as a living body on the living body authentication interface, it is determined that the target object operates the payment system by himself, so that the payment operation is performed, and the purpose of face-brushing payment is achieved.

In the related technology, a lawbreaker can carry out face recognition by holding a photo or a video clip of a related person so as to fulfill the aim of disguising the identity of the person; in the related art, the silent living body identification can be adopted to realize the living body identification, but in the silent living body identification process, the adopted identification algorithm has errors in the identified living body characteristics, so that the living body identification result has an inaccurate technical problem.

However, in this embodiment, the living body identification model may be used to determine whether an object in a picture acquired by the current camera is a living body sample or a non-living body attack sample, and the non-living body attack sample is rejected for identification, so as to achieve the purpose of filtering the non-living body attack sample, solve the technical problem that a living body identification result is inaccurate when the living body is identified, achieve the technical effect of improving the accuracy of the living body identification, and avoid the potential safety hazard caused by the non-living body attack sample to the face identification.

Example 2

The technical solution of the present invention will be described below by way of example with reference to preferred embodiments.

Terminal equipment for face recognition gradually enters the lives of people and becomes an important means for distinguishing personal identities, such as face-brushing payment, face attendance and the like. However, these terminal devices all face challenges from non-living attack samples, and a lawless person can attack the face recognition system by holding a photo or video clip of the relevant person, thereby achieving the purpose of disguising the identity of the person through the face recognition system. Therefore, non-live attacks are a great safety hazard in face recognition.

The living body recognition can be divided into time-series living body and silent living body recognition from the used data format. The time sequence living body identification is used for judging whether an object in a sequence is a living body or not by depending on a continuous image sequence acquired by a camera within a period of time in the identification process; the silent living body identification only depends on one picture acquired by a camera to judge whether an object in the picture is a living body or not in the identification process.

In the related art, there are two methods for silent live body identification, one is to regard the live body identification task as a binary classification task, acquire image features through operations (such as convolution, pooling and jumping) of a series of convolutional neural networks, and predict whether a given face is a live body sample or a non-live body attack sample through a full-connection mode. However, the disadvantage of this scheme is that the learned features are not generalized and the discrimination ability for the sample is not sufficient; the other method is to introduce methods such as blood flow, depth map, LBP map estimation and the like, convert a living body identification task into a supervision problem with a target, assist some convolution operators sensitive to local textures, and finally judge whether a given picture is a non-living body attack sample or not by calculating the estimated depth map. However, the method has the disadvantages that the camera and lighting conditions of different face acquisitions are not fully considered, and different attack categories and attributes of the faces are not considered.

In practical application, compared with time-series living body recognition, silent living body recognition has faster response speed and fewer model parameters because a single picture is processed. The embodiment can judge whether the sample living body collected by the current camera is a non-living body attack sample by using silent living body identification, and reject to identify the non-living body attack sample, so that the function of filtering the non-living body sample is achieved, the technical problem that the living body identification result is inaccurate, and the potential safety hazard brought to a face identification system by the attack sample is solved.

The above-described method of this example is further illustrated below.

Fig. 6A is a schematic diagram of a living body identification system according to an embodiment of the present invention. As shown in fig. 6A, a sample picture is input, after the sample picture enters the whole system, the module 1 calculates texture information of local details of the sample picture by using CD conv to form feature expression, and the receptive field of a convolution neuron can be continuously increased by using 3 pooling operations (pool) to form texture features at different scales, so as to obtain a feature map 1. Optionally, the 3 pooling operations described above are performed at intervals of 3 CD conv, where CD conv includes 1 convolution layer (conv).

It should be noted that the residual ratio of the CD Conv may be adjusted, and the number of CD Conv in the module 1 may be adjusted.

In this embodiment, the overall network structure of the module 2 is consistent with ResNet-18, and the module 2 may be initialized by using parameters of ResNet-18 pre-trained by ImageNet, and the module 2 may be composed of 1 convolutional layer, 1 pooling layer, a structure of 4 residual module cascades, and 1 pooling layer, where the structure of 4 residual module cascades may be a structure of 4 residual modules Base Block1, Base Block2, Base Block2, and Base Block2 cascades, where Base Block1 is composed of 4 convs, and Base Block2 is composed of two convs for upsampling and two convs for downsampling, and high-level semantic information of a given sample picture is fully extracted by the module 2, and a high-level semantic feature map, that is, the feature map 2 is output.

Considering that the local details of the sample picture have a significant guiding effect on the living attack category and the living body recognition category, in the embodiment, the feature map 1 and the feature map 2 are spliced along the channel direction to obtain the target feature map, the spliced target feature map is used as the input of the subsequent full connection layer (FC), and the living body recognition model is trained to be used for predicting the non-living body attack type and the living body recognition.

Optionally, the embodiment may perform living body recognition two-classification on the spliced target feature map through one FC, and perform non-living body attack type classification through another FC. Alternatively, the live recognition binary classification of this embodiment is true/false prediction, and the non-live attack type classification may be prediction of an 11-dimensional one-hot vector indicating the category of the non-live attack.

Optionally, the embodiment performs a CD conv operation on the feature map 2 to obtain a depth estimate of the live sample and a reflection estimate (adaptation) of the non-live attack sample. In this embodiment, the adaptation corresponds to different loss function calculation modes for different classes of attack samples, and optionally, for attack samples of the a4 paper and Photo (Photo) type, the reflection estimate loss may not be propagated backwards.

In this embodiment, considering that the face attribute and the illumination of the environment belong to high-level semantic information, the embodiment may directly use the feature map 2 to classify the face attribute by one FC and classify the illumination of the environment by another FC. For the depth map and reflection map estimation, the embodiment can adopt MES loss as a loss function, and the weight value of the MES loss is set to 0.1; for living body identification two classification (living body identification), non-living body attack type classification and illumination classification, cross entropy loss can be adopted in the embodiment, and corresponding weights are respectively set to be 1, 0.1 and 0.01; considering that the same face may have multiple face attributes, for the face attribute classification, the embodiment may adopt BCE loss, and the weight value may be set to 1. Considering that the reflection map has a negative effect on the prediction of the printed attack sample, this embodiment can cut off the gradient propagation from the reflection map when processing the printed attack sample (photo, a4 paper). By carrying out Moire detection processing on the sample picture predicted as the living body, false negative samples are further eliminated, and the accuracy of living body identification is improved.

Optionally, the embodiment may intercept a face portion of a screen shot image, obtain an image block only containing skin color in a face, perform third-order color moment calculation on the image block, perform clustering as a feature, divide a non-living sample of the screen shot into two types including moire and not including moire, and train a moire recognition network by using a real sample and an attack sample including moire as a training set, so as to perform moire detection processing on a sample picture predicted as a living body through the moire recognition network.

Fig. 6B is a schematic diagram of a scenario of living body recognition according to an embodiment of the present invention. As shown in fig. 6B, a sample picture is input to the computing device, and any one of the plurality of images may be determined as the sample picture. After the sample picture is obtained, the texture feature map and the high-level semantic feature map of the sample picture can be extracted, and the texture feature map and the high-level semantic feature map can be spliced, so that the target feature map is obtained. According to the embodiment, the obtained target feature map can be input to the full-connection layer for training to obtain the living body recognition model, the texture feature and the high-layer semantic feature of the target picture are extracted through the living body recognition model, whether the target object of the target picture is a living body can be predicted based on the texture feature and the high-layer semantic feature of the target picture, and then the living body recognition result of whether the target object is the living body is input to the computing device for displaying.

Fig. 6C is a schematic diagram of another live body identification scenario according to an embodiment of the invention. As shown in fig. 6C, a target picture to be subjected to living body recognition may be added to the display interface, where the target picture is any one of the multiple images of the target object captured by the camera. And then responding to a feature extraction operation acted on a display interface, calling a living body recognition model to extract texture features and high-level semantic features of a target picture, wherein the living body recognition model is generated by training a target feature map, the target feature map is formed by splicing the texture feature map and the high-level semantic feature map recognized from the sample picture, whether a target object of the target picture is a living body is predicted based on the texture features and the high-level semantic features of the target picture, and a living body recognition result of whether the target object is the living body is displayed on the display interface.

According to the embodiment, by the scheme, the attribute information of the live attack samples and the attribute information of the non-live attack samples can be fully utilized, and different high-level semantic information (ResNet-18) and different bottom-level texture information (CD Conv) are used for processing through different category labels (such as face attributes and non-live attack attributes).

In addition, in the embodiment, living body identification is regarded as a binary problem, the attributes of the human face, the categories of non-living body attack samples and illumination information during the acquisition of the attack samples are fully mined while depth estimation and reflection estimation are introduced, a bottom-layer texture feature extraction network and a high-layer semantic feature extraction network can be designed in a targeted manner when different attributes are predicted, and the accuracy is further improved by using a post-processing means; the embodiment further uses a post-processing means of moire detection, so that the accuracy of living body identification is further improved, and the technical problem that the result of living body identification is inaccurate when the living body identification is carried out is solved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 3

According to an embodiment of the present invention, there is also provided a data processing apparatus for implementing the data processing method shown in fig. 2.

Fig. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention. As shown in fig. 7, the data processing apparatus 70 may include: a first acquisition unit 71, a first extraction unit 72, and an input unit 73.

The first obtaining unit 71 is configured to obtain a sample picture, where the sample picture is any one of a plurality of images collected by a camera.

The first extraction unit 72 is configured to extract a texture feature map and a high-level semantic feature map of the sample picture, where the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map.

And the input unit 73 is used for inputting the target feature map to the full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-layer semantic features of the verification image, and the texture features and the high-layer semantic features are used for carrying out living body identification on the verification image.

It should be noted here that the first acquiring unit 71, the first extracting unit 72 and the input unit 73 correspond to step S202 to step S206 of embodiment 1, respectively, and the three units are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of embodiment 1. It should be noted that the above units may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

According to an embodiment of the present invention, there is also provided an identification apparatus of a living body for implementing the identification method of a living body shown in fig. 3 described above.

Fig. 8 is a schematic view of an identification apparatus of a living body according to an embodiment of the present invention. As shown in fig. 8, the living body identification apparatus 80 may include: a second acquisition unit 81, a second extraction unit 82, and a prediction unit 83.

The second obtaining unit 81 is configured to obtain a target picture to be subjected to living body recognition, where the target picture is any one of a plurality of images of a target object acquired by a camera.

And a second extraction unit 82, configured to invoke a living body recognition model to extract texture features and high-level semantic features of the target picture, where the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map and the high-level semantic feature map recognized from the sample picture.

And a prediction unit 83 for predicting whether the target object is a living body based on the texture feature and the high-level semantic feature of the target picture.

It should be noted here that the second acquiring unit 81, the second extracting unit 82 and the predicting unit 83 correspond to steps S302 to S306 of embodiment 1, respectively, and the three units are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of embodiment 1. It should be noted that the above units may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

According to an embodiment of the present invention, there is also provided a data processing apparatus for implementing the data processing method shown in fig. 4.

FIG. 9 is a schematic diagram of another data processing apparatus according to an embodiment of the present invention. As shown in fig. 9, the data processing apparatus 90 may include: a first receiving unit 91, a second receiving unit 92, and a first display unit 93.

The first receiving unit 91 is configured to receive an attendance request through a display interface of the attendance system, where the attendance request is used to capture a target picture of a target object, and the target picture is any one of multiple images of the target object acquired by a camera.

And the second receiving unit 92 is configured to receive the texture features and the high-level semantic features of the fed-back target picture based on the attendance request through the attendance system, wherein the texture features and the high-level semantic features of the target picture are extracted by using a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from the sample picture and the high-level semantic feature map.

And the first display unit 93 is configured to display a living body identification result on a display interface through the attendance system, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

It should be noted that the first receiving unit 91, the second receiving unit 92 and the first display unit 93 correspond to steps S402 to S406 of embodiment 1, respectively, and the three units are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of embodiment 1. It should be noted that the above units may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

FIG. 10 is a schematic diagram of another data processing apparatus according to an embodiment of the present invention. As shown in fig. 10, the data processing apparatus 100 may include: a second display unit 101, an output unit 102, a third acquisition unit 103, a third display unit 104, and an execution unit 105.

The second display unit 101 is configured to display a living body authentication interface through the payment system, and display a target picture to be subjected to living body identification in the living body authentication interface, where the target picture is any one of a plurality of images of a target object acquired by the camera, and the target object is located in an authentication area of the living body authentication interface.

An output unit 102, configured to output, by the payment system, a verification instruction, where the verification instruction is used to instruct a target object displayed in the target picture to perform a predetermined action instructed by the verification instruction.

And a third obtaining unit 103, configured to obtain texture features and high-level semantic features of the target picture based on the verification instruction through the payment system, where the texture features and the high-level semantic features of the target picture are extracted by using a living body recognition model, and the living body recognition model is generated by training a target feature map, where the target feature map is formed by splicing a texture feature map and a high-level semantic feature map that are recognized from the sample picture.

And a third display unit 104 for displaying the living body recognition result on the living body authentication interface through the payment system, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

An execution unit 105 for executing a payment operation in a case where the target object is confirmed to be a living body by the payment system.

It should be noted here that the second display unit 101, the output unit 102, the third obtaining unit 103, the third display unit 104 and the execution unit 105 respectively correspond to steps S502 to S510 of embodiment 1, and five units are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of embodiment 1. It should be noted that the above units may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the living body recognition device of the embodiment, the texture feature map and the high-level semantic feature map of the sample picture are extracted by adopting a silent living body recognition method and spliced to obtain the target feature map, so that the living body recognition model is obtained through training of the target feature map, and then the living body recognition model is adopted to perform living body recognition on the living body recognition image to be performed, so that a non-living body attack sample is rejected to be recognized, the purpose of filtering the non-living body attack sample is achieved, the technical problem that the living body recognition result is inaccurate when the living body recognition is performed is solved, and the technical effect of improving the accuracy of the living body recognition is achieved.

Example 4

Embodiments of the present invention may provide a living body identification system, which may include a computer terminal, which may be any one of computer terminal devices in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the data processing method of the application program: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

Alternatively, fig. 11 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 11, the computer terminal a may include: one or more processors 112 (only one shown), a memory 114, and a transmission device 116.

The transmission device is used for transmitting the sample picture; the memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the data processing method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the data processing method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, which may be connected to the computer terminal a via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

Optionally, the processor may further execute the program code of the following steps: determining a region picture where a target object is located in the sample picture; identifying texture information in the region picture; and performing pooling operation on the texture information in the region picture by adopting a central differential convolution network model to generate a texture feature map.

Optionally, the processor may further execute the program code of the following steps: analyzing a sample picture by adopting an extraction network formed by cascading a plurality of residual modules in a pre-training model, and extracting high-level semantic information from the sample picture; and constructing a high-level semantic feature map based on the high-level semantic information.

Optionally, the processor may further execute the program code of the following steps: training face information in the sample picture, and generating attribute information of the face in the sample picture; training light information in the sample picture to generate illumination information of the face in the sample picture; and constructing a high-level semantic feature map based on the attribute information and the illumination information of the face in the sample picture.

Optionally, the processor may further execute the program code of the following steps: after extracting the texture feature map and the high-level semantic feature map of the sample picture, obtaining depth estimation and reflection estimation of the sample picture based on the high-level semantic feature map, wherein MES loss is adopted as a loss function aiming at an estimation function of the depth estimation and the reflection estimation.

Optionally, the processor may further execute the program code of the following steps: after a living body identification model is obtained through training, a verification picture to be subjected to living body identification is obtained; extracting texture features at a bottom layer and semantic features at a high layer from the verification picture by adopting a living body recognition model; and predicting whether the target object in the verification picture is a living body or not based on the texture feature and the semantic feature.

Optionally, the processor may further execute the program code of the following steps: determining the type of the verification picture based on the texture features and semantic features extracted from the verification picture under the condition that the target object in the verification picture is a non-living body, wherein the type of the verification picture comprises at least one of the following types: a planar photograph, a photograph of a stereoscopic mask, a presentation image in a display device.

As an alternative example, the processor may also call the information and application stored in the memory through the transmission device to perform the following steps: acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera; calling a living body recognition model to extract texture features and high-level semantic features of a target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map; and predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

As an alternative example, the processor may also call the information and application stored in the memory through the transmission device to perform the following steps: the method comprises the steps that an attendance checking request is received by a display interface of an attendance checking system, wherein the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one picture in a plurality of images of the target object acquired by a camera; the attendance checking system receives the texture features and the high-level semantic features of the fed back target picture based on the attendance checking request, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and the high-level semantic feature map; the attendance system displays a living body identification result on a display interface, wherein whether the target object is a living body or not is predicted based on the texture feature and the high-level semantic feature of the target picture.

As an alternative example, the processor may also call the information and application stored in the memory through the transmission device to perform the following steps: displaying a living body authentication interface on a payment system, and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface; the payment system outputs a verification instruction, wherein the verification instruction is used for instructing a target object displayed in the target picture to execute a predetermined action instructed by the verification instruction; the payment system acquires texture features and high-level semantic features of a target picture based on a verification instruction, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map; the payment system displays a living body identification result on a living body authentication interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture; the payment system executes a payment operation when confirming that the target object is a living body.

The embodiment of the invention provides a data processing scheme. Acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image. That is to say, in the embodiment, the texture feature map and the high-level semantic feature map of the sample picture are extracted by using the silent living body recognition method, and are spliced to obtain the target feature map, so that the living body recognition model is obtained through training of the target feature map, and then the living body recognition model is used for carrying out living body recognition on the living body recognition image to be carried out, so that a non-living body attack sample is rejected for recognition, the purpose of filtering the non-living body attack sample is achieved, the technical problem that the living body recognition result is inaccurate when the living body recognition is carried out is solved, and the technical effect of improving the accuracy of the living body recognition is achieved.

It can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the computer terminal a may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is not intended to limit the structure of the computer terminal a. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 11, or have a different configuration than shown in fig. 11.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 4

Optionally, in this embodiment, the computer-readable storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals.

Optionally, in this embodiment, the computer readable storage medium is configured to store program code for performing the following steps: acquiring a sample picture, wherein the sample picture is any one of a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; and inputting the target feature map into a full-connection layer of the neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of the verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: determining a region picture where a target object is located in the sample picture; identifying texture information in the region picture; and performing pooling operation on the texture information in the region picture by adopting a central differential convolution network model to generate a texture feature map.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: analyzing a sample picture by adopting an extraction network formed by cascading a plurality of residual modules in a pre-training model, and extracting high-level semantic information from the sample picture; and constructing a high-level semantic feature map based on the high-level semantic information.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: training face information in the sample picture, and generating attribute information of the face in the sample picture; training light information in the sample picture to generate illumination information of the face in the sample picture; and constructing a high-level semantic feature map based on the attribute information and the illumination information of the face in the sample picture.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: after extracting the texture feature map and the high-level semantic feature map of the sample picture, obtaining depth estimation and reflection estimation of the sample picture based on the high-level semantic feature map, wherein MES loss is adopted as a loss function aiming at estimation functions of the depth estimation and the reflection estimation.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: after a living body identification model is obtained through training, a verification picture to be subjected to living body identification is obtained; extracting texture features at a bottom layer and semantic features at a high layer from the verification picture by adopting a living body recognition model; and predicting whether the target object in the verification picture is a living body or not based on the texture feature and the semantic feature.

Optionally, the computer readable storage medium is further arranged to store program code for performing the steps of: determining the type of the verification picture based on the texture features and semantic features extracted from the verification picture under the condition that the target object in the verification picture is a non-living body, wherein the type of the verification picture comprises at least one of the following types: a plane photograph, a photograph of a stereoscopic mask, a display image in a display device.

As an alternative example, the computer readable storage medium is further arranged to store program code for performing the steps of: acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera; calling a living body recognition model to extract texture features and high-level semantic features of a target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map; and predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

As an alternative example, the computer readable storage medium is further arranged to store program code for performing the steps of: the method comprises the steps that an attendance checking request is received by a display interface of an attendance checking system, wherein the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one picture in a plurality of images of the target object acquired by a camera; the attendance checking system receives the texture features and the high-level semantic features of the fed-back target picture based on an attendance checking request, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map; the attendance system displays the living body identification result on a display interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

As an alternative example, the computer readable storage medium is further arranged to store program code for performing the steps of: displaying a living body authentication interface on a payment system, and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface; the payment system outputs a verification instruction, wherein the verification instruction is used for instructing a target object displayed in the target picture to execute a predetermined action indicated by the verification instruction; the payment system acquires texture features and high-level semantic features of a target picture based on a verification instruction, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and the high-level semantic feature map; the payment system displays a living body identification result on a living body authentication interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture; the payment system executes a payment operation when confirming that the target object is a living body.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A data processing method, comprising:

acquiring a sample picture, wherein the sample picture is any one picture in a plurality of images acquired by a camera;

extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map;

inputting the target feature map into a full-connection layer of a neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of a verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

2. The method of claim 1, wherein extracting the texture feature map of the sample picture comprises:

determining an area picture where a target object is located in the sample picture;

identifying texture information in the region picture;

and performing pooling operation on the texture information in the region picture by adopting a central differential convolution network model to generate the texture feature map.

3. The method of claim 1, wherein extracting the high-level semantic feature map of the sample picture comprises:

analyzing the sample picture by adopting an extraction network formed by cascading a plurality of residual modules in a pre-training model, and extracting high-level semantic information from the sample picture;

and constructing the high-level semantic feature map based on the high-level semantic information.

4. The method of claim 3, wherein the high-level semantic information comprises at least one of: the face information and the illumination information of the sample picture, wherein the high-level semantic feature map is constructed based on the high-level semantic information, and the construction comprises the following steps:

training face information in the sample picture, and generating attribute information of the face in the sample picture;

training light information in the sample picture to generate illumination information of the face in the sample picture;

and constructing the high-level semantic feature map based on the attribute information and the illumination information of the face in the sample picture.

5. The method according to claim 3, wherein after extracting the texture feature map and the high-level semantic feature map of the sample picture, the method further comprises:

and obtaining depth estimation and reflection estimation of the sample picture based on the high-level semantic feature map, wherein MES loss is adopted as a loss function aiming at an estimation function of the depth estimation and the reflection estimation.

6. The method of any one of claims 1 to 5, wherein after obtaining the living body identification model, the method further comprises:

acquiring a verification picture to be subjected to living body identification;

extracting texture features at the bottom layer and semantic features at the high layer from the verification picture by adopting the living body recognition model;

predicting whether a target object in the verification picture is a living body based on the texture feature and the semantic feature.

7. The method according to claim 6, wherein in a case that a target object in the verification picture is a non-living object, determining a type of the verification picture based on the extracted texture feature and semantic feature in the verification picture, wherein the type of the verification picture comprises at least one of: a planar photograph, a photograph of a stereoscopic mask, a presentation image in a display device.

8. A method of identifying a living body, comprising:

acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera;

calling a living body recognition model to extract texture features and high-level semantic features of the target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map;

and predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

9. A data processing method, comprising:

the method comprises the steps that an attendance checking request is received by a display interface of an attendance checking system, wherein the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one of a plurality of images of the target object acquired by a camera;

the attendance checking system receives the fed texture features and high-level semantic features of the target picture based on the attendance checking request, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map;

the attendance system displays a living body identification result on the display interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

10. A data processing method, comprising:

displaying a living body authentication interface on a payment system, and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface;

the payment system outputs a verification instruction, wherein the verification instruction is used for instructing a target object displayed in the target picture to execute a predetermined action indicated by the verification instruction;

the payment system acquires texture features and high-level semantic features of the target picture based on the verification instruction, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing the texture feature map recognized from the sample picture and the high-level semantic feature map;

the payment system displays a living body identification result on the living body authentication interface, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture;

the payment system executes a payment operation in a case where the target object is confirmed to be a living body.

11. A data processing apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a sample picture, and the sample picture is any one of a plurality of images acquired by a camera;

the first extraction unit is used for extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map;

and the input unit is used for inputting the target feature map into a full connection layer of a neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of a verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.

12. An apparatus for recognizing a living body, comprising:

the second acquisition unit is used for acquiring a target picture to be subjected to living body identification, wherein the target picture is any one of a plurality of images of a target object acquired by a camera;

the second extraction unit is used for calling a living body recognition model to extract texture features and high-level semantic features of the target picture, wherein the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map and a high-level semantic feature map which are recognized from a sample picture;

and the prediction unit is used for predicting whether the target object is a living body or not based on the texture feature and the high-level semantic feature of the target picture.

13. A data processing apparatus, comprising:

the system comprises a first receiving unit, a second receiving unit and a third receiving unit, wherein the first receiving unit is used for receiving an attendance checking request through a display interface of an attendance checking system, the attendance checking request is used for capturing a target picture of a target object, and the target picture is any one of a plurality of images of the target object acquired by a camera;

the second receiving unit is used for receiving the fed-back texture features and high-level semantic features of the target picture through the attendance checking system based on the attendance checking request, wherein the texture features and the high-level semantic features of the target picture are extracted by adopting a living body recognition model, the living body recognition model is generated by training a target feature map, and the target feature map is formed by splicing a texture feature map recognized from a sample picture and a high-level semantic feature map;

the first display unit is used for displaying a living body identification result on the display interface through the attendance system, wherein whether the target object is a living body is predicted based on the texture feature and the high-level semantic feature of the target picture.

14. A data processing apparatus, comprising:

the second display unit is used for displaying a living body authentication interface through a payment system and displaying a target picture to be subjected to living body identification in the living body authentication interface, wherein the target picture is any one of a plurality of images of a target object acquired by a camera, and the target object is positioned in an authentication area of the living body authentication interface;

an output unit, configured to output, by the payment system, a verification instruction, where the verification instruction is used to instruct a target object displayed in the target picture to perform a predetermined action instructed by the verification instruction;

a third obtaining unit, configured to obtain, by the payment system, texture features and high-level semantic features of the target picture based on the verification instruction, where the texture features and the high-level semantic features of the target picture are extracted by using a living body recognition model, where the living body recognition model is generated by training a target feature map, where the target feature map is formed by splicing a texture feature map and a high-level semantic feature map recognized from a sample picture;

a third display unit, configured to display a living body identification result on the living body authentication interface through the payment system, wherein whether the target object is a living body is predicted based on a texture feature and a high-level semantic feature of the target picture;

an execution unit configured to execute a payment operation in a case where it is confirmed by the payment system that the target object is a living body.

15. A computer-readable storage medium, comprising a stored program, wherein the program, when executed by a processor, controls an apparatus in which the computer-readable storage medium is located to perform the method of any of claims 1-10.

16. A processor, characterized in that the processor is configured to run a program, wherein the program when run by the processor performs the method of any of claims 1 to 10.

17. A data processing system, comprising:

a processor;

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a sample picture, wherein the sample picture is any one picture in a plurality of images acquired by a camera; extracting a texture feature map and a high-level semantic feature map of the sample picture, wherein the texture feature map and the high-level semantic feature map are spliced to obtain a target feature map; inputting the target feature map into a full-connection layer of a neural network model for training to obtain a living body identification model, wherein the living body identification model is used for extracting texture features and high-level semantic features of a verification image, and the texture features and the high-level semantic features are used for carrying out living body identification on the verification image.