CN114783029A

CN114783029A - Face information identification method, device, storage medium and neural network model

Info

Publication number: CN114783029A
Application number: CN202210456894.XA
Authority: CN
Inventors: 吴汉俊; 魏玉蓉; 苏云强
Original assignee: Sunell Technology Corp
Current assignee: Sunell Technology Corp
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-07-22

Abstract

The application belongs to the technical field of face recognition detection, and particularly relates to a face information recognition method, face information recognition equipment, a storage medium and a neural network model. The method is applied to a neural network model, the neural network model comprises a first channel group and a second channel group, and the method comprises the following steps: acquiring an image to be identified; identifying face position information in an image to be identified; determining face feature information according to the face position information; and outputting the face position information through the first channel group, and outputting the face feature information through the second channel group. By increasing the number of the channel groups in the neural network model, the face position information and the face feature information are simultaneously output in different channel groups through one-time input and reasoning calculation process based on the same neural network model. According to the technical scheme provided by the embodiment of the application, resource waste is avoided under the condition of ensuring the precision, and the resource and performance consumption of equipment are saved.

Description

Face information identification method, device, storage medium and neural network model

Technical Field

The application belongs to the technical field of face recognition detection, and particularly relates to a face information recognition method, face information recognition equipment, a storage medium and a neural network model.

Background

The Yolov5 neural network model is a universal target detection model, the Yolov5-face recognition detection model is based on the Yolov5 neural network model, the detection of regression branches of key points of a face is newly added, and after an image to be recognized is input into the Yolov5-face recognition detection model by the electronic equipment, face position information can be output through the model.

When the electronic device needs to detect face position information and face three-dimensional angle information in an image to be recognized at the same time, on the basis of a Yolov5-face recognition detection model, another neural network for recognizing a face three-dimensional angle is generally used, such as Hopenet. Illustratively, after acquiring an image to be recognized, the electronic device firstly inputs the image to be recognized into a Yolov5-face recognition detection model to obtain face position information, then cuts a target face in the image to be recognized based on the face position information to obtain a cut target face image, and inputs the cut target face image into a neural network for recognizing a three-dimensional angle of the face to obtain three-dimensional angle information of the face. Therefore, when the electronic equipment identifies the face position information and the face three-dimensional angle information, two different neural network models need to be deployed, and the three-dimensional angle information of the face can be obtained through two times of input, so that the reasoning time and the performance consumption of the electronic equipment are increased.

Disclosure of Invention

In view of this, embodiments of the present application provide a face information identification method, a device, a storage medium, and a neural network model, so as to solve the problems that in the prior art, when performing face identification, two different neural network models need to be deployed in an electronic device, and the electronic device needs to perform two input processes to obtain face position information and face three-dimensional angle information, which increases inference time and performance consumption of the electronic device.

A first aspect of the embodiments of the present application provides a face information recognition method, which is applied to a neural network model, where the neural network model includes a first channel group and a second channel group, and includes: acquiring an image to be identified; identifying face position information in an image to be identified; determining face feature information according to the face position information; the face position information is output through the first channel group, and the face feature information is output through the second channel group.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the identifying face position information in an image to be identified includes: the method comprises the steps that an image to be identified is subjected to down-sampling to obtain first target images with different down-sampling multiples; performing feature fusion on the first target image to obtain a second target image; and recognizing the face position information in the second target image.

With reference to the first possible implementation manner of the first aspect, determining face feature information according to face position information includes: cutting a target face in the image to be recognized according to the face position information to obtain a cut target face image; and recognizing the face feature information in the target face image.

With reference to the first aspect, in a third possible implementation manner of the first aspect, before the outputting the face position information through the first channel group and the outputting the face feature information through the second channel group, the method further includes: and decoding the face position information and the face characteristic information.

With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the face feature information is three-dimensional angle information of a face.

A second aspect of embodiments of the present application provides a neural network model comprising a first channel group and a second channel group, the neural network model being configured to: acquiring an image to be identified; identifying face position information in an image to be identified; determining face feature information according to the face position information; face position information is output through the first channel group, and face feature information is output through the second channel group.

With reference to the second aspect, in a first possible implementation manner of the second aspect, the number of output channels of the second channel group is determined according to the face feature information.

With reference to the second aspect, in a second possible implementation manner of the second aspect, the neural network model is trained by: modifying the number of output channels of an output end in the neural network model to enable the neural network model to comprise a first channel group for recognizing face position information and a second channel group for recognizing face feature information; calculating loss parameters of face information in the neural network model through a loss function; and training the neural network model by adopting the loss parameters and the label values of the face information in the training images to obtain the neural network model capable of identifying the face position information and the face characteristic information.

In an embodiment of the present application, in an aspect, a face information recognition device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method according to any one of the first aspect are implemented.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the method according to any one of the first aspects.

Compared with the prior art, the embodiment of the application has the beneficial effects that: the embodiment of the application provides a face information identification method, a face information identification device, face information identification equipment and a storage medium, wherein the number of channel groups in a neural network model is increased, so that face position information and face feature information are simultaneously output in different channel groups through one-time input and reasoning calculation process based on the same neural network model. According to the technical scheme provided by the embodiment of the application, resource waste is avoided under the condition of ensuring the precision, and the resource and performance consumption of equipment are saved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of a face information recognition method according to an embodiment of the present application;

fig. 2 is a schematic diagram of an output process of an image to be recognized before modifying the number of output channels according to an embodiment of the present application;

fig. 3 is a schematic diagram of an output process of an image to be recognized after the number of output channels is modified according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a neural network model configuration flow provided by an embodiment of the present application;

fig. 5 is a schematic diagram of a face information recognition device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

In the neural network model, an electronic device usually uses a neural network model such as a Yolov5-face recognition detection model to detect face position information in an image to be recognized, where the face position information includes face scores, areas where faces are located, face key points (landmark), and the like. The face key points comprise: left eye, right eye, nose and the position of the left and right mouth corners.

However, in some scenarios (for example, three-dimensionally modeling a human face according to an image to be recognized), the electronic device needs to recognize not only the position information of the human face in the image to be recognized, but also three-dimensional angle information of the human face, which includes angle information of a pitch angle (pitch), a yaw angle (yaw), and a roll angle (roll).

In other scenes (for example, the image to be recognized is removed according to the front side of the face in the image to be recognized), the electronic device needs to recognize not only face position information in the image to be recognized, but also three-dimensional angle information of the face, and determines whether the face in the image to be recognized is in the front face or the side face and the side angle of the side face according to the three-dimensional angle information of the face. And then removing the image to be recognized according to the lateral angle of the side face in the side face picture.

In order to achieve the purpose, a method generally adopted at present is to introduce a neural network model for recognizing three-dimensional angle information of a human face, such as a pose estimation algorithm model (Hopentet), on the basis of a Yolov5-face human face recognition detection model. Therefore, when three-dimensional angle information of a human face needs to be detected through electronic equipment, firstly, an image to be recognized needs to be input into a Yolov5-face human face recognition detection model to obtain human face position information; then, the target face in the image to be recognized is cut based on the face position information to obtain a cut target face picture, and the cut target face picture is input into a neural network for detecting the three-dimensional angle of the face to obtain the three-dimensional angle information of the face. The method for acquiring the three-dimensional angle information of the human face needs two times of input of a target picture to be detected and a cut target human face picture. When the face position information is input, the data copying and memory backup operation of the face position information is required to be performed from the output end of the Yolov5-face recognition detection model, then the target face in the image to be recognized is cut based on the face position information to obtain a cut target face picture, and the cut target face picture is input into a neural network for detecting the three-dimensional angle of the face to perform network reasoning and decoding, so that the three-dimensional angle information of the face is obtained.

Meanwhile, if multiple faces exist on the same image to be recognized when the electronic equipment is used for detection, the data are respectively copied by the position information of the multiple faces and input into a neural network for detecting the three-dimensional angle of the faces, so that the inference time of the neural network is prolonged, and the performance consumption of the electronic equipment is increased.

Based on this, the embodiment of the application provides a neural network model, which can realize the purpose of inputting an image to be recognized once, and outputting the image to be recognized including face position information and face feature information simultaneously after the image to be recognized is recognized by the neural network model.

In this embodiment, the electronic device may be a smart phone, a tablet computer, a computer running an operating system, or a smart hardware device, and the like, which is not particularly limited in this embodiment.

In some embodiments, the electronic device may recognize the face information in the picture to be recognized in combination with a neural network model built in the electronic device, which is first exemplarily described below.

The present embodiment also provides a neural network model, which includes a first channel group and a second channel group, and is configured to: acquiring an image to be identified; identifying face position information in an image to be identified; determining face feature information according to the face position information; and outputting the face position information through the first channel group, and outputting the face feature information through the second channel group.

It should be understood that the neural network model provided in this embodiment may be obtained by training through training data for recognizing face position information and face feature information according to an initial neural network model, or may be obtained by performing configuration training based on an existing initial neural network model capable of recognizing face position information. The following explains the configuration training process of the neural network model provided in the embodiments of the present application, taking the latter as an example. As shown in FIG. 1, the configuration training process specifically includes the following steps S11-S13.

In some embodiments, the neural network model built in the electronic device is a Yolov5-face recognition detection model. The Yolov5-face recognition detection model serving as a convolutional neural network model mainly comprises a Backbone network part (Backbone), a network part (Head) for acquiring network output content and a network part (tack) for collecting feature maps in different stages.

Wherein, the Backbone part: and processing the image to be identified by using a convolutional neural network to generate a deep characteristic diagram for abstracting and extracting the characteristics of the image.

Head section: and predicting the output image features with different sizes by using the extracted features, generating a boundary box and predicting the category information.

The Neck part: the method is placed between the backbone and the head and is used for fusing the features extracted by the backbone network backbone.

The following explains the configuration training process of the neural network model provided in this embodiment in detail based on the Yolov5-face recognition detection model with the above structure.

S11, the electronic device modifies the output channel number of the output end in the initial neural network model, so that the initial neural network model comprises a first channel group for identifying face position information and a second channel group for identifying face feature information.

In some embodiments, the number of output channels of the second channel group is determined by the facial feature information. For example, when the face feature information is three-dimensional angle (pitch, yaw, roll) information of a face, three channels for outputting the pitch, yaw, and roll need to be added to the neural network model, where the three channels are a second channel group.

Illustratively, for the initial neural network model, the electronic device processes the image to be identified with a pixel size of 608 × 608 through three networks in the initial neural network model, and then obtains three feature images with features of 76 × 3 (4+1+1+10) × 3, 38 × 38 (4+1+1+10) × 3, and 19 × 19 (4+1+1+10) × 3 at the output of the Head part, as shown in fig. 2. Wherein 76 × 76, 38 × 38, and 19 × 19 are image pixels obtained by subjecting the image to be recognized to different down-sampling multiples. (4+1+1+10) represents the total number of output channels at the output end of the Head part, 4 represents the number of output channels (one output channel for each position coordinate) for 4 position coordinates (x, y, w, h), x, y represent the x coordinate and y coordinate of the upper left corner of the face position in the anchor frame on the image coordinate system, respectively, and w, h are the width and height of the rectangle in the anchor frame. The first 1 in (4+1+1+10) refers to the score of the object, i.e., whether the object is present, 1 if the object is present, and 0 if the object is not present, and the score of the object is output through one output channel. The second 1 is the score of the face type, that is, after the target exists, the type of the target is 1 if the target belongs to the face, and 0 if the target does not belong to the face, and the score of the face type is output through an output channel. 10 refers to the positions of five key points (left eye, right eye, nose, and left and right mouth corners) in the face position information, each key point containing (x, y), and thus, the five key points are output through ten output channels. And 3 is that each layer of output characteristic image has 3 groups of anchor frames with different scales.

The electronic device modifies the number of output channels of the Head portion in the neural network model, i.e., (4+1+1+10) of the output end of the Head portion in the above embodiment. When the three-dimensional angle information of the face in the image to be recognized needs to be calculated, the number of the output channels is modified into (4+1+1+10+3), and a target parameter 3 used for calculating the three-dimensional angle information of the face is added.

In some embodiments, after the electronic device completes the modification of the number of output channels, the electronic device also uses 608 × 608 images to be recognized as input images, and after passing through the three-part network in the neural network model, three feature images with output sizes of 76 × 76 (4+1+1+10+3) × 3, 38 × 38 (4+1+ 10+3) × 3, 19 × 19 (4+1+1+10+3) × 3 are obtained at the output end of the Head part, as shown in fig. 3, where the expressions of 4+1+1+10 and 3 in (4+1+ 10+3) are consistent with the above expressions, and the improvement is that the +3 part is added with the target parameter 3 for calculating the three-dimensional angle information of the human face.

In this embodiment, the electronic device completes modification of the number of output channels in the initial neural network model, and adds a target parameter for calculating face target information, thereby obtaining a modified neural network model.

And S12, the electronic equipment calculates the loss parameters of the face information in the neural network model through the loss function.

In some embodiments, the electronic device calculates the loss parameter of the facial information by the following formula:

wherein,

in the formula, i represents the number of human faces in one image to be recognized, and n represents the total number of human faces in all images to be recognized; loss (x, y) represents the average absolute error between the predicted value and the tag value, x_iAs a predicted value of face information, y_iThe predicted value is a detection value of the face information calculated through a neural network model, and the label value is a real value of the face information in the image to be recognized. And calculating the deviation between the predicted value and the label value by adopting the loss calculation method.

Illustratively, after an image to be recognized with a pixel size of 608 × 608 passes through a three-part network in the neural network model, before the number of output channels at the output end of the neural network model is modified, three feature images of 76 × 76 (4+1+1+10) × 3, 38 × (4+1+1+10) × 3, 19 × (4+1+1+10) × 3 are obtained at the output end of the Head part, and Loss parameters Loss of face information are calculated as Loss lbox of face position, Loss lobj of target, Loss lclma of face classification, and Loss rk of keypoint regression, that is, Loss lbox + lobj + lcs + lmark. After modifying the number of output channels at the output end of the neural network model, three feature images of 76 × (4+1+ 10+3) × 3, 38 × (4+1+ 10+3) × 3, 19 × (4+1+1+10+3) × 3, that is, target parameters 3 added to the face three-dimensional angle information for calculation, are obtained at the output end of the Head part, and when the electronic device calculates the loss parameters of the face information, the electronic device calculates the loss lbox of the face position, the loss lobj of the target, the loss lss of the face classification, the loss lmark of the keypoint regression, and the loss lpose of the three-dimensional angle information, that is, L ═ lbox + lobj + lcls + lmark + lpose.

And S13, the electronic equipment trains the neural network model by adopting the loss parameters and the label values of the face information in the training images to obtain the neural network model capable of identifying the face position information and the face characteristic information.

In some embodiments, the electronic device first calibrates the training image to obtain a label value of the face information in the training image, that is, the face position information and the true value of the face feature information in the training image. And then inputting the calibrated training image and the calculated face information loss parameter into a neural network model for training to obtain the neural network model adapted to the loss parameter and the training image.

Based on the neural network model, the present embodiment provides a face information recognition method, and fig. 4 shows a flowchart of the face information recognition method provided by the present application, and as shown in fig. 4, the method includes the following steps S1-S5.

And S1, the electronic equipment acquires the image to be recognized.

In this embodiment, the image to be recognized is a two-dimensional image including a target face, which may be a black-and-white picture or a color picture.

In some embodiments, the image to be recognized may be obtained by taking a picture with a camera built in the electronic device, where the camera may be any camera capable of acquiring a two-dimensional picture, such as: monocular camera, color camera, black and white camera, etc.

In other embodiments, the image to be recognized may be acquired by another image acquisition device, and then the acquired image to be recognized is sent to the electronic device for acquisition. The image capture device may be a cell phone, camera, watch, or other wearable device.

And S2, the electronic equipment identifies the face position information in the image to be identified through the neural network model.

In some embodiments, the electronic device first performs feature extraction on an image to be identified through a Backbone network Backbone part, that is, the image to be identified obtains feature images of different multiples, that is, a first target image, in a downsampling manner. The down-sampling process is a process of continuously reducing the image to be identified according to a preset multiple. Taking the pixel size of the image to be recognized as 608 × 608 as an example, different feature images are output according to the multiple of the down sampling of 8, 16, and 32, respectively. The pixel of the output feature image is 76 × 76 when the down-sampling multiple is 8, 38 × 38 when the down-sampling multiple is 16, and 19 × 19 when the down-sampling multiple is 32. Three anchor frames of different sizes are generated on each feature image based on the feature objects present in the image. Each anchor frame is used for determining the position information of the feature object in the anchor frame, for example, when the corresponding feature object in the anchor frame is a human face, the anchor frame is used for obtaining the position information of the human face based on the anchor frame in the subsequent recognition process.

And then carrying out multi-scale image fusion on the feature images with different multiples after down sampling based on the Neck part of the network layer to obtain a second target image. In the image fusion process, the high-level features of the deep convolution (the features in the feature image corresponding to the downsampling multiple of 32) and the low-level features of the shallow convolution (the features in the feature image corresponding to the downsampling multiple of 8) are fused, and after the fusion is finished, the features are respectively output by the downsampling multiple.

And finally, inputting the second target image obtained after fusion into an output end network Head part, and carrying out classification and regression analysis through the Head part, wherein the classification and regression analysis comprises detecting whether a human face exists in the output image and the position of the human face and outputting the position information of the human face based on the position of the anchor frame.

And S3, the electronic equipment determines the face feature information according to the face position information in the neural network model.

In some embodiments, in the neural network model of the Yolov5-face recognition detection model, the electronic device cuts out a target face in an image to be recognized based on the recognized face position information, and subtracts the background in a face image and only keeps the face part to obtain a cut target face image. And then outputting the face feature information of the target face in the cut target face picture through the neural network model.

The cutting shape may be square, rectangle, ellipse, etc.

And S4, the electronic equipment decodes the face position information and the face feature information in the neural network model.

In some embodiments, the face position information and the face feature information are obtained based on the position coordinates of the anchor frame in the second target image, and the second target image is a reduced image to be recognized, so when determining the true values of the face position information and the face feature information in the image to be recognized, the face position information and the face feature information need to be decoded, that is, the face position information and the face feature information calculated based on the position coordinates of the anchor frame in the second target image are multiplied by the reduction multiple (down-sampling multiple) of each picture in the second target image, so as to obtain the face position information and the face feature information in the image to be recognized.

For example, when the electronic device uses the neural network model to identify the position coordinates of the left eye of the human face in the position information of the human face in the second target image with the down-sampling multiple of 8 as (x, y), the position coordinates of the left eye of the human face in the image to be identified are obtained after decoding (8x, 8 y).

S5, the electronic device outputs face position information through the first channel group of the neural network model, and the second channel group outputs the face feature information.

In a neural network model built in an electronic device, a first channel group for outputting face position information and a second channel group for outputting face feature information are configured.

Illustratively, the face position information, i.e., the score of the face in the image to be recognized, the region where the face is located, and five key points (positions of the left eye, the right eye, the nose, and the left mouth angle and the right mouth angle) information are output through the first channel group of the neural network model, and the feature information of the face, e.g., three-dimensional angle information (pitch, yaw, roll) of the face, is output through the second channel group of the neural network model.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 5 is a schematic diagram of a computing device for face position information according to an embodiment of the present application. As shown in fig. 5, the face position information calculation device 4 of the embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a calculation program of face position information, stored in said memory 41 and executable on said processor 40. The processor 40 implements the steps in the above-mentioned embodiment of the method for calculating the face position information when executing the computer program 42. Alternatively, the processor 40 implements the functions of the modules/units in the above-described device embodiments when executing the computer program 42.

Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program 42 in the computing device 4 of the face position information.

The computing device 4 of the face position information may be a tablet computer, a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computing device of the face position information may include, but is not limited to, a processor 40 and a memory 41. Those skilled in the art will appreciate that fig. 5 is merely an example of a computing device 4 for face location information, and does not constitute a limitation of computing device 4 for face location information, and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the computing device for face location information may also include input-output devices, network access devices, buses, etc.

The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may be an internal storage unit of the computing device 4 of the face position information, such as a hard disk or a memory of the computing device 4 of the face position information. The memory 41 may also be an external storage device of the computing device 4 of the face position information, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like equipped on the computing device 4 of the face position information. Further, the memory 41 may also include both an internal storage unit of the computing device 4 of the face position information and an external storage device. The memory 41 is used to store the computer program and other programs and data required by the computing device of the face location information. The memory 41 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may be available in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by hardware related to instructions of a computer program, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the methods described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. The face information identification method is characterized by being applied to a neural network model, wherein the neural network model comprises a first channel group and a second channel group; the method comprises the following steps:

acquiring an image to be identified;

identifying face position information in the image to be identified;

determining face feature information according to the face position information;

and outputting the face position information through the first channel group, and outputting the face feature information through the second channel group.

2. The method according to claim 1, wherein the identifying face position information in the image to be identified comprises:

down-sampling the image to be identified to obtain first target images with different down-sampling multiples;

performing feature fusion on the first target images with different down-sampling multiples to obtain second target images;

and identifying face position information in the second target image.

3. The method of claim 1, wherein determining face feature information according to the face location information comprises:

cutting a target face in the image to be recognized according to the face position information to obtain a cut target face image;

and identifying the face characteristic information in the target face image.

4. The method of claim 1, wherein before outputting the face location information via the first channel group and outputting the face feature information via the second channel group, the method further comprises:

and decoding the face position information and the face characteristic information.

5. The method according to claim 1, wherein the face feature information is face three-dimensional angle information.

6. A neural network model, the neural network model comprising a first channel group and a second channel group, the neural network model configured to:

acquiring an image to be identified;

identifying face position information in the image to be identified;

7. The neural network model of claim 6, wherein the number of output channels of the second channel group is determined according to the face feature information.

8. The neural network model of claim 6, wherein the neural network model is trained by:

modifying the number of output channels at the output end of a neural network model to enable the neural network model to comprise a first channel group for identifying face position information and a second channel group for identifying the face characteristic information;

calculating loss parameters of face information in the neural network model through a loss function;

and training the neural network model by adopting the loss parameters and the label values of the face information in the training images to obtain the neural network model capable of identifying face position information and face characteristic information.

9. A face information recognition apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.