CN113052064A

CN113052064A - Attention detection method based on face orientation, facial expression and pupil tracking

Info

Publication number: CN113052064A
Application number: CN202110310469.5A
Authority: CN
Inventors: 姜文强; 汪明浩; 刘川贺
Original assignee: Beijing Seektruth Data Technology Service Co ltd
Current assignee: Beijing Seektruth Data Technology Service Co ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-29
Anticipated expiration: 2041-03-23
Also published as: CN113052064B

Abstract

The embodiment of the application provides an attention detection method based on face orientation, facial expression and pupil tracking, which relates to the technical field of risk assessment and comprises the steps of extracting a face image of an assessed person; determining the coordinates of key points in the face image; determining the face orientation angle of the evaluated person according to the key point coordinates, the camera internal parameter matrix and the camera distortion parameters; extracting an eye region image of the person to be evaluated; determining the position of the pupil of the eye of the evaluated person; determining a pupil deflection parameter based on the eye region image and the eye pupil position; determining a first expression parameter representing the positive/negative degree and a second expression parameter representing the waking/drowsiness degree according to the pixel value of each pixel point of the face image; an attention parameter corresponding to the evaluated person is determined based on the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter. The method provided by the embodiment of the application can accurately evaluate the attention of the evaluated person.

Description

Attention detection method based on face orientation, facial expression and pupil tracking

Technical Field

The present document relates to the technical field of risk assessment, and in particular, to an attention detection method based on facial orientation, facial expression and pupil tracking.

Background

Attention detection is often widely used in various fields, such as driver attention detection during traffic driving, student attention detection during teaching, and attention detection for fraudulent behavior identification in criminal investigation links.

Currently, most of the attention detection is based on the face orientation of the detected person for detection and analysis. However, the face orientation can only reflect the possibility that the detected person pays attention to a certain thing, and often cannot accurately reflect the attention of the monitored person.

Therefore, how to provide an effective solution to improve the accuracy of attention detection has become an urgent problem in the prior art.

Disclosure of Invention

The embodiment of the application provides an attention detection method based on face orientation, facial expression and pupil tracking, and aims to solve the problem that in the prior art, the accuracy of attention detection is low.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

the embodiment of the application provides an attention detection method based on face orientation, facial expression and pupil tracking, which comprises the following steps:

acquiring a target image frame containing a head portrait of an evaluated person;

extracting a face image of the evaluated person from the target image frame;

carrying out normalization processing on the pixel values of the face image;

calculating the pixel value of each pixel point of the normalized human face image as the input of a human face key point detection model trained in advance to obtain a plurality of key point coordinates in the human face image, wherein the plurality of key point coordinates comprise eye contour coordinates;

determining the face orientation angle of the evaluated person according to the coordinates of the key points, a preset camera internal parameter matrix and a preset camera distortion parameter;

extracting an eye region image of the evaluated person from the face image based on the eye contour coordinates;

determining the position of the pupil of the eye of the evaluated person according to the eye region image;

determining a pupil deflection parameter of the evaluated person based on the eye region image and the eye pupil position;

calculating the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the evaluated person, wherein the first expression parameter represents the positive/negative degree of the evaluated person, and the second expression parameter represents the waking degree/drowsiness degree of the evaluated person;

determining an attention parameter corresponding to the evaluated person based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter, the attention parameter characterizing a degree of attention concentration of the evaluated person.

Optionally, the format of the target image frame is an RGB format, and the extracting the face image of the person to be evaluated from the target image frame includes:

sequentially converting image channels of the target image frame into RGB;

zooming the target image frame after the image channel is sequentially converted into RGB to a first specified size;

normalizing the target image frame scaled to a first specified size;

taking a matrix formed by pixel values of all pixel points of the target image frame subjected to normalization processing as the input of a human face detection model trained in advance to carry out operation to obtain human face boundary point coordinates;

and extracting the face image from the target image frame based on the face boundary point coordinates.

Optionally, the coordinates of the face boundary points include first boundary point coordinates and second boundary point coordinates, and the extracting the face image from the target image frame based on the coordinates of the face boundary points includes:

and extracting the rectangular face image from the target image frame by taking the first boundary point coordinate and the second boundary point coordinate as opposite angles.

Optionally, the method further includes:

scaling the face image to a second specified size;

the normalization processing of the pixel values of the face image includes:

and carrying out normalization processing on the pixel values of the face image scaled to the second specified size.

Optionally, the determining the face orientation angle of the evaluated person according to the coordinates of the plurality of key points, a preset camera intrinsic parameter matrix and a preset camera distortion parameter includes:

determining a plurality of head three-dimensional key point reference coordinates which correspond to the key point coordinates one to one according to the key point coordinates, the preset camera internal parameter matrix and the preset camera distortion parameter;

determining a rotation vector and a transformation vector of the camera according to the preset camera internal parameter matrix, the preset camera distortion parameter, the target key point coordinate and the target head three-dimensional key point reference coordinate;

converting the rotation vector into a rotation matrix;

splicing the rotation matrix and the transformation vector to obtain an attitude matrix;

decomposing the posture matrix to obtain the face orientation angle of the evaluated person;

the target key point coordinate is one of the key point coordinates, and the target head three-dimensional key point reference coordinate is a head three-dimensional key point reference coordinate corresponding to the target key point coordinate.

Optionally, the determining the position of the pupil of the eye of the evaluated person according to the eye region image includes:

determining a first face image with a pupil ratio within a preset ratio range in a face image corresponding to the target image frame, wherein the pupil ratio is the ratio of an eye pupil area to an eye area;

when the number of the first face images reaches a preset number, solving the mean value of the pupil proportion of each first face image;

finding out the target pupil proportion which is closest to the mean value in the pupil proportions of each first face image;

selecting a first face image corresponding to the target pupil proportion as a target face image;

and taking the center of the eye pupil area in the target human face image as the pupil position.

Optionally, the determining a first face image with a pupil ratio within a preset ratio range in the face image corresponding to the target image frame includes:

solving a circumscribed rectangular area based on eye contour coordinates in the face image corresponding to the target image frame;

expanding the circumscribed matrix area outwards by a specified pixel;

carrying out corrosion operation after bilateral filtering on the expanded external matrix area to obtain a corroded image;

carrying out binarization processing on the corroded image to obtain a binarized image;

shrinking the specified pixels inwards to the binary image to obtain a shrunk image;

calculating the proportion of non-zero pixel values in the contracted image to obtain the pupil proportion of the face image;

and taking the face image with the pupil ratio within the preset ratio range as the first face image.

The technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

by determining the face orientation angle of the evaluated person, the pupil deflection parameter of the evaluated person, a first expression parameter representing the positive/negative degree of the evaluated person and a second expression parameter representing the waking/drowsiness degree of the evaluated person, the attention parameter corresponding to the evaluated person is determined based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter. In this way, attention detection can be performed by comprehensively considering parameters related to attention in different dimensions, so that the attention concentration degree of the evaluated person can be accurately evaluated, and the robustness is high.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure in any way. In the drawings:

fig. 1 is a flowchart illustrating an attention detection method based on facial orientation, facial expression and pupil tracking according to an embodiment of the present disclosure.

Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an attention detection device based on facial orientation, facial expression and pupil tracking according to an embodiment of the present disclosure.

Detailed Description

In order to make the purpose, technical solutions and advantages of this document more clear, the technical solutions of this document will be clearly and completely described below with reference to specific embodiments of this document and corresponding drawings. It is to be understood that the embodiments described are only a few embodiments of this document, and not all embodiments. All other embodiments obtained by a person skilled in the art without making creative efforts based on the embodiments in this document belong to the protection scope of this document.

In order to ensure accuracy of attention assessment, the embodiment of the application provides an attention detection method based on facial orientation, facial expression and pupil tracking, which can accurately assess the attention concentration degree and has high robustness.

The following describes in detail an attention detection method based on face orientation, facial expression and pupil tracking provided in the embodiments of the present application.

The attention detection method based on face orientation, facial expression and pupil tracking provided by the embodiment of the application can be applied to a user terminal, and the user terminal can be, but is not limited to, a personal computer, a smart phone, a tablet computer, a laptop portable computer, a personal digital assistant and the like.

It is to be understood that the described execution body does not constitute a limitation of the embodiments of the present application.

Alternatively, the flow of the attention detection method based on face orientation, facial expression and pupil tracking is shown in fig. 1, and may include the following steps:

in step S101, a target image frame containing the head portrait of the person to be evaluated is acquired.

The target image frame may be a video acquired by a camera or a picture taken by the camera, and the target image frame may include one or more frames of images, which is not specifically limited in this embodiment of the application.

And step S102, extracting the face image of the evaluated person from the target image frame.

If the target image frame is a frame image, the face image of the person to be evaluated can be directly extracted according to the target image frame. If the target image frame includes multiple frames of images, when the face image of the person to be evaluated is extracted from the target image frame, each frame of the multiple frames of images may be extracted to obtain multiple face images corresponding to the multiple frames of images one to one, or one frame of the multiple frames of images may be extracted to obtain one face image, or one frame of the multiple frames of images may be selected at intervals of a certain number of frames to be extracted to obtain multiple face images, which is not specifically limited in the embodiment of the present application.

The format of the target image frame is not limited, and taking the target image frame as an image in one RGB format as an example, the extracting the face image of the person to be evaluated from the target image frame may include the following steps:

step S1021, sequentially converting image channels of the target image frame into RGB.

In step S1022, the target image frame after the image channel sequence is converted into RGB is scaled to the first designated size.

The first specified size may be determined according to a face detection model subsequently used to determine coordinates of face boundary points, for example, if the input to the face detection model is a 300 × 300 matrix, the first specified size may be 300 × 300 pixels in size.

In step S1023, the target image frame scaled to the first specified size is normalized.

Specifically, in the target image frame scaled to the first specified size, the pixel value of each pixel point is subtracted by 127.5 and then divided by 127.5, so that the pixel values are distributed in the range of [ -1,1 ]. For use in subsequent operations.

And step S1024, taking a matrix formed by pixel values of all pixel points of the target image frame subjected to normalization processing as the input of a human face detection model trained in advance to carry out operation, and obtaining the coordinates of the human face boundary points.

In the embodiment of the application, a face detection model for face detection is pre-established. After normalization, the pixel value of each pixel point in the target image frame after normalization can be used as a value in the matrix to construct a matrix. And then, the matrix is used as the input of a pre-trained face detection model to be operated to obtain the coordinates of the face boundary points.

For example, if the target image frame is normalized to be an image frame with a size of 300 × 300 pixels, the constructed matrix is a 300 × 300 matrix.

And step S1025, extracting a face image from the target image frame based on the face boundary point coordinates.

In the embodiment of the application, the face boundary point coordinates comprise a first boundary point coordinate and a second boundary point coordinate, and when the face image is extracted, the first boundary point coordinate and the second boundary point coordinate can be used as opposite angles, and the rectangular face image is extracted from the target image frame.

For example, if the first boundary point coordinates are (X _ min, Y _ min) and the second boundary point coordinates are (X _ max, Y _ max), the rectangular region image with the abscissa ranging from X _ min to X _ max and the ordinate ranging from Y _ min to Y _ max in the target image frame may be used as the face image.

In step S103, normalization processing is performed on the pixel values of the face image.

Specifically, the pixel value of each pixel point in the face image may be divided by 256, so that the pixel value of each pixel point in the normalized face image is distributed in the interval of [0,1], so as to be used for subsequently calculating the coordinate of the key point in the face image.

Further, before the normalization processing is performed on the pixel values of the face image, the face image may be scaled to a second specified size, which may be determined according to a face keypoint detection model subsequently used for calculating the keypoint coordinates in the face image. For example, the input requirement of the face keypoint detection model is a 112 × 112 matrix, and the second specified size may be 112 × 112 pixels.

And step S104, taking the pixel value of each pixel point of the normalized face image as the input of a pre-trained face key point detection model to carry out operation, and obtaining a plurality of key point coordinates in the face image.

In the embodiment of the present application, a face key point detection model for calculating coordinates of face key points is trained in advance, where the face key points may be, but are not limited to, eyes, ears, nose, and other regions on a face.

When calculating the coordinates of the key points of the face, the pixel value of each pixel point of the face image subjected to normalization processing can be used as a value in the matrix to construct a matrix. And then, the matrix is used as the input of a pre-trained human face key point detection model for operation to obtain a plurality of key point coordinates in the human face image.

Wherein, the coordinates of the plurality of key points comprise eye contour coordinates.

And S105, determining the face orientation angle of the evaluated person according to the coordinates of the key points, the preset camera intrinsic parameter matrix and the preset camera distortion parameter.

The preset camera internal parameter matrix refers to an internal parameter matrix of a camera for acquiring the target image frame, and the preset camera distortion parameter is a distortion parameter of the camera for acquiring the target image frame. The preset camera internal parameter matrix and the preset camera distortion parameter are preset, and the cameras of different manufacturers can be set differently.

In the embodiment of the present application, determining the face orientation angle of the evaluated person may include the steps of:

and step S1051, determining a plurality of head three-dimensional key point reference coordinates which correspond to the plurality of key point coordinates one to one according to the plurality of key point coordinates, a preset camera internal parameter matrix and the preset camera distortion parameter.

In the embodiment of the present application, the reference coordinates of the three-dimensional key points of the head can be determined by opencv, which is the prior art and is not specifically described in the embodiment of the present application.

And step S1052, determining a rotation vector and a transformation vector of the camera according to a preset camera internal parameter matrix, a preset camera distortion parameter, a target key point coordinate and a target head three-dimensional key point reference coordinate.

Specifically, a function solvepnp can be used for receiving the coordinates of the target key points and the reference coordinates of the three-dimensional key points of the target head, and performing reverse extrapolation on a preset camera internal parameter matrix and a preset camera distortion parameter to obtain a rotation vector and a transformation vector of the camera.

Step S1053, converts the rotation vector into a rotation matrix.

The rotation vector can be converted into a rotation matrix by using the function Rodrigues during conversion, which is not specifically described in the embodiment of the present application.

And S1054, splicing the rotation matrix and the transformation vector to obtain an attitude matrix.

For example, the rotation matrix is a 3 × 3 matrix, the transformation vector is a 3-dimensional vector, and the pose matrix obtained by stitching is a 3 × 4 matrix.

In step S1055, the posture matrix is decomposed to obtain the face orientation angle of the person to be evaluated.

Wherein the face orientation angle includes a pitch angle, a yaw angle, and a roll angle.

And step S106, extracting the eye region image of the evaluated person from the face image based on the eye contour coordinates.

In the embodiment of the application, the coordinate index of the coordinate corresponding to each pixel point in the face image can be set, and after the eye contour coordinate is obtained, rough cutting can be performed according to the coordinate index of the eye contour coordinate to obtain the eye region image of the evaluated person. The coordinate index of the eye contour coordinate comprises a coordinate index of a left eye contour coordinate and a coordinate index of a right eye contour coordinate, and the obtained eye area image comprises a left eye area image and a right eye area image.

In step S107, the pupil position of the eye of the evaluated person is determined from the eye region image.

In the embodiment of the present application, determining the position of the pupil of the eye of the evaluated person may include the following steps:

step S1071, determining a first face image with the pupil ratio within a preset ratio range in the face image corresponding to the target image frame.

Wherein, the pupil proportion is the ratio of the pupil area of the eye to the eye area.

Specifically, a face image corresponding to one frame of image can be selected from the target image frame at certain frame intervals, and a corresponding circumscribed rectangular region can be obtained according to the eye contour coordinates of the face image.

For example, if the certain number of frames is 5 frames, the 5 th frame image in the target image frame may be selected first, and the corresponding circumscribed rectangular region may be obtained according to the eye contour coordinates in the face image corresponding to the 5 th frame image.

Then, the obtained circumscribed matrix region is expanded outward by a predetermined number of pixels, for example, by 5 pixels. And carrying out corrosion operation after bilateral filtering on the expanded external matrix area to obtain a corroded image. And then carrying out binarization processing on the corroded image to obtain a binarized image. And shrinking the binary image inwards by a specified number of pixels to obtain a shrunk image, wherein the number of the shrunk pixels is the same as that of the pixels expanded outwards. And calculating the proportion of non-zero pixel values in the contracted image to obtain the pupil proportion of the face image. If the pupil ratio of the face image is in the preset ratio range, the face image is used as a first face image, otherwise, a frame of image is selected from the target image frame at certain frame number intervals (such as 5 frames) to calculate the pupil ratio of the face image.

The preset ratio range can be determined according to the ratio of the pupil to the eye area of the ordinary person. For example, if the ratio of the pupil to the eye area of an ordinary person is 0.46-0.50, the predetermined ratio may be in the range of 0.46-0.50.

Step S1072, when the number of the first face images reaches a preset number, calculating an average value of the pupil proportion of each first face image.

After each first face image is determined, whether the number of the first face images reaches the preset number or not can be judged, if not, a frame of image is continuously selected from the target image frame at certain frame intervals to calculate the pupil ratio of the face images until the number of the first face images reaches the preset number. Wherein, the preset number can be set according to the actual situation.

And when the number of the first face images reaches a preset number, solving the mean value of the pupil proportion of each first face image.

Step S1073, find out the target pupil proportion closest to the mean value among the pupil proportions of each first face image.

Each first face image corresponds to one pupil ratio, and after the mean value of the pupil ratios of each first face image is obtained, one pupil ratio closest to the mean value can be found out from the pupil ratios corresponding to each first face image to be used as a target pupil ratio.

And step S1074, selecting the first face image corresponding to the target pupil ratio as the target face image.

Step S1075 sets the center of the eye pupil region in the target face image as the pupil position.

And step S108, determining a pupil deflection parameter of the evaluated person based on the eye area image and the eye pupil position.

The pupil deflection degree parameter may be a distance of the pupil deflecting left/right, or a ratio of the pupil deflecting left/right, and the like, which is not specifically limited in the embodiment of the present application.

Specifically, the distance between the canthus on both sides of the eye and the pupil can be calculated according to the coordinates of the key point of the eye region and the coordinates corresponding to the pupil position, and then the pupil deflection parameter of the evaluated person can be determined according to the distance between the canthus on both sides and the pupil.

Step S109, taking the pixel value of each pixel point of the normalized human face image as the input of a pre-trained facial emotion recognition model for operation, and obtaining a first expression parameter and a second expression parameter corresponding to the evaluated person.

In the embodiment of the application, a facial emotion recognition model for facial emotion recognition is trained in advance, after the pixel value of a face image is normalized, the pixel value of each pixel point of the normalized face image can be used as a value in a matrix to construct a matrix, and then the matrix is used as the input of the pre-trained facial emotion recognition model to perform operation, so that a first expression parameter and a second expression parameter corresponding to a person to be evaluated are obtained. Wherein the first expression parameter is used for representing the positive/negative degree of the evaluated person, and the second expression parameter is used for representing the waking degree/drowsiness degree of the evaluated person.

In step S110, an attention parameter corresponding to the evaluated person is determined based on the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter.

Wherein the attention parameter characterizes the degree of attention concentration of the evaluated person.

The face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter can reflect whether the evaluated person is focused or not to a certain extent. Therefore, when determining the attention parameter corresponding to the evaluated person, a score may be assigned to each of the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter, and then a final score corresponding to the evaluated person may be determined according to a sum of the scores corresponding to the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter, so as to obtain the attention concentration degree of the evaluated person according to the final score.

Specifically, the total score of the face orientation angle may be set to 100, and the score corresponding to the face orientation angle may be expressed as SS 100-pitch a-yaw b-roll, where pitch represents the number of degrees of the pitch angle in the face orientation angle, yaw represents the number of degrees of the yaw angle in the face orientation angle, and roll represents the number of the roll angle in the face orientation angle. When the absolute value of the pitch is less than or equal to 15, the value of a is 0.8, and when the absolute value of the pitch is greater than 15, the value of a is 1.5. When the absolute value of yaw is less than or equal to 15, the value of b is 0.8, and when the absolute value of yaw is greater than 15, the value of b is 1.5. When the absolute value of roll is less than or equal to 15, the value of c is 0.8, and when the absolute value of roll is more than 15, the value of c is 1.5

The score corresponding to the pupil deflection parameter may be a minimum value of the score corresponding to the pupil deflection parameter of the left eye and the score corresponding to the pupil deflection parameter of the right eye. The score corresponding to the pupil deflection parameter of the left eye may be represented as ELS ═ 100-abs ((1-LR)) × 50, and LR represents the pupil deflection parameter of the left eye. The score corresponding to the pupil deflection parameter of the right eye may be represented as ERS ═ 100-abs ((1-RR)) × 50, and RR represents the pupil deflection parameter of the right eye. The score corresponding to the pupil deflection parameter is recorded as EMS, and the EMS is the minimum value of the ELS and the ERS.

The first expression parameter characterizes the positive/negative degree of the evaluated person, wherein a first expression parameter greater than 0 indicates a positive state and less than 0 indicates a negative state, denoted V. The second expression parameter represents the degree of wakefulness/drowsiness of the evaluated person, wherein the second expression parameter is greater than 0 to indicate a wakefulness state and less than 0 to indicate a drowsiness state, which is marked as A. The score corresponding to the first expression parameter may be denoted as SV 10 × (1+ V), and the score corresponding to the second expression parameter may be denoted as SA 10 × (1+ a).

The final score FS for the evaluated person is (SS + EMS + SA + SV)/200. The final score FS corresponding to the evaluated person characterizes the attention concentration degree of the evaluated person, wherein the larger the final score FS is, the more the attention of the evaluated person is concentrated.

According to the attention detection method based on the face orientation, the facial expression and the pupil tracking, the face orientation angle of the evaluated person, the pupil deflection parameter of the evaluated person, the first expression parameter representing the positive/negative degree of the evaluated person and the second expression parameter representing the waking degree/drowsiness degree of the evaluated person are determined, and then the attention parameter corresponding to the evaluated person is determined based on the face orientation angle, the pupil deflection parameter, the first expression parameter and the second expression parameter. In this way, attention detection can be performed by comprehensively considering parameters related to attention in different dimensions, so that the attention concentration degree of the evaluated person can be accurately evaluated, and the robustness is high.

Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 2, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 2, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to form an attention detection device based on face orientation, facial expression and pupil tracking on a logical level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

extracting a face image of the evaluated person from the target image frame;

carrying out normalization processing on the pixel values of the face image;

The method performed by the attention detection device based on facial orientation, facial expression and pupil tracking as disclosed in the embodiment of fig. 2 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in one or more embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present application may be embodied directly in the hardware decoding processor, or in a combination of the hardware and software modules included in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further perform the method shown in fig. 1, and implement the functions of the attention detection apparatus based on facial orientation, facial expression and pupil tracking in the embodiment shown in fig. 1, which are not described herein again.

Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by a portable electronic device including a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to:

extracting a face image of the evaluated person from the target image frame;

carrying out normalization processing on the pixel values of the face image;

Fig. 3 is a schematic structural diagram of an attention detection device based on facial orientation, facial expression and pupil tracking according to an embodiment of the present application. Referring to fig. 3, in one software implementation, the provided attention detection device based on facial orientation, facial expression and pupil tracking may include:

the acquisition module is used for acquiring a target image frame containing the head portrait of the evaluated person;

the first extraction module is used for extracting a face image of the evaluated person from the target image frame;

the normalization module is used for carrying out normalization processing on the pixel values of the face image;

the first operation module is used for performing operation by taking the pixel value of each pixel point of the face image after normalization processing as the input of a face key point detection model trained in advance to obtain a plurality of key point coordinates in the face image, wherein the plurality of key point coordinates comprise eye contour coordinates;

the first determining module is used for determining the face orientation angle of the evaluated person according to the plurality of key point coordinates, a preset camera internal parameter matrix and a preset camera distortion parameter;

the second extraction module is used for extracting the eye region image of the evaluated person from the face image based on the eye contour coordinates;

the second determination module is used for determining the eye pupil position of the evaluated person according to the eye region image;

a third determination module, configured to determine a pupil deflection parameter of the evaluated person based on the eye region image and the eye pupil position;

the second operation module is used for performing operation by taking the pixel value of each pixel point of the face image after normalization processing as the input of a pre-trained facial emotion recognition model to obtain a first expression parameter and a second expression parameter corresponding to the evaluated person, wherein the first expression parameter represents the positive/negative degree of the evaluated person, and the second expression parameter represents the waking degree/drowsiness degree of the evaluated person;

a fourth determining module, configured to determine an attention parameter corresponding to the evaluated person based on the face orientation angle, the pupil deflection parameter, the first expression parameter, and the second expression parameter, where the attention parameter represents a degree of attention concentration of the evaluated person.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In short, the above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. An attention detection method based on facial orientation, facial expression and pupil tracking, comprising:

extracting a face image of the evaluated person from the target image frame;

carrying out normalization processing on the pixel values of the face image;

2. The method as claimed in claim 1, wherein the target image frame is in RGB format, and said extracting the face image of the person to be evaluated from the target image frame comprises:

sequentially converting image channels of the target image frame into RGB;

normalizing the target image frame scaled to a first specified size;

3. The method according to claim 2, wherein the face boundary point coordinates include first boundary point coordinates and second boundary point coordinates, and the extracting the face image from the target image frame based on the face boundary point coordinates includes:

4. The method of claim 1, further comprising:

scaling the face image to a second specified size;

the normalization processing of the pixel values of the face image includes:

5. The method of claim 1, wherein said determining the face orientation angle of the person under evaluation from the plurality of keypoint coordinates, a preset intra-camera parameter matrix, and preset camera distortion parameters comprises:

converting the rotation vector into a rotation matrix;

6. The method of claim 1, wherein said determining the position of the pupil of the eye of the evaluated person from the image of the eye region comprises:

7. The method according to claim 6, wherein the determining a first face image with a pupil ratio within a preset ratio range in the face images corresponding to the target image frame includes:

expanding the circumscribed matrix area outwards by a specified pixel;