CN115909400A

CN115909400A - Identification method for using mobile phone behaviors in low-resolution monitoring scene

Info

Publication number: CN115909400A
Application number: CN202211431361.2A
Authority: CN
Inventors: 张兆元; 丁东; 王旭初; 郭金马; 姜志祥; 刘焕洲; 裴彦杰; 王炜; 闫晓蔚; 潘婷婷; 李征征; 宋彦丽; 王东东
Original assignee: Beijing Shenqi Technology Co ltd; Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Shenqi Technology Co ltd; Beijing Institute of Computer Technology and Applications
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-04-04

Abstract

The invention discloses a method for identifying mobile phone behaviors in a low-resolution monitoring scene, which comprises the following steps: acquiring a low-resolution video image, analyzing the low-resolution video image frame by frame, and identifying the positions of a wrist joint and a head in the analyzed low-resolution video image through an AlphaPose posture identification model; based on a target detection model YOLOv7, acquiring a first probability of holding the mobile phone by acquiring a mobile phone ROI area and confidence thereof; on the basis of the first probability, according to the positions of the wrist joints and the head, identifying whether the current frame has abnormal mobile phone use conditions for representing a plurality of behaviors of prohibiting the mobile phone from being used in the current frame through an LSTM-based time sequence judgment model; the invention realizes the identification of the abnormal mobile phone using behaviors in the low-resolution monitoring scene, and improves the applicability of the intelligent identification technology in the low-resolution scene.

Description

Identification method for using mobile phone behaviors in low-resolution monitoring scene

Technical Field

The invention relates to the technical field of intelligent identification, in particular to an identification method for using mobile phone behaviors in a low-resolution monitoring scene.

Background

In a conventional video picture, certain use requirements are provided for distance and definition, and matching and recognition cannot be performed after the use requirements are exceeded. And because the cell-phone is small, can shelter from the part when the people hold, and can be littleer when the cell-phone is with the side face to the surveillance camera head, consequently more difficult discernment under the condition of low resolution image.

The existing identification aiming at using mobile phone behaviors in low-resolution images mainly comprises the following modes:

(1) An image super-resolution technique based on deep learning; and the image resolution is enlarged by utilizing neural networks such as SRGAN, classSR and the like, so that the area of a detection target pixel is increased, and the probability of correct detection is increased. The method has the defects of large calculated amount, high hardware requirement and long time consumption.

(2) An image super-segmentation technology based on a traditional method; the interpolation method, the sparse representation-Based method and the Example-Based method are adopted to carry out image super-segmentation, so that the area of a detection target pixel is increased, and the probability of correct detection is increased. The method has the defect of poor effect after the super-resolution, and is not as good as the image super-resolution technology based on deep learning.

(3) Classifying the image characteristic values based on the SVM; extracting the characteristics of each image of the corresponding part of the detection object, wherein the specific characteristics comprise: color features, LBP features, gabor filter features and Schmid filter features, and classification of each image block is identified by using an SVM. This method requires a classification target to be determined in advance, and in the case of a low-resolution image, it is difficult to detect the classification target in advance.

The image super-resolution technology based on the deep learning and the traditional method has the defects that the resolution ratio of the image after super-resolution is high, and the time consumption for using a mobile phone for detection after the super-resolution is multiplied, so that the requirement on monitoring instantaneity is difficult to meet.

In consideration of the fact that the traditional image super-resolution technology is poor in effect, the deep learning super-resolution method is large in calculation amount, the picture resolution is increased due to the super-resolution method, and real-time performance of video monitoring is difficult to meet when detection is multiplied. The SVM-based picture eigenvalue classification method can judge whether a mobile phone exists in a picture, but detection and acquisition of the picture to be classified from a low-resolution image are difficult to achieve, so that a recognition method for using mobile phone behaviors in a low-resolution monitoring scene is urgently needed to solve the problems in the prior art.

Disclosure of Invention

In order to improve the accuracy of identifying the behavior of the mobile phone under the condition that the mobile phone is possibly shielded in a low-resolution monitoring scene, the invention aims to provide the identification method for identifying the behavior of the mobile phone in the low-resolution monitoring scene, which greatly improves the accurate matching and identification of a low-resolution video target through a cascaded time sequence scheme, and is beneficial to improving the utilization rate of the video and improving the identification accuracy.

In order to achieve the technical purpose, the invention provides an identification method for using mobile phone behaviors in a low-resolution monitoring scene, which comprises the following steps:

acquiring a low-resolution video image, analyzing the low-resolution video image frame by frame, and identifying the positions of a wrist joint and a head in the analyzed low-resolution video image through an AlphaPose posture identification model;

based on a target detection model YOLOv7, acquiring a first probability of holding the mobile phone by acquiring a mobile phone ROI area and confidence thereof;

and identifying whether the current frame has abnormal mobile phone use conditions or not through an LSTM-based time sequence judgment model according to the positions of the wrist joint and the head on the basis of the first probability, wherein the abnormal mobile phone use conditions are used for representing a plurality of behaviors that the current frame forbids to use the mobile phone.

Preferably, in the process of analyzing the low-resolution video image frame by frame, the analyzed color coding format is converted into an RGB format, and the picture size is scaled to 640 × 480, wherein, during the picture scaling, the aspect ratio is maintained until the scaled length or width is exactly equal to the target size, and when the picture pixel is smaller than the target image pixel, the three RGB channels of the blank part of the target size are respectively filled with 0.

Preferably, in the process of obtaining the positions of the wrist joint and the head, obtaining an ROI region of the person of the analyzed low-resolution video image and key point data based on an alphapos gesture recognition model trained by an MSCOCO data set, wherein the alphapos gesture recognition model trained by the MSCOCO data set is used for detecting the key point data of the person;

and (3) adopting an AlignedReiD pedestrian re-identification method, extracting features of the detected human body image by using a deep convolutional neural network, and identifying the positions of the wrist joint and the head by taking the Euclidean distance of the features as a measurement basis of the similarity of the two pictures.

Preferably, in the process of obtaining the key point data, the key point data is composed of a nose, a left eye, a right eye, a left shoulder, a right shoulder, a left elbow joint, a right elbow joint, a left wrist, a right wrist, a left hip, a right hip, a left knee, a right knee, a left ankle and a right ankle.

Preferably, in the acquiring of the first probability, a first time for representing a head lowering action is acquired based on the head key point coordinates of the head position;

acquiring a second time for representing the hand raising action based on the hand key point coordinates of the wrist joints;

based on the continuity of each frame of image corresponding to the first time and/or the second time, an image sequence corresponding to a second probability of using the mobile phone behavior is obtained, and a mobile phone ROI area and a confidence coefficient of the image sequence are obtained through a target detection model YOLOv7 to obtain a first probability.

Preferably, in the process of obtaining the mobile phone ROI area and the confidence coefficient thereof, the central point of each mobile phone ROI area is obtained;

and acquiring the Euclidean distance from the left wrist or the right wrist closest to the central point, if the Euclidean distance is more than 2 times of the length of the ROI rectangular frame of the mobile phone, determining that the mobile phone is not held, and otherwise, generating confidence coefficient for representing the behavior of holding the mobile phone.

Preferably, in the process of identifying whether the mobile phone is abnormally used in the current frame, a time sequence with a confidence coefficient of the time sequence characteristic is generated according to the time sequence characteristic of the image sequence corresponding to the confidence coefficient and is used as the input of the time sequence judgment model;

and acquiring the abnormal use condition of the mobile phone according to the time sequence characteristics corresponding to the output images of the time sequence judging module, wherein the abnormal use condition of the mobile phone in the output images is judged by acquiring the time interval between the adjacent output images and comparing the time interval with a set threshold value.

Preferably, in the process of generating the time sequence feature, a time sequence is constructed according to the time sequence feature and a summation of confidence degrees of the mobile phone appearing in the current frame, where the summation of confidence degrees is represented as:

wherein S is _i The confidence of all mobile phones appearing in the ith frame is summed, n is the total number of the mobile phones in the current frame, C _k Is the confidence level of the kth handset.

Preferably, in the process of inputting the time sequence as the time sequence judgment model, a hand and head key point coordinate matrix is constructed according to the head key point coordinate and the hand key point coordinate of the image sequence;

and constructing an input matrix used as the input of the time sequence judgment model by transposing the hand and head key point coordinate matrixes based on the time sequence.

Preferably, in the process of constructing the input matrix, the input matrix is represented as:

wherein, P _i ^T Is the transposition of the i-th frame hand and head key point coordinate matrix Pi, S _i The Input represents the Input matrix for the i-th frame where the sum of all handset confidences occurs.

The invention discloses the following technical effects:

the invention realizes the identification of abnormal mobile phone behaviors in the low-resolution monitoring scene, and improves the applicability of the intelligent identification technology in the low-resolution scene.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, the present invention provides a method for identifying a behavior of a mobile phone used in a low-resolution monitoring scenario, which includes the following steps:

based on a target detection model YOLOv7, acquiring a first probability of the mobile phone being held by acquiring a mobile phone ROI area and a confidence coefficient thereof;

The low resolution mentioned in the present invention is used to indicate that the display picture or the display image is distorted or unclear relative to human eyes, and the resolution below a certain pixel value can be summarized as the low resolution; the resolution corresponding to an image that cannot be recognized or accurately recognized by human eyes may be regarded as a low resolution, and the low resolution is intended to indicate a case where a certain feature in a current picture or image cannot be recognized or accurately recognized by human eyes.

Further preferably, in the process of analyzing the low-resolution video image frame by frame, the color coding format analyzed by the invention is converted into an RGB format, and the picture size is scaled to 640 × 480, wherein during the picture scaling, the aspect ratio is maintained until the scaled length or width is exactly equal to the target size, and when the picture pixel is smaller than the target image pixel, three RGB channels of the blank part of the target size are respectively filled with 0.

Further preferably, in the process of acquiring the positions of the wrist joints and the head, based on an alphaPose gesture recognition model trained by an MSCOCO data set, the method acquires an ROI area of a person of an analyzed low-resolution video image and key point data, wherein the alphaPose gesture recognition model trained by the MSCOCO data set is used for detecting the key point data of the person;

and (3) adopting an AlignedReID pedestrian re-identification method, extracting features of the detected human body image by using a deep convolution neural network, and identifying the positions of the wrist joint and the head by taking the Euclidean distance of the features as a measurement basis of the similarity of the two pictures.

Further preferably, in the process of obtaining the key point data, the key point data mentioned in the present invention is composed of a nose, a left eye, a right eye, a left shoulder, a right shoulder, a left elbow joint, a right elbow joint, a left wrist, a right wrist, a left hip, a right hip, a left knee, a right knee, a left ankle, and a right ankle.

Further preferably, the present invention acquires a first time for representing a lowering motion based on the head key point coordinates of the head position in acquiring the first probability;

based on the hand key point coordinates of the wrist joint, the invention acquires a second time for expressing the hand-lifting action;

Further preferably, in the process of acquiring the mobile phone ROI area and the confidence thereof, the center point of each mobile phone ROI area is acquired;

according to the method, the Euclidean distance from the left wrist or the right wrist closest to the central point is obtained, if the Euclidean distance is more than 2 times of the length of the ROI rectangular frame of the mobile phone, the mobile phone is considered to be not held, and otherwise, confidence coefficient for representing the behavior of holding the mobile phone is generated.

Further preferably, in the process of identifying whether the mobile phone is used abnormally in the current frame, the time sequence with the confidence of the time sequence characteristic is generated according to the time sequence characteristic of the image sequence corresponding to the confidence as the input of the time sequence judgment model;

Further preferably, in the process of generating the time sequence feature, the time sequence is constructed according to the time sequence feature and the summation of the confidence degrees of the mobile phone appearing in the current frame, wherein the summation of the confidence degrees is represented as:

wherein S is _i The sum of confidence degrees of all mobile phones appears in the ith frame, n is the total number of the mobile phones in the current frame, C _k Is the confidence level of the kth handset.

Further preferably, in the process of inputting the time sequence as the time sequence judgment model, the hand and head key point coordinate matrix is constructed according to the head key point coordinate and the hand key point coordinate of the image sequence;

based on the time sequence, an input matrix used as the input of the time sequence judgment model is constructed through the transposition of the hand and head key point coordinate matrix.

Further preferably, in the process of constructing the input matrix, the input matrix mentioned in the present invention is represented as:

wherein, P _i ^T Is the transposition of the i-th frame hand and head key point coordinate matrix Pi, S _i The sum of all handset confidences occurs for frame i, and Input represents the Input matrix.

The invention also discloses an identification system for realizing the identification method by using the mobile phone behavior in the low-resolution monitoring scene, which comprises the following steps:

the data acquisition and processing module is used for acquiring the low-resolution video image to analyze the low-resolution video image frame by frame, and identifying the positions of the wrist joint and the head in the analyzed low-resolution video image through an AlphaPose posture identification model;

the first identification module is used for acquiring a first probability of holding the mobile phone by acquiring a mobile phone ROI (region of interest) region and a confidence coefficient thereof based on a target detection model YOLOv 7;

and the second identification module is used for identifying whether the current frame has abnormal mobile phone use conditions or not through the time sequence judgment model based on the LSTM according to the positions of the wrist joint and the head on the basis of the first probability, wherein the abnormal mobile phone use conditions are used for representing a plurality of behaviors that the current frame forbids to use the mobile phone.

The invention discloses a computer program and a movable storage device, wherein the computer program is used for realizing a method for identifying the behavior of a mobile phone used in a low-resolution monitoring scene, and forming executable software which is embedded into intelligent equipment and assists in monitoring the abnormal mobile phone used in the low-resolution monitoring scene; the mobile storage device is used for bearing an identification system for using mobile phone behaviors in a low-resolution monitoring scene, performing data interaction with the existing equipment in the low-resolution monitoring scene, and judging whether the situation that the mobile phone is used abnormally exists in the scene or not.

The invention provides a method for identifying mobile phone behaviors used in a low-resolution monitoring scene, which specifically comprises the following technical processes:

(1) Acquiring a video, analyzing and preprocessing:

(11) And acquiring and analyzing the H264 coded real-time video transmitted by the RTSP through the camera through the opencv.

(12) And converting the analyzed color coding format from BGR to RGB format, and scaling the picture size to 640 × 480. And (3) keeping the length-width ratio when the picture is zoomed until the zoomed length or width is just equal to the target size, and the picture pixel is smaller than the target image pixel, and filling three channels of RGB (red, green and blue) of a blank part of the target size with 0 respectively.

(2) Recognizing the wrist joint and head position of each frame in the video sequence by utilizing a gesture recognition model:

(21) And (3) identifying joint points of the gesture identification model: the method is characterized in that an AlphaPose posture recognition model is obtained by MSCOCO data set training, 17 key points of a human body are detected, and the key points are respectively a nose, a left eye, a right eye, a left shoulder, a right shoulder, a left elbow joint, a right elbow joint, a left wrist, a right wrist, a left hip, a right hip, a left knee, a right knee, a left ankle and a right ankle in sequence. Inputting the processed RGB picture into an AlphaPose model, and acquiring the ROI (region of interest) area of each person and 17 key point data of each person in the picture.

(22) Determining the association of the front posture and the rear posture by adopting a pedestrian re-recognition method: and (3) extracting features of the detected human body image by using a deep convolution neural network by adopting an AlignedReID pedestrian re-identification method, and taking the Euclidean distance of the features as a measurement basis of the similarity of the two pictures.

(23) Key points of hand and head are divided: head key points have numbers 0 to 4, and hand key points have numbers 10 and 11. The coordinates of the head keypoints and the hand keypoints are taken, namely the coordinate matrixes of the hand and head keypoints in a single picture are P = [ kp _0_x, kp _0_y, kp _1_x, kp _1_y, kp _2_x, kp _2_y, kp _3_x, kp _3_, kp 4_x, kp u 4_y, kp _10_x, kp _10_y, kp 11 u _x, kp _11_y ], wherein kp _ i _ x and kp _ i _ y are the long and wide coordinates of the ith keypoint, respectively. If no keypoint location is detected in the picture, the keypoint coordinates are set to (-1, -1).

(24) Dividing time-series segments of actions:

and (4) considering the association before and after the action, and integrating the actions of the hand and the head of the same person in a time sequence to judge whether the action of connecting or disconnecting the mobile phone exists. And setting the time for raising and lowering the hands to be M seconds, and setting the frame number of the camera per second to be F, wherein the time sequence length N = M × F. Generally, M =2, F =25. For continuous video, the coordinates of the key points of the hand and the head of each frame of the picture of the same person are detected, if the current frame does not belong to any time series and the picture of the current frame detects the key points of the hand and the head, and is set as the starting frame Pi of a new time series, the time series T = [ P ] =ofthe action is obtained _i ^T ，P _i+1 ^T ，P _i+2 ^T ，…P _i+N ^T ]And person ID, wherein P _i ^T Is the ith frame hand and head key point coordinate matrix P _i The transposing of (1).

(3) Detecting the probability of the mobile phone in each frame of image by using a target detection model:

(31) Carrying out mobile phone detection by using a target detection model: using the YOLOv7 official model, set the mobile phone detection only, i.e., set the parameter class to 67, the confidence conf-thres to 0.2, and the intersection ratio threshold iou-thres to 0.4. And inputting the preprocessed RGB pictures in the step (12) by the model to obtain the ROI area and the confidence coefficient of each mobile phone in the pictures.

(32) Excluding the case of not holding a handset: and calculating the Euclidean distance between the central point of each mobile phone ROI area and the nearest left wrist or right wrist detected by AlphaPose, if the distance is more than 2 times of the length of the mobile phone ROI rectangular frame, determining that no mobile phone is held, and otherwise, obtaining the confidence coefficient C of the held mobile phone.

(33) Calculating the confidence of all mobile phones appearing in the current frameDegree:

S _i the sum of confidence degrees of all mobile phones appears in the ith frame, n is the total number of the mobile phones in the current frame, C _k Is the confidence level of the kth handset.

(4) Inputting the wrist joint, the head position and the probability of holding the mobile phone into an LSTM-based time sequence judgment model to judge whether the mobile phone is illegally used:

(41) The time series of actions and confidence for each person ID is used as input to the LSTM model:

wherein P is _i ^T Is the transposition of the i-th frame hand and head key point coordinate matrix Pi, S _i The sum of all handset confidences occurs for the ith frame. The matrix size of Input is (15, 50);

(42) And (3) reasoning by using the trained LSTM model: setting the step length time =30, outputting an Output which judges whether the action of connecting and striking the mobile phone exists, wherein the size is (1, 21), if the action of connecting and striking the mobile phone exists, outputting 1, otherwise, outputting 0.

(43) Judging to connect the mobile phone and remove the weight, reducing repeated alarms: according to the personnel ID classification, all Output sequences of the ID are recorded as L according to the video frame sequence, if the p-th number of the sequence L is L _p =1, i.e. consider L _p And the mobile phone is used at the corresponding moment, if the sequence L within 3 seconds of the ID has a record with a result of 1, the continuous calling action is considered not to be alarmed, and if the sequence L does not have the result, the new illegal action is considered to be alarmed. Each person ID needs to be judged independently, and the condition that a current person using the mobile phone is missed to be warned because other persons use the mobile phone is avoided.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for identifying behaviors of a mobile phone in a low-resolution monitoring scene is characterized by comprising the following steps:

acquiring a low-resolution video image to analyze frame by frame, and identifying the positions of a wrist joint and a head in the analyzed low-resolution video image through an AlphaPose gesture recognition model;

and identifying whether the current frame has abnormal mobile phone use conditions or not through an LSTM-based time sequence judgment model according to the wrist joint and the head position based on the first probability, wherein the abnormal mobile phone use conditions are used for representing a plurality of behaviors that the current frame forbids to use the mobile phone.

2. The method for recognizing the behavior of the mobile phone in the low-resolution monitoring scene as claimed in claim 1, wherein:

in the process of analyzing the low-resolution video image frame by frame, the analyzed color coding format is converted into an RGB format, the picture size is scaled to 640 × 480, when the picture is scaled, the length-width ratio is kept until the scaled length or width is just equal to the target size, and when the picture pixel is smaller than the target image pixel, the RGB three channels of the blank part of the target size are respectively filled with 0.

3. The method for identifying behavior of a cell phone used in a low resolution monitoring scenario as claimed in claim 2, wherein:

in the process of obtaining the positions of the wrist joints and the head, obtaining an ROI (region of interest) region of the person of the analyzed low-resolution video image and key point data based on the AlphaPose posture recognition model obtained by training of an MSCOCO data set, wherein the AlphaPose posture recognition model obtained by training of the MSCOCO data set is used for detecting the key point data of the person;

4. The method for recognizing the behavior of the mobile phone in the low-resolution monitoring scene as claimed in claim 3, wherein:

in the process of obtaining the key point data, the key point data is composed of a nose, a left eye, a right eye, a left shoulder, a right shoulder, a left elbow joint, a right elbow joint, a left wrist, a right wrist, a left hip, a right hip, a left knee, a right knee, a left ankle and a right ankle.

5. The method of claim 4, wherein the method comprises the following steps:

in acquiring the first probability, acquiring a first time representing a heads-down motion based on head keypoint coordinates of the head position;

acquiring a second time for representing a hand raising action based on the hand key point coordinates of the wrist joints;

based on the continuity of each frame of image corresponding to the first time and/or the second time, acquiring an image sequence corresponding to a second probability of using the mobile phone behavior, and acquiring the mobile phone ROI area of the image sequence and the confidence thereof through the target detection model YOLOv7 to acquire the first probability.

6. The method for recognizing the behavior of the mobile phone in the low-resolution monitoring scene as claimed in claim 5, wherein:

in the process of obtaining the mobile phone ROI area and the confidence coefficient thereof, obtaining the central point of each mobile phone ROI area;

and acquiring a Euclidean distance from the left wrist or the right wrist closest to the central point, if the Euclidean distance is more than 2 times of the length of the ROI rectangular frame of the mobile phone, determining that the mobile phone is not held, and otherwise, generating the confidence coefficient for representing the behavior of holding the mobile phone.

7. The method of claim 6, wherein the method comprises the following steps:

in the process of identifying whether the mobile phone is used abnormally in the current frame, generating a time sequence with the confidence coefficient of the time sequence characteristic according to the time sequence characteristic of the image sequence corresponding to the confidence coefficient, and using the time sequence as the input of the time sequence judgment model;

and acquiring the abnormal mobile phone use condition according to the time sequence characteristics corresponding to the output images of the time sequence judging module, wherein the time interval between the adjacent output images is acquired and compared with a set threshold value, and whether the abnormal mobile phone use condition exists in the output images is judged.

8. The method of claim 7, wherein the method comprises the following steps:

in the process of generating the time sequence feature, according to the sum of confidence degrees of mobile phones appearing in the current frame and according to the time sequence feature, the time sequence is constructed, wherein the sum of the confidence degrees is represented as:

9. The method of claim 8, wherein the method comprises the following steps:

in the process of taking the time sequence as the input of the time sequence judgment model, constructing a hand and head key point coordinate matrix according to the head key point coordinates and the hand key point coordinates of the image sequence;

and constructing an input matrix used as the input of the time sequence judgment model by transposing the hand and head key point coordinate matrix based on the time sequence.

10. The method of claim 9, wherein the method comprises the following steps:

in the process of constructing an input matrix, the input matrix is represented as:

wherein, P _i ^T Is the ith frame hand and head key point coordinate matrix P _i Is transposed, S _i The Input represents the Input matrix for the i-th frame where the sum of all handset confidences occurs.