CN112381068B - Method and system for detecting 'playing mobile phone' of person - Google Patents

Method and system for detecting 'playing mobile phone' of person Download PDF

Info

Publication number
CN112381068B
CN112381068B CN202011563792.5A CN202011563792A CN112381068B CN 112381068 B CN112381068 B CN 112381068B CN 202011563792 A CN202011563792 A CN 202011563792A CN 112381068 B CN112381068 B CN 112381068B
Authority
CN
China
Prior art keywords
mobile phone
person
model
video
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011563792.5A
Other languages
Chinese (zh)
Other versions
CN112381068A (en
Inventor
游忍
邵延华
刘明华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202011563792.5A priority Critical patent/CN112381068B/en
Publication of CN112381068A publication Critical patent/CN112381068A/en
Application granted granted Critical
Publication of CN112381068B publication Critical patent/CN112381068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a method for detecting a person playing a mobile phone, which comprises the following steps: acquiring a video signal in the current environment to obtain a video to be detected and a training sample; if no person or mobile phone is detected to appear in the video, judging that no person plays the mobile phone; if the video is detected to have people and mobile phones, extracting the characteristics of each person and each mobile phone by using a characteristic extraction model; inputting the characteristics of each person and the mobile phone into a characteristic relation judgment model, and calculating the relation characteristics between each person and the mobile phone; inputting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment; and processing the judgment result. The method of the invention combines the human body joint point coordinate, the size coordinate of the mobile phone, the action intention relation, the space relation and the deep learning method of the mobile phone and the human body, and combines the time sequence model to finally judge whether the people in the environment play the mobile phone, thereby greatly improving the detection precision.

Description

Method and system for detecting 'playing mobile phone' of person
Technical Field
The invention relates to the field of computer videos, in particular to a method and a system for detecting that a person plays a mobile phone.
Technical Field
With the rapid development of information technology, the use of mobile phones is more and more common, and the dependence of people on mobile phones is more and more serious. In an actual scene, accidents caused by playing mobile phones are frequent. For example, when a driver drives a car, the driver takes his hands away from the steering wheel to play a mobile phone, which causes a car accident. When the pedestrian passes through the road, the pedestrian collides with the vehicle because of playing the mobile phone. In some special industries such as railway departments, military management modes need to be adopted for employees, and real-time early warning needs to be carried out on some illegal behaviors of the employees, wherein the real-time early warning comprises the step of detecting whether the employees play mobile phones or not through a camera. For example, in schools, it is necessary to monitor the classroom discipline and detect whether students have the behavior of playing mobile phones. In the existing literature and patent, there are few patents on "playing a mobile phone" by a person. The mainstream method mainly based on computer vision mainly aims at judging the areas of the mobile phone and the hand or customizing some rules to judge whether to play the mobile phone. As disclosed in patent publication No. CN 110674728A, a method, apparatus, server, and storage medium for playing a mobile phone based on video image recognition. The method comprises the steps of detecting the change condition of the hand of a human body and the color change condition of the mobile phone in a set period by utilizing the change relationship between the hand of the human body and the mobile phone in the process of playing the mobile phone, so as to realize the detection of the behavior of playing the mobile phone. The method only simply applies the change condition of the hand of the human body and the color change of the mobile phone, and has low robustness in the complex scene of practical application. The invention with the publication number of CN 111191576A, a personnel behavior target detection model construction method, an intelligent analysis method and an intelligent analysis system. For the behavior of playing the mobile phone, the invention mainly intercepts the area of the mobile phone, then judges the brightness of the screen of the mobile phone and judges the frame number statistics to judge whether to play the mobile phone. The method is judged by a self-defined rule and does not have intelligent judgment of a similar person. The method has low robustness, is suitable for limited scenes and is difficult to meet various practical requirements. With the development of technologies such as deep learning and the like, the method of human body posture estimation algorithm, target detection algorithm, sight line estimation, time sequence model and the like is utilized to more accurately judge whether a person plays a mobile phone.
At present, the method for detecting the 'playing of a mobile phone' by a person in the prior art has the problems of rare related algorithms and low detection precision.
Disclosure of Invention
The invention aims to overcome the defects in the background technology, and provides a method and a system for detecting that a person plays a mobile phone, which can be used for solving the technical problem of low detection precision in the prior art.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method of detecting a person "playing a cell phone", the method comprising the steps of:
s1, acquiring video signals in the current environment to obtain a video to be detected and a training sample;
step S2, detecting all people and mobile phones in the video;
step S3, if no person or mobile phone is detected to appear in the video, judging that no person plays the mobile phone;
step S4, if the video is detected to have people and mobile phones, extracting the characteristics of each person and each mobile phone by using a characteristic extraction model;
step S5, inputting the characteristics of each person and the mobile phone into a characteristic relation judgment model, and calculating the relation characteristics between each person and the mobile phone;
step S6, inputting the characteristics of each person and each mobile phone and the relationship characteristics between the persons and the mobile phones into a judgment model, and judging whether each person plays the mobile phone at the current moment;
and step S7, processing the detection result.
Further, step S2 includes at least detecting all people and mobile phones in the current frame picture by using a computer vision algorithm.
Further, the characteristics of each person and each mobile phone in step S4 at least include:
a. two-dimensional human body joint point coordinates and three-dimensional human body joint point coordinates of each human body;
b. two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone;
c. visual features of each person and cell phone.
Further, the visual features include, but are not limited to, features extracted based on a conventional machine learning algorithm or deep learning.
Further, the following operations are also included before proceeding to step S4:
a. constructing a human body key point model and a 3D target detection model;
b. training a human body key point model and a 3D target detection model by using a training sample to obtain the feature extraction model.
Further, the human body key point model is an openposition model and is used for calculating two-dimensional and three-dimensional joint point coordinates of each human body; the 3D target detection model is a centrNet model and is used for calculating two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone.
Further, the relationship characteristics of each person and each mobile phone in step S5 at least include:
action intention relationship: the method comprises the steps that a person takes the mobile phone, does not take the mobile phone, sees the mobile phone, and does not see the mobile phone;
spatial relationship: comprises a front part, a rear part, a left part, a right part, an upper part and a lower part;
and combining the action intention relation and the spatial relation to obtain the relation characteristic.
Further, the following operations are also included before proceeding to step S5:
constructing a deep learning model;
extracting the characteristics of each person and each mobile phone from a training sample by using a characteristic extraction model, and training the deep learning model by using the characteristics to obtain a final characteristic relation judgment model;
further, the deep learning model is specifically a sight line (attention) estimation model.
Further, the step S6 of inputting the characteristics of each person and each mobile phone and the relationship characteristics between them into the judgment model to judge whether each person is playing the mobile phone at the current time includes:
for the video at the current moment, acquiring the characteristics of each person and each mobile phone and the relationship characteristics between the persons and the mobile phones in a period of time before the current moment and in the video at the current moment;
and inputting all the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment.
Further, the following operations are also included before proceeding to step S6:
a. constructing a time sequence model;
b. and extracting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics from the training samples by using the characteristic extraction model and the characteristic relationship judgment model, and using the characteristics to train the time sequence model to obtain a final judgment model.
Further, the time series model is an LSTM model.
Further, the step 7 of processing the result specifically includes storing the detection result according to different application scenarios, storing the picture or video evidence that the person is playing the mobile phone, sending an alarm, and the like.
Meanwhile, the invention also discloses a system for detecting the 'playing mobile phone' of a person, which comprises the following steps:
the video signal acquisition module is used for acquiring video signals in the current environment to obtain a video to be detected and a training sample;
the human and mobile phone detection module is used for detecting all the human and mobile phones in the video;
the feature extraction module is used for training the feature extraction model and the feature relation judgment model, if people and mobile phones in the video are detected to appear, the feature extraction model is used for extracting the features of each person and each mobile phone, and the feature relation judgment model is used for obtaining the relation features of each person and each mobile phone;
the judging module is used for training a time sequence model, and judging whether each person plays the mobile phone at the current moment or not by utilizing the characteristics and the relation characteristics of each person and each mobile phone in a period of time before the current moment and in each frame of video at the current moment;
the characteristic storage module is used for storing the characteristics and the relation characteristics of the people and the mobile phone obtained in the algorithm operation process;
the state output module is used for outputting the state of each person: "play cell phone" or "not play cell phone".
Further, a system for detecting a person 'playing a mobile phone' comprises: the system also comprises an alarm module, and if a person plays the mobile phone, the system gives an alarm.
Compared with the prior art, the invention has the following beneficial effects: the method is combined with the human body joint point coordinate, the size coordinate of the mobile phone, the action intention relation and the space relation of the mobile phone and the human body, the deep learning method and the time sequence model to finally judge whether the person in the environment plays the mobile phone, and the detection precision is greatly improved.
Drawings
Fig. 1 is a flowchart illustrating a method for detecting a person playing a mobile phone according to an embodiment of the present invention.
Fig. 2 is a flowchart of training a feature extraction model according to a first embodiment of the present invention.
Fig. 3 is a flowchart of training a feature relation determination model according to an embodiment of the present invention.
Fig. 4 is a flowchart of a decision model training process according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a system for detecting a person playing a mobile phone according to a second embodiment of the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example one
As shown in fig. 1, a method for detecting a person playing a mobile phone specifically includes the following steps:
and step S1, acquiring the video signal in the current environment to obtain the video to be detected and the training sample.
Specifically, during model training, a large number of videos are collected through the camera. Marking two-dimensional and three-dimensional joint point coordinates of a person; marking two-dimensional and three-dimensional size coordinates of the mobile phone; marking the action intention relation between the person and the mobile phone: the method comprises the steps that a person takes the mobile phone, does not take the mobile phone, sees the mobile phone, and does not see the mobile phone; spatial relationship between the marker and the mobile phone: comprises a front part, a rear part, a left part, a right part, an upper part and a lower part; whether each person plays the mobile phone is marked, and a training sample is obtained after marking is completed. When the application is actually deployed, the video in the application scene is collected through the camera, and the video to be detected is obtained.
And step S2, detecting all people and mobile phones in the video.
Specifically, all people and mobile phones in the video to be detected are detected by a Faster-RCNN algorithm.
And step S3, if no person or mobile phone is detected to appear in the video, determining that no person plays the mobile phone.
And step S4, if the video is detected to have people and mobile phones, extracting the characteristics of each person and each mobile phone by using a characteristic extraction model.
The characteristics of each person and each mobile phone comprise a. two-dimensional human body joint point coordinates and three-dimensional human body joint point coordinates of each human body; b. two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone; c. visual features of each person and cell phone. The visual features include, but are not limited to, features extracted based on a conventional machine learning algorithm or deep learning.
In this embodiment, the implementation is specifically as follows: and calculating the two-dimensional and three-dimensional joint point coordinates of each human body by using an openposition model. And calculating the two-dimensional size coordinate and the three-dimensional size coordinate of each mobile phone by using a centrNet model. Meanwhile, the area corresponding to each person and each mobile phone is intercepted from the last convolution layer of the centret model, and the visual characteristics of each person and each mobile phone are obtained.
The feature extraction models openpase and centret models are generated in advance, as shown in fig. 2, and the specific implementation and training steps are as follows:
a. constructing a human body key point model and a 3D target detection model;
b. training a human body key point model and a 3D target detection model by using a training sample to obtain the feature extraction model.
In this embodiment, the human body key point model is an openposition model, and is used for calculating two-dimensional and three-dimensional joint point coordinates of each human body; the 3D target detection model is a centrNet model and is used for calculating two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone;
in this embodiment, an openposition model is trained using a training sample labeled with two-dimensional and three-dimensional coordinates of a human joint point, and a centrnet model is trained using data labeled with two-dimensional and three-dimensional coordinates of a mobile phone. Finally, combining the openposition model and the centrnet model to obtain a feature extraction model.
And step S5, inputting the characteristics of each person and the mobile phone into a characteristic relation judgment model, and calculating the relation characteristics between each person and the mobile phone.
Specifically, the features of each person and the mobile phone obtained by the feature extraction model are input into the feature relationship judgment model, so as to obtain the relationship features between each person and the mobile phone.
Wherein the relationship features include an action intent relationship: the method comprises the steps that a person takes the mobile phone, does not take the mobile phone, sees the mobile phone, and does not see the mobile phone; spatial relationship: comprises a front part, a rear part, a left part, a right part, an upper part and a lower part; and combining the action intention relation and the spatial relation to obtain the relation characteristic.
In this embodiment, the implementation is specifically as follows: the openposition model and the centrnet model are used for calculating the two-dimensional and three-dimensional coordinates, the size coordinates and the visual characteristics of each person and each mobile phone. And then inputting all the characteristics into a characteristic relation judgment model to obtain the action intention relation and the spatial relation of the person and the mobile phone, and finally combining the action intention relation and the spatial relation to obtain the relation characteristics.
The feature relationship determination model is generated in advance, as shown in fig. 3, in this embodiment, the implementation and training steps of the feature relationship determination model are as follows:
a. constructing a deep learning model;
b. extracting the characteristics of each person and each mobile phone from a training sample by using a characteristic extraction model, and training the sight line (attention) estimation model based on deep learning by using the characteristics to obtain a final characteristic relation judgment model;
in this embodiment, the deep learning model is specifically a sight line (attention) estimation model.
In this embodiment, the implementation is specifically as follows: extracting the characteristics of each person and each mobile phone from the training samples by using the characteristic extraction models openposition and centret, training the constructed sight line estimation model by using the characteristics and the samples marked with the action intention relationship and the space relationship between the marked person and the mobile phone in the training samples, and finally training the sight line estimation model on the MPIIGaze data set to obtain the final characteristic relationship judgment model.
In step S6, the characteristics of each person and each mobile phone and the relationship characteristics between them are input to the determination model, and it is determined whether or not each person is playing a mobile phone at the present time. The method comprises the following specific steps:
a. for the video at the current moment, acquiring the characteristics of each person and each mobile phone and the relationship characteristics between the persons and the mobile phones in a period of time before the current moment and in the video at the current moment;
b. and inputting all the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment.
As shown in fig. 4, in this embodiment, the implementation and training steps of the judgment model are as follows:
a. constructing a time sequence model, specifically constructing an LSTM model;
b. and extracting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics from the training samples by using the characteristic extraction model and the characteristic relationship judgment model, and using the characteristics and the relationship characteristics to train the LSTM model to obtain a final judgment model.
In this embodiment, the specific implementation manner is as follows: extracting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics from the training samples by using the characteristic extraction models openposition, centret and the sight line estimation model, and training the constructed LSTM model by taking 10 frames as an input sample to obtain a final judgment model.
And step S7, processing the result, specifically including storing the detection result according to different application scenes, storing pictures or video evidences for judging that a person plays a mobile phone, sending an alarm and the like.
Example two
Fig. 5 is a schematic structural diagram of a system for detecting a person playing a mobile phone according to an embodiment of the present invention. The method comprises the following steps: the device comprises a video signal acquisition module, a human and mobile phone detection module, a feature extraction module, a judgment module, a feature storage module and a state output and alarm module.
The 201 video signal acquisition module is used for acquiring video signals in the current environment to obtain a video to be detected and a training sample.
In this embodiment, the implementation is specifically as follows: and selecting a proper camera, and designing a hardware scheme for acquiring videos. Specifically, when the model is trained, a large number of video pictures are collected through the camera. Marking two-dimensional and three-dimensional joint point coordinates of a person; marking two-dimensional and three-dimensional coordinates of the mobile phone; marking the action intention relation between the person and the mobile phone: the method comprises the steps that a person takes the mobile phone, does not take the mobile phone, sees the mobile phone, and does not see the mobile phone; marking the spatial relationship between the person and the mobile phone: comprises a front part, a rear part, a left part, a right part, an upper part and a lower part; whether each person plays the mobile phone is marked, and a training sample is obtained after marking is completed. When the application is actually deployed, the camera is used for collecting the video in the application scene to obtain the video to be detected.
The 202 people and mobile phone detection module is used for detecting all people and mobile phones in the video.
In this embodiment, all people and mobile phones in the video to be detected are detected by using a Faster-RCNN algorithm.
And the 203 feature extraction module is used for training the feature extraction model and the feature relation judgment model, extracting the features of each person and each mobile phone by using the feature extraction model if the presence of the person and the mobile phone in the video is detected, and obtaining the relation features of each person and each mobile phone by using the feature relation judgment model.
In this embodiment, the implementation is specifically as follows: training a human body key point model openposition and a 3D target detection model centrNet by using a training sample to obtain a feature extraction model. And then extracting the characteristics of each person and each mobile phone from the training samples by using a characteristic extraction model, training a sight line estimation model based on deep learning by using the characteristics, and finally training the sight line estimation model on the MPIIGaze data set to obtain a final sight line estimation model, namely a relation characteristic model. When the application is actually deployed, the pictures are input into opendose and centrnet models to obtain the characteristics of people and mobile phones. And inputting the characteristics of the people and the mobile phones into the sight line estimation model to obtain the relationship characteristics of each person and each mobile phone.
The 204 judging module is used for training a time sequence model, and for the video at the current moment, judging whether each person in the current moment plays the mobile phone or not by utilizing the characteristics and the relation characteristics of each person and each mobile phone in a period of time before the current moment and in each frame of video at the current moment.
The method comprises the following specific steps:
a. for the video at the current moment, acquiring the characteristics of each person and each mobile phone and the relationship characteristics between the persons and the mobile phones in a period of time before the current moment and in the video at the current moment;
b. inputting all the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment;
in this embodiment, the implementation is specifically as follows: and extracting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics from the training samples by using the characteristic extraction model and the characteristic relationship judgment model, and training an LSTM model by taking 10 frames as input to obtain a final judgment model.
205 the feature storage module is used for storing the features and the relation features of the people and the mobile phone obtained in the algorithm operation process.
The 206 state output and alarm module is used for outputting the state of each person: "play cell phone" or "not play cell phone".
207 alarm module is used for the system to give an alarm if someone is playing the cell phone.
In summary, the method and the system for detecting the 'playing mobile phone' of the person provided by the invention have the beneficial effects that: the method is combined with the human body joint point coordinate, the size coordinate of the mobile phone, the action intention relation and the space relation of the mobile phone and the human body, the deep learning method and the time sequence model to finally judge whether the person in the environment plays the mobile phone, and the detection precision is greatly improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions related to a program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the above embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (14)

1. A method for detecting a person 'playing a cell phone', the method comprising the steps of:
s1, acquiring video signals in the current environment to obtain a video to be detected and a training sample;
s2, detecting all people and mobile phones in the video;
s3, if no person or mobile phone is detected to appear in the video, judging that no person plays the mobile phone;
s4, if the video is detected to have the people and the mobile phones, extracting the characteristics of each person and each mobile phone by using a characteristic extraction model, wherein the characteristics at least comprise:
two-dimensional human body joint point coordinates and three-dimensional human body joint point coordinates of each human body;
two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone;
visual characteristics of each person and cell phone;
s5, inputting the characteristics of each person and the mobile phone into a characteristic relation judgment model, and calculating the relation characteristics between each person and the mobile phone;
s6, inputting the characteristics of each person and each mobile phone and the relationship characteristics between the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment;
and S7, processing the result.
2. The method as claimed in claim 1, wherein the step S2 at least includes detecting all people and mobile phones in the current frame picture by using computer vision algorithm.
3. The method as claimed in claim 1, wherein the visual features include but are not limited to features extracted based on traditional machine learning algorithm or deep learning.
4. The method for detecting the "playing mobile phone" of the person as claimed in claim 1, wherein the step S4 is preceded by the following steps:
constructing a human body key point model and a 3D target detection model;
training a human body key point model and a 3D target detection model by using a training sample to obtain the feature extraction model.
5. A method of detecting a person's "playing a cell phone" as claimed in claim 4, wherein:
the human body key point model is an openposition model and is used for calculating the coordinates of two-dimensional and three-dimensional joint points of each human body;
the 3D target detection model is a centrNet model and is used for calculating two-dimensional size coordinates and three-dimensional size coordinates of each mobile phone.
6. The method as claimed in claim 1, wherein the relationship between each person and each mobile phone in step S5 at least comprises:
action intention relationship: the method comprises the steps that a person takes the mobile phone, does not take the mobile phone, sees the mobile phone, and does not see the mobile phone;
spatial relationship: comprises a front part, a rear part, a left part, a right part, an upper part and a lower part;
and combining the action intention relation and the spatial relation to obtain the relation characteristic.
7. The method for detecting the "playing mobile phone" of the person as claimed in claim 1, wherein the step S5 is preceded by the following steps:
constructing a deep learning model;
and extracting the characteristics of each person and each mobile phone from the training sample by using a characteristic extraction model, and training the deep learning model by using the characteristics to obtain a final characteristic relation judgment model.
8. The method as claimed in claim 7, wherein the deep learning model is a sight line estimation model.
9. The method of claim 1, wherein the step S6 of inputting the characteristics of each person and each mobile phone and the relationship characteristics between them into the judgment model to judge whether each person is playing the mobile phone at the current time comprises:
for the video at the current moment, acquiring the characteristics of each person and each mobile phone and the relationship characteristics between the persons and the mobile phones in a period of time before the current moment and in the video at the current moment;
and inputting all the characteristics into a judgment model, and judging whether each person plays the mobile phone at the current moment.
10. The method for detecting the "playing mobile phone" of the person as claimed in claim 1, wherein the step S6 is preceded by the following steps:
a. constructing a time sequence model;
b. and extracting the characteristics of each person and each mobile phone and the relation characteristics between the characteristics from the training samples by using the characteristic extraction model and the characteristic relation judgment model, and using the characteristics and the relation characteristics to train the time sequence model to obtain a final judgment model.
11. The method of claim 10, wherein the time series model is an LSTM model.
12. The method as claimed in claim 1, wherein the step S7 processes the result, which includes one or more of the following ways:
storing the detection result;
the method comprises the steps of storing pictures or video evidences of 'playing mobile phones' of people;
an alarm is issued.
13. A system for detecting a person's mobile phone play' as claimed in any one of claims 1-12, comprising:
the video signal acquisition module is used for acquiring video signals in the current environment to obtain a video to be detected and a training sample;
the person and mobile phone detection module is used for detecting all persons and mobile phones in the video;
the feature extraction module is used for training the feature extraction model and the feature relation judgment model, if people and mobile phones in the video are detected to appear, the feature extraction model is used for extracting the features of each person and each mobile phone, and the feature relation judgment model is used for obtaining the relation features of each person and each mobile phone;
the judging module is used for training a time sequence model, and judging whether each person plays the mobile phone at the current moment or not by utilizing the characteristics and the relation characteristics of each person and each mobile phone in a period of time before the current moment and in each frame of video at the current moment;
the characteristic storage module is used for storing the characteristics and the relation characteristics of the people and the mobile phone obtained in the algorithm operation process;
the state output module is used for outputting the state of each person: "play cell phone" or "not play cell phone".
14. The system for detecting human cell phone play as claimed in claim 13, further comprising an alarm module for generating an alarm signal if the output status is "cell phone play".
CN202011563792.5A 2020-12-25 2020-12-25 Method and system for detecting 'playing mobile phone' of person Active CN112381068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011563792.5A CN112381068B (en) 2020-12-25 2020-12-25 Method and system for detecting 'playing mobile phone' of person

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011563792.5A CN112381068B (en) 2020-12-25 2020-12-25 Method and system for detecting 'playing mobile phone' of person

Publications (2)

Publication Number Publication Date
CN112381068A CN112381068A (en) 2021-02-19
CN112381068B true CN112381068B (en) 2022-05-31

Family

ID=74590855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011563792.5A Active CN112381068B (en) 2020-12-25 2020-12-25 Method and system for detecting 'playing mobile phone' of person

Country Status (1)

Country Link
CN (1) CN112381068B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408379A (en) * 2021-06-04 2021-09-17 开放智能机器(上海)有限公司 Mobile phone candid behavior monitoring method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171152A (en) * 2017-12-26 2018-06-15 深圳大学 Deep learning human eye sight estimation method, equipment, system and readable storage medium storing program for executing
CN108596064A (en) * 2018-04-13 2018-09-28 长安大学 Driver based on Multi-information acquisition bows operating handset behavioral value method
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN109614939A (en) * 2018-12-13 2019-04-12 四川长虹电器股份有限公司 " playing mobile phone " behavioral value recognition methods based on human body attitude estimation
CN109871764A (en) * 2019-01-16 2019-06-11 深兰科技(上海)有限公司 A kind of abnormal behaviour recognition methods, device and storage medium
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
CN112001347A (en) * 2020-08-31 2020-11-27 重庆科技学院 Motion recognition method based on human skeleton shape and detection target

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101954192B1 (en) * 2012-11-15 2019-03-05 엘지전자 주식회사 Array camera, Moblie terminal, and method for operating the same
JP6261199B2 (en) * 2013-06-21 2018-01-17 キヤノン株式会社 Information processing apparatus, information processing method, and computer program
US20180122185A1 (en) * 2016-10-31 2018-05-03 Kenneth L. Miller Player Tracking Card Reader With Interface For Cell Phone In Place Of Player Tracking Card
CN109120791B (en) * 2018-08-31 2021-01-01 湖南人文科技学院 Method for warning and protecting cervical vertebra through smart phone
CN110287906A (en) * 2019-06-26 2019-09-27 四川长虹电器股份有限公司 Method and system based on image/video detection people " playing mobile phone "

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171152A (en) * 2017-12-26 2018-06-15 深圳大学 Deep learning human eye sight estimation method, equipment, system and readable storage medium storing program for executing
CN108596064A (en) * 2018-04-13 2018-09-28 长安大学 Driver based on Multi-information acquisition bows operating handset behavioral value method
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN109614939A (en) * 2018-12-13 2019-04-12 四川长虹电器股份有限公司 " playing mobile phone " behavioral value recognition methods based on human body attitude estimation
CN109871764A (en) * 2019-01-16 2019-06-11 深兰科技(上海)有限公司 A kind of abnormal behaviour recognition methods, device and storage medium
CN110059662A (en) * 2019-04-26 2019-07-26 山东大学 A kind of deep video Activity recognition method and system
CN112001347A (en) * 2020-08-31 2020-11-27 重庆科技学院 Motion recognition method based on human skeleton shape and detection target

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于Faster R-CNN的学生课堂行为检测算法研究";谭斌等;《现代计算机(专业版)》;20181130(第33期);第45-47页 *
"目标检测、人体姿态估计算法叠加的监控视频分析方法";李宾皑等;《电子技术与软件工程》;20200430(第7期);第143-147页 *

Also Published As

Publication number Publication date
CN112381068A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
Martin et al. Drive&act: A multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles
CN105260712B (en) A kind of vehicle front pedestrian detection method and system
CN103824070B (en) A kind of rapid pedestrian detection method based on computer vision
CN111860274B (en) Traffic police command gesture recognition method based on head orientation and upper half skeleton characteristics
CN109875568A (en) A kind of head pose detection method for fatigue driving detection
CN109740424A (en) Traffic violations recognition methods and Related product
CN105426827A (en) Living body verification method, device and system
CN110298300A (en) A method of detection vehicle violation crimping
CN109766755A (en) Face identification method and Related product
CN108038866A (en) A kind of moving target detecting method based on Vibe and disparity map Background difference
CN106778650A (en) Scene adaptive pedestrian detection method and system based on polymorphic type information fusion
CN110348463A (en) The method and apparatus of vehicle for identification
CN110245563A (en) Refitted car recognition methods and Related product
CN110728199A (en) Intelligent driving test car practice system and method based on MR
CN105117096A (en) Image identification based anti-tracking method and apparatus
CN110147731A (en) Vehicle type recognition method and Related product
CN111540171B (en) Fatigue driving early warning system, corresponding early warning method and construction method
CN112381068B (en) Method and system for detecting 'playing mobile phone' of person
CN112215093A (en) Method and device for evaluating vehicle driving ability level
CN106611165B (en) A kind of automotive window detection method and device based on correlation filtering and color-match
CN116935361A (en) Deep learning-based driver distraction behavior detection method
CN114529979A (en) Human body posture identification system, human body posture identification method and non-transitory computer readable storage medium
CN115797970B (en) Dense pedestrian target detection method and system based on YOLOv5 model
CN115147817B (en) Driver distraction behavior recognition method of instance perception network guided by gestures
Kumar et al. Traffic sign and drowsiness detection using open-cv

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant