CN112580584A - Method, device and system for detecting standing behavior and storage medium - Google Patents

Method, device and system for detecting standing behavior and storage medium Download PDF

Info

Publication number
CN112580584A
CN112580584A CN202011583301.3A CN202011583301A CN112580584A CN 112580584 A CN112580584 A CN 112580584A CN 202011583301 A CN202011583301 A CN 202011583301A CN 112580584 A CN112580584 A CN 112580584A
Authority
CN
China
Prior art keywords
target
frame
behavior
standing
video image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011583301.3A
Other languages
Chinese (zh)
Inventor
王程
章勇
毛晓蛟
曹李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN202011583301.3A priority Critical patent/CN112580584A/en
Publication of CN112580584A publication Critical patent/CN112580584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method, a device, a system and a storage medium for detecting standing up behavior, comprising the following steps: acquiring continuous frame video images; performing target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position; and after the obtained continuous frame video images reach the preset frame number, determining whether the target object has a standing behavior or a sitting behavior according to the corresponding sitting and standing state change at the target position. The method and the device can accurately detect the position of the multi-target object standing behavior under the complex environments such as the classroom and the like, and solve the problems that the multi-target object standing behavior under the complex environments such as the classroom and the like in the prior art has inaccurate detection results and low detection accuracy.

Description

Method, device and system for detecting standing behavior and storage medium
Technical Field
The application relates to a standing behavior detection method, a device, a system and a storage medium, belonging to the technical field of image detection.
Background
With the popularization of the internet, education is not limited to classrooms, more and more high-quality courses can be better spread through the internet, new applications such as automatic recording and broadcasting of teaching courses, remote teaching, classroom behavior analysis and the like are continuously popularized, and computer technology is a key step for realizing the applications. In order to vividly record events occurring in a classroom, classroom behavior analysis is required, wherein behavior tracking of students is particularly important, because the interaction between the students and teachers in the teaching process can be better recorded in the behavior tracking of the students, and particularly when the students speak up, the positions of the standing students are expected to be automatically detected and a close-up snapshot is carried out.
At present, the detection of the standing behavior of the students is based on the traditional tracking detection methods, such as an optical flow method, an interframe difference method, a background subtraction method and the like, but the traditional method cannot effectively solve the problem in the complex classroom environment.
Because the camera installation angle in the classroom environment, classroom space size, the interior environmental arrangement of classroom and circumstances such as light are complicated, have personnel's gathering again, human activity is frequent and gesture characteristics such as various, adopt traditional detection method can't accurate detection standing up the emergence of action, also can detect for standing up when for example student's upper limbs move, can't realize the position that accurate positioning standing up action takes place.
Disclosure of Invention
The application provides a method, a device and a storage medium for detecting a standing behavior, which are used for solving the problems of inaccurate detection result and low detection accuracy of the standing behavior of a multi-target object in a complex environment such as a classroom in the prior art.
The application provides the following technical scheme:
a first aspect of an embodiment of the present application provides a method for detecting a standing behavior, where the method includes:
acquiring continuous frame video images;
performing target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position;
and after the obtained continuous frame video images reach the preset frame number, determining whether the target object has a standing behavior or a sitting behavior according to the corresponding sitting and standing state change at the target position.
In a second aspect of embodiments of the present application, there is provided a standing behavior detection apparatus, including:
the image acquisition module is used for acquiring continuous frame video images;
the target detection module is used for carrying out target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position;
and the behavior judgment module is used for determining whether the target object has a standing behavior or a sitting behavior according to the corresponding standing state change at the target position after the obtained continuous frame video images reach the preset frame number.
In a third aspect of the embodiments of the present application, a system for detecting a standing behavior is provided, where the system includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the computer program is loaded and executed by the processor to implement the steps of the method for detecting a standing behavior according to the first aspect of the embodiments of the present application.
In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, where a computer program is stored, and the computer program is used, when being executed by a processor, to implement the steps of the standing behavior detection method according to the first aspect of the embodiments of the present application.
The beneficial effect of this application lies in: according to the method for detecting the standing behavior of the target object, the target detection is carried out by adopting a pre-trained target detection model to obtain the position of a target frame of the target object, the sitting-up state change of the target object in continuous frame video images is determined through the information matching of the target frames in front and back frame video images, whether the standing behavior occurs or not is judged, the standing behavior detection under the complex environment is realized, the position where the standing behavior of the multi-target object occurs under the complex environment such as a classroom is accurately determined, and the detection accuracy is high.
The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIG. 1 is a network architecture diagram of a rise behavior monitoring system provided by one embodiment of the present application;
FIG. 2 is a flow chart of a method for detection of a standing behavior provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an object detection network model according to an embodiment of the present application;
FIG. 4 is a flow diagram of a training target detection network model provided by one embodiment of the present application;
FIG. 5 is a flow chart of a method for detection of a standing behavior provided by an embodiment of the present application;
FIG. 6 is a block diagram of a standing behavior detection apparatus provided in one embodiment of the present application;
FIG. 7 is a block diagram of a standing behavior detection system according to one embodiment of the present application.
Detailed Description
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.
Fig. 1 is a schematic diagram of a network architecture of a system for monitoring a standing behavior according to an embodiment of the present application, and please refer to fig. 1, the network architecture of the system for monitoring a standing behavior includes: a camera assembly 1 and a standing behavior detection assembly 2, said camera assembly 1 and standing behavior detection assembly 2 establishing a network connection. The standing behavior detection component 2 may be a desktop computer, a notebook computer, a server, or the like, and a pre-trained target detection network model is set in the standing behavior detection component 2. The camera assembly 1 is used for acquiring a video image of a target object to be detected.
Based on the framework shown in fig. 1, the steps of detecting the standing behavior of the target object are as follows:
step 1, the camera assembly 1 collects video images of a target object and sends continuous frame video images containing the target object to the standing behavior detection assembly 2.
And 2, the standing behavior detection component 2 performs target detection on each frame of received video image according to a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame of video image and a standing state corresponding to the target position.
The sitting-up state includes standing up and sitting up, and the target position refers to a position where the target object is located, for example, the standing up behavior of students in a classroom is detected, and then the target object is all students in the classroom.
And 3, the standing behavior detection component 2 determines whether the target object stands or sits according to the standing state change at the target position after the obtained continuous frame video images reach the preset frame number.
And 4, when the standing behavior detection component 2 determines that the current target object has the standing behavior, sending the position coordinates of the target frame corresponding to the current target object to the camera component 1.
And step 5, switching the close-up shot to the target object by the camera assembly 1 according to the received position coordinates.
Further, when the target objects at a plurality of target positions are simultaneously in standing up, the shot is switched to the panoramic picture.
The monitoring system can be applied to monitoring the standing-up behavior of students in a classroom, and the embodiment of the application is described in detail by taking the detection of the standing-up behavior of the students in the classroom as an example:
fig. 2 is a flowchart of a standing behavior detection method according to an embodiment of the present application, and this embodiment illustrates an example in which the method is applied to the standing behavior monitoring system shown in fig. 1 to detect standing behaviors of students in a classroom, and an execution subject of each step is the standing behavior detection component 2. The method at least comprises the following steps:
s201, acquiring continuous frame video images.
Video images are captured by the camera assembly, and the captured video images may contain 1 or more target objects.
For example, when the detection method is applied in a classroom environment for detecting the standing up behavior of a student, the target object is the student, and the acquired video image may include the states of all students in the classroom, that is, include a plurality of target objects.
This embodiment requires that a video image of each frame be acquired in succession.
S202, carrying out target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position.
Specifically, the sitting-up state includes that the target object is in a standing-up and sitting-up state, the target object refers to a student in a classroom, and the target position refers to a position where the target object is located.
In this embodiment, the pre-trained target detection network model is used to detect the sitting and standing states of different target objects in each frame of video image, that is, whether the target object at each target position is in the standing state or the sitting state is detected.
Optionally, the target detection network model of this embodiment includes a feature extraction layer, a feature fusion layer, and a target prediction layer, where the feature extraction layer is configured to extract image features of each frame of input video image at different scales; the characteristic fusion layer is used for carrying out characteristic fusion on image characteristics of the same frame of video image at different scales to obtain image characteristics after the characteristic fusion; and the target prediction layer predicts and obtains a target frame and a corresponding label of each frame of video image according to the fused image characteristics, wherein the target frame indicates the target position of a target object, and the label of the target frame indicates that the sitting-up state at the target position is standing or sitting. The target frame is a rectangular frame, which may be the smallest bounding rectangle of the target object.
And S203, after the obtained continuous frame video images reach the preset frame number, determining whether the target object has a standing behavior or a sitting behavior according to the corresponding sitting and standing state change at the target position.
Specifically, sequentially inputting the continuous frame video images into the pre-trained target detection network model of this embodiment, performing information matching operation on each target frame in the current frame video image and the target frame in the previous frame video image, and obtaining the target frames in the previous frame video image that are matched with each target frame in the current frame video image, specifically:
caching the target frames detected by the first frame of video image and the corresponding labels, and performing information matching operation on each target frame in the second frame of video image and the target frames detected by the first frame of video image respectively to obtain the target frames matched with the target frames in the second frame of video image in the first frame of video image and the labels corresponding to the target frames in the first frame of video image and the second frame of video image (the labels indicate that the sitting-up state is standing up or sitting up). Similarly, each target frame in the input third frame video image is respectively subjected to information matching operation with each target frame in the second frame video image, so as to respectively obtain the target frames in the second frame video image, which are matched with each target frame in the third frame video image, and the labels corresponding to each target frame. And the matched target frame indicates the target position corresponding to the same target object in the front and back frame video images.
And in this way, when the input continuous frame video images reach the preset frame number, judging whether the corresponding target object has the standing behavior or the sitting behavior according to the change of the target frame label indicating the target position corresponding to the same target object in the continuous frame video images with the preset frame number.
Illustratively, the target frame A of the target object A of the first frame video image is subjected to a target frame information matching operation1And the target frame A in the second frame video image2For a matching target frame, the third frame is target frame A in the image3And the target frame A in the second frame video image2For a matching target frame, the fourth frame is target frame A in the image4And the target frame A in the second frame video image3For the matched target frame, the fifth frame is the target frame A in the image5And the target frame A in the fourth frame video image4For the matched target frame, target frame A1~A5The same target object a is indicated in all of the first frame video image to the fifth frame video image, and so on.
Assume target Box A1The tag indicates a sitting state, and if 3 frames of video images are detected in the input continuous 5 frames of video images, the tag of the target frame corresponding to the target object A indicates a standing state, such as the target frame A3Target frame A4Target frame A5(or object box A)2Target frame A4Target frame A5Etc.) the corresponding labels are all indicated as the standing state, the target object A is judged to have the standing behavior.
After the target object A is judged to have the standing behavior, if 3 frames of target frame labels indicating the target object A are detected in the continuous 5-frame video images acquired again to indicate that the target object A is sitting, the target object A is judged to be finished in the standing behavior, and the target object A indicated by the target frame has the sitting behavior.
The above-mentioned selection of the number of the preset continuous frames is only an exemplary illustration, and other preset continuous frames may be selected according to actual needs, for example, n frames in the continuous m-frame video images are selected to indicate standing (sitting), where m > n and n/m > a preset ratio, in one embodiment, the preset ratio is 60%, and the preset ratio may also be other values, which are set according to specific situations, such as 55%, 70%, 80%, and so on. In this embodiment, the values of m and n are not limited herein.
Optionally, performing information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image respectively, includes:
acquiring the lower left corner coordinates of each target frame in the current frame video image and the previous frame video image;
respectively calculating Euclidean distances between the lower left corner coordinates of each target frame in the current frame video image and the lower left corner coordinates of each target frame in the previous frame video image;
and determining the target frame with the shortest Euclidean distance from the current target frame in the previous frame of video image to the current target frame in the current frame of video image as the target frame matched with the current target frame.
For the detection of the standing behavior of students in a classroom, the range of the divided target frame is an area from a desktop to the top of the head, and in the process of the standing behavior, the coordinates of the lower left corner of the visible part of a human body are static or slightly disturbed, so that the coordinates of the lower left corner can be used as key information matched with the same target position in front and back frame video images. The method and the device for determining the Euclidean distance of the two target frames in the front and rear frame video images acquire coordinates of the lower corners of the target frames in the front and rear frame video images, and calculate the Euclidean distance of the two target frames in the front and rear frame video images according to the coordinates of the lower left corners of the target frames in the front and rear frame video images.
Fig. 4 is a flowchart illustrating a process of training an FCOS network model according to an embodiment of the present invention, and optionally, as shown in fig. 4, the step of training the FCOS network model according to the embodiment of the present invention is as follows:
s401, collecting a training data set.
The training data set comprises a video image of the target object, an annotation frame, a label corresponding to the annotation frame and a label category corresponding to each pixel point sample.
Taking the detection of the standing up behavior of the students in the classroom environment as an example, the acquired training data set comprises video images of the students in normal class acquired in classrooms with various specifications and different sizes in daily life.
The callout box ranges from the desktop up to the student's top head area, and if the student's body is occluded beyond 3/4, no callout is made.
Decomposing the whole standing process of the student, marking the label of the labeling frame corresponding to the continuous frame when the standing behavior occurs as sitting, marking the label of the labeling frame corresponding to the continuous frame when the standing behavior approaches to completion as standing, and neglecting the intermediate frame without participating in the calculation of the final loss function.
For the label category of each pixel point sample, if the pixel point is located in the labeling frame, the label category is consistent with the labeling frame; otherwise, it is a background point and is marked as 0.
S402: and (4) preprocessing data.
Since the standing up behavior of students accounts for a small number of students in a classroom, data augmentation processing is required to be performed for the problem of category imbalance first. Data augmentation is well known in the art and will not be described further herein.
And S403, constructing a network model.
The target detection network model constructed in this embodiment is an FCOS (full volumetric One-Stage object detection) network model based on an anchor-free algorithm, fig. 3 shows a schematic structural diagram of the target detection network model in the embodiment of the present application, and as shown in fig. 3, the target detection network model adopted in this embodiment is an FCOS (full volumetric One-Stage object detection) network model based on an anchor-free algorithm, and the specific structure is as follows:
and the feature extraction layer adopts a mobile terminal neural network mobilenetv2 as a backbone network, the backbone network is a 3-layer convolution network (C3, C4 and C5), and image features of the input video image in different scales are extracted.
And the characteristic fusion layer is constructed by adopting an FPN (Feature pyramid network) and is used for fusing image characteristics of different scales. The feature pyramid network constructs pyramid layers P3-P7, and each pyramid layer corresponds to a probing Head (Head).
The target prediction layer is constructed by adopting three prediction branches, wherein the first prediction branch is a category branch and is used for predicting the label of a target frame; the second prediction branch is a regression branch and is used for predicting the position of the target frame; the third branch is a quality branch used to predict the quality of the target box.
Finally, the confidence of the target box output by the FCOS network model is the probability of the classification branch multiplied by the value of the quality branch output.
And S404, training a network model.
And inputting the training data set into the FCOS network model for training to obtain and store the trained FCOS network model.
The training of the network model adopts fixed scale training, firstly, the sizes of feature graphs with different scales obtained by the feature fusion layer are calculated, then, the coordinates of pixel points in the feature graphs corresponding to the original graph are calculated, if the coordinates fall in the labeling frame, the labels corresponding to the pixel feature points are the labels corresponding to the labeling frame, and if the coordinates fall in the labeling frame, the pixel points are the background, and the label is 0.
In order to enhance the accuracy of data, when the positive and negative samples are selected, the samples of which the pixel points fall into the central area of the labeling frame are matched into the positive samples, and the labeling frame is endowed with a label.
The regression branch calculates the distance of the coordinate relative to the upper direction, the lower direction, the left direction and the right direction of the labeling frame according to the coordinate of the pixel point on the feature graph with different scales corresponding to the original graph, and a learnable scaling factor Scale is used for relieving the problem that the network is difficult to learn due to the fact that the head heads with different scales are shared.
The CenteNess quality branch gives a mark to whether the model is accurately positioned or not, and a cross entropy loss function is adopted; a Loss function Focal local is adopted for the classification branch so as to solve the problems of category imbalance and difficult and easy sample imbalance which still exist; and for the regression branch, a Loss function DIoU Loss is adopted, so that the positioning accuracy can be further improved.
In summary, the method for detecting the standing behavior of the target object provided by this embodiment determines the sitting state change of the target object in the consecutive frame video images by matching the information of the target frames in the previous and next frame video images based on the depth network model, determines whether the standing behavior occurs, realizes the detection of the standing behavior in the complex environment, and has high detection accuracy.
Fig. 5 is a flowchart of an upright behavior detection method according to an embodiment of the present application, which is applied to the upright behavior detection system shown in fig. 1 to detect the upright behaviors of students in a classroom, and the implementation subject of each step is exemplified by the camera assembly 1. The method at least comprises the following steps:
s501, collecting continuous frame video images of students in a classroom.
And the continuous frame video images are used for inputting a pre-trained target detection network model to realize the position detection and the corresponding sitting-up state detection of the target object.
And S502, when the standing behavior is judged to occur according to the sitting-up state of the target object, receiving the target position coordinates of the target object in which the standing behavior occurs.
S503: and judging whether only one target position coordinate is received, if so, executing step S505, and if not, executing step S504.
And S504, switching the close-up shot to the corresponding target position.
And S505, switching the shot to the panoramic picture.
According to the technical scheme, the shot switching can be performed according to the position coordinates, and the user can more accurately determine the target object with the standing behavior in the complex environment through the shot switching.
In other embodiments, when the position coordinates of the plurality of target frames are received, that is, the plurality of target objects are detected to have the standing behavior, the target object currently speaking is identified by using the existing voice/speaker recognition algorithm, and the close-up shot of the camera is switched to the target position corresponding to the speaking target object in the standing target objects.
For details of the method of this embodiment, please refer to the above method embodiment, which is not described herein again.
Fig. 6 is a block diagram of an apparatus for detecting a standing behavior according to an embodiment of the present application, and this embodiment is described by taking an example of an application of the apparatus to the standing behavior detection module 2 in the system for monitoring a standing behavior shown in fig. 1. The device at least comprises the following modules:
the image obtaining module 601 is configured to obtain consecutive frame video images.
A target detection module 602, configured to perform target detection on the consecutive frames of video images through a pre-trained target detection network model, so as to obtain a target position corresponding to a target object in each frame of video image and a sitting state at the target position.
A behavior determining module 603, configured to determine whether the target object performs a standing behavior or a sitting behavior according to a standing state change at a target position in a preset continuous frame video image.
Further, the object detection module 602 includes:
the characteristic extraction unit is used for respectively extracting the image characteristics of each frame of video image in different scales;
the feature fusion unit is used for respectively performing feature fusion on the image features of different scales corresponding to the video images of each frame to obtain the image features of each frame after the video images are fused;
the prediction unit is used for respectively predicting a target frame and a corresponding label of each frame of video image according to the fused image characteristics;
the target frame indicates a target position, and the label indicates that the sitting-up state corresponding to the target position is standing or sitting.
Further, the behavior determination module 603 includes:
the matching operation unit is used for performing information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image respectively to obtain target frames matched with each target frame in the current frame video image in the previous frame video image respectively; the matched target frame indicates the target position corresponding to the same target object in the current frame video image and the previous frame video image;
and the position judging unit is used for judging whether the corresponding target object has a standing behavior or a sitting behavior according to the target frame label change of the target position corresponding to the same target object in the preset continuous frame video images.
Further, the matching operation unit is configured to perform information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image, and includes:
acquiring the lower left corner coordinates of each target frame in the current frame video image and the previous frame video image;
respectively calculating Euclidean distances between the lower left corner coordinates of each target frame in the current frame video image and the lower left corner coordinates of each target frame in the previous frame video image;
and determining the target frame with the shortest Euclidean distance from the current target frame in the previous frame of video image to the current target frame in the current frame of video image as the target frame matched with the current target frame.
Further, the position determination unit is configured to determine whether a corresponding target object has a standing behavior or a sitting behavior according to a target frame tag change indicating a target position of the same target object in consecutive frame video images of a preset number of frames, and includes:
if the target frame label indicating the target position of the same target object starts from indicating a sitting state, n frames of indications are detected in preset continuous m frames of video images to indicate a standing state, and the target object indicated by the target frame is judged to have a standing behavior;
in the standing behavior keeping state, if n frames of instructions are detected in preset continuous m frames of video images of the target frame to be in a sitting state, judging that the standing behavior is finished, and judging that the target object indicated by the target frame has the sitting behavior; wherein m is larger than n, and n/m is larger than a preset ratio.
Further, the standing behavior detection apparatus further includes a position output module, where the position output module is configured to:
if the current target object is determined to have the standing behavior, outputting the position coordinates of a target frame corresponding to the current target object;
wherein the position coordinates of the target frame are used to indicate a close-up cut position.
For relevant details reference is made to the above-described method embodiments.
It should be noted that: in the above embodiment, when the standing behavior detection device performs the standing behavior detection, only the division of the functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the standing behavior detection device is divided into different functional modules to complete all or part of the above-described functions. In addition, the embodiment of the standing behavior detection apparatus and the embodiment of the standing behavior detection method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Fig. 7 is a block diagram of a standing behavior detection system according to an embodiment of the present application, where the behavior detection system according to the embodiment may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server, the standing behavior detection system according to the embodiment at least includes a processor 701 and a memory 702, a computer program is stored in the memory 702, the computer program may run on the processor 701, and when the processor 701 executes the computer program, the steps in the standing behavior detection method embodiment, such as the steps of the standing behavior detection method shown in fig. 2, are implemented. Alternatively, the processor 701 may implement the functions of each module in the embodiment of the standing behavior detection apparatus when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, the instruction segments describing the execution process of the computer program in the standing behavior detection system. For example, the computer program may be divided into an image acquisition module, an object detection module, and a behavior determination module, and the specific functions of each module are as follows:
the image acquisition module is used for acquiring continuous frame video images;
the target detection module is used for carrying out target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position;
and the behavior judgment module is used for determining whether the standing behavior or the sitting behavior occurs at the target position according to the corresponding standing state change at the target position after the obtained continuous frame video images reach the preset frame number.
The processor may include one or more processing cores, such as: 4 core processors, 6 core processors, etc. The processor may be implemented in at least one hardware form of DSP (Digital Signal Processing), FPGA (Field-Programmable gate array), PLA (Programmable logic array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning. The processor is a control center of the standing behavior detection device and is connected with each part of the whole behavior detection system by various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the standing behavior detection apparatus by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a memory device, or other volatile solid state storage device.
It will be understood by those skilled in the art that the rising behavior detection system described in this embodiment is only an example, and is not limited to the rising behavior detection system, and in other embodiments, more or fewer components may be included, or some components may be combined, or different components may be included, for example, the rising behavior detection system may further include an input/output device, a network access device, a bus, and the like. The processor, memory and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.
Of course, the rising behavior detection system may also include fewer or more components, which is not limited by the embodiment.
Optionally, the present application further provides a computer-readable storage medium, which stores a computer program, which when executed by a processor is configured to implement the steps of the above-mentioned standing behavior detection method.
Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the steps of the above-mentioned embodiment of the standing behavior detection method.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of standing behavior detection, the method comprising:
acquiring continuous frame video images;
performing target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position;
and after the obtained continuous frame video images reach the preset frame number, determining whether the target object has a standing behavior or a sitting behavior according to the corresponding sitting and standing state change at the target position.
2. The method according to claim 1, wherein the target detection network model is an FCOS network model based on an anchor-free algorithm, and the target detection of the consecutive frame video images by the pre-trained target detection network model comprises:
respectively extracting image characteristics of each frame of video image in different scales;
respectively carrying out feature fusion on the image features of different scales corresponding to each frame of the video image to obtain the image feature after each frame of the video image is fused;
respectively predicting a target frame and a corresponding label of each frame of video image according to the fused image characteristics;
the target frame indicates the target position, and the label indicates that the sitting-up state corresponding to the target position is standing or sitting.
3. The method according to claim 2, wherein determining whether the standing behavior or the sitting behavior occurs at the target position according to the corresponding sitting and standing state change at the target position after the obtained continuous frame video images reach the preset frame number comprises:
performing information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image respectively to obtain target frames matched with each target frame in the current frame video image in the previous frame video image respectively; the matched target frame indicates the target position corresponding to the same target object in the current frame video image and the previous frame video image;
and judging whether the corresponding target object has a standing behavior or a sitting behavior according to the target frame label change of the target position corresponding to the same target object in the continuous frame video images with the preset frame number.
4. The method according to claim 3, wherein performing an information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image respectively comprises:
acquiring the lower left corner coordinates of each target frame in the current frame video image and the previous frame video image;
respectively calculating Euclidean distances between the lower left corner coordinates of each target frame in the current frame video image and the lower left corner coordinates of each target frame in the previous frame video image;
and determining the target frame with the shortest Euclidean distance from the current target frame in the previous frame of video image to the current target frame in the current frame of video image as the target frame matched with the current target frame.
5. The method according to claim 3, wherein the determining whether the corresponding target object has a standing behavior or a sitting behavior according to a target frame tag change indicating a target position of the same target object in the consecutive frame video images of the preset number of frames comprises:
if the target frame label indicating the target position of the same target object starts from indicating a sitting state, n frames of indications are detected in preset continuous m frames of video images to indicate a standing state, and the target object indicated by the target frame is judged to have a standing behavior;
in the standing behavior keeping state, if n frames of instructions are detected in preset continuous m frames of video images of the target frame to be in a sitting state, judging that the standing behavior is finished, and judging that the target object indicated by the target frame has the sitting behavior; wherein m is larger than n, and n/m is larger than a preset ratio.
6. The method of claim 2, wherein after determining whether the target object is in a standing or sitting position, further comprising:
if the current target object is determined to have the standing behavior, outputting the position coordinates of a target frame corresponding to the current target object;
wherein the position coordinates of the target frame are used to indicate a close-up cut position.
7. A rise behavior detection apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring continuous frame video images;
the target detection module is used for carrying out target detection on the continuous frame video images through a pre-trained target detection network model to obtain a target position corresponding to a target object in each frame video image and a sitting-up state corresponding to the target position;
and the behavior judgment module is used for determining whether the target object has a standing behavior or a sitting behavior according to the corresponding standing state change at the target position after the obtained continuous frame video images reach the preset frame number.
8. The apparatus of claim 7, wherein the behavior determination module comprises:
the matching operation unit is used for performing information matching operation on each target frame detected in the current frame video image and each target frame detected in the previous frame video image respectively to obtain target frames matched with each target frame in the current frame video image in the previous frame video image respectively; the matched target frame indicates the target position corresponding to the same target object in the current frame video image and the previous frame video image;
and the position judging unit is used for judging whether the corresponding target object has a standing behavior or a sitting behavior according to the change of the target frame label indicating the same target position in the preset continuous frame video images.
9. A system for detection of standing behavior, the system comprising a processor and a memory, the memory having stored therein a computer program, characterized in that the computer program is loaded and executed by the processor to carry out the steps of the method for detection of standing behavior according to any of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program for implementing the steps of the standing behavior detection method according to any one of claims 1 to 6 when executed by a processor.
CN202011583301.3A 2020-12-28 2020-12-28 Method, device and system for detecting standing behavior and storage medium Pending CN112580584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011583301.3A CN112580584A (en) 2020-12-28 2020-12-28 Method, device and system for detecting standing behavior and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011583301.3A CN112580584A (en) 2020-12-28 2020-12-28 Method, device and system for detecting standing behavior and storage medium

Publications (1)

Publication Number Publication Date
CN112580584A true CN112580584A (en) 2021-03-30

Family

ID=75140433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011583301.3A Pending CN112580584A (en) 2020-12-28 2020-12-28 Method, device and system for detecting standing behavior and storage medium

Country Status (1)

Country Link
CN (1) CN112580584A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392776A (en) * 2021-06-17 2021-09-14 深圳市千隼科技有限公司 Seat leaving behavior detection method and storage device combining seat information and machine vision
CN114222065A (en) * 2021-12-20 2022-03-22 北京奕斯伟计算技术有限公司 Image processing method, image processing apparatus, electronic device, storage medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780565A (en) * 2016-11-15 2017-05-31 天津大学 A kind of many students based on light stream and k means clusters rise and sit detection method
CN110728696A (en) * 2019-09-06 2020-01-24 天津大学 Student standing detection method of recording and broadcasting system based on background modeling and optical flow method
CN110738109A (en) * 2019-09-10 2020-01-31 浙江大华技术股份有限公司 Method, device and computer storage medium for detecting user standing
CN111104816A (en) * 2018-10-25 2020-05-05 杭州海康威视数字技术股份有限公司 Target object posture recognition method and device and camera

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780565A (en) * 2016-11-15 2017-05-31 天津大学 A kind of many students based on light stream and k means clusters rise and sit detection method
CN111104816A (en) * 2018-10-25 2020-05-05 杭州海康威视数字技术股份有限公司 Target object posture recognition method and device and camera
CN110728696A (en) * 2019-09-06 2020-01-24 天津大学 Student standing detection method of recording and broadcasting system based on background modeling and optical flow method
CN110738109A (en) * 2019-09-10 2020-01-31 浙江大华技术股份有限公司 Method, device and computer storage medium for detecting user standing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392776A (en) * 2021-06-17 2021-09-14 深圳市千隼科技有限公司 Seat leaving behavior detection method and storage device combining seat information and machine vision
CN113392776B (en) * 2021-06-17 2022-07-12 深圳日海物联技术有限公司 Seat leaving behavior detection method and storage device combining seat information and machine vision
CN114222065A (en) * 2021-12-20 2022-03-22 北京奕斯伟计算技术有限公司 Image processing method, image processing apparatus, electronic device, storage medium, and program product
CN114222065B (en) * 2021-12-20 2024-03-08 北京奕斯伟计算技术股份有限公司 Image processing method, image processing apparatus, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
US11538246B2 (en) Method and apparatus for training feature extraction model, computer device, and computer-readable storage medium
CN110610510B (en) Target tracking method and device, electronic equipment and storage medium
WO2020224479A1 (en) Method and apparatus for acquiring positions of target, and computer device and storage medium
CN106874826A (en) Face key point-tracking method and device
CN109345553B (en) Palm and key point detection method and device thereof, and terminal equipment
CN109740668B (en) Deep model training method and device, electronic equipment and storage medium
CN112115894B (en) Training method and device of hand key point detection model and electronic equipment
CN111612822B (en) Object tracking method, device, computer equipment and storage medium
CN110660102B (en) Speaker recognition method, device and system based on artificial intelligence
CN112381104A (en) Image identification method and device, computer equipment and storage medium
CN112580584A (en) Method, device and system for detecting standing behavior and storage medium
CN115471662B (en) Training method, recognition method, device and storage medium for semantic segmentation model
CN113159200B (en) Object analysis method, device and storage medium
CN103105924A (en) Man-machine interaction method and device
CN112001944A (en) Classroom teaching quality evaluation data acquisition method, computer equipment and medium
CN110969045A (en) Behavior detection method and device, electronic equipment and storage medium
CN115907507B (en) Student class behavior detection and learning analysis method combined with class scene
CN109584262A (en) Cloud detection method of optic, device and electronic equipment based on remote sensing image
CN115187772A (en) Training method, device and equipment of target detection network and target detection method, device and equipment
CN116863286A (en) Double-flow target detection method and model building method thereof
CN113947613B (en) Target area detection method, device, equipment and storage medium
CN111589138A (en) Action prediction method, device, equipment and storage medium
CN111538852A (en) Multimedia resource processing method, device, storage medium and equipment
CN112153320B (en) Method and device for measuring size of article, electronic equipment and storage medium
CN111652168B (en) Group detection method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210330