CN111353452A - Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images - Google Patents

Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images Download PDF

Info

Publication number
CN111353452A
CN111353452A CN202010151359.4A CN202010151359A CN111353452A CN 111353452 A CN111353452 A CN 111353452A CN 202010151359 A CN202010151359 A CN 202010151359A CN 111353452 A CN111353452 A CN 111353452A
Authority
CN
China
Prior art keywords
image
behavior recognition
frame
rgb
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010151359.4A
Other languages
Chinese (zh)
Inventor
熊德智
陈向群
胡军华
柳青
刘小平
杨茂涛
黄瑞
温和
欧阳黎
陈浩
曾文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Hunan Electric Power Co Ltd
Metering Center of State Grid Hunan Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Hunan Electric Power Co Ltd
Metering Center of State Grid Hunan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Hunan Electric Power Co Ltd, Metering Center of State Grid Hunan Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202010151359.4A priority Critical patent/CN111353452A/en
Publication of CN111353452A publication Critical patent/CN111353452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06398Performance of employee with respect to a job function
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method, a behavior recognition device, a behavior recognition medium and behavior recognition equipment based on RGB images, which belong to the technical field of behavior recognition and are used for solving the technical problem that no behavior specification intelligent recognition analysis exists in the current service occasion, and the method comprises the following steps: 1) preprocessing the RGB image, segmenting the region of a worker, and capturing or tracking a target; 2) extracting image characteristic parameters, and sending the image characteristic parameters into a cyclic neural network to obtain the mapping between the image characteristic parameters and high-dimensional vectors; 3) on the basis of obtaining the high-dimensional vector of the video frame, establishing a classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model; 4) and acquiring RGB images in the monitoring video information, and identifying the behaviors of service personnel in the power supply business hall based on the trained classifier model. The invention has the advantages of intelligent identification and analysis of the behavior of service personnel, high identification precision, improvement of working efficiency and service level and the like.

Description

Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
Technical Field
The invention mainly relates to the technical field of behavior recognition, in particular to a behavior recognition method, a behavior recognition device, a behavior recognition medium and behavior recognition equipment based on RGB images.
Background
The power supply business hall is the most important service window of a power supply enterprise and has important social functions of communicating, displaying and spreading the enterprise image. The power supply business hall is the front edge of a window of a power supply enterprise and represents the image of the power supply enterprise. The client transacts various electricity utilization businesses to the electricity supply business hall, and the service staff of the electricity supply business hall is contacted firstly. Therefore, the service skills of the staff of the power supply business hall and the attitude of the waiting person and the receiving object often determine the cognitive degree of the client on the service level of the power supply enterprise. The casual and lackluster behaviors of some workers, such as mobile phone playing during working time, sleeping, bad attitude and the like, can leave an extremely bad impression on customers. In addition, the microblog is widely used from media in the information era, and if dissatisfactory customers release information to the internet, the image of an enterprise and a large amount of economic loss are easily caused. At present, the service of the power supply business hall has a perfect standard system, but the conditions of incomplete execution and difficult supervision often exist, and if the service only depends on the field inspection of a competent department, the service is difficult to play a good role in supervision and control. The research on the intelligent recognition, analysis and early warning of the business hall behavior specification is carried out, and the exploration and establishment of demonstration projects are necessary.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the behavior recognition method, the behavior recognition device, the behavior recognition medium and the behavior recognition equipment based on the RGB image, which are simple and convenient to operate, high in recognition accuracy and capable of improving the working efficiency and the service level.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
a behavior recognition method based on RGB images is characterized by comprising the following steps:
1) preprocessing an RGB image in a video frame, segmenting the region of a worker, and capturing or tracking a target;
2) extracting image characteristic parameters in the preprocessed RGB image, and sending the image characteristic parameters into a recurrent neural network to obtain the mapping between the image characteristic parameters and high-dimensional vectors;
3) on the basis of obtaining the high-dimensional vector of the video frame, establishing a final classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
4) and acquiring RGB images in the monitoring video information, and identifying the behaviors of service personnel in the power supply business hall based on the trained classifier model.
As a further improvement of the above technical solution, the step 3) specifically includes:
3.1) calling each video frame unit used for feature extraction in the step 2) as a segment, recording the high-dimensional vector output each time as a segment action score, and finally obtaining an SAS feature sequence with equal length for a video containing T frame images;
3.2) after obtaining a signature sequence of length T, using it as input to the SSAD model; the SSAD model is a network which is completely formed by time sequence convolution and mainly comprises three convolution layers: the system comprises a base layer, an anchor frame layer and a prediction layer, wherein the base layer is used for shortening the length of a characteristic sequence and increasing the receptive field of each position in the characteristic sequence;
3.3) continuing to use in the SSAD model to reduce the length of the feature sequence, each position in the feature sequence output by the anchor frame layer is associated with an anchor frame instance of multiple scales;
3.4), obtaining the coordinate offset, the overlapping confidence coefficient and the classification result corresponding to each anchor frame example through a prediction layer;
and 3.5) obtaining the motion instance prediction of each time scale from small to large by the SSAD model through a characteristic sequence with a plurality of layers of time scales being reduced continuously, and establishing a final classifier model.
As a further improvement of the above technical solution, in step 3), training of a classifier model is further included:
correcting the obtained anchor frame by using coordinate offset, and matching the corrected anchor frame with the label example to determine whether the anchor frame example is a positive sample or a negative sample; wherein the SSAD model is model-trained using a loss function comprising a classification loss LclassOverlap confidence regression loss LoverBoundary regression loss LlocAnd a regularization term L2
L=Lclass+α·Lover+β·Lloc+λ·L2(Θ)
Wherein α, β and lambda are coefficients;
during testing, the obtained anchor frame examples are corrected by coordinate offset, and then the final classification result of each anchor frame example is obtained.
As a further improvement of the above technical solution, in step 4), after all the prediction action instances of a segment of video are obtained, a non-maximization suppression algorithm is used to deduplicate the overlapped predictions, so as to obtain a final time sequence action detection result.
As a further improvement of the above technical solution, in step 2), image feature parameters in the RGB image are extracted through a C3D model; the C3D model includes 8 convolution operations, 5 pooling operations; wherein the convolution kernels are all 3 x 3 in size, and the step size is 1 x 1; the size of the pooling nuclei was 2 x 2, the step size was 2 x 2, except for the first pooling, both size and step size were 1 x 2, so as not to reduce the length on the time series too early; finally, after two full connection layers, a 4096-dimensional high-dimensional vector is obtained.
As a further improvement of the above technical solution, in step 1), the preprocessing the video frame specifically includes: the method comprises the steps of adopting a background extraction algorithm to segment the region of a worker, using a voting algorithm to calculate a connected domain positioning target region, capturing or tracking a target, and finally obtaining an image only containing a single target; the motion area in the image is extracted by subtracting the pixel values of two adjacent frames or two images separated by several frames in the video stream and thresholding the subtracted images; or carrying out difference operation on the currently acquired image frame and the background image to obtain a gray level image of the target motion region, carrying out thresholding on the gray level image to extract the motion region, wherein the background image is updated according to the currently acquired image frame.
As a further improvement of the above technical solution, in step 1), the preprocessing of the video frame further includes that a specific start frame and an end frame of the irregular behavior are calibrated for identification, and the specific process includes: extracting a feature sequence of a video frame, generating a plurality of nominations with different sizes at each position in the video by using a sliding window mechanism, then training an action classifier and a ranking for each nomination to classify and sequence the nominations, and finely adjusting an action boundary in the time-series action detection by using a CDC algorithm so as to enable the action boundary to be more accurate.
The invention also discloses a behavior recognition device based on the RGB image, which comprises
The preprocessing unit is used for preprocessing the RGB image in the video frame, segmenting the region of a worker and capturing or tracking a target;
the feature extraction module is used for extracting image feature parameters in the preprocessed RGB image and sending the image feature parameters into a recurrent neural network to obtain the mapping between the image feature parameters and high-dimensional vectors;
the classifier model establishing and training module is used for establishing a final classifier model on the basis of obtaining the high-dimensional vector of the video frame, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
and the behavior recognition module is used for acquiring RGB images in the monitoring video information and recognizing the behaviors of the service personnel in the power supply business hall based on the trained classifier model.
The invention further discloses a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the RGB image-based behavior recognition method as described above.
The invention also discloses a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program is characterized in that when being executed by the processor, the computer program executes the steps of the behavior recognition method based on the RGB image.
Compared with the prior art, the invention has the advantages that:
(1) the behavior recognition method based on the RGB image adopts the behavior recognition technology of the RGB image to extract the characteristics, and sends the characteristics into a recurrent neural network to obtain the mapping between the characteristic parameters of the image and high-dimensional vectors; on the basis of obtaining the high-dimensional vector of the video frame, establishing a final classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model; therefore, RGB images in monitoring video information are obtained, the behavior of service personnel in the power supply business hall is recognized based on the trained classifier model, the operation is simple and convenient, and the recognition precision is high; by the method, the administrative department does not need to frequently check the site, but can check the working condition of the service personnel through monitoring information, thereby greatly improving the efficiency; and personalized training can be carried out according to the service level and the defects of different business hall personnel based on the business hall monitoring information.
(2) The method adopts a frame difference method or a background difference method to extract the motion area, has simple operation and is not easily influenced by environmental light; in the background difference method, the method is used for carrying out motion segmentation on a static scene, specifically, difference operation is carried out on a currently acquired image frame and a background image to obtain a gray image of a target motion region, thresholding is carried out on the gray image to extract the motion region, and the background image is updated according to the currently acquired image frame to avoid the influence of environmental illumination change; or different algorithms are respectively applied to the monitoring video frames, and operations such as voting algorithm, calculation of connected domain positioning target area and the like are used for further improving the segmentation accuracy, and finally an image only containing a single target is obtained; the effect of the model is further improved through the combination of the models.
(3) The method extracts a feature sequence of a video frame, generates a plurality of nominations with different sizes at each position in the video by using a sliding window mechanism, trains an action classifier and a rank for each nomination to classify and sequence the nominations, and finely adjusts an action boundary in the time sequence action detection by adopting a CDC algorithm so as to enable the action boundary to be more accurate.
Drawings
FIG. 1 is a flow chart of an embodiment of the method of the present invention.
Fig. 2a is a schematic diagram of a single frame 2D convolution.
Fig. 2b is a schematic diagram of a 2D convolution of multiple frames.
Fig. 2c is a schematic diagram of the 3D convolution.
Fig. 3 is a schematic diagram of a 3D type network.
FIG. 4 is a schematic diagram of the structure of the SSAD model.
Detailed Description
The invention is further described below with reference to the figures and the specific embodiments of the description.
As shown in fig. 1, the behavior recognition method based on RGB images of this embodiment is applied to behavior recognition of service personnel in a power supply business hall, and specifically includes the following steps:
1) preprocessing an RGB image in a video frame, segmenting the region of a worker, and capturing or tracking a target;
2) extracting image characteristic parameters in the preprocessed RGB image, and sending the image characteristic parameters into a recurrent neural network to obtain the mapping between the image characteristic parameters and high-dimensional vectors;
3) on the basis of obtaining the high-dimensional vector of the video frame, establishing a classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
4) and acquiring RGB images in the monitoring video information, and identifying the behaviors of service personnel in the power supply business hall based on the trained classifier model.
The behavior recognition method based on the RGB image adopts the behavior recognition technology of the RGB image to extract the characteristics, and sends the characteristics into a recurrent neural network to obtain the mapping between the characteristic parameters of the image and high-dimensional vectors; on the basis of obtaining the high-dimensional vector of the video frame, establishing a final classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model; therefore, RGB images in monitoring video information are obtained, the behavior of service personnel in the power supply business hall is recognized based on the trained classifier model, the operation is simple and convenient, and the recognition precision is high; by the method, the administrative department does not need to frequently check the site, but can check the working condition of the service personnel through monitoring information, thereby greatly improving the efficiency; and personalized training can be carried out according to the service level and the defects of different business hall personnel based on the business hall monitoring information.
In this embodiment, since there are often many people in the monitored video, the preprocessing of the video frame specifically includes: the method comprises the steps of segmenting regions of workers by adopting a background extraction algorithm, calculating a connected domain positioning target region by using a voting algorithm, capturing or tracking a target, finally obtaining an image only containing a single target, and laying a foundation for subsequent classification and behavior analysis and understanding.
Specifically, the background extraction algorithm (or the object detection algorithm) includes an optical flow method, a frame difference method, a background difference method, ViBe, and the like. In the frame difference method (inter-frame difference method), pixel values of two images adjacent to each other or separated by several frames in a video stream are subtracted, and the subtracted images are thresholded to extract a motion region in the images. If the frame numbers of the two subtracted frame images are respectively the kth frame and the (k +1) th frame, the frame images are respectively fk(x,y),fk+1(x,y)fk(x,y),fk+1(x, y), the difference image binarization threshold is T, the difference image is represented by D (x, y), and the formula of the inter-frame difference method is as follows:
Figure BDA0002402552710000051
the algorithm is simple and is not easily influenced by ambient light.
In the background difference method, the method is used for performing motion segmentation on a static scene, specifically, difference operation is performed on a currently acquired image frame and a background image to obtain a gray level image of a target motion region, thresholding is performed on the gray level image to extract the motion region, and the background image is updated according to the currently acquired image frame, so that the influence of environmental illumination change is avoided. Background difference methods also differ according to foreground detection, background maintenance and post-processing methods. If It and Bt are respectively the current frame and background frame image, and T is the foreground gray threshold, one of the method flows is as follows:
taking the average value of the images of the previous frames as an initial background image Bt;
carrying out gray subtraction operation on the current frame image and the background image, and taking an absolute value; the formula is | It (x, y) -Bt (x, y) |;
for a pixel (x, y) of the current frame, if | It (x, y) -Bt (x, y) | > T exists, the pixel is a foreground point;
performing morphological operations (corrosion, expansion, opening and closing operations and the like) on the foreground pixel map;
the background image is updated with the current frame image. The method is simple and overcomes the influence of ambient light to a certain extent.
The ViBe is an algorithm for pixel-level video background modeling or foreground detection, and occupies little hardware memory. The algorithm mainly differs from the background model updating strategy in that a sample of pixels needing to be replaced is randomly selected, and neighborhood pixels are randomly selected for updating. When the model of the pixel change cannot be determined, the random updating strategy can simulate the uncertainty of the pixel change to a certain extent. In addition, the ViBe stores a sample set for all the pixel points, and the sampling values stored in the sample set are the past pixel values of the pixel point and the pixel values of the neighbor points of the pixel point. And comparing the new pixel value of each frame in the following with the sample historical value in the sample set to judge whether the new pixel value belongs to the background point. In the model, the background is a stationary or very slowly moving object. The foreground is the object relative to the background, i.e. the object that is moving. Therefore, the background extraction algorithm can also be regarded as a classification problem, and in the process of traversing pixel points, whether a pixel point belongs to a foreground point or a background point is determined. In the ViBe model, the background model stores a sample set for each pixel point, and the size of the sample set is generally 20 points. For a new frame of image, when a certain pixel point of the frame is closer to the sampling value in the sample set of the pixel point, it can be judged that the pixel point is a background point.
Is expressed by the formula:
v (x, y): a current pixel value at pixel point (x, y);
m (x, y) { v1(x, y), v2(x, y),.. vN (x, y) }: a background sample set (sample set size is N) of pixel points (x, y);
r: up and down value ranges;
and (3) subtracting all sample values in v (x, y) and M (x, y), wherein the number of all difference values within the range of +/-R is Nb, and if Nb is greater than a given threshold value min, the current pixel value is similar to a plurality of values in the historical sample of the point, and the (x, y) point is considered to belong to a background point.
The initialization is a process of establishing a background model, a general detection algorithm needs to complete learning of a video sequence with a certain length, detection real-time performance is affected, and when a video picture changes suddenly, the background model needs to be learned again for a long time. The method comprises the steps of taking a first frame of a video as a background model, simultaneously randomly taking a plurality of pixel points around each pixel point in the frame, and filling a sample set of the pixel points, so that the sample set contains the space-time distribution information of the pixel points.
Formulaically, M0(x, y): a pixel point (x, y) in the initial background model;
NG: neighbor points; v0(x, y): pixel values of pixel points (x, y) in the initial original image; thus, there are:
M0(x)={v0(y|y∈NG(x))},t=0
of course, the different algorithms can be applied to the monitoring video frame respectively, and the accuracy of segmentation is further improved by using operations such as voting algorithm, calculation of connected domain positioning target area and the like, and finally an image only containing a single target is obtained; through the combination of the models, the effect of the models is further improved, for example, the finally generated high-dimensional feature vectors are subjected to operations such as averaging, weight averaging, maximum value taking, splicing and the like to obtain synthetic feature vectors, and the synthetic feature vectors are sent to a classifier, and meanwhile, the model training efficiency is further improved by further applying a parameter adjusting skill in practice.
In this embodiment, for the identification of the irregular behavior, the specific start frame and the specific end frame need to be calibrated: extracting a feature sequence of a video frame, generating a plurality of nominations with different sizes at each position in the video by using a sliding window mechanism, then training an action classifier and a ranking for each nomination to classify and sequence the nominations, and finely adjusting an action boundary in the time-series action detection by using a CDC algorithm so as to enable the action boundary to be more accurate.
In the embodiment, a C3D model is adopted to extract features, and then the full-connection layer is sent to a subsequent classifier; among them, Convolutional Neural Networks (CNN) have been widely used in computer vision in recent years, including tasks such as classification, detection, and segmentation. These tasks are typically performed on images using two-dimensional convolution (i.e., the dimension of the convolution kernel is two-dimensional). For the problem based on video analysis, the two-dimensional convolution cannot capture information on time sequence well, so the three-dimensional convolution is proposed. The C3D model is proposed as a general network, and can be used in the fields of behavior recognition, scene recognition, video similarity analysis and the like.
As shown in fig. 2a and fig. 2b, in the case of 2D convolution for single-channel image and multi-channel image (where the multi-channel image may refer to 3 color channels of the same picture, and also refers to a plurality of stacked pictures, i.e. a short segment of video), the output is a two-dimensional feature map for a filter, and the information of the multi-channel is completely compressed. While the output of the 3D convolution in 2c is still a 3D signature. The value of the (x, y, z) position of the ith layer jth feature map can be found as follows:
Figure BDA0002402552710000071
where Ri is the size of the 3D convolution kernel in the timing dimension,
Figure BDA0002402552710000072
is the value at the (p, q, r) position of the mth feature map at the upper layer of the convolution kernel connection. Consider a video segment input of size c l h w, where c is the image channel (typically 3), l is the length of the video sequence, and h and w are the width and height of the video, respectively. And performing 3D convolution with the kernel size of 3 x 3, the step length of 1, edge supplementing and the number of filters K, outputting the convolution with the size of K x l h w, and performing pooling.
Where a C3D type network is shown in fig. 3, where there are 8 convolution operations and 5 pooling operations. Wherein the convolution kernels are all 3 x 3 in size and have a step size of 1 x 1. The number below the name is the number of convolution kernels. The size of the pooling nuclei was 2 x 2 and the step size was 2 x 2, except for the first pooling, which was 1 x 2 in both size and step size. This is to reduce the length of the time sequence without early, and the final network gets 4096-dimensional high-dimensional feature vectors after two full-connection layers.
In this embodiment, in step 3), the classifier model uses softmax and a multi-class support vector machine multi-class SVM to establish a mapping from a high-dimensional vector to a final class; the specific construction process is as follows:
3.1) calling each video frame unit used for feature extraction in the step 2) as a segment, recording the high-dimensional vector output each time as a segment action score, and finally obtaining an SAS feature sequence with equal length for a video containing T frame images;
3.2) after obtaining a signature sequence of length T, using it as input to the SSAD model; the SSAD model is a network which is completely formed by time sequence convolution and mainly comprises three convolution layers: the system comprises a base layer, an anchor frame layer and a prediction layer, wherein the base layer is used for shortening the length of a characteristic sequence and increasing the receptive field of each position in the characteristic sequence;
3.3) continuing to use in the SSAD model to reduce the length of the feature sequence, each position in the feature sequence output by the anchor frame layer is associated with an anchor frame instance of multiple scales;
3.4), obtaining the coordinate offset, the overlapping confidence coefficient and the classification result corresponding to each anchor frame example through a prediction layer;
and 3.5) obtaining the motion instance prediction of each time scale from small to large by the SSAD model through a characteristic sequence with a plurality of layers of time scales being reduced continuously, and establishing a final classifier model.
In this embodiment, in step 3), training of the classifier model is further included:
correcting the obtained anchor frame by using coordinate offset, and matching the corrected anchor frame with the tag instance to determine the anchor frameWhether an instance is a positive or negative sample; wherein the SSAD model is model-trained using a loss function comprising a classification loss LclassOverlap confidence regression loss LoverBoundary regression loss LlocAnd a regularization term L2
L=Lclass+α·Lover+β·Lloc+λ·L2(Θ)
Wherein α, β and lambda are coefficients;
during testing, the obtained anchor frame examples are corrected by coordinate offset, and then the final classification result of each anchor frame example is obtained.
In this embodiment, in step 4), after all the predicted action instances of a section of video are obtained, a non-maximization suppression algorithm is used to perform deduplication on overlapped predictions, so as to obtain a final time sequence action detection result.
In this embodiment, the staff of the power supply business hall is mainly divided into two categories, namely, a leader and a service staff, and each job has a common behavior specification and also has a respective unique behavior specification. The following table lists the main non-canonical behavior for both working categories. Respectively training two kinds of working personnel with six kinds of classifiers comprising 5 irregular behaviors and normal behaviors as shown in the following table 1:
table 1:
Figure BDA0002402552710000081
and defining the category of the non-standard behaviors according to a service specification manual of the power supply business hall, selecting representative category of the non-standard behaviors for model training, wherein the category of the non-standard behaviors is not specified to confirm the grade. Reporting the statistical information of each service person to a manager at intervals, calculating by a design program according to the statistical frequency of each service person and each non-standard behavior grade through a certain formula to obtain a service standard coefficient, and performing early warning if the service standard coefficient exceeds a set threshold value. In addition, the nonstandard behaviors of cloud service personnel are analyzed, the occurrence frequency and the occupied proportion of the nonstandard behaviors are counted, a training classroom is established, training courses with corresponding weight values are distributed according to the nonstandard behavior statistical information of different service personnel, and meanwhile, a demonstration project is established, so that personalized training is realized.
The invention also discloses a behavior recognition device based on the RGB image, which comprises
The preprocessing unit is used for preprocessing the RGB image in the video frame, segmenting the region of a worker and capturing or tracking a target;
the feature extraction module is used for extracting image feature parameters in the preprocessed RGB image and sending the image feature parameters into a recurrent neural network to obtain the mapping between the image feature parameters and high-dimensional vectors;
the classifier model establishing and training module is used for establishing a final classifier model on the basis of obtaining the high-dimensional vector of the video frame, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
and the behavior recognition module is used for acquiring RGB images in the monitoring video information and recognizing the behaviors of the service personnel in the power supply business hall based on the trained classifier model.
Specifically, monitoring is carried out through a depth camera, the depth camera is arranged at the four directions of a hall and 45-degree deviation in front of counter service personnel, the hall service personnel and the counter service personnel are monitored in real time, the actions of the service personnel are detected and learned through a face recognition technology and an action start and end frame detection technology, learning results are compared with a cloud nonstandard action feature library, information such as nonstandard action features and early warning levels of the service personnel is recorded, and the information is stored in the cloud.
The behavior recognition device based on the RGB image is used for executing the behavior recognition method, has the advantages of the method, and is simple in structure and convenient to operate.
The invention further discloses a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the RGB image-based behavior recognition method as described above.
The invention also discloses a computer device comprising a memory and a processor, wherein the memory is stored with a computer program, and the computer program executes the steps of the behavior recognition method based on the RGB image when being executed by the processor.
All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium and executed by a processor, to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. The memory may be used to store computer programs and/or modules, and the processor may perform various functions by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (10)

1. A behavior recognition method based on RGB images is characterized by comprising the following steps:
1) preprocessing an RGB image in a video frame, segmenting the region of a worker, and capturing or tracking a target;
2) extracting image characteristic parameters in the preprocessed RGB image, and sending the image characteristic parameters into a recurrent neural network to obtain the mapping between the image characteristic parameters and high-dimensional vectors;
3) on the basis of obtaining the high-dimensional vector of the video frame, establishing a classifier model, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
4) and acquiring RGB images in the monitoring video information, and identifying the behaviors of service personnel in the power supply business hall based on the trained classifier model.
2. The RGB image-based behavior recognition method according to claim 1, wherein the step 3) is specifically:
3.1) calling each video frame unit used for feature extraction in the step 2) as a segment, recording the high-dimensional vector output each time as a segment action score, and finally obtaining an SAS feature sequence with equal length for a video containing T frame images;
3.2) after obtaining a signature sequence of length T, using it as input to the SSAD model; the SSAD model is a network which is composed of time sequence convolution, and comprises three convolution layers: the system comprises a base layer, an anchor frame layer and a prediction layer, wherein the base layer is used for shortening the length of a characteristic sequence and increasing the receptive field of each position in the characteristic sequence;
3.3) continuing to use in the SSAD model to reduce the length of the feature sequence, each position in the feature sequence output by the anchor frame layer is associated with an anchor frame instance of multiple scales;
3.4), obtaining the coordinate offset, the overlapping confidence coefficient and the classification result corresponding to each anchor frame example through a prediction layer;
and 3.5) obtaining the motion instance prediction of each time scale from small to large by the SSAD model through a characteristic sequence with a plurality of layers of time scales being reduced continuously, and establishing a final classifier model.
3. The RGB image-based behavior recognition method as claimed in claim 2, further comprising, in step 3), training of classifier models:
correcting the obtained anchor frame by using coordinate offset, and matching the corrected anchor frame with the label example to determine whether the anchor frame example is a positive sample or a negative sample; wherein the SSAD model is model-trained using a loss function comprising a classification loss LclassOverlap confidence regression loss LoverBoundary regression loss LlocAnd a regularization term L2
L=Lclass+α·Lover+β·Lloc+λ·L2(Θ)
Wherein α, β and lambda are coefficients;
during testing, the obtained anchor frame examples are corrected by coordinate offset, and then the final classification result of each anchor frame example is obtained.
4. The method as claimed in claim 3, wherein in step 4), after all the prediction motion instances of a segment of video are obtained, the overlapped predictions are de-duplicated by using a non-maximization suppression algorithm, so as to obtain a final time sequence motion detection result.
5. The behavior recognition method based on RGB images as claimed in any one of claims 1-4, wherein in step 2), image feature parameters in RGB images are extracted by C3D model; the C3D model includes 8 convolution operations, 5 pooling operations; wherein the convolution kernels are all 3 x 3 in size, and the step size is 1 x 1; the size of the pooling nuclei was 2 x 2, the step size was 2 x 2, except for the first pooling, both size and step size were 1 x 2, so as not to reduce the length on the time series too early; finally, after two full connection layers, a 4096-dimensional high-dimensional vector is obtained.
6. The behavior recognition method based on RGB image as claimed in any of claims 1-4, wherein in step 1), the preprocessing of the video frame specifically comprises: the method comprises the steps of adopting a background extraction algorithm to segment the region of a worker, using a voting algorithm to calculate a connected domain positioning target region, capturing or tracking a target, and finally obtaining an image only containing a single target; the motion area in the image is extracted by subtracting the pixel values of two adjacent frames or two images separated by several frames in the video stream and thresholding the subtracted images; or carrying out difference operation on the currently acquired image frame and the background image to obtain a gray level image of the target motion region, carrying out thresholding on the gray level image to extract the motion region, wherein the background image is updated according to the currently acquired image frame.
7. The method for recognizing behaviors based on RGB images as claimed in any one of claims 1-4, wherein in step 1), the preprocessing of the video frame further comprises calibrating the specific start frame and end frame for recognizing the irregular behaviors, and the specific process is as follows: extracting a feature sequence of a video frame, generating a plurality of nominations with different sizes at each position in the video by using a sliding window mechanism, then training an action classifier and a ranking for each nomination to classify and sequence the nominations, and finely adjusting an action boundary in the time-series action detection by using a CDC algorithm so as to enable the action boundary to be more accurate.
8. A behavior recognition device based on RGB image is characterized by comprising
The preprocessing unit is used for preprocessing the RGB image in the video frame, segmenting the region of a worker and capturing or tracking a target;
the feature extraction module is used for extracting image feature parameters in the preprocessed RGB image and sending the image feature parameters into a recurrent neural network to obtain the mapping between the image feature parameters and high-dimensional vectors;
the classifier model establishing and training module is used for establishing a final classifier model on the basis of obtaining the high-dimensional vector of the video frame, establishing mapping from the high-dimensional vector to the final irregular behavior category, and training the classifier model;
and the behavior recognition module is used for acquiring RGB images in the monitoring video information and recognizing the behaviors of the service personnel in the power supply business hall based on the trained classifier model.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the RGB image-based behavior recognition method according to any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the computer program, when executed by the processor, performs the steps of the RGB image-based behavior recognition method according to any one of claims 1 to 7.
CN202010151359.4A 2020-03-06 2020-03-06 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images Pending CN111353452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010151359.4A CN111353452A (en) 2020-03-06 2020-03-06 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151359.4A CN111353452A (en) 2020-03-06 2020-03-06 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images

Publications (1)

Publication Number Publication Date
CN111353452A true CN111353452A (en) 2020-06-30

Family

ID=71194324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151359.4A Pending CN111353452A (en) 2020-03-06 2020-03-06 Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images

Country Status (1)

Country Link
CN (1) CN111353452A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016538A (en) * 2020-10-29 2020-12-01 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN112036744A (en) * 2020-08-28 2020-12-04 深圳前海微众银行股份有限公司 Method and device for determining working condition
CN112434629A (en) * 2020-07-21 2021-03-02 新加坡依图有限责任公司(私有) Online time sequence action detection method and equipment
CN112749666A (en) * 2021-01-15 2021-05-04 百果园技术(新加坡)有限公司 Training and motion recognition method of motion recognition model and related device
CN113076813A (en) * 2021-03-12 2021-07-06 首都医科大学宣武医院 Mask face feature recognition model training method and device
CN113378004A (en) * 2021-06-03 2021-09-10 中国农业大学 FANet-based farmer working behavior identification method, device, equipment and medium
CN113723230A (en) * 2021-08-17 2021-11-30 山东科技大学 Process model extraction method for extracting field procedural video by business process
CN114283492A (en) * 2021-10-28 2022-04-05 平安银行股份有限公司 Employee behavior-based work saturation analysis method, device, equipment and medium
CN115470986A (en) * 2022-09-14 2022-12-13 北京工业大学 Behavior monitoring and preventing system and method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103699A (en) * 2011-01-27 2011-06-22 华中科技大学 Method for detecting boll opening of cotton based on image detection
CN103150579A (en) * 2013-02-25 2013-06-12 东华大学 Abnormal human behavior detecting method based on video sequence
CN106846358A (en) * 2017-01-13 2017-06-13 西北工业大学深圳研究院 Segmentation of Multi-target and tracking based on the ballot of dense track
CN108764148A (en) * 2018-05-30 2018-11-06 东北大学 Multizone real-time action detection method based on monitor video
CN110108914A (en) * 2019-05-21 2019-08-09 国网湖南省电力有限公司 One kind is opposed electricity-stealing intelligent decision making method, system, equipment and medium
CN110619651A (en) * 2019-09-09 2019-12-27 博云视觉(北京)科技有限公司 Driving road segmentation method based on monitoring video
CN110738106A (en) * 2019-09-05 2020-01-31 天津大学 optical remote sensing image ship detection method based on FPGA

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103699A (en) * 2011-01-27 2011-06-22 华中科技大学 Method for detecting boll opening of cotton based on image detection
CN103150579A (en) * 2013-02-25 2013-06-12 东华大学 Abnormal human behavior detecting method based on video sequence
CN106846358A (en) * 2017-01-13 2017-06-13 西北工业大学深圳研究院 Segmentation of Multi-target and tracking based on the ballot of dense track
CN108764148A (en) * 2018-05-30 2018-11-06 东北大学 Multizone real-time action detection method based on monitor video
CN110108914A (en) * 2019-05-21 2019-08-09 国网湖南省电力有限公司 One kind is opposed electricity-stealing intelligent decision making method, system, equipment and medium
CN110738106A (en) * 2019-09-05 2020-01-31 天津大学 optical remote sensing image ship detection method based on FPGA
CN110619651A (en) * 2019-09-09 2019-12-27 博云视觉(北京)科技有限公司 Driving road segmentation method based on monitoring video

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
TIANWEI LIN ET AL.: "Single Shot Temporal Action Detection", 《HTTP://EXPORT.ARXIV.ORG/ABS/1710.06236V1》 *
TIANWEI LIN ET AL.: "Single Shot Temporal Action Detection", 《HTTP://EXPORT.ARXIV.ORG/ABS/1710.06236V1》, 17 October 2017 (2017-10-17), pages 1 - 9 *
Z. SHOU ET AL.: "CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
Z. SHOU ET AL.: "CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》, 9 November 2017 (2017-11-09), pages 1 - 10 *
付维娜 等: "《拓扑同构与视频目标跟踪》", 31 May 2018, 西安电子科技大学出版社, pages: 7 - 11 *
孙水发 等: "《视频前景检测及其在水电工程监测中的应用》", 31 December 2014, 国防工业出版社, pages: 69 - 71 *
林阳 等: "利用多种投票策略的水表读数字符分割与识别", 《科学技术与工程》, vol. 17, no. 10, pages 50 - 57 *
鲍蕊 等: "综合聚类和上下文特征的高光谱影像分类", 《武汉大学学报》, vol. 42, no. 7, pages 890 - 896 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434629A (en) * 2020-07-21 2021-03-02 新加坡依图有限责任公司(私有) Online time sequence action detection method and equipment
CN112036744A (en) * 2020-08-28 2020-12-04 深圳前海微众银行股份有限公司 Method and device for determining working condition
CN112016538A (en) * 2020-10-29 2020-12-01 腾讯科技(深圳)有限公司 Video processing method, video processing device, computer equipment and storage medium
CN112749666A (en) * 2021-01-15 2021-05-04 百果园技术(新加坡)有限公司 Training and motion recognition method of motion recognition model and related device
CN112749666B (en) * 2021-01-15 2024-06-04 百果园技术(新加坡)有限公司 Training and action recognition method of action recognition model and related device
CN113076813A (en) * 2021-03-12 2021-07-06 首都医科大学宣武医院 Mask face feature recognition model training method and device
CN113076813B (en) * 2021-03-12 2024-04-12 首都医科大学宣武医院 Training method and device for mask face feature recognition model
CN113378004A (en) * 2021-06-03 2021-09-10 中国农业大学 FANet-based farmer working behavior identification method, device, equipment and medium
CN113723230A (en) * 2021-08-17 2021-11-30 山东科技大学 Process model extraction method for extracting field procedural video by business process
CN114283492A (en) * 2021-10-28 2022-04-05 平安银行股份有限公司 Employee behavior-based work saturation analysis method, device, equipment and medium
CN114283492B (en) * 2021-10-28 2024-04-26 平安银行股份有限公司 Staff behavior-based work saturation analysis method, device, equipment and medium
CN115470986A (en) * 2022-09-14 2022-12-13 北京工业大学 Behavior monitoring and preventing system and method

Similar Documents

Publication Publication Date Title
CN111353452A (en) Behavior recognition method, behavior recognition device, behavior recognition medium and behavior recognition equipment based on RGB (red, green and blue) images
CN108596277B (en) Vehicle identity recognition method and device and storage medium
CN111797653B (en) Image labeling method and device based on high-dimensional image
Rachmadi et al. Vehicle color recognition using convolutional neural network
Yousif et al. Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification
US20230289979A1 (en) A method for video moving object detection based on relative statistical characteristics of image pixels
US11443454B2 (en) Method for estimating the pose of a camera in the frame of reference of a three-dimensional scene, device, augmented reality system and computer program therefor
CN105701467B (en) A kind of more people's abnormal behaviour recognition methods based on human figure feature
CN109635758B (en) Intelligent building site video-based safety belt wearing detection method for aerial work personnel
CA3077517A1 (en) Method and system for classifying an object-of-interest using an artificial neural network
US20150248590A1 (en) Method and apparatus for processing image of scene of interest
CN111325051B (en) Face recognition method and device based on face image ROI selection
CN111401169A (en) Power supply business hall service personnel behavior identification method based on monitoring video information
CN110555420B (en) Fusion model network and method based on pedestrian regional feature extraction and re-identification
CN109657715B (en) Semantic segmentation method, device, equipment and medium
CN111325769A (en) Target object detection method and device
CN109685045A (en) A kind of Moving Targets Based on Video Streams tracking and system
CN113449606B (en) Target object identification method and device, computer equipment and storage medium
CN113205002B (en) Low-definition face recognition method, device, equipment and medium for unlimited video monitoring
CN112464850A (en) Image processing method, image processing apparatus, computer device, and medium
CN115512134A (en) Express item stacking abnormity early warning method, device, equipment and storage medium
Mousavi A new way to age estimation for rgb-d images, based on a new face detection and extraction method for depth images
CN115862113A (en) Stranger abnormity identification method, device, equipment and storage medium
CN111461143A (en) Picture copying identification method and device and electronic equipment
CN109740527B (en) Image processing method in video frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200630