CN112200006A

CN112200006A - Human body attribute detection and identification method under community monitoring scene

Info

Publication number: CN112200006A
Application number: CN202010966064.2A
Authority: CN
Inventors: 徐亮; 张卫山; 孙浩云; 尹广楹; 张大千; 管洪清
Original assignee: Qingdao Sui Zhi Information Technologies Co ltd
Current assignee: Qingdao Sui Zhi Information Technologies Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-01-08

Abstract

The invention relates to the technical field of video processing, artificial intelligence and deep learning, and particularly discloses a human body attribute detection and identification method in a community monitoring scene. The invention combines the color identification mechanism aiming at the characteristics of target residents, and effectively improves the accuracy of target detection and identification.

Description

Human body attribute detection and identification method under community monitoring scene

Technical Field

The invention relates to the technical field of video processing, artificial intelligence and deep learning, in particular to a human body attribute detection and identification method in a community monitoring scene.

Background

In recent years, with the improvement of data volume and computing power, especially the large-scale use of GPU operation, deep learning gradually establishes its dominant position in the field of computer vision. In many fields, including image classification, image segmentation, image recognition and speech recognition, deep neural networks have achieved the best results at present. The convolutional neural network is particularly prominent in a plurality of deep neural network structures, the structures are trained by utilizing large-scale data, and operations such as weight sharing, pooling and discarding are simultaneously used for reducing the operand and improving the generalization capability of the model.

Semantic segmentation is one of the most core and basic characters in the computer vision field, has important application in the fields of unmanned driving, medical imaging, geographic remote sensing, robot navigation and the like, and aims to accurately classify each pixel point in an input image. In recent years, deep learning has demonstrated excellent performance on this dense labeling problem. However, in recent years, semantic segmentation methods based on convolutional neural networks mainly focus on how to better fuse the features of a single input image, and little attention is paid to how to make the segmentation result finer by enhancing the features of the image. The full convolution neural network FCN is a representation of the deep learning application in image segmentation, which can accept input images of any size without requiring all training images and test images to be of the same size; more efficient because the problems of repeated storage and computation of the convolution due to the use of pixel blocks are avoided.

Human attribute is as one of the important target under the control scene of community, and accurate detection plays decisive role to subsequent target identification, supplementary searching. The human body attribute is difficult to accurately detect due to the fact that clothes change is large, the quantity of the clothes is large, and the uncertainty is strong, and the human body attribute is difficult to recognize under the influence of factors such as weather, sheltering, angles and postures in a real environment. The human body attribute is used as auxiliary information for describing the identification of community resident characters, and the positioning of the resident moving path has great practical significance, for example: strange people enter the community, and the property is identified, positioned and tracked through the described human body attributes and the characters in the video monitoring, so that the safety and the stability of the community are guaranteed.

In view of this, in a community monitoring scenario, a method capable of improving accuracy of human attribute detection and identification needs to be provided to solve the above problem.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a method for detecting and identifying human body attributes in a community environment. The new image enhancement network is provided, the visual effect of an original picture is improved by utilizing image enhancement, the picture definition is improved, the segmentation network is helped to obtain a better segmentation effect, then, the FCN full convolution network is utilized to realize semantic segmentation and image semantic feature division, and the color identification mechanism is combined to detect and identify the attributes of the clothes of the human body. The method greatly improves the accuracy and efficiency of detecting and identifying the human body attributes of community residents.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a human body attribute detection and identification method under a community monitoring scene comprises the following steps:

step 1: acquiring a video stream in a community monitoring area, and decoding and separating the video stream into image data;

step 2: training an image enhancement network Img-EN model to obtain optimal parameters;

and step 3: adopting a trained Img-EN model to perform enhancement processing on the image data based on a histogram equalization method;

and 4, step 4: inputting the enhanced image into a semantic segmentation network (FCN) for training to obtain an optimal parameter;

and 5: enhancing the image to be detected by using the Img-EN model with the optimal parameters, and performing feature segmentation on the processed image by using the FCN model with the optimal parameters;

step 6: carrying out human body attribute detection and identification on the segmented semantic features by combining a color identification mechanism;

and 7: adopting a GPU scheduling strategy to perform GPU scheduling;

and 8: and sending the final result to the terminal.

Preferably, the step of acquiring the video stream in the monitored area in step 1 includes: high-definition cameras or video acquisition devices are installed at various places of a community, an area needing to be monitored is selected, all video streams in the area are obtained, and the video streams are decoded and separated to obtain image data.

Preferably, in the step 2, the invention uses histogram equalization to implement the image enhancement network Img-EN. The contrast of the local image is adjusted by equalizing the histogram of the image, which allows an overexposed or underexposed image to show more detail. The Img-EN network is formed by a layer of convolution, the convolution kernel parameters of which are dynamically generated from the input image. The structure-modeling histogram equalization algorithm produces a series of enhanced images to enrich the available semantic features.

Preferably, in step 4, the full convolution network FCN related to the present invention is called in all english: fully volumetric Networks. The FCN is used as a semantic segmentation technical model, an input image is received, deconvolution is adopted to carry out upsampling on the feature map of the last convolution layer, the FCN with the highest fineness is selected as 8s times for the upsampling, so that the upsampling is restored to the same size of the input image, each pixel can be predicted, the spatial information in the original input image is kept, and finally, gradual pixel classification is carried out on the upsampled feature map. Training process: firstly, training is carried out by using default parameters, and according to a training intermediate result, an initial weight, a training rate and iteration times are continuously adjusted until the image enhancement network achieves a preset enhancement effect with preset efficiency.

Preferably, in step 6, the present invention uses the TCS230 sensor to implement a color identification mechanism by using a color filter, and the mechanism performs signal processing on the input picture to identify the color component values of the image to be detected, so as to achieve the purpose of identifying the color of the clothes. And (5) combining the semantic features segmented in the step (5) with a color recognition mechanism, so that the accuracy of human body attribute recognition is enhanced.

Preferably, the step of sending the final result to the platform terminal: and calculating the coordinates of the pedestrian area, acquiring the current timestamp and sending the timestamp to the platform terminal.

By adopting the technical scheme, the human body attribute detection and identification method under the community monitoring scene has the following beneficial effects:

(1) the image enhancement and semantic segmentation technologies are combined, and the segmentation result is more refined by enhancing the characteristics of the image through the image enhancement network Img-EN.

(2) The invention designs a color identification mechanism by using the filter, further identifies the color of the attributes of the human body clothes, and improves the accuracy of human body attribute detection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a general flowchart of a human body attribute detection and identification method in a community environment according to the present invention;

FIG. 2 is a structural diagram of an image enhancement network Img-EN implemented by histogram equalization in the present invention;

FIG. 3 is a diagram of a color recognition mechanism implemented using a color sensor according to the present invention;

FIG. 4 is a diagram of a GPU resource scheduling strategy in a GPU processor cluster according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a human body attribute detection and identification method under a community monitoring environment, which can be used for identifying the attributes of pedestrians on roads in real time in a community, including types and colors of clothes, backpacks and the like. According to the invention, the characteristic image more beneficial to network segmentation is obtained through an image enhancement technology, and a series of semantic characteristics are obtained through network segmentation, so that the purpose of detection and identification is achieved.

As shown in FIG. 1, the method for detecting and identifying human body attributes in a community environment of the invention comprises the following basic steps: step 1: acquiring a video stream in a community monitoring area, and decoding and separating the video stream into image data; step 2: training an image enhancement network Img-EN model to obtain optimal parameters; and step 3: adopting a trained Img-EN model to perform enhancement processing on the image data based on a histogram equalization method; and 4, step 4: inputting the enhanced image into a semantic segmentation network (FCN) for training to obtain an optimal parameter; and 5: enhancing the image to be detected by using the Img-EN model with the optimal parameters, and performing feature segmentation on the processed image by using the FCN model with the optimal parameters; step 6: carrying out human body attribute detection and identification on the segmented semantic features by combining a color identification mechanism; and 7: adopting a GPU scheduling strategy to perform GPU scheduling; and 8: and sending the final result to the terminal.

The following describes in detail a method for detecting and identifying human body attributes in a community environment:

as shown in fig. 1, a high-definition camera or a video acquisition device is installed in a community, an area needing to be monitored is selected, and all video streams in the area are acquired; decoding a video stream, decoding the video stream of the monitoring equipment, separating image data, then performing enhancement processing on a target image by using a trained Img-EN network to obtain a series of semantic features which are rich and available, then performing semantic segmentation on the enhanced image by using a trained FCN network to obtain a series of semantic features related to human body attributes, and detecting and identifying the target by combining the features with a color identification mechanism; monitoring the GPU use condition in a GPU processor cluster in real time, and adopting a proper scheduling strategy to schedule the GPUs in real time; and calculating the coordinates of the pedestrian area, acquiring the current timestamp and sending the timestamp to the platform terminal. The method combines image enhancement and semantic segmentation technologies, an image enhancement network Img-EN and a semantic segmentation network are trained, an original image is subjected to enhancement processing based on a histogram equalization method by using the trained Img-EN model, relevant features are highlighted, and irrelevant backgrounds are weakened, so that the segmentation capability of the semantic segmentation network is improved, an image which is easier to segment is obtained by processing the image enhancement network, the image is subjected to standard semantic segmentation by using the semantic segmentation network, a series of semantic features of the image are obtained, the aim of detecting and identifying human body attributes is fulfilled, and the accuracy of target detection and identification is effectively improved by combining the features of target residents' clothing with a color identification mechanism.

It can be appreciated that in step 2, the present invention implements an image enhancement network Img-EN using histogram equalization. The contrast of the local image is adjusted by equalizing the histogram of the image, which allows an overexposed or underexposed image to show more detail. The network structure is shown in fig. 2, the Img-EN network is formed by a layer of convolution, and the convolution kernel parameters of the convolution layer are dynamically generated by the input image. The structure-modeling histogram equalization algorithm produces a series of enhanced images to enrich the available semantic features. In step 4, the full convolution network FCN related to the present invention is all called in english: fully volumetric Networks. The FCN is used as a semantic segmentation technical model, an input image is received, deconvolution is adopted to carry out upsampling on the feature map of the last convolution layer, the FCN with the highest fineness is selected as 8s times for the upsampling, so that the upsampling is restored to the same size of the input image, each pixel can be predicted, the spatial information in the original input image is kept, and finally, gradual pixel classification is carried out on the upsampled feature map. Training process: firstly, training is carried out by using default parameters, and according to a training intermediate result, an initial weight, a training rate and iteration times are continuously adjusted until the image enhancement network achieves a preset enhancement effect with preset efficiency.

The GPU resource scheduling layer monitors the current GPU resource use condition in real time according to the scheduling strategy as shown in figure 4, before a GPU processor cluster distributes tasks, whether the current GPU consumption is too large is checked, if the consumption is too large, a GPU use condition list and a GPU computing capacity list are checked, and a GPU receiving task is reselected.

According to the human body attribute detection and identification method under the community monitoring scene, the image enhancement and the semantic segmentation technology are combined, the segmentation result is finer by enhancing the characteristics of the image, and the speed and the precision of human body attribute detection and identification are improved by combining a color identification mechanism; the community environment is covered nationally by adopting a plurality of cameras, and basic conditions are provided for human body attribute detection. The invention promotes the further development of the intelligent community.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A human body attribute detection and identification method under a community monitoring scene is characterized by comprising the following steps: the method comprises the following steps:

and 7: adopting a GPU scheduling strategy to perform GPU scheduling;

and 8: and sending the final result to the terminal.

2. The method for detecting and identifying the human body attribute under the community monitoring scene according to claim 1, wherein the method comprises the following steps: in the step 1, high-definition cameras or video acquisition devices are installed at various places of the community, an area needing to be monitored is selected, all video streams in the area are obtained, and the video streams are decoded to separate image data.

3. The method for detecting and identifying the human body attribute under the community monitoring scene according to claim 1, wherein the method comprises the following steps: in the step 2, the contrast of the local image is adjusted by equalizing the histogram of the image, and the histogram equalization can enable the overexposed or underexposed image to display more details; the Img-EN network is formed by a layer of convolution, and convolution kernel parameters of the convolution layer are dynamically generated by an input image; the histogram equalization algorithm produces a series of enhanced images to enrich the available semantic features.

4. The method for detecting and identifying the human body attribute under the community monitoring scene according to claim 1, wherein the method comprises the following steps: in the step 4, the FCN is used as a semantic segmentation technology model, an input image is received, deconvolution is adopted to perform upsampling on the feature map of the last convolutional layer, the FCN-8s times with the highest fineness is selected for the upsampling, so that the upsampling is restored to the same size of the input image, thereby generating a prediction for each pixel, simultaneously reserving spatial information in the original input image, and finally performing gradual pixel classification on the upsampled feature map; training process: firstly, training is carried out by using default parameters, and according to a training intermediate result, an initial weight, a training rate and iteration times are continuously adjusted until the image enhancement network achieves a preset enhancement effect with preset efficiency.

5. The method for detecting and identifying the human body attribute under the community monitoring scene according to claim 1, wherein the method comprises the following steps: in the step 6, a color identification mechanism is realized by using a TCS230 sensor and using a color filter, and the mechanism performs signal processing on an input picture to identify color component values of an image to be detected, so as to achieve the purpose of identifying the color of clothes; and (5) combining the semantic features segmented in the step (5) with a color recognition mechanism, so that the accuracy of human body attribute recognition is enhanced.

6. The method for detecting and identifying the human body attribute under the community monitoring scene according to claim 1, wherein the method comprises the following steps: in the step 8, the method further comprises the steps of calculating the coordinates of the pedestrian area, acquiring the current timestamp and sending the current timestamp to the platform terminal.