CN111898449A

CN111898449A - Pedestrian attribute identification method and system based on monitoring video

Info

Publication number: CN111898449A
Application number: CN202010614464.7A
Authority: CN
Inventors: 贾川民
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-11-06
Anticipated expiration: 2040-06-30
Also published as: CN111898449B

Abstract

The application provides a pedestrian attribute identification method and system based on a monitoring video. Wherein the method comprises the following steps: extracting an original image frame from a monitoring video stream shot by a monitoring camera, wherein the image content of the original image frame contains pedestrians; detecting whether the original image frame has color cast; if the original image frame has color cast, inputting the original image frame into a color cast reduction neural network model trained in advance, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model; and performing pedestrian attribute identification based on the corrected image frame. The technical scheme of the application can solve the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video picture under the neon light irradiation at night.

Description

Pedestrian attribute identification method and system based on monitoring video

Technical Field

The application relates to the technical field of image processing, in particular to a pedestrian attribute identification method and system based on a monitoring video.

Background

The video monitoring system is widely applied to daily life as a social security guarantee system, and video acquisition equipment such as a camera and the like can be seen everywhere in public places such as banks, shopping malls, supermarkets, hotels, street corners, intersections, toll stations and the like. The installation of the system greatly increases social security, plays a role in monitoring and recording the behavior of lawbreakers in real time, and provides a great amount of real and reliable clues for the public security organs to detect cases.

With the development of technology and the improvement of living standard of people, the requirement of automatic pedestrian attribute identification in video monitoring is provided to further identify, analyze and track pedestrians, wherein the pedestrian attributes comprise skin colors, clothing colors, color forms and the like of pedestrians, and because the pedestrian attribute identification is realized based on the shot monitoring video, the color cast of the monitoring video shot under the conditions of shining sunlight, bright light and the like is light, the current pedestrian attribute identification technology can be adopted for accurately identifying, but the monitoring video shot under the illumination of a neon lamp at night often has serious color cast, and along with the wide application of color lighting equipment such as colorful neon lamps, colorful lamp belts, colorful lamp boxes and the like, the color cast of the monitoring video often changes to be blue, red and the like, and the color cast degree also often changes, so that the monitoring video is difficult to be subjected to quantization, analysis and tracking, Automatic color cast correction further causes that the pedestrian attributes such as the skin color, the clothing color, the hair color and the like of the pedestrian cannot be accurately and automatically identified.

Disclosure of Invention

The application aims to provide a pedestrian attribute identification method and system based on a surveillance video.

The application provides a pedestrian attribute identification method based on a surveillance video in a first aspect, which comprises the following steps:

extracting an original image frame from a monitoring video stream shot by a monitoring camera, wherein the image content of the original image frame contains pedestrians;

detecting whether the original image frame has color cast;

if the original image frame has color cast, inputting the original image frame into a color cast reduction neural network model trained in advance, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;

and performing pedestrian attribute identification based on the corrected image frame.

In some embodiments of the first aspect of the present application, before inputting the original image frame into a pre-trained color cast reduction neural network model and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, the method further includes:

acquiring a plurality of groups of image sample groups, wherein each group of image sample group comprises a first image sample with color cast and a second image sample without color cast, and the image contents of the first image sample and the second image sample are the same;

and training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.

In some embodiments of the first aspect of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.

In some embodiments of the first aspect of the present application, the detecting whether the original image frame has color cast includes:

carrying out scene recognition on the original image frame, and extracting a scene image in the original image frame according to a recognition result;

inputting the scene image into a pre-trained color cast detection neural network model, and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.

In some embodiments of the first aspect of the present application, before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further includes:

obtaining a plurality of color cast detection image samples, wherein the color cast detection image samples comprise color cast negative samples and color cast-free positive samples;

carrying out scene identification on each color cast detection image sample, and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;

determining whether the scene sample corresponding to the color cast detection image sample has color cast according to whether the color cast detection image sample has color cast;

and training a color cast detection neural network model by taking the scene sample as input and the scene sample with or without color cast as output to obtain the trained color cast detection neural network model.

In some embodiments of the first aspect of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.

In some embodiments of the first aspect of the present application, the performing scene recognition on the original image frame and extracting a scene image in the original image frame according to a recognition result includes:

detecting the pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method, and determining the region where the pedestrians are located;

and extracting other areas except the area where the pedestrian is located in the original image frame as scene images.

In some embodiments of the first aspect of the present application, the performing scene recognition on original image frames and extracting a scene image from the original image frames according to a recognition result includes:

detecting a salient region in the original image frame by adopting a saliency detection algorithm, wherein the salient region is a region where a pedestrian is located;

extracting non-significant regions except the significant region in the original image frame as a scene image.

In some embodiments of the first aspect of the present application, the performing pedestrian attribute identification according to the original image frame after color cast reduction includes:

identifying at least one of the following attributes of the pedestrians contained in the original image frame after the color cast reduction: skin tone, hair color, and clothing color.

A second aspect of the present application provides a pedestrian attribute identification system based on deep learning, including:

the system comprises an original image frame extraction module, a pedestrian detection module and a pedestrian detection module, wherein the original image frame extraction module is used for extracting an original image frame from a monitoring video stream shot by a monitoring camera, and the image content of the original image frame contains pedestrians;

the color cast detection module is used for detecting whether the original image frame has color cast;

the color cast reduction module is used for inputting the original image frame into a pre-trained color cast reduction neural network model if the original image frame has color cast, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;

and the pedestrian attribute identification module is used for identifying the pedestrian attribute based on the corrected image frame.

In some embodiments of the second aspect of the present application, the system further comprises:

the image sample group acquisition module is used for acquiring a plurality of groups of image sample groups, wherein each group of image sample groups comprises a first image sample with color cast and a second image sample without color cast, and the first image sample and the second image sample contain the same image content;

and the color cast reduction model training module is used for training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.

In some embodiments of the second aspect of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.

In some embodiments of the second aspect of the present application, the color cast detection module comprises:

the scene extraction unit is used for carrying out scene identification on the original image frame and extracting a scene image in the original image frame according to an identification result;

and the scene detection unit is used for inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.

In some embodiments of the second aspect of the present application, the color cast detection module further comprises:

a color cast sample acquisition unit, configured to acquire a plurality of color cast detection image samples, where the color cast detection image samples include a color cast negative sample and a color cast-free positive sample;

the sample scene extraction unit is used for carrying out scene identification on each color cast detection image sample and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;

the color cast state determining unit is used for determining whether the scene sample corresponding to the color cast detection image sample has color cast according to the color cast detection image sample;

and the color cast detection model training unit is used for training the color cast detection neural network model by taking the scene sample as input and taking the presence or absence of color cast of the scene sample as output to obtain the trained color cast detection neural network model.

In some embodiments of the second aspect of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.

In some embodiments of the second aspect of the present application, the scene extraction unit includes:

the pedestrian region determining subunit is used for detecting pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method and determining a region where the pedestrians are located;

and the scene extraction subunit is used for extracting other areas except the area where the pedestrian is located in the original image frame as scene images.

a salient region determining subunit, configured to detect a salient region in the original image frame by using a saliency detection algorithm, where the salient region is a region where a pedestrian is located;

and the non-significant region extracting subunit is used for extracting non-significant regions except the significant region in the original image frame as the scene image.

In some embodiments of the second aspect of the present application, the pedestrian attribute identification module comprises:

Compared with the prior art, the pedestrian attribute identification method based on the surveillance video extracts original image frames from surveillance video streams shot by a surveillance camera, detects whether the original image frames have color cast, inputs the original image frames into a color cast reduction neural network model trained in advance if the original image frames have color cast, outputs corrected image frames corresponding to the original image frames by using the color cast reduction neural network model, and then identifies the pedestrian attributes based on the corrected image frames. According to the method, the color cast reduction neural network model is trained in advance, then the color cast detection is carried out on the original image frame, if color cast exists, the color cast reduction neural network model can be used for carrying out color cast reduction on the original image frame, the corrected image frame with corrected color cast is obtained, then the pedestrian attribute identification can be accurately carried out by using the corrected image frame without color cast, and on the basis, the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video image under neon light irradiation at night can be further solved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 illustrates a flow chart of a surveillance video based pedestrian attribute identification method provided by some embodiments of the present application;

fig. 2 illustrates a schematic diagram of a deep learning based pedestrian attribute identification system provided by some embodiments of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a pedestrian attribute identification method and system based on a monitoring video, and the following description is given by combining the embodiment and the accompanying drawings for an example.

Referring to fig. 1, which shows a flowchart of a surveillance video-based pedestrian attribute identification method according to some embodiments of the present application, as shown in fig. 1, the surveillance video-based pedestrian attribute identification method may include the following steps:

step S101: original image frames are extracted from a surveillance video stream captured by a surveillance camera, wherein image content of the original image frames contains pedestrians.

Step S102: and detecting whether the original image frame has color cast.

Step S103: and if the original image frame has color cast, inputting the original image frame into a pre-trained color cast reduction neural network model, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model.

Step S104: and performing pedestrian attribute identification based on the corrected image frame.

Compared with the prior art, the pedestrian attribute identification method based on the surveillance video extracts an original image frame from a surveillance video stream shot by a surveillance camera, detects whether the original image frame has color cast, inputs the original image frame into a color cast reduction neural network model trained in advance if the original image frame has color cast, outputs a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, and then identifies the pedestrian attribute based on the corrected image frame. According to the method, the color cast reduction neural network model is trained in advance, then the color cast detection is carried out on the original image frame, if color cast exists, the color cast reduction neural network model can be used for carrying out color cast reduction on the original image frame, the corrected image frame with corrected color cast is obtained, then the pedestrian attribute identification can be accurately carried out by using the corrected image frame without color cast, and on the basis, the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video image under neon light irradiation at night can be further solved.

In some modifications of the embodiments of the present application, before inputting the original image frame into a pre-trained color cast reduction neural network model and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, the method further includes:

For example, the user may perform color cast correction on the first image sample with color cast by using image editing software, where the color cast correction has the same meaning as the color cast reduction expression to obtain a second image sample without color cast, train the color cast reduction neural network model by using the first image sample as an input and using the second image sample as an output, so that the color cast reduction neural network model has the capability of performing automatic color cast correction on the image, and perform automatic color cast correction on the original image frame with color cast by using the color cast reduction neural network model after the training is completed.

On the basis of any implementation manner of the present application, in some specific implementation manners, the color cast reduction neural network model is implemented by generating an antagonistic neural network GAN, generating an antagonistic neural network DCGAN by deep convolution, generating an antagonistic neural network CoGAN by coupling, or generating an antagonistic neural network SAGAN by self-attention, and the neural networks have the capability of generating pictures according to the pictures, so that the purpose of the embodiments of the present application can be achieved.

In some variations of embodiments of the present application, the detecting whether the original image frame has color cast includes:

Considering that in a monitoring video, a scene is basically kept unchanged or changes little, and entering and exiting of a monitoring picture by a dynamic object such as a pedestrian brings interference to color cast detection of an original image frame, so that uncertainty of a detection result is caused.

In addition to the above embodiments, in some modified embodiments, before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further includes:

According to the method, the positive samples and the negative samples are obtained, scene recognition is carried out on the positive samples and the negative samples, the scene samples are extracted, then model training is carried out by using the scene samples, so that the color cast detection neural network model capable of efficiently and accurately judging whether the scene images are color cast can be obtained, and the color cast detection neural network model is trained by using the scene samples, so that the influence of dynamic objects in the image samples is eliminated, and the color cast detection neural network model obtained by training has high detection accuracy.

It should be noted that the essence of the color cast detection neural network model is a two-class neural network model for a picture, and therefore, any two-class neural network suitable for a picture provided by the prior art, for example, a BP neural network, a CNN neural network, etc., which can achieve the purpose of the embodiments of the present application, should be within the scope of the present application.

On the basis of the foregoing embodiment, in some variations, the performing scene recognition on the original image frame and extracting a scene image in the original image frame according to a recognition result includes:

The interframe difference method and the optical flow field method have better identification capability for images with static and unchangeable background (namely scene) and dynamic change foreground (namely pedestrians), so that the area where the pedestrians are located can be identified more accurately.

In some variations of the embodiments of the present application, the performing scene recognition on an original image frame and extracting a scene image in the original image frame according to a recognition result includes:

The significance detection algorithm has the advantages of high efficiency and low system operation load, so that the operation efficiency can be effectively improved and the system load can be reduced by adopting the embodiment of the application.

In some variations of the embodiments of the present application, the performing pedestrian attribute recognition according to the original image frame after color cast reduction includes:

The step can be implemented directly or by changing any pedestrian attribute identification method disclosed in the prior art, and the specific implementation mode of the step is not limited in the application.

In consideration of the fact that skin color, color development and clothing color are easily unrecognizable or mistakenly recognized under the irradiation of neon lamps and the like, the pedestrian attributes such as skin color, color development and clothing color can be further accurately recognized after automatic color cast correction is carried out through the embodiment, and the problem that the pedestrian attributes cannot be recognized or mistakenly recognized due to color cast of monitoring video pictures under the irradiation of neon lights at night is solved.

In the embodiment, a pedestrian attribute identification method based on a surveillance video is provided, and correspondingly, the application also provides a pedestrian attribute identification system based on deep learning. The pedestrian attribute identification system based on deep learning provided by the embodiment of the application can implement the pedestrian attribute identification method based on the monitoring video, and can be realized through software, hardware or a combination of software and hardware. For example, the deep learning based pedestrian attribute identification system may comprise integrated or separate functional modules or units to perform the corresponding steps of the above methods. Referring to fig. 2, a schematic diagram of a deep learning based pedestrian attribute identification system according to some embodiments of the present application is shown. Since the system embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The system embodiments described below are merely illustrative.

As shown in fig. 2, the deep learning based pedestrian attribute identification system 10 may include:

a second aspect of the present application provides a pedestrian attribute recognition system 10 based on deep learning, including:

an original image frame extracting module 101, configured to extract an original image frame from a surveillance video stream captured by a surveillance camera, where image content of the original image frame includes a pedestrian;

a color cast detection module 102, configured to detect whether the original image frame has color cast;

the color cast reduction module 103 is configured to, if the original image frame has color cast, input the original image frame into a color cast reduction neural network model trained in advance, and output a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;

and a pedestrian attribute identification module 104, configured to perform pedestrian attribute identification based on the corrected image frame.

In some variations of the embodiments of the present application, the system further includes:

In some variations of embodiments of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.

In some variations of the embodiments of the present application, the color cast detection module 102 includes:

In some variations of the embodiments of the present application, the color cast detection module 102 further includes:

In some variations of the embodiments of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.

In some variations of embodiments of the present application, the scene extraction unit includes:

In some variations of the embodiments of the present application, the pedestrian property identification module 104 includes:

The pedestrian attribute identification system 10 based on deep learning provided by the embodiment of the present application and the pedestrian attribute identification method based on surveillance video provided by the foregoing embodiment of the present application have the same beneficial effects based on the same inventive concept.

It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims

1. A pedestrian attribute identification method based on a surveillance video is characterized by comprising the following steps:

detecting whether the original image frame has color cast;

2. The method of claim 1, wherein before inputting the original image frames into a pre-trained color cast reduction neural network model and outputting corrected image frames corresponding to the original image frames by using the color cast reduction neural network model, the method further comprises:

3. The method of claim 2, wherein the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.

4. The method of claim 1, wherein the detecting whether the original image frame has color cast comprises:

5. The method of claim 4, wherein before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further comprises:

6. A pedestrian attribute recognition system based on deep learning, comprising:

7. The system of claim 6, further comprising:

8. The system of claim 6, wherein the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.

9. The system of claim 6, wherein the color cast detection module comprises:

10. The system of claim 6, wherein the color cast detection module further comprises: