CN111898449A - Pedestrian attribute identification method and system based on monitoring video - Google Patents

Pedestrian attribute identification method and system based on monitoring video Download PDF

Info

Publication number
CN111898449A
CN111898449A CN202010614464.7A CN202010614464A CN111898449A CN 111898449 A CN111898449 A CN 111898449A CN 202010614464 A CN202010614464 A CN 202010614464A CN 111898449 A CN111898449 A CN 111898449A
Authority
CN
China
Prior art keywords
color cast
neural network
sample
image frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010614464.7A
Other languages
Chinese (zh)
Other versions
CN111898449B (en
Inventor
贾川民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202010614464.7A priority Critical patent/CN111898449B/en
Publication of CN111898449A publication Critical patent/CN111898449A/en
Application granted granted Critical
Publication of CN111898449B publication Critical patent/CN111898449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a pedestrian attribute identification method and system based on a monitoring video. Wherein the method comprises the following steps: extracting an original image frame from a monitoring video stream shot by a monitoring camera, wherein the image content of the original image frame contains pedestrians; detecting whether the original image frame has color cast; if the original image frame has color cast, inputting the original image frame into a color cast reduction neural network model trained in advance, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model; and performing pedestrian attribute identification based on the corrected image frame. The technical scheme of the application can solve the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video picture under the neon light irradiation at night.

Description

Pedestrian attribute identification method and system based on monitoring video
Technical Field
The application relates to the technical field of image processing, in particular to a pedestrian attribute identification method and system based on a monitoring video.
Background
The video monitoring system is widely applied to daily life as a social security guarantee system, and video acquisition equipment such as a camera and the like can be seen everywhere in public places such as banks, shopping malls, supermarkets, hotels, street corners, intersections, toll stations and the like. The installation of the system greatly increases social security, plays a role in monitoring and recording the behavior of lawbreakers in real time, and provides a great amount of real and reliable clues for the public security organs to detect cases.
With the development of technology and the improvement of living standard of people, the requirement of automatic pedestrian attribute identification in video monitoring is provided to further identify, analyze and track pedestrians, wherein the pedestrian attributes comprise skin colors, clothing colors, color forms and the like of pedestrians, and because the pedestrian attribute identification is realized based on the shot monitoring video, the color cast of the monitoring video shot under the conditions of shining sunlight, bright light and the like is light, the current pedestrian attribute identification technology can be adopted for accurately identifying, but the monitoring video shot under the illumination of a neon lamp at night often has serious color cast, and along with the wide application of color lighting equipment such as colorful neon lamps, colorful lamp belts, colorful lamp boxes and the like, the color cast of the monitoring video often changes to be blue, red and the like, and the color cast degree also often changes, so that the monitoring video is difficult to be subjected to quantization, analysis and tracking, Automatic color cast correction further causes that the pedestrian attributes such as the skin color, the clothing color, the hair color and the like of the pedestrian cannot be accurately and automatically identified.
Disclosure of Invention
The application aims to provide a pedestrian attribute identification method and system based on a surveillance video.
The application provides a pedestrian attribute identification method based on a surveillance video in a first aspect, which comprises the following steps:
extracting an original image frame from a monitoring video stream shot by a monitoring camera, wherein the image content of the original image frame contains pedestrians;
detecting whether the original image frame has color cast;
if the original image frame has color cast, inputting the original image frame into a color cast reduction neural network model trained in advance, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;
and performing pedestrian attribute identification based on the corrected image frame.
In some embodiments of the first aspect of the present application, before inputting the original image frame into a pre-trained color cast reduction neural network model and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, the method further includes:
acquiring a plurality of groups of image sample groups, wherein each group of image sample group comprises a first image sample with color cast and a second image sample without color cast, and the image contents of the first image sample and the second image sample are the same;
and training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
In some embodiments of the first aspect of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.
In some embodiments of the first aspect of the present application, the detecting whether the original image frame has color cast includes:
carrying out scene recognition on the original image frame, and extracting a scene image in the original image frame according to a recognition result;
inputting the scene image into a pre-trained color cast detection neural network model, and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
In some embodiments of the first aspect of the present application, before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further includes:
obtaining a plurality of color cast detection image samples, wherein the color cast detection image samples comprise color cast negative samples and color cast-free positive samples;
carrying out scene identification on each color cast detection image sample, and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
determining whether the scene sample corresponding to the color cast detection image sample has color cast according to whether the color cast detection image sample has color cast;
and training a color cast detection neural network model by taking the scene sample as input and the scene sample with or without color cast as output to obtain the trained color cast detection neural network model.
In some embodiments of the first aspect of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.
In some embodiments of the first aspect of the present application, the performing scene recognition on the original image frame and extracting a scene image in the original image frame according to a recognition result includes:
detecting the pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method, and determining the region where the pedestrians are located;
and extracting other areas except the area where the pedestrian is located in the original image frame as scene images.
In some embodiments of the first aspect of the present application, the performing scene recognition on original image frames and extracting a scene image from the original image frames according to a recognition result includes:
detecting a salient region in the original image frame by adopting a saliency detection algorithm, wherein the salient region is a region where a pedestrian is located;
extracting non-significant regions except the significant region in the original image frame as a scene image.
In some embodiments of the first aspect of the present application, the performing pedestrian attribute identification according to the original image frame after color cast reduction includes:
identifying at least one of the following attributes of the pedestrians contained in the original image frame after the color cast reduction: skin tone, hair color, and clothing color.
A second aspect of the present application provides a pedestrian attribute identification system based on deep learning, including:
the system comprises an original image frame extraction module, a pedestrian detection module and a pedestrian detection module, wherein the original image frame extraction module is used for extracting an original image frame from a monitoring video stream shot by a monitoring camera, and the image content of the original image frame contains pedestrians;
the color cast detection module is used for detecting whether the original image frame has color cast;
the color cast reduction module is used for inputting the original image frame into a pre-trained color cast reduction neural network model if the original image frame has color cast, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;
and the pedestrian attribute identification module is used for identifying the pedestrian attribute based on the corrected image frame.
In some embodiments of the second aspect of the present application, the system further comprises:
the image sample group acquisition module is used for acquiring a plurality of groups of image sample groups, wherein each group of image sample groups comprises a first image sample with color cast and a second image sample without color cast, and the first image sample and the second image sample contain the same image content;
and the color cast reduction model training module is used for training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
In some embodiments of the second aspect of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.
In some embodiments of the second aspect of the present application, the color cast detection module comprises:
the scene extraction unit is used for carrying out scene identification on the original image frame and extracting a scene image in the original image frame according to an identification result;
and the scene detection unit is used for inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
In some embodiments of the second aspect of the present application, the color cast detection module further comprises:
a color cast sample acquisition unit, configured to acquire a plurality of color cast detection image samples, where the color cast detection image samples include a color cast negative sample and a color cast-free positive sample;
the sample scene extraction unit is used for carrying out scene identification on each color cast detection image sample and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
the color cast state determining unit is used for determining whether the scene sample corresponding to the color cast detection image sample has color cast according to the color cast detection image sample;
and the color cast detection model training unit is used for training the color cast detection neural network model by taking the scene sample as input and taking the presence or absence of color cast of the scene sample as output to obtain the trained color cast detection neural network model.
In some embodiments of the second aspect of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.
In some embodiments of the second aspect of the present application, the scene extraction unit includes:
the pedestrian region determining subunit is used for detecting pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method and determining a region where the pedestrians are located;
and the scene extraction subunit is used for extracting other areas except the area where the pedestrian is located in the original image frame as scene images.
In some embodiments of the second aspect of the present application, the scene extraction unit includes:
a salient region determining subunit, configured to detect a salient region in the original image frame by using a saliency detection algorithm, where the salient region is a region where a pedestrian is located;
and the non-significant region extracting subunit is used for extracting non-significant regions except the significant region in the original image frame as the scene image.
In some embodiments of the second aspect of the present application, the pedestrian attribute identification module comprises:
identifying at least one of the following attributes of the pedestrians contained in the original image frame after the color cast reduction: skin tone, hair color, and clothing color.
Compared with the prior art, the pedestrian attribute identification method based on the surveillance video extracts original image frames from surveillance video streams shot by a surveillance camera, detects whether the original image frames have color cast, inputs the original image frames into a color cast reduction neural network model trained in advance if the original image frames have color cast, outputs corrected image frames corresponding to the original image frames by using the color cast reduction neural network model, and then identifies the pedestrian attributes based on the corrected image frames. According to the method, the color cast reduction neural network model is trained in advance, then the color cast detection is carried out on the original image frame, if color cast exists, the color cast reduction neural network model can be used for carrying out color cast reduction on the original image frame, the corrected image frame with corrected color cast is obtained, then the pedestrian attribute identification can be accurately carried out by using the corrected image frame without color cast, and on the basis, the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video image under neon light irradiation at night can be further solved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 illustrates a flow chart of a surveillance video based pedestrian attribute identification method provided by some embodiments of the present application;
fig. 2 illustrates a schematic diagram of a deep learning based pedestrian attribute identification system provided by some embodiments of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.
In addition, the terms "first" and "second", etc. are used to distinguish different objects, rather than to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a pedestrian attribute identification method and system based on a monitoring video, and the following description is given by combining the embodiment and the accompanying drawings for an example.
Referring to fig. 1, which shows a flowchart of a surveillance video-based pedestrian attribute identification method according to some embodiments of the present application, as shown in fig. 1, the surveillance video-based pedestrian attribute identification method may include the following steps:
step S101: original image frames are extracted from a surveillance video stream captured by a surveillance camera, wherein image content of the original image frames contains pedestrians.
Step S102: and detecting whether the original image frame has color cast.
Step S103: and if the original image frame has color cast, inputting the original image frame into a pre-trained color cast reduction neural network model, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model.
Step S104: and performing pedestrian attribute identification based on the corrected image frame.
Compared with the prior art, the pedestrian attribute identification method based on the surveillance video extracts an original image frame from a surveillance video stream shot by a surveillance camera, detects whether the original image frame has color cast, inputs the original image frame into a color cast reduction neural network model trained in advance if the original image frame has color cast, outputs a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, and then identifies the pedestrian attribute based on the corrected image frame. According to the method, the color cast reduction neural network model is trained in advance, then the color cast detection is carried out on the original image frame, if color cast exists, the color cast reduction neural network model can be used for carrying out color cast reduction on the original image frame, the corrected image frame with corrected color cast is obtained, then the pedestrian attribute identification can be accurately carried out by using the corrected image frame without color cast, and on the basis, the problem that the pedestrian attribute cannot be identified or is identified wrongly due to color cast of the monitoring video image under neon light irradiation at night can be further solved.
In some modifications of the embodiments of the present application, before inputting the original image frame into a pre-trained color cast reduction neural network model and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model, the method further includes:
acquiring a plurality of groups of image sample groups, wherein each group of image sample group comprises a first image sample with color cast and a second image sample without color cast, and the image contents of the first image sample and the second image sample are the same;
and training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
For example, the user may perform color cast correction on the first image sample with color cast by using image editing software, where the color cast correction has the same meaning as the color cast reduction expression to obtain a second image sample without color cast, train the color cast reduction neural network model by using the first image sample as an input and using the second image sample as an output, so that the color cast reduction neural network model has the capability of performing automatic color cast correction on the image, and perform automatic color cast correction on the original image frame with color cast by using the color cast reduction neural network model after the training is completed.
On the basis of any implementation manner of the present application, in some specific implementation manners, the color cast reduction neural network model is implemented by generating an antagonistic neural network GAN, generating an antagonistic neural network DCGAN by deep convolution, generating an antagonistic neural network CoGAN by coupling, or generating an antagonistic neural network SAGAN by self-attention, and the neural networks have the capability of generating pictures according to the pictures, so that the purpose of the embodiments of the present application can be achieved.
In some variations of embodiments of the present application, the detecting whether the original image frame has color cast includes:
carrying out scene recognition on the original image frame, and extracting a scene image in the original image frame according to a recognition result;
inputting the scene image into a pre-trained color cast detection neural network model, and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
Considering that in a monitoring video, a scene is basically kept unchanged or changes little, and entering and exiting of a monitoring picture by a dynamic object such as a pedestrian brings interference to color cast detection of an original image frame, so that uncertainty of a detection result is caused.
In addition to the above embodiments, in some modified embodiments, before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further includes:
obtaining a plurality of color cast detection image samples, wherein the color cast detection image samples comprise color cast negative samples and color cast-free positive samples;
carrying out scene identification on each color cast detection image sample, and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
determining whether the scene sample corresponding to the color cast detection image sample has color cast according to whether the color cast detection image sample has color cast;
and training a color cast detection neural network model by taking the scene sample as input and the scene sample with or without color cast as output to obtain the trained color cast detection neural network model.
According to the method, the positive samples and the negative samples are obtained, scene recognition is carried out on the positive samples and the negative samples, the scene samples are extracted, then model training is carried out by using the scene samples, so that the color cast detection neural network model capable of efficiently and accurately judging whether the scene images are color cast can be obtained, and the color cast detection neural network model is trained by using the scene samples, so that the influence of dynamic objects in the image samples is eliminated, and the color cast detection neural network model obtained by training has high detection accuracy.
It should be noted that the essence of the color cast detection neural network model is a two-class neural network model for a picture, and therefore, any two-class neural network suitable for a picture provided by the prior art, for example, a BP neural network, a CNN neural network, etc., which can achieve the purpose of the embodiments of the present application, should be within the scope of the present application.
On the basis of the foregoing embodiment, in some variations, the performing scene recognition on the original image frame and extracting a scene image in the original image frame according to a recognition result includes:
detecting the pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method, and determining the region where the pedestrians are located;
and extracting other areas except the area where the pedestrian is located in the original image frame as scene images.
The interframe difference method and the optical flow field method have better identification capability for images with static and unchangeable background (namely scene) and dynamic change foreground (namely pedestrians), so that the area where the pedestrians are located can be identified more accurately.
In some variations of the embodiments of the present application, the performing scene recognition on an original image frame and extracting a scene image in the original image frame according to a recognition result includes:
detecting a salient region in the original image frame by adopting a saliency detection algorithm, wherein the salient region is a region where a pedestrian is located;
extracting non-significant regions except the significant region in the original image frame as a scene image.
The significance detection algorithm has the advantages of high efficiency and low system operation load, so that the operation efficiency can be effectively improved and the system load can be reduced by adopting the embodiment of the application.
In some variations of the embodiments of the present application, the performing pedestrian attribute recognition according to the original image frame after color cast reduction includes:
identifying at least one of the following attributes of the pedestrians contained in the original image frame after the color cast reduction: skin tone, hair color, and clothing color.
The step can be implemented directly or by changing any pedestrian attribute identification method disclosed in the prior art, and the specific implementation mode of the step is not limited in the application.
In consideration of the fact that skin color, color development and clothing color are easily unrecognizable or mistakenly recognized under the irradiation of neon lamps and the like, the pedestrian attributes such as skin color, color development and clothing color can be further accurately recognized after automatic color cast correction is carried out through the embodiment, and the problem that the pedestrian attributes cannot be recognized or mistakenly recognized due to color cast of monitoring video pictures under the irradiation of neon lights at night is solved.
In the embodiment, a pedestrian attribute identification method based on a surveillance video is provided, and correspondingly, the application also provides a pedestrian attribute identification system based on deep learning. The pedestrian attribute identification system based on deep learning provided by the embodiment of the application can implement the pedestrian attribute identification method based on the monitoring video, and can be realized through software, hardware or a combination of software and hardware. For example, the deep learning based pedestrian attribute identification system may comprise integrated or separate functional modules or units to perform the corresponding steps of the above methods. Referring to fig. 2, a schematic diagram of a deep learning based pedestrian attribute identification system according to some embodiments of the present application is shown. Since the system embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The system embodiments described below are merely illustrative.
As shown in fig. 2, the deep learning based pedestrian attribute identification system 10 may include:
a second aspect of the present application provides a pedestrian attribute recognition system 10 based on deep learning, including:
an original image frame extracting module 101, configured to extract an original image frame from a surveillance video stream captured by a surveillance camera, where image content of the original image frame includes a pedestrian;
a color cast detection module 102, configured to detect whether the original image frame has color cast;
the color cast reduction module 103 is configured to, if the original image frame has color cast, input the original image frame into a color cast reduction neural network model trained in advance, and output a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;
and a pedestrian attribute identification module 104, configured to perform pedestrian attribute identification based on the corrected image frame.
In some variations of the embodiments of the present application, the system further includes:
the image sample group acquisition module is used for acquiring a plurality of groups of image sample groups, wherein each group of image sample groups comprises a first image sample with color cast and a second image sample without color cast, and the first image sample and the second image sample contain the same image content;
and the color cast reduction model training module is used for training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
In some variations of embodiments of the present application, the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.
In some variations of the embodiments of the present application, the color cast detection module 102 includes:
the scene extraction unit is used for carrying out scene identification on the original image frame and extracting a scene image in the original image frame according to an identification result;
and the scene detection unit is used for inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
In some variations of the embodiments of the present application, the color cast detection module 102 further includes:
a color cast sample acquisition unit, configured to acquire a plurality of color cast detection image samples, where the color cast detection image samples include a color cast negative sample and a color cast-free positive sample;
the sample scene extraction unit is used for carrying out scene identification on each color cast detection image sample and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
the color cast state determining unit is used for determining whether the scene sample corresponding to the color cast detection image sample has color cast according to the color cast detection image sample;
and the color cast detection model training unit is used for training the color cast detection neural network model by taking the scene sample as input and taking the presence or absence of color cast of the scene sample as output to obtain the trained color cast detection neural network model.
In some variations of the embodiments of the present application, the color cast reduction neural network model is implemented by using a BP neural network and a CNN neural network.
In some variations of embodiments of the present application, the scene extraction unit includes:
the pedestrian region determining subunit is used for detecting pedestrians in the original image frame by adopting an interframe difference method or an optical flow field method and determining a region where the pedestrians are located;
and the scene extraction subunit is used for extracting other areas except the area where the pedestrian is located in the original image frame as scene images.
In some variations of embodiments of the present application, the scene extraction unit includes:
a salient region determining subunit, configured to detect a salient region in the original image frame by using a saliency detection algorithm, where the salient region is a region where a pedestrian is located;
and the non-significant region extracting subunit is used for extracting non-significant regions except the significant region in the original image frame as the scene image.
In some variations of the embodiments of the present application, the pedestrian property identification module 104 includes:
identifying at least one of the following attributes of the pedestrians contained in the original image frame after the color cast reduction: skin tone, hair color, and clothing color.
The pedestrian attribute identification system 10 based on deep learning provided by the embodiment of the present application and the pedestrian attribute identification method based on surveillance video provided by the foregoing embodiment of the present application have the same beneficial effects based on the same inventive concept.
It should be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present disclosure, and the present disclosure should be construed as being covered by the claims and the specification.

Claims (10)

1. A pedestrian attribute identification method based on a surveillance video is characterized by comprising the following steps:
extracting an original image frame from a monitoring video stream shot by a monitoring camera, wherein the image content of the original image frame contains pedestrians;
detecting whether the original image frame has color cast;
if the original image frame has color cast, inputting the original image frame into a color cast reduction neural network model trained in advance, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;
and performing pedestrian attribute identification based on the corrected image frame.
2. The method of claim 1, wherein before inputting the original image frames into a pre-trained color cast reduction neural network model and outputting corrected image frames corresponding to the original image frames by using the color cast reduction neural network model, the method further comprises:
acquiring a plurality of groups of image sample groups, wherein each group of image sample group comprises a first image sample with color cast and a second image sample without color cast, and the image contents of the first image sample and the second image sample are the same;
and training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
3. The method of claim 2, wherein the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.
4. The method of claim 1, wherein the detecting whether the original image frame has color cast comprises:
carrying out scene recognition on the original image frame, and extracting a scene image in the original image frame according to a recognition result;
inputting the scene image into a pre-trained color cast detection neural network model, and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
5. The method of claim 4, wherein before inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, the method further comprises:
obtaining a plurality of color cast detection image samples, wherein the color cast detection image samples comprise color cast negative samples and color cast-free positive samples;
carrying out scene identification on each color cast detection image sample, and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
determining whether the scene sample corresponding to the color cast detection image sample has color cast according to whether the color cast detection image sample has color cast;
and training a color cast detection neural network model by taking the scene sample as input and the scene sample with or without color cast as output to obtain the trained color cast detection neural network model.
6. A pedestrian attribute recognition system based on deep learning, comprising:
the system comprises an original image frame extraction module, a pedestrian detection module and a pedestrian detection module, wherein the original image frame extraction module is used for extracting an original image frame from a monitoring video stream shot by a monitoring camera, and the image content of the original image frame contains pedestrians;
the color cast detection module is used for detecting whether the original image frame has color cast;
the color cast reduction module is used for inputting the original image frame into a pre-trained color cast reduction neural network model if the original image frame has color cast, and outputting a corrected image frame corresponding to the original image frame by using the color cast reduction neural network model;
and the pedestrian attribute identification module is used for identifying the pedestrian attribute based on the corrected image frame.
7. The system of claim 6, further comprising:
the image sample group acquisition module is used for acquiring a plurality of groups of image sample groups, wherein each group of image sample groups comprises a first image sample with color cast and a second image sample without color cast, and the first image sample and the second image sample contain the same image content;
and the color cast reduction model training module is used for training a color cast reduction neural network model by using the plurality of groups of image sample groups by taking the first image sample as input and the second image sample as output to obtain the trained color cast reduction neural network model.
8. The system of claim 6, wherein the color cast reduction neural network model is implemented using generation of an antagonistic neural network GAN, deep convolution generation of an antagonistic neural network DCGAN, coupled generation of an antagonistic neural network CoGAN, or self-attention generation of an antagonistic neural network SAGAN.
9. The system of claim 6, wherein the color cast detection module comprises:
the scene extraction unit is used for carrying out scene identification on the original image frame and extracting a scene image in the original image frame according to an identification result;
and the scene detection unit is used for inputting the scene image into a pre-trained color cast detection neural network model and outputting a color cast detection result by using the color cast detection neural network model, wherein the color cast detection result comprises color cast or no color cast.
10. The system of claim 6, wherein the color cast detection module further comprises:
a color cast sample acquisition unit, configured to acquire a plurality of color cast detection image samples, where the color cast detection image samples include a color cast negative sample and a color cast-free positive sample;
the sample scene extraction unit is used for carrying out scene identification on each color cast detection image sample and extracting a scene image in each color cast detection image sample as a scene sample according to an identification result;
the color cast state determining unit is used for determining whether the scene sample corresponding to the color cast detection image sample has color cast according to the color cast detection image sample;
and the color cast detection model training unit is used for training the color cast detection neural network model by taking the scene sample as input and taking the presence or absence of color cast of the scene sample as output to obtain the trained color cast detection neural network model.
CN202010614464.7A 2020-06-30 2020-06-30 Pedestrian attribute identification method and system based on monitoring video Active CN111898449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010614464.7A CN111898449B (en) 2020-06-30 2020-06-30 Pedestrian attribute identification method and system based on monitoring video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010614464.7A CN111898449B (en) 2020-06-30 2020-06-30 Pedestrian attribute identification method and system based on monitoring video

Publications (2)

Publication Number Publication Date
CN111898449A true CN111898449A (en) 2020-11-06
CN111898449B CN111898449B (en) 2023-04-18

Family

ID=73206516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010614464.7A Active CN111898449B (en) 2020-06-30 2020-06-30 Pedestrian attribute identification method and system based on monitoring video

Country Status (1)

Country Link
CN (1) CN111898449B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102013005A (en) * 2009-09-07 2011-04-13 泉州市铁通电子设备有限公司 Local dynamic threshold color balance based detecting human face detection method with polarized colored light based on
CN103065334A (en) * 2013-01-31 2013-04-24 金陵科技学院 Color cast detection and correction method and device based on HSV (Hue, Saturation, Value) color space
CN106412547A (en) * 2016-08-29 2017-02-15 厦门美图之家科技有限公司 Image white balance method and device based on convolutional neural network, and computing device
CN108364270A (en) * 2018-05-22 2018-08-03 北京理工大学 Colour cast color of image restoring method and device
CN109523485A (en) * 2018-11-19 2019-03-26 Oppo广东移动通信有限公司 Image color correction method, device, storage medium and mobile terminal
CN109726669A (en) * 2018-12-26 2019-05-07 浙江捷尚视觉科技股份有限公司 Pedestrian identifies data creation method again under different illumination conditions based on confrontation network
CN111292251A (en) * 2019-03-14 2020-06-16 展讯通信(上海)有限公司 Image color cast correction method, device and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102013005A (en) * 2009-09-07 2011-04-13 泉州市铁通电子设备有限公司 Local dynamic threshold color balance based detecting human face detection method with polarized colored light based on
CN103065334A (en) * 2013-01-31 2013-04-24 金陵科技学院 Color cast detection and correction method and device based on HSV (Hue, Saturation, Value) color space
CN106412547A (en) * 2016-08-29 2017-02-15 厦门美图之家科技有限公司 Image white balance method and device based on convolutional neural network, and computing device
CN108364270A (en) * 2018-05-22 2018-08-03 北京理工大学 Colour cast color of image restoring method and device
CN109523485A (en) * 2018-11-19 2019-03-26 Oppo广东移动通信有限公司 Image color correction method, device, storage medium and mobile terminal
CN109726669A (en) * 2018-12-26 2019-05-07 浙江捷尚视觉科技股份有限公司 Pedestrian identifies data creation method again under different illumination conditions based on confrontation network
CN111292251A (en) * 2019-03-14 2020-06-16 展讯通信(上海)有限公司 Image color cast correction method, device and computer storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ESUBE BEKELE等: "The Deeper, the Better: Analysis of Person Attributes Recognition", 《ARXIV》 *
贾川民 等: "基于神经网络的图像视频编码", 《专题:智能通信技术及应用》 *
马成前 等: "基于机器学习的图像偏色检测", 《计算机应用与软件》 *

Also Published As

Publication number Publication date
CN111898449B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Jung Efficient background subtraction and shadow removal for monochromatic video sequences
CN110796098B (en) Method, device, equipment and storage medium for training and auditing content auditing model
CN109815936B (en) Target object analysis method and device, computer equipment and storage medium
CN108446681B (en) Pedestrian analysis method, device, terminal and storage medium
Ibrahim et al. Speed Detection Camera System using Image ProcessingTechniques on Video Streams
CN110874878B (en) Pedestrian analysis method, device, terminal and storage medium
CN105279487A (en) Beauty tool screening method and system
CN111222450B (en) Model training and live broadcast processing method, device, equipment and storage medium
CN105451029A (en) Video image processing method and device
WO2022213540A1 (en) Object detecting, attribute identifying and tracking method and system
CN111898448B (en) Pedestrian attribute identification method and system based on deep learning
CN112989950A (en) Violent video recognition system oriented to multi-mode feature semantic correlation features
CN112184771A (en) Community personnel trajectory tracking method and device
US11455785B2 (en) System and method for use in object detection from video stream
CN108769521B (en) Photographing method, mobile terminal and computer readable storage medium
CN112749696B (en) Text detection method and device
CN111898449B (en) Pedestrian attribute identification method and system based on monitoring video
Satwashil et al. Integrated natural scene text localization and recognition
CN115376033A (en) Information generation method and device
CN111008601A (en) Fighting detection method based on video
CN109977891A (en) A kind of object detection and recognition method neural network based
Lin et al. Face detection based on skin color segmentation and SVM classification
Low et al. Frame Based Object Detection--An Application for Traffic Monitoring
TWI777689B (en) Method of object identification and temperature measurement
CN108764126A (en) A kind of embedded living body faces tracking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant