CN113192009B - Crowd counting method and system based on global context convolutional network - Google Patents

Crowd counting method and system based on global context convolutional network Download PDF

Info

Publication number
CN113192009B
CN113192009B CN202110382645.6A CN202110382645A CN113192009B CN 113192009 B CN113192009 B CN 113192009B CN 202110382645 A CN202110382645 A CN 202110382645A CN 113192009 B CN113192009 B CN 113192009B
Authority
CN
China
Prior art keywords
level feature
global context
low
feature map
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110382645.6A
Other languages
Chinese (zh)
Other versions
CN113192009A (en
Inventor
康春萌
孟琛
盛星
吕蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202110382645.6A priority Critical patent/CN113192009B/en
Publication of CN113192009A publication Critical patent/CN113192009A/en
Application granted granted Critical
Publication of CN113192009B publication Critical patent/CN113192009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a crowd counting method and a system based on a global context convolutional network, wherein the method respectively extracts a low-level feature map and a high-level feature map of an image to be counted; respectively extracting multi-scale features from the low-level feature map and the high-level feature map to obtain a feature map with multi-scale information; by capturing the spatial information and the channel information, the global context characteristics are aggregated to each pixel to obtain a characteristic diagram with the context information, and the remote dependency relationship among the pixels is obtained, so that the characteristic diagram contains richer information; and a crowd density map is obtained through upsampling and feature fusion, so that the crowd counting precision is improved.

Description

Crowd counting method and system based on global context convolutional network
Technical Field
The invention belongs to the field of deep learning and computer vision, and particularly relates to a population counting method and system based on a global context convolutional network.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, due to the wide application of population counting in public safety, city planning, traffic control, and the like, there has been a continuous interest in the field of computer vision. The goal of crowd counting is to accurately estimate the number of people from a still image or frame. Due to the factors such as the shooting angle of the camera, the distance difference between different people in the crowd and the camera and the like, the shot images have the problems of scale change, serious shielding, irrelevant background and the like, so that the accuracy of the crowd counting algorithm is greatly influenced.
At present, the CNN-based method has become the mainstream method of population counting research, and the network architecture thereof is mainly divided into a single-column architecture and a multi-column architecture. The single-column architecture is generally a single multilayer convolutional neural network, and the network structure is simple but lacks detail information and spatial information; the multi-column architecture generally adopts a multi-scale or multi-column structure to capture richer feature information, but the structure is complex, the calculation complexity is high, and most methods do not fully utilize context information and proportion information. For this reason, some recent crowd counting methods try to introduce strategies such as hole convolution, pyramid network, attention model, etc. to improve the existing architecture, but there still exists a great challenge in dealing with the problems of scale change and severe occlusion.
Disclosure of Invention
The invention aims to solve the problems and provides a crowd counting method and system based on a global context convolutional network.
According to some embodiments, the invention adopts the following technical scheme:
a crowd counting method based on a global context convolutional network comprises the following steps:
acquiring a crowd image to be counted;
extracting a low-level feature map and a high-level feature map of the crowd image;
carrying out scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map;
sequentially carrying out context modeling and feature conversion on the enhanced low-level feature graph and the enhanced high-level feature graph, extracting global context features, and obtaining the low-level feature graph and the high-level feature graph which are blended with global context information through feature fusion;
determining a density map according to the low-level feature map and the high-level feature map which are merged into the global context information;
population counts were made from the density map.
As a further limitation, the specific steps of performing scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map include:
compressing channels by four convolution operations on the low-level feature map and the high-level feature map to obtain compressed feature maps;
extracting a multi-scale characteristic diagram from the compressed low-level characteristic diagram and the high-level characteristic diagram through convolution of cavities with different expansion rates;
and splicing the extracted multi-scale feature maps according to a channel splicing method to obtain an enhanced low-level feature map and an enhanced high-level feature map.
As a further limitation, the specific steps of the context modeling are as follows:
performing convolution operation on the characteristic diagram and the linear transformation matrix, and normalizing the attention weight value through a softmax function to obtain a normalized attention weight value;
and carrying out reshape operation on the feature map, and carrying out matrix multiplication on the feature map and the normalized attention weight value to obtain the initial global context feature.
As a further limitation, the specific steps of feature transformation include:
firstly, performing convolution operation on the initial global context feature and a linear transformation matrix, then sequentially performing LayerNorm and Relu operation, and finally completing feature transformation through 1 × 1 convolution to obtain the global context feature.
By way of further limitation, the feature fusion aggregates global context features to each position of the enhanced low-level feature map and the enhanced high-level feature map through broadcast element addition operation, so that each position can acquire global context information, and the low-level feature map and the high-level feature map which are fused with the global context information are obtained.
As a further limitation, the specific step of determining the density map according to the low-level feature map and the high-level feature map merged into the global context information is as follows:
carrying out up-sampling operation on the high-level feature graph integrated with the global context information;
and performing splicing operation and convolution operation on the high-level feature graph subjected to the upsampling operation and the low-level feature graph fused with the global context information to obtain a density graph.
As a further limitation, the population counting according to the density map specifically includes: the predicted population is obtained by integrating and summing the density maps.
A population counting system based on a global context convolutional network, comprising:
the image acquisition module is used for acquiring a crowd image to be counted;
the characteristic extraction module is used for extracting a low-level characteristic diagram and a high-level characteristic diagram of the crowd image;
the scale perception module is used for carrying out scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map;
the global context module is used for sequentially carrying out context modeling and feature conversion on the enhanced low-level feature graph and the enhanced high-level feature graph, extracting global context features and obtaining the low-level feature graph and the high-level feature graph which are blended with global context information through feature fusion;
the density map determining module is used for determining a density map according to the low-level feature map and the high-level feature map which are blended into the global context information;
and the people counting module is used for counting people according to the density map.
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a global context convolutional network based crowd counting method.
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the population counting method based on the global context convolutional network.
Compared with the prior art, the invention has the beneficial effects that:
compared with the standard convolution, the method has the advantages that the hole convolution is used, so that the method has a larger receptive field when the feature graph is subjected to convolution operation, contains more local context information and reduces the calculation complexity.
The invention uses the hole convolution with different expansion rates to form a proportional pyramid type network, and compared with the traditional convolution operation with convolution kernels of different sizes, the invention has simpler structure and smaller complexity while extracting the multi-proportion information of the characteristic diagram.
The invention extracts the global context characteristics of the low-level characteristic diagram and the high-level characteristic diagram, obtains the low-level characteristic diagram and the high-level characteristic diagram which are blended with the global context information through characteristic fusion, and captures the dependency relationship between channels, thereby ensuring that each position of the image can obtain the global context information, obtaining the remote dependency relationship between pixels and ensuring that the characteristic diagram contains richer information.
The global context module used by the invention belongs to a lightweight computing module, so that the model has less resource consumption and higher computing efficiency.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of the present invention;
FIG. 3 is a scale-aware schematic of the present invention;
FIG. 4 is a diagram illustrating the global context information extraction and fusion principle of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example 1
In one or more technical solutions disclosed in one or more embodiments, a population counting method based on a global context convolutional network is provided, which is used for processing extracted low-level features and extracted high-level features respectively by using a scale perception module and a global context module at the same time so as to capture rich scale information and context information, and finally achieving the purpose of predicting a density map more accurately.
As shown in fig. 1 and fig. 2, a population counting method based on a global context convolutional network includes the following specific steps:
step 1, acquiring a crowd image to be counted;
step 2, extracting a low-level feature map and a high-level feature map of the crowd image, namely extracting image features, and respectively extracting local feature maps with the low-level features and the high-level features from the image to be counted;
the first five layers of VGG-16Net are adopted in the low-level feature map and the high-level feature map of the extracted crowd image, the feature map output by the third layer is used as the low-level feature map, and the feature maps output by the fourth layer and the fifth layer are used as the high-level feature map; given image I, the output profile of VGG-16Net can be expressed as:
f v =F vgg (I)
through a backbone network of VGG-16Net, the characteristic information of the image can be preliminarily extracted.
And 3, respectively carrying out scale perception on the low-level feature map and the high-level feature map, and extracting multi-scale information of the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map. That is, multi-scale information is acquired, multi-scale features of the low-level feature map and the high-level feature map are extracted, and a feature map with the multi-scale information is acquired. As shown in fig. 3, specifically:
firstly, compressing channels for a low-level feature map and a high-level feature map through four 1 × 1 convolution operations to obtain compressed feature maps;
then, convolving the compressed low-level feature map and the high-level feature map by four holes with different expansion rates, wherein d in the graph 3 represents the expansion convolution rates which are respectively 1, 2, 3 and 4, so as to extract the multi-scale feature map;
and finally, splicing the extracted multi-scale feature maps according to a channel splicing method to obtain an enhanced low-level feature map and an enhanced high-level feature map.
Multi-scale information of the image can be extracted through scale perception, and the extracted low-level feature map and the high-level feature map are enhanced.
And 4, extracting the global context information of the enhanced low-level feature graph and the enhanced high-level feature graph to obtain the low-level feature graph and the high-level feature graph which are integrated with the global context information. The global context information is aggregated, the feature graph with the multi-scale information is subjected to global context information extraction, and the global context features are aggregated to each pixel through capturing of the spatial information and the channel information, so that the global context is more effectively modeled, and the feature graph with the context information is obtained. As shown in fig. 4, the device specifically includes three parts: (1) context modeling, (2) feature transformation, and (3) feature fusion.
(1) Context modeling: firstly, the feature map X and W are combined k Performing a convolution operation, W k A linear transformation matrix representing a 1 × 1 convolution; then performing softmax operation, and normalizing the attention weight value through a softmax function; at the same time, reshape operation is performed on the feature map X, and then matrix multiplication is performed on the feature map X and the normalized attention weight, which is shown in FIG. 4
Figure BDA0003013560510000081
Multiplying the representative matrices to obtain a global context feature; wherein, the feature map X is an enhanced low-level feature map or a high-level feature map;
(2) feature conversion: global context feature and W v1 Performing a convolution operation, W as shown in FIG. 4 v1 A linear transformation matrix representing a 1 × 1 convolution; after the convolution operation, LayerNorm and Relu operation are sequentially carried out, so that the performance can be improved, and the network can be optimized; finally, a 1 × 1 convolution (W) is performed v2 ) Completing feature conversion; reducing the number of parameters from C.C to 2. C.C/r in the feature conversion process, wherein r is a dimension attenuation ratio, C/r represents a hidden representation dimension of the dimension, and r is generally set to 16; the characteristic conversion can capture the dependency relationship among the channels to obtain the importance degree of each channel; the feature transformation is represented as:
δ(·)=W v2 RuLU(LN(W v1 (·)))
(3) feature fusion: after the feature graph is subjected to context modeling and feature conversion, global context features are aggregated to each position of an original feature graph X through broadcast element addition operation, so that each position i of the original feature graph can acquire global context information, namely, a low-level feature graph and a high-level feature graph which are blended with the global context information can be represented as follows:
Figure BDA0003013560510000091
wherein the input enhanced low-level feature map or high-level feature map is represented by X ∈ R C×W×H C represents the number of channels, order
Figure BDA0003013560510000092
N P The number of positions for feature map X, i.e. W X H,
Figure BDA0003013560510000093
representing the weight of global attention.
And 5, performing up-sampling operation on the high-level feature graph integrated with the global context information to enable the high-level feature graph to be the same as the low-level feature graph integrated with the global context information in size. That is, the context information feature map from the high-level feature is upsampled to obtain a feature map with the same size as the context information feature map from the low-level feature, specifically:
the high-level feature map fused with the global context feature corresponding to the feature map of the fourth layer of VGG-16Net is subjected to up-sampling multiplied by 2 operation, and the high-level feature map fused with the global context feature corresponding to the feature map of the fifth layer is subjected to up-sampling multiplied by 4 operation, so that the obtained feature map and the low-level feature map fused with the global context feature corresponding to the feature map of the third layer can be the same in size.
And 6, fusing the three layers of feature maps together according to a channel splicing method, namely splicing the high-level feature map subjected to the upsampling operation and the low-level feature map fused with the global context information by using the channel splicing method, and obtaining a predicted density map through 1 × 1 convolution.
And 7, finally, integrating and summing the density map to obtain the predicted number of people. That is, the number of people is counted, and the predicted number of people in the image is obtained by integrating the predicted population density map.
Example 2
The embodiment provides a crowd counting system based on a global context convolutional network, which comprises:
the image acquisition module is used for acquiring a crowd image to be counted;
the characteristic extraction module is used for extracting a low-level characteristic diagram and a high-level characteristic diagram of the crowd image; the feature extraction module adopts the first five layers of VGG-16Net, and takes the feature graph output by the third layer as a low-level feature graph and the feature graphs output by the fourth layer and the fifth layer as high-level feature graphs. Given image I, the output profile of VGG-16Net can be expressed as:
f v =F vgg (I)
the characteristic information of the image can be preliminarily extracted through a backbone network of VGG-16 Net;
the scale perception module is used for carrying out scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map; multi-scale information of the image can be extracted through a scale perception module, and the extracted low-level feature map and the extracted high-level feature map are enhanced;
the global context module is used for sequentially carrying out context modeling and feature conversion on the enhanced low-level feature graph and the enhanced high-level feature graph, extracting global context features and obtaining the low-level feature graph and the high-level feature graph which are blended with global context information through feature fusion;
the global context module may be represented as:
Figure BDA0003013560510000101
input characteristic diagram X ∈ R C×W×H C represents the number of channels, order
Figure BDA0003013560510000111
N P For the number of positions of the feature map, i.e. WXH, use
Figure BDA0003013560510000112
Weight representing global attention, δ (·) W v2 RuLU(LN(W v1 (·))) represents a feature transformation;
the density map determining module is used for determining a density map according to the low-level feature map and the high-level feature map which are blended into the global context information;
and the people counting module is used for counting people according to the density map.
Example 3
The present embodiment provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are adapted to be loaded by a processor of a terminal device and execute the population counting method based on the global context convolutional network.
Example 4
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the population counting method based on the global context convolutional network.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (8)

1. A crowd counting method based on a global context convolutional network is characterized in that: the method comprises the following steps:
acquiring a crowd image to be counted;
extracting a low-level feature map and a high-level feature map of the crowd image;
carrying out scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map;
sequentially carrying out context modeling and feature conversion on the enhanced low-level feature graph and the enhanced high-level feature graph, extracting global context features, and obtaining the low-level feature graph and the high-level feature graph which are blended into global context information through feature fusion; the feature fusion aggregates global context features to each position of the enhanced low-level feature graph and the enhanced high-level feature graph through broadcast element addition operation, so that each position can obtain global context information to obtain the low-level feature graph and the high-level feature graph which are fused with the global context information;
determining a density map according to the low-level feature map and the high-level feature map merged into the global context information; the method comprises the following specific steps: carrying out up-sampling operation on the high-level feature graph integrated with the global context information; performing splicing operation and convolution operation on the high-level feature graph subjected to the upsampling operation and the low-level feature graph integrated with the global context information to obtain a density graph;
population counts were made from the density map.
2. The population counting method based on the global context convolutional network as claimed in claim 1, wherein: the specific steps of conducting scale perception on the low-level feature map and the high-level feature map to obtain the enhanced low-level feature map and the enhanced high-level feature map comprise:
compressing channels by four convolution operations on the low-level feature map and the high-level feature map to obtain compressed feature maps;
extracting a multi-scale characteristic diagram from the compressed low-level characteristic diagram and the high-level characteristic diagram through convolution of four cavities with different expansion rates;
and splicing the extracted multi-scale feature maps according to a channel splicing method to obtain an enhanced low-level feature map and an enhanced high-level feature map.
3. The population counting method based on the global context convolutional network as claimed in claim 1, wherein: the specific steps of the context modeling are as follows:
performing convolution operation on the characteristic graph and the linear transformation matrix, and normalizing the attention weight value through a softmax function to obtain a normalized attention weight value;
and carrying out reshape operation on the feature map, and carrying out matrix multiplication on the feature map and the normalized attention weight value to obtain the initial global context feature.
4. The population counting method based on the global context convolutional network as claimed in claim 3, wherein: the specific steps of the feature conversion include:
firstly, performing convolution operation on the initial global context feature and a linear transformation matrix, then sequentially performing LayerNorm and Relu operation, and finally completing feature transformation through 1 × 1 convolution to obtain the global context feature.
5. The population counting method based on the global context convolutional network as claimed in claim 1, wherein: the population counting according to the density map specifically comprises the following steps: the predicted population is obtained by integrating and summing the density maps.
6. A crowd counting system based on a global context convolutional network is characterized in that: the method comprises the following steps:
the image acquisition module is used for acquiring a crowd image to be counted;
the characteristic extraction module is used for extracting a low-level characteristic diagram and a high-level characteristic diagram of the crowd image;
the scale perception module is used for carrying out scale perception on the low-level feature map and the high-level feature map to obtain an enhanced low-level feature map and an enhanced high-level feature map;
the global context module is used for sequentially carrying out context modeling and feature conversion on the enhanced low-level feature graph and the enhanced high-level feature graph, extracting global context features and obtaining the low-level feature graph and the high-level feature graph which are blended with global context information through feature fusion; the feature fusion aggregates global context features to each position of the enhanced low-level feature graph and the enhanced high-level feature graph through broadcast element addition operation, so that each position can obtain global context information to obtain the low-level feature graph and the high-level feature graph which are fused with the global context information;
the density map determining module is used for determining a density map according to the low-level feature map and the high-level feature map which are blended into the global context information; the method comprises the following specific steps: carrying out up-sampling operation on the high-level feature graph integrated with the global context information; performing splicing operation and convolution operation on the high-level feature graph subjected to the upsampling operation and the low-level feature graph fused with the global context information to obtain a density graph;
and the people counting module is used for counting people according to the density map.
7. A computer-readable storage medium characterized by: a plurality of instructions stored therein, the instructions being adapted to be loaded by a processor of a terminal device and to perform a global context convolutional network-based population counting method as claimed in any one of claims 1 to 5.
8. A terminal device is characterized in that: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform a global context convolutional network-based population counting method as claimed in any one of claims 1 to 5.
CN202110382645.6A 2021-04-09 2021-04-09 Crowd counting method and system based on global context convolutional network Active CN113192009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110382645.6A CN113192009B (en) 2021-04-09 2021-04-09 Crowd counting method and system based on global context convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110382645.6A CN113192009B (en) 2021-04-09 2021-04-09 Crowd counting method and system based on global context convolutional network

Publications (2)

Publication Number Publication Date
CN113192009A CN113192009A (en) 2021-07-30
CN113192009B true CN113192009B (en) 2022-09-02

Family

ID=76975232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110382645.6A Active CN113192009B (en) 2021-04-09 2021-04-09 Crowd counting method and system based on global context convolutional network

Country Status (1)

Country Link
CN (1) CN113192009B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946952B2 (en) * 2013-06-25 2018-04-17 University Of Central Florida Research Foundation, Inc. Multi-source, multi-scale counting in dense crowd images
CN111626237A (en) * 2020-05-29 2020-09-04 中国民航大学 Crowd counting method and system based on enhanced multi-scale perception network
CN112132023B (en) * 2020-09-22 2024-05-17 上海应用技术大学 Crowd counting method based on multi-scale context enhancement network
CN112541459A (en) * 2020-12-21 2021-03-23 山东师范大学 Crowd counting method and system based on multi-scale perception attention network
CN112580545B (en) * 2020-12-24 2022-07-29 山东师范大学 Crowd counting method and system based on multi-scale self-adaptive context network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Context-aware crowd counting》;Liu W et al.;《Proceedings of the IEEE/CVF conference on computer vision and pattern recognition》;20190630;第5099-5108页 *
《Multi‐level feature fusion network for crowd counting》;Wang L et al.;《IET Computer Vision》;20210205;第60-72页 *
《一种多尺度融合的深度人群计数算法》;左静;《激光与光电子学进展》;20200624;第1-13页 *

Also Published As

Publication number Publication date
CN113192009A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN109271933B (en) Method for estimating three-dimensional human body posture based on video stream
CN111784602B (en) Method for generating countermeasure network for image restoration
CN111445418B (en) Image defogging processing method and device and computer equipment
CN108510451B (en) Method for reconstructing license plate based on double-layer convolutional neural network
CN109919032B (en) Video abnormal behavior detection method based on motion prediction
CN108288270B (en) Target detection method based on channel pruning and full convolution deep learning
CN109509149A (en) A kind of super resolution ratio reconstruction method based on binary channels convolutional network Fusion Features
CN109993269B (en) Single image crowd counting method based on attention mechanism
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN110659664B (en) SSD-based high-precision small object identification method
CN111062395B (en) Real-time video semantic segmentation method
CN112541459A (en) Crowd counting method and system based on multi-scale perception attention network
CN112766123B (en) Crowd counting method and system based on criss-cross attention network
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN112308087A (en) Integrated imaging identification system and method based on dynamic vision sensor
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN116485646A (en) Micro-attention-based light-weight image super-resolution reconstruction method and device
CN111126185A (en) Deep learning vehicle target identification method for road intersection scene
CN113192084A (en) Machine vision-based highway slope micro-displacement deformation monitoring method
CN116778346B (en) Pipeline identification method and system based on improved self-attention mechanism
CN111951260B (en) Partial feature fusion based convolutional neural network real-time target counting system and method
CN113192009B (en) Crowd counting method and system based on global context convolutional network
CN116740547A (en) Digital twinning-based substation target detection method, system, equipment and medium
CN115170803A (en) E-SOLO-based city street view example segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant