CN112966600A - Adaptive multi-scale context aggregation method for crowded crowd counting - Google Patents

Adaptive multi-scale context aggregation method for crowded crowd counting Download PDF

Info

Publication number
CN112966600A
CN112966600A CN202110242403.7A CN202110242403A CN112966600A CN 112966600 A CN112966600 A CN 112966600A CN 202110242403 A CN202110242403 A CN 202110242403A CN 112966600 A CN112966600 A CN 112966600A
Authority
CN
China
Prior art keywords
scale
context
resolution
scale context
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110242403.7A
Other languages
Chinese (zh)
Other versions
CN112966600B (en
Inventor
赵怀林
梁兰军
张亚妮
周方波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Technology
Original Assignee
Shanghai Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Technology filed Critical Shanghai Institute of Technology
Priority to CN202110242403.7A priority Critical patent/CN112966600B/en
Publication of CN112966600A publication Critical patent/CN112966600A/en
Application granted granted Critical
Publication of CN112966600B publication Critical patent/CN112966600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a self-adaptive multi-scale context aggregation method for crowd counting, which comprises the following steps: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being j times of the resolution of the input image; inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph; and integrating and summing the density maps to obtain the predicted number of people. The invention effectively extracts multi-scale information, solves the problem of non-uniform head size, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of the information, can have more accurate density estimation in crowded scenes and has higher robustness.

Description

Adaptive multi-scale context aggregation method for crowded crowd counting
Technical Field
The invention relates to the technical field of data processing, in particular to an adaptive multi-scale context aggregation method for crowd counting.
Background
People counting is a basic task of computer vision-based people analysis, aiming at automatically detecting crowding conditions.
However, in a crowd scene, tasks often encounter some challenging factors, such as severe occlusion, scale change, diversity of crowd distribution, and the like, especially in a very crowded scene, it becomes difficult to estimate the crowdedness due to the visual similarity of the foreground crowd and the background object and the scale change of the human head.
Currently, networks for directly aggregating contextual features of different scales exist, but not all features are useful for final population counting, and the performance of the counting network is affected due to redundancy of information caused by direct aggregation.
Disclosure of Invention
In view of the deficiencies in the prior art, it is an object of the present invention to provide an adaptive multi-scale context aggregation method for crowd counting.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which comprises the following steps:
step 1: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being i times of the resolution of the input image;
step 2: inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and used for converting the multi-scale context features into a feature map with higher resolution;
and step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph;
and 4, step 4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
and 5: and performing integral summation on the generated density map to obtain the predicted number of people.
Optionally, the step 4 includes:
generating a truth-value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark point, wherein the calculation formula of the density map is as follows:
Figure BDA0002961904370000021
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing a Gaussian kernel, delta (·) expressing a Dirac function, sigma being a standard deviation, N representing the total number of people in the picture, and x expressing a pixel point of the picture.
Optionally, the step 2 includes:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features and the large-scale context features; the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates;
by using
Figure BDA00029619043700000218
To represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,
Figure BDA0002961904370000022
indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,
Figure BDA0002961904370000023
representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is j times of the original resolution; w multiplied by H represents the resolution of the image, C represents the channel number of the image, and R represents the set of all the feature maps with j times of resolution;
inputting the feature map extracted by the convolution of the hole into a channel attention module, wherein the channel attention module adopts a selection function f to select in an adaptive way
Figure BDA0002961904370000024
Useful context feature information, and output a feature map Y in which the context information is aggregatedj∈RjW×jH×CWhere Yj is defined as follows:
Figure BDA0002961904370000025
Yja feature map representing the j times resolution extracted by the aggregation module,
Figure BDA0002961904370000026
meaning that the summation is element-by-element,
Figure BDA0002961904370000027
the feature map of the 1 st scale is shown to be extracted,
Figure BDA0002961904370000028
the feature map of the 2 nd scale is shown to be extracted,
Figure BDA0002961904370000029
the feature map of the 3 rd scale is shown to be extracted,
Figure BDA00029619043700000210
and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
Optionally, said adaptively selecting using a selection function f
Figure BDA00029619043700000211
Context feature information useful in this context, including:
performing pooling treatment on each context feature through a global space average pooling layer, and outputting feature information
Figure BDA00029619043700000212
With a bottleneck structure consisting of two fully-connected layersFor the characteristic information FavgProcessing is carried out, and output characteristics are normalized to (0, 1) through a sigmoid function, wherein the calculation formula of the self-adaptive output coefficient is as follows:
Figure BDA00029619043700000213
in the formula:
Figure BDA00029619043700000214
and
Figure BDA00029619043700000215
respectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,
Figure BDA00029619043700000216
to represent
Figure BDA00029619043700000217
Averaging the output after pooling;
adding a residual connection between the input and the output of the channel attention mechanism, resulting in a selection function defined as follows:
Figure BDA0002961904370000031
in the formula:
Figure BDA0002961904370000032
the output of the ith channel attention mechanism module is shown,
Figure BDA0002961904370000033
representing a feature map representing the convolution extraction of holes at the ith scale,
Figure BDA0002961904370000034
attention mechanism module for indicating ith channelThe adaptive coefficient of (2).
Compared with the prior art, the invention has the following beneficial effects:
the self-adaptive multi-scale context aggregation method for counting crowds effectively extracts multi-scale information, solves the problem of nonuniform head sizes, avoids information redundancy through self-adaptive selection and aggregation of useful context information through a channel attention mechanism, can realize more accurate density estimation in crowded scenes, and has higher robustness.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which is used for crowd density estimation in a crowd scene. The method mainly comprises the following steps: inputting a picture, firstly extracting characteristic information through a backbone network, and then inputting the extracted characteristic graph into a plurality of multi-scale context aggregation modules in a cascading mode. The module firstly extracts multi-scale information by using convolution kernels with different void ratios, and then adaptively selects channel context characteristic information through a channel attention mechanism and carries out aggregation. And each time the multi-scale context aggregation module passes through, converting the feature map into a feature map with higher resolution by means of upsampling, finally outputting an estimated density map by means of a 1-by-1 convolution kernel, and obtaining the predicted number of people by means of integral summation. The method provided by the invention effectively extracts multi-scale information through a plurality of convolution kernels with different void ratios, solves the problem of nonuniform head sizes, avoids information redundancy through self-adaptive selection and aggregation of useful context information through a channel attention mechanism, can realize more accurate density estimation in crowded scenes, and has higher robustness.
Fig. 1 is a schematic diagram of a principle of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention, as shown in fig. 1, the method may include the following steps:
step S1: and inputting the sample picture into a backbone network, and extracting a feature map with the size being i times of the resolution of the original image.
Step S2: and inputting the extracted feature maps into a plurality of self-adaptive multi-scale context aggregation modules in a cascading mode, extracting and self-adaptively aggregating multi-scale context information, wherein an up-sampling layer is arranged behind each module and used for converting the multi-scale context features into feature maps with higher resolution.
Step S3: and performing 1 × 1 convolutional layer processing on the generated multi-scale context features to generate a density map.
Step S4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
step S5: and integrating and summing the density maps to obtain the predicted number of people.
In this embodiment, according to the picture with the head mark point, the true density map of the crowd is generated through the gaussian kernel convolution, and the pixel point with the head is represented as xiThe Gaussian kernel is denoted GσThen the true density map can be expressed as:
Figure BDA0002961904370000041
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing the Gaussian kernel, delta (. cndot.) expressing the Dirac function, sigma being the standard deviation, and N representing the graphThe total number of slices, x, represents the pixel points of the picture.
Specifically, the adaptive multi-scale context aggregation module of step S2 is shown in fig. 1, which adaptively selects reliable small-scale context features and aggregates them with large-scale context features. The specific operation is as follows:
the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates, and is used for
Figure BDA0002961904370000042
Figure BDA0002961904370000043
To represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,
Figure BDA0002961904370000044
indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,
Figure BDA0002961904370000045
representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is i times of the original resolution, W multiplied by H represents the resolution of an image, C represents the number of channels of the image, and R represents a set of all feature maps of j times of resolution; the feature maps of the hole convolution extraction are then input into a channel attention module (CA) which adaptively selects using a selection function f
Figure BDA0002961904370000046
Useful context feature information, and finally outputting a feature graph Y aggregated with the context informationj∈RjW×jH×CThe definition is as follows:
Figure BDA0002961904370000047
in the formula: y isjA feature map representing the j times resolution extracted by the aggregation module,
Figure BDA0002961904370000048
meaning that the summation is element-by-element,
Figure BDA0002961904370000049
the feature map of the 1 st scale is shown to be extracted,
Figure BDA00029619043700000410
the feature map of the 2 nd scale is shown to be extracted,
Figure BDA00029619043700000411
the feature map of the 3 rd scale is shown to be extracted,
Figure BDA0002961904370000051
and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
Illustratively, the selection function f adopts a channel attention mechanism for aggregating multi-scale context information, and the specific operations are as follows:
each feature is first passed through a global spatial averaging pooling layer (denoted as F)avg) Then, the features are processed by adopting a bottleneck structure consisting of two completely connected layers, and finally, the output features are normalized to (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
Figure BDA0002961904370000052
in the formula:
Figure BDA0002961904370000053
and
Figure BDA0002961904370000054
respectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,
Figure BDA0002961904370000055
to represent
Figure BDA0002961904370000056
Averaging the output after pooling;
furthermore, for better optimization, a residual connection is added between the input and output of the channel attention mechanism, and the final selection function is defined as:
Figure BDA0002961904370000057
compared with the existing counting method, the method adopts convolution with different void rates to extract multi-scale information, self-adaptive selection and aggregation of the multi-scale context information are performed through a channel attention mechanism, good performance is shown in a crowd scene, and the crowd counting precision is improved.
The technical solution of the present invention will be described in more detail with reference to specific examples. When the pixel value and the label of a picture are known, a true value density map corresponding to the picture is obtained through gaussian convolution, and the true value density map can be represented as:
Figure BDA0002961904370000058
Figure BDA0002961904370000059
in the formula, xi represents a pixel point with a head, x represents all pixel points, and GσExpressed as a gaussian kernel, δ (·) represents the dirac function, σ is the standard deviation, and N represents the total number of people in the picture.
Then, learning a complex nonlinear mapping from the input image to the crowd estimation density map through a multi-scale context aggregation network, wherein the specific details are as follows:
the front ten layers of VGG-16 are selected as a backbone network, pictures are input into the backbone network, feature information is extracted, and the size of a feature map is 1\8 of the size of an input image.
And (4) convolving the extracted feature map by using a 3-by-3 convolution kernel, and then sending the feature information to the multi-scale context aggregation module. Firstly, extracting different scale characteristics through a plurality of void convolution branches with different void rates, and recording each scale characteristic as
Figure BDA00029619043700000510
There are a total of n scale information.
Will be provided with
Figure BDA0002961904370000061
The multi-scale context information is adaptively aggregated by the attention module. The method comprises the steps of firstly extracting context information through a global space average pooling layer, then processing features by adopting a bottleneck structure formed by two completely connected layers, and finally normalizing output features to be (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
Figure BDA0002961904370000062
finally, we directly perform residual connection on the input and output of the channel attention mechanism, and the final output result is:
Figure BDA0002961904370000063
will be provided with
Figure BDA0002961904370000064
Multi-scale contextual feature information selected by attention mechanism
Figure BDA0002961904370000065
And 2 nd scale information
Figure BDA0002961904370000066
The pixel-by-pixel summation is performed, which can be expressed as:
Figure BDA0002961904370000067
extract the obtained
Figure BDA0002961904370000068
The feature information is sent to a channel attention mechanism to self-adaptively select context information, pixel summation is carried out on the feature information and the feature information of the 3 rd scale, and the like, and finally the feature mapping which aggregates the multi-scale context information is obtained:
Figure BDA0002961904370000069
after multi-scale context information is extracted by the multi-scale context aggregation module, the multi-scale context information is converted into a feature map with higher resolution by up-sampling. And then sending the data to a multi-scale context aggregation module for feature extraction in the same mode, passing through three multi-scale context aggregation modules all the time, finally outputting an estimated density map through a 1 x 1 convolution kernel, and calculating a loss function L (theta):
Figure BDA00029619043700000610
wherein F (I)i(ii) a θ) is a density map of the output of the network, FiThe method is characterized in that the method is a real density graph, theta is a parameter required to be optimized by a network, and the network continuously optimizes the parameter theta through a gradient descent method to find a parameter value which enables a loss function to be minimum.
It should be noted that, the steps in the adaptive multi-scale context aggregation method for counting crowds provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the adaptive multi-scale context aggregation system for counting crowds, and those skilled in the art may refer to the technical scheme of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred embodiment for implementing the method, and details are not repeated here.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (4)

1. An adaptive multi-scale context aggregation method for crowd counting, comprising:
step 1: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being j times of the resolution of the input image;
step 2: inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and used for converting the multi-scale context features into a feature map with higher resolution;
and step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph;
and 4, step 4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
and 5: and performing integral summation on the generated density map to obtain the predicted number of people.
2. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 4 comprises:
generating a truth-value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark point, wherein the calculation formula of the density map is as follows:
Figure FDA0002961904360000011
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing a Gaussian kernel, delta (·) expressing a Dirac function, sigma being a standard deviation, N representing the total number of people in the picture, and x expressing a pixel point of the picture.
3. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 2 comprises:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features and the large-scale context features; the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates;
by using
Figure FDA0002961904360000012
To represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,
Figure FDA0002961904360000013
indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,
Figure FDA0002961904360000014
representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is j times of the original resolution; w multiplied by H represents the resolution of the image, C represents the channel number of the image, and R represents the set of all the feature maps with j times of resolution;
inputting the feature map extracted by the convolution of the hole into a channel attention module, wherein the channel attention module adopts a selection function f to select in an adaptive way
Figure FDA0002961904360000015
Useful context feature information, and output a feature map Y in which the context information is aggregatedj∈RjW×jH×CWherein Y isjThe definition is as follows:
Figure FDA0002961904360000021
Yja feature map representing the j times resolution extracted by the aggregation module,
Figure FDA0002961904360000022
meaning that the summation is element-by-element,
Figure FDA0002961904360000023
the feature map of the 1 st scale is shown to be extracted,
Figure FDA0002961904360000024
the feature map of the 2 nd scale is shown to be extracted,
Figure FDA0002961904360000025
the feature map of the 3 rd scale is shown to be extracted,
Figure FDA0002961904360000026
and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
4. The adaptive multi-scale context aggregation method for crowd counting according to claim 3, wherein the adaptive selection using a selection function f
Figure FDA0002961904360000027
Context feature information useful in this context, including:
performing pooling treatment on each context feature through a global space average pooling layer, and outputting feature information
Figure FDA0002961904360000028
Using a bottleneck-structure pair consisting of two fully-connected layersavgProcessing is carried out, and output characteristics are normalized to (0, 1) through a sigmoid function, wherein the calculation formula of the self-adaptive output coefficient is as follows:
Figure FDA0002961904360000029
in the formula:
Figure FDA00029619043600000210
and
Figure FDA00029619043600000211
respectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,
Figure FDA00029619043600000212
to represent
Figure FDA00029619043600000213
Averaging the output after pooling;
adding a residual connection between the input and the output of the channel attention mechanism, resulting in a selection function defined as follows:
Figure FDA00029619043600000214
in the formula:
Figure FDA00029619043600000215
the output of the ith channel attention mechanism module is shown,
Figure FDA00029619043600000216
representing a feature map representing the convolution extraction of holes at the ith scale,
Figure FDA00029619043600000217
the adaptive coefficients of the ith channel attention mechanism module are shown.
CN202110242403.7A 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting Active CN112966600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110242403.7A CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110242403.7A CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Publications (2)

Publication Number Publication Date
CN112966600A true CN112966600A (en) 2021-06-15
CN112966600B CN112966600B (en) 2024-04-16

Family

ID=76277443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110242403.7A Active CN112966600B (en) 2021-03-04 2021-03-04 Self-adaptive multi-scale context aggregation method for crowded population counting

Country Status (1)

Country Link
CN (1) CN112966600B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120233A (en) * 2021-11-29 2022-03-01 上海应用技术大学 Training method of lightweight pyramid hole convolution aggregation network for crowd counting

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
WO2020169043A1 (en) * 2019-02-21 2020-08-27 苏州大学 Dense crowd counting method, apparatus and device, and storage medium
CN111709290A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Crowd counting method based on coding and decoding-jumping connection scale pyramid network
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020169043A1 (en) * 2019-02-21 2020-08-27 苏州大学 Dense crowd counting method, apparatus and device, and storage medium
CN110263849A (en) * 2019-06-19 2019-09-20 合肥工业大学 A kind of crowd density estimation method based on multiple dimensioned attention mechanism
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
CN111709290A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Crowd counting method based on coding and decoding-jumping connection scale pyramid network
CN112132023A (en) * 2020-09-22 2020-12-25 上海应用技术大学 Crowd counting method based on multi-scale context enhanced network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈朋;汤一平;王丽冉;何霞;: "多层次特征融合的人群密度估计", 中国图象图形学报, no. 08 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114120233A (en) * 2021-11-29 2022-03-01 上海应用技术大学 Training method of lightweight pyramid hole convolution aggregation network for crowd counting
CN114120233B (en) * 2021-11-29 2024-04-16 上海应用技术大学 Training method of lightweight pyramid cavity convolution aggregation network for crowd counting

Also Published As

Publication number Publication date
CN112966600B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN110232394B (en) Multi-scale image semantic segmentation method
WO2020177651A1 (en) Image segmentation method and image processing device
CN108256562B (en) Salient target detection method and system based on weak supervision time-space cascade neural network
US10339421B2 (en) RGB-D scene labeling with multimodal recurrent neural networks
EP3427195B1 (en) Convolutional neural networks, particularly for image analysis
US11823443B2 (en) Segmenting objects by refining shape priors
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
JP2022536807A (en) Real-time video ultra-high resolution
US9710697B2 (en) Method and system for exacting face features from data of face images
CN112990219B (en) Method and device for image semantic segmentation
CN111582483A (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
KR20170038622A (en) Device and method to segment object from image
CN110176024B (en) Method, device, equipment and storage medium for detecting target in video
CN107506792B (en) Semi-supervised salient object detection method
US11568212B2 (en) Techniques for understanding how trained neural networks operate
CN111815665A (en) Single image crowd counting method based on depth information and scale perception information
JP2023507248A (en) System and method for object detection and recognition
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN114140346A (en) Image processing method and device
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN116194933A (en) Processing system, processing method, and processing program
CN112132867B (en) Remote sensing image change detection method and device
CN112966600B (en) Self-adaptive multi-scale context aggregation method for crowded population counting
CN117315310A (en) Image recognition method, image recognition model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant