CN112966600A - Adaptive multi-scale context aggregation method for crowded crowd counting - Google Patents
Adaptive multi-scale context aggregation method for crowded crowd counting Download PDFInfo
- Publication number
- CN112966600A CN112966600A CN202110242403.7A CN202110242403A CN112966600A CN 112966600 A CN112966600 A CN 112966600A CN 202110242403 A CN202110242403 A CN 202110242403A CN 112966600 A CN112966600 A CN 112966600A
- Authority
- CN
- China
- Prior art keywords
- scale
- context
- resolution
- scale context
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 43
- 238000004220 aggregation Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000003044 adaptive effect Effects 0.000 title claims description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000004931 aggregating effect Effects 0.000 claims abstract description 6
- 238000010586 diagram Methods 0.000 claims abstract description 5
- 239000011800 void material Substances 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 23
- 230000008859 change Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a self-adaptive multi-scale context aggregation method for crowd counting, which comprises the following steps: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being j times of the resolution of the input image; inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph; and integrating and summing the density maps to obtain the predicted number of people. The invention effectively extracts multi-scale information, solves the problem of non-uniform head size, adaptively selects and aggregates useful context information through a channel attention mechanism, avoids redundancy of the information, can have more accurate density estimation in crowded scenes and has higher robustness.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an adaptive multi-scale context aggregation method for crowd counting.
Background
People counting is a basic task of computer vision-based people analysis, aiming at automatically detecting crowding conditions.
However, in a crowd scene, tasks often encounter some challenging factors, such as severe occlusion, scale change, diversity of crowd distribution, and the like, especially in a very crowded scene, it becomes difficult to estimate the crowdedness due to the visual similarity of the foreground crowd and the background object and the scale change of the human head.
Currently, networks for directly aggregating contextual features of different scales exist, but not all features are useful for final population counting, and the performance of the counting network is affected due to redundancy of information caused by direct aggregation.
Disclosure of Invention
In view of the deficiencies in the prior art, it is an object of the present invention to provide an adaptive multi-scale context aggregation method for crowd counting.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which comprises the following steps:
step 1: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being i times of the resolution of the input image;
step 2: inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and used for converting the multi-scale context features into a feature map with higher resolution;
and step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph;
and 4, step 4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
and 5: and performing integral summation on the generated density map to obtain the predicted number of people.
Optionally, the step 4 includes:
generating a truth-value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark point, wherein the calculation formula of the density map is as follows:
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing a Gaussian kernel, delta (·) expressing a Dirac function, sigma being a standard deviation, N representing the total number of people in the picture, and x expressing a pixel point of the picture.
Optionally, the step 2 includes:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features and the large-scale context features; the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates;
by usingTo represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is j times of the original resolution; w multiplied by H represents the resolution of the image, C represents the channel number of the image, and R represents the set of all the feature maps with j times of resolution;
inputting the feature map extracted by the convolution of the hole into a channel attention module, wherein the channel attention module adopts a selection function f to select in an adaptive wayUseful context feature information, and output a feature map Y in which the context information is aggregatedj∈RjW×jH×CWhere Yj is defined as follows:
Yja feature map representing the j times resolution extracted by the aggregation module,meaning that the summation is element-by-element,the feature map of the 1 st scale is shown to be extracted,the feature map of the 2 nd scale is shown to be extracted,the feature map of the 3 rd scale is shown to be extracted,and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
Optionally, said adaptively selecting using a selection function fContext feature information useful in this context, including:
performing pooling treatment on each context feature through a global space average pooling layer, and outputting feature information
With a bottleneck structure consisting of two fully-connected layersFor the characteristic information FavgProcessing is carried out, and output characteristics are normalized to (0, 1) through a sigmoid function, wherein the calculation formula of the self-adaptive output coefficient is as follows:
in the formula:andrespectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,to representAveraging the output after pooling;
adding a residual connection between the input and the output of the channel attention mechanism, resulting in a selection function defined as follows:
in the formula:the output of the ith channel attention mechanism module is shown,representing a feature map representing the convolution extraction of holes at the ith scale,attention mechanism module for indicating ith channelThe adaptive coefficient of (2).
Compared with the prior art, the invention has the following beneficial effects:
the self-adaptive multi-scale context aggregation method for counting crowds effectively extracts multi-scale information, solves the problem of nonuniform head sizes, avoids information redundancy through self-adaptive selection and aggregation of useful context information through a channel attention mechanism, can realize more accurate density estimation in crowded scenes, and has higher robustness.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a schematic diagram of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides an adaptive multi-scale context aggregation method for crowd counting, which is used for crowd density estimation in a crowd scene. The method mainly comprises the following steps: inputting a picture, firstly extracting characteristic information through a backbone network, and then inputting the extracted characteristic graph into a plurality of multi-scale context aggregation modules in a cascading mode. The module firstly extracts multi-scale information by using convolution kernels with different void ratios, and then adaptively selects channel context characteristic information through a channel attention mechanism and carries out aggregation. And each time the multi-scale context aggregation module passes through, converting the feature map into a feature map with higher resolution by means of upsampling, finally outputting an estimated density map by means of a 1-by-1 convolution kernel, and obtaining the predicted number of people by means of integral summation. The method provided by the invention effectively extracts multi-scale information through a plurality of convolution kernels with different void ratios, solves the problem of nonuniform head sizes, avoids information redundancy through self-adaptive selection and aggregation of useful context information through a channel attention mechanism, can realize more accurate density estimation in crowded scenes, and has higher robustness.
Fig. 1 is a schematic diagram of a principle of an adaptive multi-scale context aggregation method for crowd counting according to an embodiment of the present invention, as shown in fig. 1, the method may include the following steps:
step S1: and inputting the sample picture into a backbone network, and extracting a feature map with the size being i times of the resolution of the original image.
Step S2: and inputting the extracted feature maps into a plurality of self-adaptive multi-scale context aggregation modules in a cascading mode, extracting and self-adaptively aggregating multi-scale context information, wherein an up-sampling layer is arranged behind each module and used for converting the multi-scale context features into feature maps with higher resolution.
Step S3: and performing 1 × 1 convolutional layer processing on the generated multi-scale context features to generate a density map.
Step S4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
step S5: and integrating and summing the density maps to obtain the predicted number of people.
In this embodiment, according to the picture with the head mark point, the true density map of the crowd is generated through the gaussian kernel convolution, and the pixel point with the head is represented as xiThe Gaussian kernel is denoted GσThen the true density map can be expressed as:
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing the Gaussian kernel, delta (. cndot.) expressing the Dirac function, sigma being the standard deviation, and N representing the graphThe total number of slices, x, represents the pixel points of the picture.
Specifically, the adaptive multi-scale context aggregation module of step S2 is shown in fig. 1, which adaptively selects reliable small-scale context features and aggregates them with large-scale context features. The specific operation is as follows:
the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates, and is used for To represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is i times of the original resolution, W multiplied by H represents the resolution of an image, C represents the number of channels of the image, and R represents a set of all feature maps of j times of resolution; the feature maps of the hole convolution extraction are then input into a channel attention module (CA) which adaptively selects using a selection function fUseful context feature information, and finally outputting a feature graph Y aggregated with the context informationj∈RjW×jH×CThe definition is as follows:
in the formula: y isjA feature map representing the j times resolution extracted by the aggregation module,meaning that the summation is element-by-element,the feature map of the 1 st scale is shown to be extracted,the feature map of the 2 nd scale is shown to be extracted,the feature map of the 3 rd scale is shown to be extracted,and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
Illustratively, the selection function f adopts a channel attention mechanism for aggregating multi-scale context information, and the specific operations are as follows:
each feature is first passed through a global spatial averaging pooling layer (denoted as F)avg) Then, the features are processed by adopting a bottleneck structure consisting of two completely connected layers, and finally, the output features are normalized to (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
in the formula:andrespectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,to representAveraging the output after pooling;
furthermore, for better optimization, a residual connection is added between the input and output of the channel attention mechanism, and the final selection function is defined as:
compared with the existing counting method, the method adopts convolution with different void rates to extract multi-scale information, self-adaptive selection and aggregation of the multi-scale context information are performed through a channel attention mechanism, good performance is shown in a crowd scene, and the crowd counting precision is improved.
The technical solution of the present invention will be described in more detail with reference to specific examples. When the pixel value and the label of a picture are known, a true value density map corresponding to the picture is obtained through gaussian convolution, and the true value density map can be represented as: in the formula, xi represents a pixel point with a head, x represents all pixel points, and GσExpressed as a gaussian kernel, δ (·) represents the dirac function, σ is the standard deviation, and N represents the total number of people in the picture.
Then, learning a complex nonlinear mapping from the input image to the crowd estimation density map through a multi-scale context aggregation network, wherein the specific details are as follows:
the front ten layers of VGG-16 are selected as a backbone network, pictures are input into the backbone network, feature information is extracted, and the size of a feature map is 1\8 of the size of an input image.
And (4) convolving the extracted feature map by using a 3-by-3 convolution kernel, and then sending the feature information to the multi-scale context aggregation module. Firstly, extracting different scale characteristics through a plurality of void convolution branches with different void rates, and recording each scale characteristic asThere are a total of n scale information.
Will be provided withThe multi-scale context information is adaptively aggregated by the attention module. The method comprises the steps of firstly extracting context information through a global space average pooling layer, then processing features by adopting a bottleneck structure formed by two completely connected layers, and finally normalizing output features to be (0, 1) through a sigmoid function. The adaptive output coefficient may be expressed as:
finally, we directly perform residual connection on the input and output of the channel attention mechanism, and the final output result is:
will be provided withMulti-scale contextual feature information selected by attention mechanismAnd 2 nd scale informationThe pixel-by-pixel summation is performed, which can be expressed as:
extract the obtainedThe feature information is sent to a channel attention mechanism to self-adaptively select context information, pixel summation is carried out on the feature information and the feature information of the 3 rd scale, and the like, and finally the feature mapping which aggregates the multi-scale context information is obtained:
after multi-scale context information is extracted by the multi-scale context aggregation module, the multi-scale context information is converted into a feature map with higher resolution by up-sampling. And then sending the data to a multi-scale context aggregation module for feature extraction in the same mode, passing through three multi-scale context aggregation modules all the time, finally outputting an estimated density map through a 1 x 1 convolution kernel, and calculating a loss function L (theta):
wherein F (I)i(ii) a θ) is a density map of the output of the network, FiThe method is characterized in that the method is a real density graph, theta is a parameter required to be optimized by a network, and the network continuously optimizes the parameter theta through a gradient descent method to find a parameter value which enables a loss function to be minimum.
It should be noted that, the steps in the adaptive multi-scale context aggregation method for counting crowds provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the adaptive multi-scale context aggregation system for counting crowds, and those skilled in the art may refer to the technical scheme of the system to implement the step flow of the method, that is, the embodiment in the system may be understood as a preferred embodiment for implementing the method, and details are not repeated here.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (4)
1. An adaptive multi-scale context aggregation method for crowd counting, comprising:
step 1: inputting the sample picture into a backbone network, and extracting a characteristic diagram with the size being j times of the resolution of the input image;
step 2: inputting the extracted feature graph into a plurality of multi-scale context aggregation modules in a cascading mode, extracting and adaptively aggregating multi-scale context information to obtain multi-scale context features; an up-sampling layer is arranged behind each multi-scale context aggregation module and used for converting the multi-scale context features into a feature map with higher resolution;
and step 3: performing convolution layer processing on the generated multi-scale context characteristics to generate a density graph;
and 4, step 4: calculating a loss function between the generated density map and the true density map, and optimizing network parameters;
and 5: and performing integral summation on the generated density map to obtain the predicted number of people.
2. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 4 comprises:
generating a truth-value density map of the crowd through Gaussian kernel convolution according to the picture with the head mark point, wherein the calculation formula of the density map is as follows:
wherein, Fi(x) Representing a truth density map, xiPixel points representing a person's head, GσExpressing a Gaussian kernel, delta (·) expressing a Dirac function, sigma being a standard deviation, N representing the total number of people in the picture, and x expressing a pixel point of the picture.
3. The adaptive multi-scale context aggregation method for crowd counting according to claim 1, wherein the step 2 comprises:
the multi-scale context aggregation module adaptively selects small-scale context features and aggregates the small-scale context features and the large-scale context features; the multi-scale context aggregation module comprises a plurality of branches of hole convolution with different hole rates;
by usingTo represent features extracted by the convolution of holes at the ith scale; where i represents the void rate of the convolution kernel,indicating that the resolution is j times the resolution of the input image, r indicates the reduction rate of the backbone network,representing a feature map of void convolution extraction of the ith scale, wherein the resolution of the feature map is j times of the original resolution; w multiplied by H represents the resolution of the image, C represents the channel number of the image, and R represents the set of all the feature maps with j times of resolution;
inputting the feature map extracted by the convolution of the hole into a channel attention module, wherein the channel attention module adopts a selection function f to select in an adaptive wayUseful context feature information, and output a feature map Y in which the context information is aggregatedj∈RjW×jH×CWherein Y isjThe definition is as follows:
Yja feature map representing the j times resolution extracted by the aggregation module,meaning that the summation is element-by-element,the feature map of the 1 st scale is shown to be extracted,the feature map of the 2 nd scale is shown to be extracted,the feature map of the 3 rd scale is shown to be extracted,and j represents that the characteristic map of the nth scale is extracted, and the resolution is j times of the resolution of the input picture.
4. The adaptive multi-scale context aggregation method for crowd counting according to claim 3, wherein the adaptive selection using a selection function fContext feature information useful in this context, including:
performing pooling treatment on each context feature through a global space average pooling layer, and outputting feature information
Using a bottleneck-structure pair consisting of two fully-connected layersavgProcessing is carried out, and output characteristics are normalized to (0, 1) through a sigmoid function, wherein the calculation formula of the self-adaptive output coefficient is as follows:
in the formula:andrespectively representing the weight coefficients of two fully connected layers, wherein the first fully connected layer is followed by a RELU function, the second fully connected layer is followed by a Sigmoid function,to representAveraging the output after pooling;
adding a residual connection between the input and the output of the channel attention mechanism, resulting in a selection function defined as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242403.7A CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110242403.7A CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112966600A true CN112966600A (en) | 2021-06-15 |
CN112966600B CN112966600B (en) | 2024-04-16 |
Family
ID=76277443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110242403.7A Active CN112966600B (en) | 2021-03-04 | 2021-03-04 | Self-adaptive multi-scale context aggregation method for crowded population counting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966600B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120233A (en) * | 2021-11-29 | 2022-03-01 | 上海应用技术大学 | Training method of lightweight pyramid hole convolution aggregation network for crowd counting |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263849A (en) * | 2019-06-19 | 2019-09-20 | 合肥工业大学 | A kind of crowd density estimation method based on multiple dimensioned attention mechanism |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
WO2020169043A1 (en) * | 2019-02-21 | 2020-08-27 | 苏州大学 | Dense crowd counting method, apparatus and device, and storage medium |
CN111709290A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Crowd counting method based on coding and decoding-jumping connection scale pyramid network |
CN112132023A (en) * | 2020-09-22 | 2020-12-25 | 上海应用技术大学 | Crowd counting method based on multi-scale context enhanced network |
-
2021
- 2021-03-04 CN CN202110242403.7A patent/CN112966600B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020169043A1 (en) * | 2019-02-21 | 2020-08-27 | 苏州大学 | Dense crowd counting method, apparatus and device, and storage medium |
CN110263849A (en) * | 2019-06-19 | 2019-09-20 | 合肥工业大学 | A kind of crowd density estimation method based on multiple dimensioned attention mechanism |
CN111242036A (en) * | 2020-01-14 | 2020-06-05 | 西安建筑科技大学 | Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network |
CN111709290A (en) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | Crowd counting method based on coding and decoding-jumping connection scale pyramid network |
CN112132023A (en) * | 2020-09-22 | 2020-12-25 | 上海应用技术大学 | Crowd counting method based on multi-scale context enhanced network |
Non-Patent Citations (1)
Title |
---|
陈朋;汤一平;王丽冉;何霞;: "多层次特征融合的人群密度估计", 中国图象图形学报, no. 08 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120233A (en) * | 2021-11-29 | 2022-03-01 | 上海应用技术大学 | Training method of lightweight pyramid hole convolution aggregation network for crowd counting |
CN114120233B (en) * | 2021-11-29 | 2024-04-16 | 上海应用技术大学 | Training method of lightweight pyramid cavity convolution aggregation network for crowd counting |
Also Published As
Publication number | Publication date |
---|---|
CN112966600B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110232394B (en) | Multi-scale image semantic segmentation method | |
WO2020177651A1 (en) | Image segmentation method and image processing device | |
CN108256562B (en) | Salient target detection method and system based on weak supervision time-space cascade neural network | |
US10339421B2 (en) | RGB-D scene labeling with multimodal recurrent neural networks | |
EP3427195B1 (en) | Convolutional neural networks, particularly for image analysis | |
US11823443B2 (en) | Segmenting objects by refining shape priors | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
JP2022536807A (en) | Real-time video ultra-high resolution | |
US9710697B2 (en) | Method and system for exacting face features from data of face images | |
CN112990219B (en) | Method and device for image semantic segmentation | |
CN111582483A (en) | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism | |
KR20170038622A (en) | Device and method to segment object from image | |
CN110176024B (en) | Method, device, equipment and storage medium for detecting target in video | |
CN107506792B (en) | Semi-supervised salient object detection method | |
US11568212B2 (en) | Techniques for understanding how trained neural networks operate | |
CN111815665A (en) | Single image crowd counting method based on depth information and scale perception information | |
JP2023507248A (en) | System and method for object detection and recognition | |
CN114821058A (en) | Image semantic segmentation method and device, electronic equipment and storage medium | |
CN114140346A (en) | Image processing method and device | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
CN116194933A (en) | Processing system, processing method, and processing program | |
CN112132867B (en) | Remote sensing image change detection method and device | |
CN112966600B (en) | Self-adaptive multi-scale context aggregation method for crowded population counting | |
CN117315310A (en) | Image recognition method, image recognition model training method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |