CN111488834B - Crowd counting method based on multi-level feature fusion - Google Patents

Crowd counting method based on multi-level feature fusion Download PDF

Info

Publication number
CN111488834B
CN111488834B CN202010284030.5A CN202010284030A CN111488834B CN 111488834 B CN111488834 B CN 111488834B CN 202010284030 A CN202010284030 A CN 202010284030A CN 111488834 B CN111488834 B CN 111488834B
Authority
CN
China
Prior art keywords
crowd
convolution
layer
feature
density map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010284030.5A
Other languages
Chinese (zh)
Other versions
CN111488834A (en
Inventor
霍占强
路斌
宋素玲
雒芬
乔应旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202010284030.5A priority Critical patent/CN111488834B/en
Publication of CN111488834A publication Critical patent/CN111488834A/en
Application granted granted Critical
Publication of CN111488834B publication Critical patent/CN111488834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a crowd counting method based on multi-level feature fusion, which comprises the following steps: preprocessing the obtained crowd images, generating a corresponding crowd density map by using labeling information, constructing a crowd counting network with multi-level feature fusion, initializing network weight parameters, inputting the preprocessed crowd images and the crowd density map into the network, completing forward propagation, calculating loss of a forward propagation result and a real density map, updating model parameters, iterating forward propagation and updating the model parameters to appointed times, obtaining the crowd density map, and obtaining estimated number of people. The method provided by the invention can overcome the problem of crowd scale change in the crowd counting task, and the crowd counting is more accurate.

Description

Crowd counting method based on multi-level feature fusion
Technical Field
The invention relates to the field of image crowd counting and deep learning, in particular to a crowd counting method based on deep learning.
Background
Crowd counting is an important problem in the fields of image processing and computer vision, and aims at: and automatically generating a crowd density map according to the crowd image and estimating the number of people in the scene. The crowd counting is widely applied in the fields of traffic scheduling, safety prevention and control, urban management and the like.
The traditional crowd counting method needs to carry out complex preprocessing on crowd images, needs to manually design and extract human body characteristics, needs to re-extract the characteristics under the condition of crossing scenes, and has poor adaptability. In recent years, the successful application of convolutional neural networks brings a significant breakthrough for the task of crowd counting. Zhang [1] et al propose a convolutional neural network model suitable for crowd counting, which realizes end-to-end training without foreground segmentation and artificial design and feature extraction, and obtains high-level features after multi-layer convolution, thereby improving the performance of crowd counting in a cross-scene. However, in different crowded scenes, the crowd scale is very different, the density and distribution of the crowd in the same image are different due to the fact that the distances between the crowd and the cameras are different, and the accuracy of the method is low when the method is used for processing scenes with large crowd scale differences.
In order to solve the problem of the scale change of the crowd, the focus of the existing research work is mainly on extracting a plurality of features with different scales to reduce the influence of the scale change. Zhang [2] et al propose a multi-branched convolutional neural network in which each branch consists of convolution kernels of different sizes, and the problem of crowd scale variation is solved by extracting features of different scales from the convolution kernels of different branches. Cao 3 et al propose a scale-aware network that solves the scale-change problem by designing feature extraction modules composed of convolution kernels of different sizes. The method solves the problem of scale change of the crowd by extracting features of different scales through convolution kernels of different sizes. However, the scale variation of the population scale in one image is continuous, and only discrete scale population features can be extracted by different size convolution kernels, which ignores other scale populations. Therefore, the problem of scale difference of people in different scenes is not completely solved.
Reference is made to:
1.C.Zhang,H.Li,X.Wang,and X.Yang,Cross-Scene Crowd Counting via Deep Convolutional Neural Networks[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015,833-841.
2.Y.Zhang,D.Zhou,S.Chen,et al.Single-image crowd counting via multi-column convolutional neural network[C].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016,589-597.
3.X.Cao,Z.Wang,Y.Zhao,and F.Su,Scale aggregation network for accurate and efficient crowd counting[C].European Conference on Computer Vision,2018,734-750.
disclosure of Invention
The invention provides a crowd counting method based on multi-level feature fusion, which aims to solve the problem of crowd scale difference in different scenes in the prior art. Mainly comprises the following steps:
step S1: preprocessing the obtained crowd images, and generating corresponding crowd density maps by using labeling information;
step S2: constructing a crowd counting network with multi-level feature fusion;
step S3: initializing network weight parameters;
step S4: inputting the crowd image and the crowd density map which are preprocessed in the S1 into a network to finish forward propagation;
step S5: calculating loss of the S4 forward propagation result and a real density map, and updating model parameters;
step S6: iterating the step S4, S5 to the appointed times;
step S7: and obtaining a crowd density map to obtain an estimated crowd.
Compared with the current method for solving the problem of crowd scale change by adopting multi-branch and multi-size convolution kernels, the invention provides a method based on multi-level feature fusion, wherein shallow output features of VGG16 feature extractors contained in a network contain spatial information and texture information of crowds, and high-level output features contain semantic information of crowds. Shallow features describe the spatial location of the crowd and high-level features provide specific details of the crowd features. The method combines the low-level features with the high-level features, so that the problem of crowd scale change can be effectively solved, and the defect that only discrete-scale crowd features can be extracted by adopting a multi-branch and multi-size convolution kernel method is overcome. Compared with the existing method, the method provided by the invention is more accurate.
Drawings
Fig. 1 is a flow chart of a crowd counting method based on multi-level feature fusion according to the invention.
Fig. 2 is a diagram of a crowd counting network based on multi-level feature fusion according to the present invention.
Fig. 3 is a block diagram of a channel domain attention module of a crowd counting network based on multi-level feature fusion according to the invention.
Detailed Description
Fig. 1 is a flowchart of a crowd counting method based on multi-level feature fusion according to the invention. Mainly comprises the following steps: preprocessing the obtained crowd images, generating a corresponding crowd density map by using labeling information, constructing a crowd counting network with multi-level feature fusion, initializing network weight parameters, inputting the preprocessed crowd images and the crowd density map into the network, completing forward propagation, calculating loss of a forward propagation result and a real density map, updating model parameters, iterating the forward propagation and updating the model parameters to specified times, obtaining the crowd density map, and obtaining estimated population, wherein the specific implementation details of each step are as follows:
step S1: preprocessing the obtained crowd images, and generating corresponding crowd density maps by using labeling information, wherein the specific mode is as follows:
step S11: the collected crowd images are subjected to centering treatment in a specific mode that the average value corresponding to the channels is subtracted from elements on three channels R, G and B of the images, and then the elements are divided by the standard deviation corresponding to the channels, wherein the average value corresponding to the channels R, G and B is (0.485,0.456,0.406), and the corresponding standard deviation is (0.229,0.224,0.225).
Step S12: generating a position matrix for the provided labeling information, specifically, creating a matrix with all elements being 0, which are the same as the corresponding image in resolution, and setting the elements at the positions corresponding to the matrix to be 1 according to the coordinates provided by the labeling information.
Step S13: and randomly cutting the centralized crowd image and the corresponding position matrix to fix the image blocks and the matrix with the fixed size, wherein the cutting size is 400 multiplied by 400 in the specific embodiment of the invention.
Step S14: generating a corresponding crowd density map by Gaussian kernel convolution of the position matrix, specifically, generating two one-dimensional Gaussian kernel convolution, wherein μ=15, σ=4, transpose one of the Gaussian kernels, and multiplying the two-dimensional Gaussian convolution kernel with the other one to obtain a two-dimensional Gaussian convolution kernel, and carrying out convolution operation on the two-dimensional Gaussian convolution kernel and an element with the size of 1 in the position matrix to generate a crowd density map.
Step S15: the density map generated in step S14 is downsampled to a 200×200 resolution size, specifically, a convolution kernel with 2×2 parameters being 1 is used to perform convolution operation on the density map with a step of 2.
Step S2: the crowd counting network with multi-level feature fusion is constructed, as shown in fig. 2, in the following specific manner:
step S21: a VGG16 network is built that does not contain a fully connected layer.
Step S22: the channel domain attention module is built, as shown in fig. 3, in a specific manner, a global average pooling layer on the channel domain is built, the input characteristic X is pooled into 1×1×C characteristics, two full-connection layers are added after the pooling layer, the number of neurons is C/4 and C respectively, a Sigmoid activation layer is built after the two full-connection layers, and element multiplication operation is carried out on the output of the activation layer and the input characteristic X to obtain the output of the channel domain attention module.
Step S23: output characteristics X of fifth layer to fourth layer of VGG16 network constructed in step S21 50 ,X 40 Feature fusion is performed by outputting the feature X from the fifth layer 50 Up-sampling (the amplification factors of the up-sampling layer are 2) is carried out, and the up-sampled characteristics and the fourth layer output characteristics X are compared 40 Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels of 256, and obtaining the output characteristics X of the convolution block 41
Step S24: outputting the output characteristics X of the fourth layer to the third layer of the VGG16 network constructed in the step S21 40 ,X 30 And the feature X obtained in step S23 41 Feature fusion is performed by combining features X 40 Upsampling and combining the upsampled result with the feature X 30 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 128, and obtaining characteristics X 31 Will characteristic X 41 Up-sampling operation to obtain feature X 32 Will characteristic X 31 And feature X 32 Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 128, and obtaining the output characteristics X of the convolution block 33
Step S25: building step S21Output characteristics X of the third layer to the second layer of the VGG16 network 30 ,X 20 And the feature X obtained in step S24 31 ,X 33 Feature fusion is performed by combining features X 30 Up-sampling operation is carried out, and the up-sampled characteristics and characteristics X 20 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 64, and obtaining characteristics X 21 Will characteristic X 31 Up-sampling operation to obtain feature X 22 Will characteristic X 21 And feature X 22 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 64, and obtaining the output characteristics X of the convolution block 23 Will characteristic X 33 Up-sampling operation to obtain feature X 24 Will characteristic X 23 And feature X 24 And performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the number of 3 multiplied by 3 being 64 and one convolution layer with the number of 3 multiplied by 3 being 32, inputting the output of the convolution block into one convolution layer with the number of 1 multiplied by 1, and completing the construction of the crowd counting network with the multi-level characteristic fusion.
Step S3, initializing network weight parameters in a specific manner, wherein for the crowd counting network obtained in the step S2, an initial value of a feature extractor VGG16 is a classification weight of ImageNet of the VGG16 without a full connection layer, other convolution layers and the full connection layer adopt forward distribution initialization parameters, and the forward distribution initialization parameters comprise: μ=0, σ=0.01.
And S4, inputting the crowd image and the crowd density map preprocessed in the step S1 into a network to finish forward propagation.
And S5, calculating loss by using the forward propagation result of the step S4 and a true density map of the input network, and updating model parameters, wherein the specific mode is as follows:
step S51, calculating the mean square error loss L of the forward propagation result and the true density map MSE The specific mode is as follows:
Figure BDA0002447808570000061
where N represents the number of samples of the input data that are propagated forward once, n=8 in the present invention,
Figure BDA0002447808570000062
density map representing the current ith data forward propagation calculation,/for>
Figure BDA0002447808570000063
Representing the true density map of the current ith data.
Step S52, the loss L calculated in the step S51 MSE Model parameters are updated using a random gradient descent method.
And S6, iterating the steps S4 and S5 to the appointed times, wherein the specific mode is that the iteration times are 50 times.
And S7, obtaining a crowd density map to obtain an estimated number of people, wherein the specific mode is that all pixels in the crowd density map calculated by the model are summed to obtain the number of people contained in the crowd image.
Compared with the current method for solving the problem of crowd scale change by adopting multi-branch and multi-size convolution kernels, the invention provides a method based on multi-level feature fusion, wherein shallow output features of VGG16 feature extractors contained in a network contain spatial information and texture information of crowds, and high-level output features contain semantic information of crowds. Shallow features describe the spatial location of the crowd and high-level features provide specific details of the crowd features. The method combines the low-level features with the high-level features, so that the problem of crowd scale change can be effectively solved, and the defect that only discrete-scale crowd features can be extracted by adopting a multi-branch and multi-size convolution kernel method is overcome. Compared with the existing method, the method provided by the invention is more accurate.

Claims (1)

1. The crowd counting method based on multi-level feature fusion is characterized by comprising the following steps of:
step S1: preprocessing the obtained crowd images, and generating corresponding crowd density maps by using labeling information, wherein the specific mode is as follows:
step S11: the collected crowd images are subjected to centering treatment in a specific mode that elements on three channels R, G and B of the images are subtracted by average values corresponding to the channels and then divided by standard deviations corresponding to the channels, the average values corresponding to the channels R, G and B are (0.485,0.456,0.406), and the corresponding standard deviations are (0.229,0.224,0.225);
step S12: generating a position matrix for the provided labeling information by creating a matrix with all 0 elements with the same resolution as the corresponding image, and setting the element at the corresponding position of the matrix to be 1 according to the coordinates provided by the labeling information;
step S13: randomly cutting the centralized crowd image and the corresponding position matrix to obtain image blocks and matrixes with fixed sizes, wherein in a specific embodiment, the cutting size is 400 multiplied by 400;
step S14: generating a corresponding crowd density map by Gaussian kernel convolution of the position matrix, specifically, generating two one-dimensional Gaussian kernel convolution, wherein μ=15, σ=4, transpose one of the Gaussian kernels, multiplying the two-dimensional Gaussian convolution kernel with the other one to obtain a two-dimensional Gaussian convolution kernel, and carrying out convolution operation on the two-dimensional Gaussian convolution kernel and an element with the size of 1 in the position matrix to generate a crowd density map;
step S15: downsampling the density map generated in the step S14 to 200×200 resolution, specifically, performing convolution operation on the density map with a stride of 2 by using a convolution kernel with a 2×2 parameter of 1;
step S2: the crowd counting network with multi-level feature fusion is constructed in the following specific mode:
step S21: building a VGG16 network which does not contain a full connection layer;
step S22: setting up a channel domain attention module, namely setting up a global average pooling layer on the channel domain, pooling an input characteristic X into a characteristic of 1 multiplied by C, adding two full-connection layers after the pooling layer, wherein the number of neurons is C/4 and C respectively, setting up a Sigmoid activation layer after the two full-connection layers, and carrying out element multiplication operation on the output of the activation layer and the input characteristic X to obtain the output of the channel domain attention module;
step S23: output characteristics X of fifth layer to fourth layer of VGG16 network constructed in step S21 50 ,X 40 Feature fusion is performed by outputting the feature X from the fifth layer 50 Up-sampling (up-sampling layer amplification factors are 2), and up-sampling features and fourth layer output features X 40 Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels of 256, and obtaining the output characteristics X of the convolution block 41
Step S24: outputting the output characteristics X of the fourth layer to the third layer of the VGG16 network constructed in the step S21 40 ,X 30 And the feature X obtained in step S23 41 Feature fusion is performed by combining features X 40 Upsampling and combining the upsampled result with the feature X 30 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 128, and obtaining characteristics X 31 Will characteristic X 41 Up-sampling operation to obtain feature X 32 Will characteristic X 31 And feature X 32 Performing splicing operation on the channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 128, and obtaining the output characteristics X of the convolution block 33
Step S25: outputting the output characteristic X from the third layer to the second layer of the VGG16 network constructed in the step S21 30 ,X 20 And the feature X obtained in step S24 31 ,X 33 Feature fusion is performed by combining features X 30 Up-sampling operation is carried out, and the up-sampled characteristics and characteristics X 20 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 64, and obtaining characteristics X 21 Will characteristic X 31 Up-sampling operation to obtain feature X 22 Will characteristic X 21 And feature X 22 Performing splicing operation on the channel domain, inputting the spliced characteristics into a convolution block formed by two convolution layers with the number of 3 multiplied by 3 channels being 64, and obtaining the output characteristics X of the convolution block 23 Will characteristic X 33 Up-sampling operation to obtain feature X 24 Will characteristic X 23 And feature X 24 Performing splicing operation on a channel domain, inputting the spliced characteristics into a channel domain attention module, inputting the output of the channel domain attention module into a convolution block consisting of two convolution layers with the number of 3 multiplied by 3 being 64 and one convolution layer with the number of 3 multiplied by 3 being 32, inputting the output of the convolution block into one convolution layer with the number of 1 multiplied by 1, and completing the construction of a crowd counting network with multi-level characteristic fusion;
step S3, initializing network weight parameters in a specific manner, wherein for the crowd counting network obtained in the step S2, an initial value of a feature extractor VGG16 is a classification weight of ImageNet of the VGG16 without a full connection layer, other convolution layers and the full connection layer adopt forward distribution initialization parameters, and the forward distribution initialization parameters comprise: μ=0, σ=0.01;
s4, inputting the crowd image and the crowd density map preprocessed in the step S1 into a network to finish forward propagation;
and S5, calculating loss by using the forward propagation result of the step S4 and a true density map of the input network, and updating model parameters, wherein the specific mode is as follows:
step S51, calculating the mean square error loss L of the forward propagation result and the true density map MSE The specific mode is as follows:
Figure FDA0004257394810000031
where N represents the number of samples of the input data for one forward propagation, n=8,
Figure FDA0004257394810000032
density map representing the current ith data forward propagation calculation,/for>
Figure FDA0004257394810000033
A true density map representing the current ith data;
step S52, the loss L calculated in the step S51 MSE Updating model parameters by using a random gradient descent method;
step S6, iterating the steps S4 and S5 to the appointed times, wherein the specific mode is that the iterated times are 50 times;
and S7, obtaining a crowd density map to obtain an estimated number of people, wherein the specific mode is that all pixels in the crowd density map calculated by the model are summed to obtain the number of people contained in the crowd image.
CN202010284030.5A 2020-04-13 2020-04-13 Crowd counting method based on multi-level feature fusion Active CN111488834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010284030.5A CN111488834B (en) 2020-04-13 2020-04-13 Crowd counting method based on multi-level feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010284030.5A CN111488834B (en) 2020-04-13 2020-04-13 Crowd counting method based on multi-level feature fusion

Publications (2)

Publication Number Publication Date
CN111488834A CN111488834A (en) 2020-08-04
CN111488834B true CN111488834B (en) 2023-07-04

Family

ID=71792806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010284030.5A Active CN111488834B (en) 2020-04-13 2020-04-13 Crowd counting method based on multi-level feature fusion

Country Status (1)

Country Link
CN (1) CN111488834B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801340B (en) * 2020-12-16 2024-04-26 北京交通大学 Crowd density prediction method based on multi-level city information unit portraits

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks
CN109598220A (en) * 2018-11-26 2019-04-09 山东大学 A kind of demographic method based on the polynary multiple dimensioned convolution of input
CN109903339A (en) * 2019-03-26 2019-06-18 南京邮电大学 A kind of video group personage's position finding and detection method based on multidimensional fusion feature
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN109271960A (en) * 2018-10-08 2019-01-25 燕山大学 A kind of demographic method based on convolutional neural networks
CN109598220A (en) * 2018-11-26 2019-04-09 山东大学 A kind of demographic method based on the polynary multiple dimensioned convolution of input
CN109903339A (en) * 2019-03-26 2019-06-18 南京邮电大学 A kind of video group personage's position finding and detection method based on multidimensional fusion feature
CN110705344A (en) * 2019-08-21 2020-01-17 中山大学 Crowd counting model based on deep learning and implementation method thereof

Also Published As

Publication number Publication date
CN111488834A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN112541503B (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN111340814B (en) RGB-D image semantic segmentation method based on multi-mode self-adaptive convolution
CN113344806A (en) Image defogging method and system based on global feature fusion attention network
CN112132023A (en) Crowd counting method based on multi-scale context enhanced network
CN111815665B (en) Single image crowd counting method based on depth information and scale perception information
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN113269224A (en) Scene image classification method, system and storage medium
CN116258757A (en) Monocular image depth estimation method based on multi-scale cross attention
CN114972748A (en) Infrared semantic segmentation method capable of explaining edge attention and gray level quantization network
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN111488834B (en) Crowd counting method based on multi-level feature fusion
CN115601236A (en) Remote sensing image super-resolution reconstruction method based on characteristic information distillation network
CN117726954B (en) Sea-land segmentation method and system for remote sensing image
CN115049945A (en) Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN111275076B (en) Image significance detection method based on feature selection and feature fusion
CN116543165B (en) Remote sensing image fruit tree segmentation method based on dual-channel composite depth network
CN117475134A (en) Camouflage target detection algorithm based on multi-scale cross-layer feature fusion network
CN116342675A (en) Real-time monocular depth estimation method, system, electronic equipment and storage medium
CN116681978A (en) Attention mechanism and multi-scale feature fusion-based saliency target detection method
CN115471718A (en) Construction and detection method of lightweight significance target detection model based on multi-scale learning
CN112966600B (en) Self-adaptive multi-scale context aggregation method for crowded population counting
CN113887536B (en) Multi-stage efficient crowd density estimation method based on high-level semantic guidance
CN111274900B (en) Empty-base crowd counting method based on bottom layer feature extraction
CN114240966B (en) Self-supervision learning method for 3D medical image segmentation training feature extractor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant