CN111815665A - Single image crowd counting method based on depth information and scale perception information - Google Patents

Single image crowd counting method based on depth information and scale perception information Download PDF

Info

Publication number
CN111815665A
CN111815665A CN202010662406.1A CN202010662406A CN111815665A CN 111815665 A CN111815665 A CN 111815665A CN 202010662406 A CN202010662406 A CN 202010662406A CN 111815665 A CN111815665 A CN 111815665A
Authority
CN
China
Prior art keywords
density
map
density map
scale
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010662406.1A
Other languages
Chinese (zh)
Other versions
CN111815665B (en
Inventor
田玲
朱大勇
张栗粽
罗光春
邬丹丹
董文琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010662406.1A priority Critical patent/CN111815665B/en
Publication of CN111815665A publication Critical patent/CN111815665A/en
Application granted granted Critical
Publication of CN111815665B publication Critical patent/CN111815665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a computer vision technology, and discloses a single image crowd counting method based on depth information and scale perception information, which improves the prediction capability and reduces the calculation complexity. The method comprises the following steps: s1, performing Gaussian mapping on head center coordinate data corresponding to the input sample picture to generate a preliminary truth-value density map, and correcting the preliminary truth-value density map based on depth information obtained by a depth estimation algorithm to obtain a truth-value density map; s2, predicting the crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and the true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration; and S3, when counting the crowd of a single image, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image.

Description

Single image crowd counting method based on depth information and scale perception information
Technical Field
The invention relates to a computer vision technology, in particular to a single image crowd counting method based on depth information and scale perception information.
Background
The crowd counting aims to output a crowd density graph corresponding to a picture after the picture is input and processed by a network model, and finally, the probability of the number of people corresponding to each pixel on the density graph is summed to obtain the final total number of people. The crowd counting task is challenging due to problems of occlusion, view angle change, crowd size change, and distribution diversity.
In the early methods, each pedestrian in the crowd was mainly located by a target detector, and the number of detected targets was the counting result. However, these methods use manual features for classifier training and perform poorly in highly crowded scenes. In order to solve the problem of counting crowds in a complex scene, a crowd density graph is generated by using a convolutional neural network, and counting performance is improved by capturing scale change.
In 2016, Zhang et al proposed an MCNN algorithm to cope with scale changes, which consisted of three branch networks, each of which sampled features using different sized receptive fields. And for a given picture, processing by three branch networks respectively, performing channel fusion on the obtained result, and finally obtaining a final density map by 1-by-1 convolution. But since this design only involves convolution of three different scales, each class can only serve a certain density level. However, there are dense variations and uneven population distribution in the actual scene, and it is not possible to strictly classify the population pictures into which category, so the effectiveness of the MCNN algorithm is limited by the number of branches.
Cao et al proposed an SANet algorithm to improve the scale-aware structure in 2018, integrated scale information using an inclusion structure, and performing convolution operation on each convolution layer using a plurality of convolution kernels, fusing each part of information, and fully sharing information from the bottom layer to the top layer. The network comprises four inclusion structures, and scale reduction is performed by using transposed convolution after each inclusion structure, so that the generated density map has the same size as an input density map, and pixel-level supervision can be performed. However, in the crowd counting scene, the pedestrians far away in the image appear as small targets under the influence of the camera angle. Such small objects are numerous in images and are the main subject of investigation. Although multi-scale information can be integrated by using the inclusion structure, with the forward transmission of a network, features are highly abstracted, and detail features of small targets are lost, so that the prediction capability of the final small targets is reduced. In addition, the scale reduction is carried out by using the transposition convolution, the calculation complexity is high, and the performance of the method has no outstanding advantages in a certain training batch range.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: a single image crowd counting method based on depth information and scale perception information is provided, prediction capability is improved, and calculation complexity is reduced.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a single image crowd counting method based on depth information and scale perception information comprises the following steps:
s1, performing Gaussian mapping on head center coordinate data corresponding to the input sample picture to generate a preliminary truth-value density map, and correcting the preliminary truth-value density map based on depth information obtained by a depth estimation algorithm to obtain a truth-value density map;
s2, predicting the crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and the true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
and S3, when counting the crowd of a single image, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image.
As a further optimization, step S1 specifically includes:
s11, performing Gaussian distribution mapping on the head coordinate points in the sample picture label data by a Gaussian kernel with a fixed size, and superposing mapping values at all positions of the image to form a primary true-value density graph F1(x);
S12, carrying out Gaussian distribution mapping on the head coordinate points in the sample picture label data by using a geometric self-adaptive Gaussian kernel, and superposing mapping values at all positions of the image to form a primary true-value density graph F2(x);
S13, extracting depth information of each pixel position in the input sample picture by adopting a monocular depth estimation algorithm to form a depth estimation map depth (x);
s14, determining a final truth density map by using a threshold segmentation algorithm based on the information of the depth estimation map depth (x):
Figure BDA0002579100810000021
wherein, being a preset segmentation threshold, F1(i, j) represents the preliminary truth density plot F1(x) Value, F, corresponding to the middle coordinate (i, j)2(i, j) represents the preliminary truth density plot F2(x) The coordinate (i, j) in the Depth (x) represents a Depth value at the coordinate (i, j) in the Depth (x), and the M (i, j) represents a value corresponding to the coordinate (i, j) in the final truth density map.
As a further optimization, in step S2, the density estimation network includes: the system comprises a basic feature extraction module, a multi-scale capture module and a scale transfer module; the basic feature extraction module is used for extracting low-level features such as textures of the pictures; the multi-scale capturing module is used for further extracting picture features, fusing multi-scale information and storing detailed features of small targets; the scale transfer module is used for scale reduction of the feature map and lifting the feature map to the size of the input picture.
As a further optimization, the basic feature extraction module consists of a convolutional layer before conv4_3 in the VGG16 network; the multi-scale capturing module adopts four dense connecting layers, each layer uses a 3 multiplied by 3 convolution kernel to extract features, the resolution of a feature map is kept unchanged by using edge filling, and the growth rate of convolution is set to be 256; the scale transfer module adopts sub-pixel convolution to restore the scale of the feature map and increase the resolution of the feature map to the size of the input picture.
As a further optimization, in step S2, the calculating of the loss error according to the predicted density map and the true density map specifically includes:
and measuring the error between the predicted density graph and the true density graph by using Euclidean distance as a loss function, wherein the expression is as follows:
Figure BDA0002579100810000031
wherein, F (X)i(ii) a Theta) is a predicted density map of the network output, theta represents a learning parameter in the network, XiThe ith picture that represents the input is shown,
Figure BDA0002579100810000032
the truth density map of the ith picture is shown, and N is the number of training pictures.
The invention has the beneficial effects that:
(1) the supervision information is more accurate:
the invention utilizes depth information to guide the generation of the truth density map, and the generated truth density map is more accurate than the truth density map generated by the traditional single mode. The information is used for guiding network training, and the predicted density graph is closer to a true value.
(2) A wide range of dimensional changes can be captured:
the invention constructs the multi-scale capture module suitable for the current scene by utilizing dense connection, fuses multi-scale information and reserves more detailed characteristics of small targets, thereby being beneficial to improving the prediction performance of the network on the multi-scale targets.
(3) Scale reduction is performed with low computational complexity:
the invention utilizes the sub-pixel convolution module to carry out scale reduction, avoids the problem that bilinear interpolation upsampling is used to ignore the self characteristics of the image, and simultaneously avoids the computational complexity of using transposition convolution upsampling.
Drawings
FIG. 1 is a flow chart of a crowd counting algorithm based on depth information and scale perception information according to the present invention;
FIG. 2 is a diagram of a process of generating a truth density map;
FIG. 3 is a diagram of a process for generating a predicted density map for population counting by a density estimation network.
Detailed Description
The invention aims to provide a method for counting the population of a single image based on depth information and scale perception information, which improves the prediction capability and reduces the calculation complexity. The core idea is as follows: (1) training a prediction model: firstly, generating a preliminary true density map, then, correcting the preliminary true density map based on depth information obtained by a depth estimation algorithm so as to obtain a true density map, wherein the true density map is used for a predicted density map generated by a point-to-point supervised density estimation network, adjusting network parameters through gradient back transmission according to an error between the true density map and the predicted density map, and generating a final prediction model through iteration; (2) and realizing the prediction of the density graph of the input picture based on the trained prediction model, and calculating the total number of people in the graph.
In the invention, the truth density map is generated not by single fixed Gaussian kernel mapping or geometric self-adaptive Gaussian kernel mapping but by the source analysis of the crowd picture, the size of the target close to the camera in the picture is large, and the distance between the targets is large. Targets farther from the camera are affected by the viewing angle, smaller targets, and smaller distances between targets. In view of this, the depth information of the picture is introduced to guide the generation of the true density map, and a more accurate density map is obtained to supervise and predict the generation of the density map.
In the process of acquiring the predicted density map, the dense connection structure is used, the small target detail features are fully reserved while the multi-scale features are fused, and the problem that the small target detail features are lost when the multi-scale features are captured by the conventional method is solved. In order to improve the resolution of the prediction graph, the method of filling dimension information by using channel information makes full use of the information of the image. The influence of manual characteristics caused by the adoption of linear interpolation upsampling in the conventional method is avoided, and meanwhile, the calculation complexity caused by the transposition convolution mode is avoided.
In a specific implementation, as shown in fig. 1, a crowd counting algorithm flow based on depth information and scale perception information in the present invention includes the following steps:
s1: obtaining a truth density map of an input sample picture:
in order to obtain a truth-value density map label of an input sample picture, Gaussian mapping needs to be performed on head center coordinate data corresponding to the input picture to generate a preliminary truth-value density map. And then correcting the preliminary true density map based on the depth information obtained by the depth estimation algorithm, wherein the obtained true density icon is used for a predicted density map generated by the point-to-point supervised density estimation network.
Here, two gaussian mapping methods are used, which are a fixed gaussian kernel function and a geometric adaptive gaussian kernel function, respectively, and the preliminary truth density maps generated by the two methods are fused by depth information to generate a final truth density map, which is specifically shown in fig. 2.
S11, fixing a Gaussian kernel mapping mode:
let the coordinate of a head label point be xiUsing (x-x)i) Indicating a gaussian distribution position, so a picture with N persons' heads can be represented as
Figure BDA0002579100810000041
The corresponding population density map may be expressed as f (x) ═ G (x) ·σ(x) In that respect Wherein G isσ(x) Representing a gaussian kernel function, the closer the coordinates are to the center point, the larger the value, and σ represents the size of the region range upon which the function acts. This density function assumes that each head is marked with a point xiThe distribution in the image space is independent of each other, but the range of the region involved by different samples is different in size in the three-dimensional space due to the influence of perspective distortion.
S12, a geometric self-adaptive Gaussian kernel mapping mode:
through each oneAverage distance decision related parameters of person to neighboring object for each head marker x in the pictureiMarks m nearby objects as
Figure BDA0002579100810000042
The average distance between the objects is
Figure BDA0002579100810000043
The distribution of the image in the crowd is Gaussian kernel
Figure BDA0002579100810000044
Wherein sigmaiAnd
Figure BDA0002579100810000045
and (4) correlating. The density map generated by the method can be expressed as
Figure BDA0002579100810000046
Figure BDA0002579100810000047
Wherein
Figure BDA0002579100810000048
Beta is a hyperparameter.
S13, extraction of the depth estimation map:
the method calculates the depth map corresponding to the input sample picture through a monocular depth estimation algorithm, corrects the Gaussian mapping value of each position in the picture by utilizing a threshold segmentation algorithm based on the information of the depth map, and fuses the information of the two density maps to form the final density map. Specifically, the monadepth algorithm may be used to estimate the depth information of the input sample picture, and the input sample picture passes through the monadepth algorithm model to obtain a gray scale map, where each pixel value on the map represents the distance from the camera to the surface of the object.
S14, fusing the density maps generated in S11 and S12 by using the depth information:
suppose the input picture is X ∈ Rh×h×cWhere h denotes the size of the picture and c denotes the dimension of the picture, using a fixedObtaining a true density map F by a Gaussian kernel function1(x) In that respect Using geometric self-adaptive Gaussian kernel function mapping to the input picture to obtain a true value density graph F2(x) In that respect And processing the input picture through a monodepth model to obtain a depth information map depth (x). The two types of true density map information obtained are further processed based on the depth information, and a true density map F is obtained1(x)、F2(x) The operation of segmentation according to the preset depth threshold is as follows:
Figure BDA0002579100810000051
wherein F1(i, j) shows a density map F1(x) Value, F, corresponding to the middle coordinate (i, j)2(i, j) shows a density map F2(x) The coordinate (i, j) in the Depth (x) represents a Depth value at the coordinate (i, j) in the Depth (x), and the M (i, j) represents a value corresponding to the coordinate (i, j) in the final truth density map.
S2, obtaining a predicted density map (density estimation map) of the input sample picture based on the density estimation network:
the density estimation network employed in the present invention consists of three main components: the system comprises a basic feature extraction module, a multi-scale capture module and a scale reduction module; the basic feature extraction module is mainly used for extracting low-level features such as textures of pictures; the multi-scale capturing module is used for further extracting the features, fusing multi-scale information and storing the detailed features of the small target; the scale transfer module is mainly used for scale reduction of the feature map and lifting the feature map to the size of the input picture.
S21, a basic feature extraction module:
the model may use layers in a pre-trained VGG module. By taking pictures with size of 256 × 256 as input, analyzing the convolutional layer in VGG16, the scope of the receptive field of conv4_3 layer reaches 172, which is far beyond the scale of large target. In the current scenario, the ratio of the scale of the large target in the picture is less than one half, and finally, the basic feature extraction module adopted by the invention is composed of the convolutional layer before conv4_3 in the VGG 16.
S22, a multi-scale capturing module:
in order to keep the detail features of the small targets in the current scene, the feature information output by the basic feature extraction module is sequentially transmitted backwards through the multi-scale module, and the performance bottleneck caused by the loss of the detail information in the conventional research method is avoided. Unlike the random short connections of Resnet, dense connections ensure the greatest degree of information sharing between layers. Through receptive field analysis, four layers of dense connection are added, and the receptive field range can meet the extraction of semantic information of all size targets. To ensure that the module can extract enough context information while avoiding too high a growth rate, the module extracts features using a 3 × 3 convolution kernel per layer, keeps the resolution of the feature map unchanged using edge-filling, and sets the growth rate of the convolution to 256. Since the output channel of the basic network is 512-dimensional, dimension conversion into a feature map of 256 channels is required before entering the scale capture module.
S23, a scale reduction module:
the module improves the resolution of the feature map based on sub-pixel convolution, and the size of the picture is changed into 1/8 as the basic feature extraction part performs 8 times down-sampling on the picture by using three times of pooling operation. In the multi-scale capture module, the multi-layer feature maps need to be connected through channels, so that the feature map size is kept unchanged by using edge filling. For scale reduction we need 8 times up-sampling the feature map. Since the number of low resolution feature maps in the sub-pixel convolution operation must be the square number of the upsampling factor, a 1 × 1 convolution is added after the multi-scale capture module structure, and the number of channels of the feature maps is adjusted to be the square number of the upsampling factor of 64. Finally, dimension filling of the feature map is performed using the channel features.
The process of generating the predicted density map based on the density estimation network with the structure is shown in fig. 3, wherein an input sample picture firstly passes through a basic feature extraction part to extract basic features, then enters a multi-scale feature capture module to fuse scale information, and finally is subjected to scale reduction to generate the predicted density map.
After a prediction density map of an input sample picture is generated, calculating loss errors according to the prediction density map and a true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
in the training process, a population density estimation algorithm is trained by using Euclidean distance as a loss function, wherein Euclidean loss is mainly used for calculating estimation errors at a pixel level and has the expression of
Figure BDA0002579100810000061
Wherein F (X)i(ii) a θ) is a density estimation map of the network output, θ represents a learning parameter in the network, XiThe ith picture that represents the input is shown,
Figure BDA0002579100810000062
the truth label graph of the ith picture is shown, and N is the number of training pictures.
Inputting a density prediction model: crowd picture and label data for training
Figure BDA0002579100810000063
And (3) outputting: predicted density map
Figure BDA0002579100810000064
The training process is as follows:
1. data preprocessing: obtaining a preliminary truth density map F of a picture1(Xi)、F2(Xi) (ii) a Depth map information Depth (X) of picturei) (ii) a Determining a depth segmentation threshold; based on the depth information, a final truth density map M (X) is obtainedi);
2. Model parameters are initialized, and then the model is trained until the model converges: loading pictures according to batches; extracting basic characteristics and updating a characteristic diagram Fi 1∈Rh1×h1×c1←Xi∈Rh×h×c(ii) a Channel transformation F for feature mapi 1∈Rh1×h1×c2←Fi 1∈Rh1 ×h1×c1(ii) a Multi-scale capturing and updating feature map Fi 2←Fi 1(ii) a Channel transformation of feature maps
Figure BDA0002579100810000067
Figure BDA0002579100810000068
Scale reduction to obtain prediction chart
Figure BDA0002579100810000065
Calculating M (X)i),
Figure BDA0002579100810000066
Updating the model parameters.
The initialization of the model parameters, except for the VGG part participating in training, the convolution kernel parameters of the rest parts are initialized by using a Gaussian function, and the standard deviation of the parameters is set to be 0.01. Optimization of the model uses Adam algorithm to replace conventional random gradient descent algorithm, and in order to enable the model to be converged quickly, a fixed learning rate is set to be 1e-5
After a stable density prediction model is trained, a predicted density map of the image can be generated for an input image by using the model in practical application, and after the density map is obtained, the total number of people in the map can be obtained through summation of pixel points, which is the conventional calculation and is not repeated herein.

Claims (5)

1. The single image crowd counting method based on the depth information and the scale perception information is characterized by comprising the following steps of:
s1, performing Gaussian mapping on head center coordinate data corresponding to the input sample picture to generate a preliminary truth-value density map, and correcting the preliminary truth-value density map based on depth information obtained by a depth estimation algorithm to obtain a truth-value density map;
s2, predicting the crowd density map of the input sample picture by adopting a density estimation network to generate a predicted density map, calculating loss errors according to the predicted density map and the true density map, adjusting network parameters through gradient back propagation, and generating a density prediction model through iteration;
and S3, when counting the crowd of a single image, generating a predicted density map of the image by using a density prediction model, and calculating to obtain the total number of people in the image.
2. The method of claim 1,
it is characterized in that step S1 specifically includes:
s11, performing Gaussian distribution mapping on the head coordinate points in the sample picture label data by a Gaussian kernel with a fixed size, and superposing mapping values at all positions of the image to form a primary true-value density graph F1(x);
S12, carrying out Gaussian distribution mapping on the head coordinate points in the sample picture label data by using a geometric self-adaptive Gaussian kernel, and superposing mapping values at all positions of the image to form a primary true-value density graph F2(x);
S13, extracting depth information of each pixel position in the input sample picture by adopting a monocular depth estimation algorithm to form a depth estimation map depth (x);
s14, determining a final truth density map by using a threshold segmentation algorithm based on the information of the depth estimation map depth (x):
Figure FDA0002579100800000011
wherein, being a preset segmentation threshold, F1(i, j) represents the preliminary truth density plot F1(x) Value, F, corresponding to the middle coordinate (i, j)2(i, j) represents the preliminary truth density plot F2(x) The coordinate (i, j) in the Depth (x) represents a Depth value at the coordinate (i, j) in the Depth (x), and the M (i, j) represents a value corresponding to the coordinate (i, j) in the final truth density map.
3. The method of claim 1,
wherein, in step S2, the density estimation network includes: the system comprises a basic feature extraction module, a multi-scale capture module and a scale transfer module; the basic feature extraction module is used for extracting low-level features such as textures of the pictures; the multi-scale capturing module is used for further extracting picture features, fusing multi-scale information and storing detailed features of small targets; the scale transfer module is used for scale reduction of the feature map and lifting the feature map to the size of the input picture.
4. The method of claim 3, wherein the method for counting the population of single-image based on the depth information and the scale perception information,
the basic feature extraction module is composed of convolutional layers before conv4_3 in a VGG16 network; the multi-scale capturing module adopts four dense connecting layers, each layer uses a 3 multiplied by 3 convolution kernel to extract features, the resolution of a feature map is kept unchanged by using edge filling, and the growth rate of convolution is set to be 256; the scale transfer module adopts sub-pixel convolution to restore the scale of the feature map and increase the resolution of the feature map to the size of the input picture.
5. The method for counting the population of single-image based on depth information and scale perception information as claimed in any one of claims 1 to 4,
in step S2, calculating a loss error according to the predicted density map and the true density map specifically includes:
and measuring the error between the predicted density graph and the true density graph by using Euclidean distance as a loss function, wherein the expression is as follows:
Figure FDA0002579100800000021
wherein, F (X)i(ii) a Theta) is a predicted density map of the network output, theta represents a learning parameter in the network, XiThe ith picture that represents the input is shown,
Figure FDA0002579100800000022
truth density map representing the ith picture, N is trainingThe number of pictures.
CN202010662406.1A 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information Active CN111815665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010662406.1A CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010662406.1A CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Publications (2)

Publication Number Publication Date
CN111815665A true CN111815665A (en) 2020-10-23
CN111815665B CN111815665B (en) 2023-02-17

Family

ID=72841731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010662406.1A Active CN111815665B (en) 2020-07-10 2020-07-10 Single image crowd counting method based on depth information and scale perception information

Country Status (1)

Country Link
CN (1) CN111815665B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767451A (en) * 2021-02-01 2021-05-07 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network
CN112861718A (en) * 2021-02-08 2021-05-28 暨南大学 Lightweight feature fusion crowd counting method and system
CN113436239A (en) * 2021-05-18 2021-09-24 中国地质大学(武汉) Monocular image three-dimensional target detection method based on depth information estimation
CN113688747A (en) * 2021-08-27 2021-11-23 国网浙江省电力有限公司双创中心 Method, system, device and storage medium for detecting personnel target in image
CN113807274A (en) * 2021-09-23 2021-12-17 山东建筑大学 Crowd counting method and system based on image inverse perspective transformation
CN113869285A (en) * 2021-12-01 2021-12-31 四川博创汇前沿科技有限公司 Crowd density estimation device, method and storage medium
CN114926409A (en) * 2022-04-29 2022-08-19 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130142406A1 (en) * 2011-12-05 2013-06-06 Illinois Tool Works Inc. Method and apparatus for prescription medication verification
CN106295557A (en) * 2016-08-05 2017-01-04 浙江大华技术股份有限公司 A kind of method and device of crowd density estimation
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
CN109145708A (en) * 2018-06-22 2019-01-04 南京大学 A kind of people flow rate statistical method based on the fusion of RGB and D information
WO2019084854A1 (en) * 2017-11-01 2019-05-09 Nokia Technologies Oy Depth-aware object counting
CN109858424A (en) * 2019-01-25 2019-06-07 佳都新太科技股份有限公司 Crowd density statistical method, device, electronic equipment and storage medium
CN110765817A (en) * 2018-07-26 2020-02-07 株式会社日立制作所 Method, device and equipment for selecting crowd counting model and storage medium thereof
CN111126177A (en) * 2019-12-05 2020-05-08 杭州飞步科技有限公司 People counting method and device
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130142406A1 (en) * 2011-12-05 2013-06-06 Illinois Tool Works Inc. Method and apparatus for prescription medication verification
CN106295557A (en) * 2016-08-05 2017-01-04 浙江大华技术股份有限公司 A kind of method and device of crowd density estimation
CN107301387A (en) * 2017-06-16 2017-10-27 华南理工大学 A kind of image Dense crowd method of counting based on deep learning
CN107862261A (en) * 2017-10-25 2018-03-30 天津大学 Image people counting method based on multiple dimensioned convolutional neural networks
WO2019084854A1 (en) * 2017-11-01 2019-05-09 Nokia Technologies Oy Depth-aware object counting
CN109145708A (en) * 2018-06-22 2019-01-04 南京大学 A kind of people flow rate statistical method based on the fusion of RGB and D information
CN110765817A (en) * 2018-07-26 2020-02-07 株式会社日立制作所 Method, device and equipment for selecting crowd counting model and storage medium thereof
CN109858424A (en) * 2019-01-25 2019-06-07 佳都新太科技股份有限公司 Crowd density statistical method, device, electronic equipment and storage medium
CN111126177A (en) * 2019-12-05 2020-05-08 杭州飞步科技有限公司 People counting method and device
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DESEN ZHOU等: "《Cascaded Multi-Task Learning of Head Segmentation and Density Regression for RGBD Crowd Counting》", 《IEEE ACCESS》 *
陈朋等: "《多层次特征融合的人群密度估计》", 《中国图象图形学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767451A (en) * 2021-02-01 2021-05-07 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network
CN112767451B (en) * 2021-02-01 2022-09-06 福州大学 Crowd distribution prediction method and system based on double-current convolutional neural network
CN112861718A (en) * 2021-02-08 2021-05-28 暨南大学 Lightweight feature fusion crowd counting method and system
CN113436239A (en) * 2021-05-18 2021-09-24 中国地质大学(武汉) Monocular image three-dimensional target detection method based on depth information estimation
CN113688747A (en) * 2021-08-27 2021-11-23 国网浙江省电力有限公司双创中心 Method, system, device and storage medium for detecting personnel target in image
CN113688747B (en) * 2021-08-27 2024-04-09 国网浙江省电力有限公司双创中心 Method, system, device and storage medium for detecting personnel target in image
CN113807274A (en) * 2021-09-23 2021-12-17 山东建筑大学 Crowd counting method and system based on image inverse perspective transformation
CN113807274B (en) * 2021-09-23 2023-07-04 山东建筑大学 Crowd counting method and system based on image anti-perspective transformation
CN113869285A (en) * 2021-12-01 2021-12-31 四川博创汇前沿科技有限公司 Crowd density estimation device, method and storage medium
CN113869285B (en) * 2021-12-01 2022-03-04 四川博创汇前沿科技有限公司 Crowd density estimation device, method and storage medium
CN114926409A (en) * 2022-04-29 2022-08-19 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method
CN114926409B (en) * 2022-04-29 2024-05-28 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method

Also Published As

Publication number Publication date
CN111815665B (en) 2023-02-17

Similar Documents

Publication Publication Date Title
CN111815665B (en) Single image crowd counting method based on depth information and scale perception information
CN107154023B (en) Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN109886121B (en) Human face key point positioning method for shielding robustness
CN108648161B (en) Binocular vision obstacle detection system and method of asymmetric kernel convolution neural network
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN113052835B (en) Medicine box detection method and system based on three-dimensional point cloud and image data fusion
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN114863573B (en) Category-level 6D attitude estimation method based on monocular RGB-D image
CN112862792B (en) Wheat powdery mildew spore segmentation method for small sample image dataset
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN113065546A (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN111639571B (en) Video action recognition method based on contour convolution neural network
CN112465021B (en) Pose track estimation method based on image frame interpolation method
CN113610087B (en) Priori super-resolution-based image small target detection method and storage medium
CN111768415A (en) Image instance segmentation method without quantization pooling
CN114429555A (en) Image density matching method, system, equipment and storage medium from coarse to fine
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
WO2022052782A1 (en) Image processing method and related device
CN112084952B (en) Video point location tracking method based on self-supervision training
CN111414931A (en) Multi-branch multi-scale small target detection method based on image depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant