CN113887473A - Improved normalized deformable convolution population counting method - Google Patents

Improved normalized deformable convolution population counting method Download PDF

Info

Publication number
CN113887473A
CN113887473A CN202111204377.5A CN202111204377A CN113887473A CN 113887473 A CN113887473 A CN 113887473A CN 202111204377 A CN202111204377 A CN 202111204377A CN 113887473 A CN113887473 A CN 113887473A
Authority
CN
China
Prior art keywords
normalized
deformable
sampling point
offset
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111204377.5A
Other languages
Chinese (zh)
Other versions
CN113887473B (en
Inventor
吕伟刚
仲芯
覃静
张树刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202111204377.5A priority Critical patent/CN113887473B/en
Publication of CN113887473A publication Critical patent/CN113887473A/en
Application granted granted Critical
Publication of CN113887473B publication Critical patent/CN113887473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer vision, and particularly relates to a method for estimating the number of people by using a convolutional neural network. A method for population counting based on improved normalized deformable convolution comprising: constructing a normalized deformable convolution neural network; and constraining the position of the characteristic graph sampling point of the input image by utilizing the normalized deformable convolution neural network to obtain the accurate human head characteristic. The invention provides a normalized deformable convolution (NDConv), which limits the offset of a sampling point to a certain extent, can obtain information in an effective area of the sampling point, does not increase extra calculated amount, and improves the accuracy of the neural network prediction crowd counting.

Description

Improved normalized deformable convolution population counting method
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a method for estimating the number of people by using a convolutional neural network.
Background
With the rapid expansion of the world population and the rapid progression of urbanization, the likelihood of population crowding has increased dramatically. In some casesNext, the stepping event in a large-scale activity may cause harm to public safety, and a prerequisite for preventing public safety crisis caused by an excessively large number of people is to monitor the number of people and keep the scale thereof within a certain range. The population has been counted in some studies[1]-[5]The above idea is implemented by estimating the number of people in a public area (such as a campus, a shopping mall, and a train station). Moreover, remote intelligent camera equipment[6]-[8]The improvement of the advancement provides a good hardware foundation for the research related to the population counting. Thus, people counting has gained a wide range of attention in the field of computer vision.
Dense crowd has the characteristics of irregular shape, high noise, partial shielding and the like, so that the crowd still has a larger promotion space in the aspects of accurately counting and constructing data sets of different scenes and people numbers. At present, population counting algorithms can be divided into three categories: method based on detection[9]-[10]Feature-based regression method[11]-[12]And a method based on a convolutional neural network[13]-[15]. In contrast, the convolutional neural network-based approach performs better in terms of computational accuracy, training efficiency, and robustness. Although the research of population counting by using the convolutional neural network has made great progress, the depth model in the network cannot adapt to the geometric change of the scale, and is still an obstacle to the improvement of population counting precision. Method based on multi-scale feature fusion[16]-[17]The above problems are effectively solved, however, there are some limitations. Firstly, the network parameters and the computation time are significantly increased and the training efficiency is greatly reduced. Furthermore, these methods do not effectively solve the convolution geometry transformation problem.
To address the above problems, Microsoft Asian institute has proposed a deformable convolution (Deformable convolution)[18]To enhance the ability of convolutional neural networks to model geometric transformations, it can model continuous scale features to handle population counting problems. Then, they improve the deformable convolution by providing the network with additional convolution layer and a method of adding weight to offset for accurate feature extraction, and propose a deformable convolution v2 version (deformable convolution v2)[19]More advanced performance is achieved. However, it is difficult to sample the crowd-sourcing feature directly and uniformly, since neither version of the deformable convolution can control the offset of the sample points. Thus, there is still room for great improvement in the performance of deformable convolution. Based on the foregoing discussion, the present invention recognizes that it is difficult for a deformable convolutional network to uniformly sample and collect rich information. Because the offset is not restricted, the shape of the sampling area is not a regular graph any more, so that the head characteristics of the crowd can not be uniformly sampled, and the accurate prediction of the number of people in the crowd can not be realized.
Disclosure of Invention
The invention aims to provide a population counting method based on improved normalized deformable convolution, which realizes the uniform distribution of sampling points in the deformable convolution and controls the offset of the sampling points by adding normalized deformable loss in the deformable convolution, thereby obtaining more complete human head characteristic information and finally realizing the remarkable improvement of performance.
In order to achieve the purpose, the invention adopts a technical scheme that: a method for population counting based on improved normalized deformable convolution comprising: constructing a normalized deformable convolution neural network; and constraining the position of the characteristic graph sampling point of the input image by utilizing the normalized deformable convolution neural network to obtain the accurate human head characteristic.
Further, the normalized deformable convolution neural network mainly consists of a modified VGG-16 network; 5 layers of expansion convolution and the last layer of normalized deformable convolution are arranged before the pooling layer of the improved VGG-16 network; the normalized deformable convolution uses the following loss function:
Figure BDA0003306248840000021
wherein the content of the first and second substances,
Figure BDA0003306248840000022
which represents the total loss of training,
Figure BDA0003306248840000023
is the loss of density;
Figure BDA0003306248840000024
to normalize the deformable losses; λ is a regularization coefficient, and the value range is (0, 1).
Further, the step of calculating the normalized deformable loss comprises:
(1) constraining the positions of the center sample point E, the horizontal sample point D, F, the vertical sample point B, H and the diagonal sample point A, C, G, I of the feature map obtained by convolution:
for the center sample point E, the loss equation is:
Figure BDA0003306248840000025
wherein, Delta Ex、ΔEyRepresenting the offset of the central sampling point E in the horizontal direction and the vertical direction relative to the sampling point E before offset;
for a horizontal sampling point, the loss formula is:
Figure BDA0003306248840000026
wherein, Δ Dx、ΔDyRepresenting the offset amount of the horizontal sampling point D in the horizontal direction and the vertical direction relative to the sampling point D before offset; Δ Fx、ΔFyRepresenting the offset amount of the horizontal sampling point F in the horizontal direction and the vertical direction relative to the sampling point F before offset;
for a vertical sampling point, the loss formula is:
Figure BDA0003306248840000031
wherein, Delta Bx、ΔByRepresenting the offset amount of the horizontal sampling point B in the horizontal direction and the vertical direction relative to the sampling point B before offset; Δ Hx、ΔHyRepresenting the offset of the horizontal sampling point H in the horizontal direction and the vertical direction relative to the sampling point H before offset;
for diagonal sampling points, the loss formula is:
Figure BDA0003306248840000032
Figure BDA0003306248840000033
Figure BDA0003306248840000034
Figure BDA0003306248840000035
wherein a, b, c, d, e, f, g, h and i respectively represent coordinates of sampling points before shifting;
(2) calculating normalized deformable loss:
Figure BDA0003306248840000036
further, the density loss formula is as follows:
Figure BDA0003306248840000037
wherein, YiIs a density map of the number of persons, P (I)i(ii) a Φ) is the density map of the estimated population, and N is the batch size.
Further, the front-end convolutional layer of the improved VGG-16 network is subjected to batch normalization operations.
Compared with the prior art, the invention has the beneficial effects that:
(1) a normalized deformable convolution (NDConv) is provided, which limits the offset of a sampling point to a certain extent, can obtain all information in the effective area of the sampling point, and does not increase extra calculation amount; (2) normalized deformable convolution (NDConv) achieves better performance gains over multiple crowd count data sets than existing methods.
Drawings
Fig. 1 (a) shows an offset of a post-offset sample with respect to a pre-offset sample obtained by a conventional deformable convolution; (b) the offset of the sampling point after the offset relative to the sampling point before the offset is obtained by the normalized deformable convolution;
FIG. 2 is a schematic structural diagram of a normalized deformable convolutional neural network proposed in the present invention;
fig. 3 is a visualization picture display of part of the original picture and the mark point of the OUC _ crown data set proposed by the present invention.
Detailed Description
In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The embodiment of the invention provides an improved normalized deformable convolution-based crowd counting method, which mainly comprises the following steps:
firstly, constructing a normalized deformable convolution neural network.
The normalized deformable convolution neural network constructed by the invention mainly consists of an improved CSRNet network, wherein the CSRNet network[20]The original front-end network model is VGG-16, a Batch Normalization layer (Batch norm) is inserted between adjacent convolutional layers of the VGG-16 network, Batch Normalization operation is carried out, and the VGG-16-BN network model is generated and named as the VGG-16-BN network.
Before the VGG-16-BN pooling layer, 6 layers of convolution are added, wherein the first 5 layers are expansion convolution, and the last layer is normalized deformable convolution, so that the normalized deformable convolution neural network is formed, and is recorded as NDConv, and the network structure is shown in figure 2.
Secondly, utilize the normalized deformable convolution neural network (NDConv) of structure to retrain the characteristic map sampling point position offset of input image, obtain more accurate people's head characteristic to promote crowd's counting accuracy, specifically do:
(1) with convolution of 3 × 3, 9 shifted samples, namely a center sample E, a horizontal sample D, F, a vertical sample B, H, and a diagonal sample A, C, G, I, are obtained for each convolution.
Fig. 1 (a) shows the offset of a post-offset sampling point relative to a pre-offset sampling point obtained by using a conventional deformable convolution; (b) the offset of the sampling point after the offset relative to the sampling point before the offset, which is obtained by adopting the normalized deformable convolution network, is shown; the shifted sampling points are represented by capital letters A-I, as indicated by dots in (a), (b) of FIG. 1; the sample points before offset are indicated by lower case letters a-i, as shown by the triangles in (a), (b) of fig. 1; the length of the arrow indicates the magnitude of the offset.
(2) And carrying out normalized constraint on sampling points in the horizontal direction and the vertical direction, and enabling the sampling points to be located on the coordinate axes as much as possible, namely B, D, F and H to be close to the coordinate axes as much as possible. For the center sample point E, it is made as close to the origin E as possible. For the diagonal sampling points A, C, G and I, the parallelogram principle is used, so that the sampling points A, C, G and I and other points can respectively form 4 parallelograms. These four parallelograms are: EBAD, EBCF, EHIF, EHGD.
The coordinates of a-i are respectively: a (-r, -r), b (0, -r), c (r, -r), d (-r,0), e (0,0), f (r,0), g (-r, r), h (-r,0), i (r, r).
The offset of point a from point a is denoted as Δ a and is expressed as: (Delta A)x,ΔAy) Then, the coordinates of point a are as shown in equation (1):
Ax=-r+ΔAx
Ay=-r+ΔAy; (1)
the offset of point E from sample point E is denoted as deltae,expressed as: (Δ E)x,ΔEy);
The offset of point B from sample point B is denoted Δ B and is expressed as: (Delta B)x,ΔBy);
The offset of point C from sample point C is denoted Δ C and is expressed as: (Δ C)x,ΔCy);
The offset of point D from sample point D is denoted as Δ D and is represented as: (Δ D)x,ΔDy);
The offset of point F from sample point F is denoted as Δ F and is expressed as: (Δ F)x,ΔFy);
The offset of point G from sample point G is denoted Δ G and is expressed as: (Δ G)x,ΔGy);
The offset of point H from sample point H is denoted Δ H and is expressed as: (Δ H)x,ΔHy);
The offset of point I from sample point I is denoted as Δ I and is expressed as: (Delta I)x,ΔIy);
Thus, the normalized deformable loss (NDLoss) includes 5 parts in total:
for sampling point E, the position of E point is kept unchanged as much as possible, and the loss is shown in formula (2):
Figure BDA0003306248840000051
for two horizontal sampling points D, F, trying to make D and F on the horizontal axis (i.e., the vertical component of the offset is as 0 as possible), while keeping both points D and F near or far from the origin at the same distance, the penalty is given by equation (3):
Figure BDA0003306248840000052
for two vertical sample points B, H, trying to make B and H on the vertical axis (i.e., the horizontal component of the offset is as much as 0), while keeping both points B and H as close to or as far from the origin as possible, the penalty is given by equation (4):
Figure BDA0003306248840000061
for the four diagonal sampling points A, C, G, I, the diagonal points a, C, G and I are made to form four parallelograms with other points around as shown by the dotted lines, and the loss is shown in equation (5):
Figure BDA0003306248840000062
in formula (5), a, b, c, d, e, f, g, h, and i respectively represent coordinates of corresponding points, that is, a ═ r, -r, b ═ 0, -r, c ═ r, -r, d ═ r,0, e ═ 0, f ═ r,0, g ═ r, h ═ r,0, and i ═ r, r.
Thus, the normalized deformable convolution (NDConv) loss is shown in equation (6):
Figure BDA0003306248840000063
the density estimation loss is shown in equation (7):
Figure BDA0003306248840000064
wherein, YiIs a density map of the number of persons, P (I)i(ii) a Φ) is a density map of the estimated number of people, and N is the batch size (batch size).
The final loss is shown in equation (8):
Figure BDA0003306248840000065
wherein, λ is a regularization coefficient, and the value range is (0, 1).
In order to examine the performance of the normalized deformable convolution neural network and the method provided by the invention, the invention adopts four methodsThe data set was trained and validated as ShanghaiTech, respectively[5]、UCF-QRNF[21]、UCF_CC_50[22]And OUC _ Crowd as proposed by the present invention. The details of the experiment are described below:
1. data set and implementation details
1) Data set
The difference in performance between the normalized deformable convolution population counting method of the present invention and other existing population count prediction methods was evaluated on three datasets ShanghaiTech, UCF-QNRF and UCFCC50 commonly used for population counting, as well as the OUC _ Crowd dataset proposed by the present invention.
Shanghaitech the Shanghaitech dataset consists of part A and part B. ShanghaiTech a contains 482 pictures, unlike ShanghaiTech a, ShanghaiTech B contains many more high resolution images, with a population number from 9 to 578, and a total of 716 pictures of the population.
UCF _ QNRF: compared to the ShanghaiTech dataset, UCF _ QNRF is a population count dataset containing a larger number of pictures, containing 1535 high resolution images and 125 ten thousand head marker points. In this data set there were 1201 images of extremely crowded scenes, the rest being test images, the maximum population count in one image being 12865.
UCF _ CC _ 50: the data set contains 50 images. Although the number of images is small, the number of people per image varies greatly, from 94 to 4543. According to the existing research[22]The data set was divided into five subsets and then 5-fold cross-validation was performed.
OUC _ Crowd: OUC _ Crowd is a data set constructed by the invention, consists of 529 Crowd images shot in different scenes of a campus, is a data set with sparse Crowd and variable scenes, and has 379 training images in total, and the rest are test images, wherein the minimum number of people in each image is 1. The data set comprises crowd pictures of different indoor and outdoor scenes such as campus roads, playgrounds, classrooms and gymnasiums, in order to ensure the marking accuracy and reduce the interference of marking errors on experimental results, after the head of each picture is marked manually, secondary inspection is carried out by utilizing a marking point visualization mode to remove missed pictures, as shown in fig. 3, the first action is an original picture, the second action is a picture marked manually, and the marking points are shown as white dots in the picture.
2) Evaluation index
Reference to reported studies[23]-[25]The present invention adopts the Mean Absolute Error (Mean Absolute Error) and the Root Mean Square Error (Root Mean Square Error) as evaluation indexes, and the definitions of the Mean Absolute Error and the Root Mean Square Error are shown in the following formulas (9) and (10):
Figure BDA0003306248840000071
Figure BDA0003306248840000072
where N is the number of pictures, Ci and
Figure BDA0003306248840000073
true population counts and predicted population counts.
3) Implementation details
CSRNet[20]The original front-end network model is VGG-16[26]In order to further improve the accuracy of the network model training result, Batch Normalization (Batch Normalization) operation is carried out on the convolution layer in the VGG-16 to generate the VGG-16-BN network model. Next, for CSRNet consisting of VGG-16-BN[20]The main network is modified, namely 6 layers of expansion convolutions are added before the VGG-16-BN pooling layer, and the last layer of expansion convolution is replaced by the deformable convolution. After making the above changes, the new network CSRNet becomes the performance starting point (baseline) of the experiment.
And replacing the last expansion convolution of the 6 layers of expansion convolutions in the VGG-16-BN with the normalized deformable convolution, and recording the expansion convolution as NDConv, namely the normalized deformable convolution neural network constructed by the invention.
During training Adam is used[27]The learning rate of the optimizer is 1e-4,the pixel size of all images was adjusted to 400 × 400, the batch size (batch size) was set to 4, and the regularization parameter λ was set to 0.001. The evaluation begins at the 100 th epoch and the training process ends at the 400 th epoch.
2. Evaluation and comparison
Results of experiments on ShanghaiTech a and B data sets are shown in table 1. Compared with the existing method, the method provided by the invention has the best performance, and proves the advantage of restricting the offset in the deformable convolution. For shanghaiitecha, the method of the invention achieved an optimum mean absolute error of 61.4, a 5.5% improvement in performance compared to the starting point of performance (baseline) CSRNet. Evaluation of Shanghai TechB did not yield expectations, CFF[28]And SPN[29]When the prior method reaches the average absolute error of 7.2, the method of the invention obtains 7.8, but compared with CSRNet, the performance of the normalized deformable convolution reaches 13.3 percent.
Table 1: comparison of normalized Deformable convolution and results of other prior art methods in ShanghaiTechA and ShanghaiTechB
Figure BDA0003306248840000081
Figure BDA0003306248840000091
The results of the comparison between the method of the present invention and other prior art methods for the data sets UCF _ QNRF and UCF _ CC _50 are recorded in table 2. Compared to the performance starting point CSRNet, the method of the invention achieves a significant gain in mean absolute error. On both data sets UCF _ QNRF and UCF _ CC _50, mean absolute errors of 91.2 and 167.2 were obtained, respectively, achieving 4.5% and 4.2% performance improvement. It is worth mentioning that, compared with other existing methods, on the premise that the CSRNet performance is greatly improved, the method further reduces the average absolute error, and thus, the effectiveness of the normalized deformable convolution provided by the invention in constraining the offset can be proved.
Table 2: comparison of experimental results of normalized deformable convolution and other existing methods at UCF _ QNRF and UCF _ CC _50
Figure BDA0003306248840000092
3. Ablation experiment
In the experiment, the influence of the number of deformable convolution layers in the network on the training result is shown firstly, so that the reliability of parameter selection in the experiment is proved. On the other hand, the significance of constraint offsets and the effectiveness of normalizing deformable losses are verified compared to hard constraints.
(1) The original network expansion convolution is replaced by the deformable convolution and the normalized deformable convolution, and the influence of the replacing layer number on the network performance
In CSRNet[20]The network structure of (1) has 6 layers of expansion convolution, and each layer of the expansion convolution is gradually replaced by deformable convolution from back to front. Each layer of the deformable convolution is then replaced with a normalized deformable convolution and the performance of the network is shown in table 3. As the number of alternative convolution layers increases, the performance of both the performance starting point CSRNet and the normalized deformable convolution decreases, with the mean absolute error result peaks at 167.2 and 184.6, respectively, and the CSRNet peak at 172.9 and 187.4 MAEs, respectively. As can be seen from table 3, the best experimental results can be obtained by replacing only the last layer of the dilation convolution.
Table 3: influence of number of layers of deformable convolution on experimental results
Figure BDA0003306248840000101
2) Comparison with hard constraints
To verify the effectiveness of the present invention in normalizing the deformable loss (NDloss), only the x-component of the offset on the y-axis is removed, which is referred to as the hard constraint. The performance effect of the hard constraint is then compared to the normalized deformation loss (NDloss), which results in a better performance effect than the hard constraint, as shown in table 4. For the hard constraint, the mean absolute error is found to be 96.1. Hard constraints did not have significant performance improvement compared to baseline CSRNet. The normalized deformable loss (NDloss) proposed by the present invention is demonstrated to be superior to the existing deformable convolution by comparing performance with hard constraints.
Table 4: comparison of hard constraint and NDConv Experimental results
Figure BDA0003306248840000111
The final experimental results are shown in table 5, the performance starting point (CSRNet) and NDConv proposed by the invention both reach a better average absolute error of 15.3, and the experimental results are not changed after many times of training and testing. This result indicates that the normalized deformable convolution NDConv still has great improvement and room for improvement in improving the performance of sparse crowd and scene-variant datasets.
Table 5: experimental results on the OUC _ Crowd data set
Figure BDA0003306248840000112
The invention provides a normalized deformable convolution NDConv, and a new normalized deformable loss (NDloss) plays an important role in the normalized deformable convolution (NDConv). On the premise of not increasing the calculated amount, the offset of sampling points in the deformable convolution is limited, so that the sampling points are more uniform, and the information sampling is more sufficient. The normalized deformable convolution (NDConv) has the advantages of rapid and accurate information acquisition, small information loss and the like. The experiment of the invention takes the crowd counting network as an example, and the result shows that the normalized deformable convolution (NDConv) can effectively improve the network performance and improve the accuracy of crowd counting prediction.
Reference documents:
[1]Wang Q,Gao J,Lin W,et al.NWPU-Crowd:A Large-Scale Benchmark for Crowd Counting and Localization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020:3013269.
[2]Mazzeo P L,Contino R,Spagnolo P,et al.MH-MetroNet—A Multi-Head CNN for Passenger-Crowd Attendance Estimation[J].Journal of Imaging,2020,6(7):62.
[3]Sindagi V A,Patel V M.Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs[C]//2017IEEE International Conference on Computer Vision(ICCV),2017,206:1879-1888.
[4]Zan S,Yi X,Ni B,et al.Crowd Counting via Adversarial Cross-Scale Consistency Pursuit[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2018:00550.
[5]Zhang Y,Zhou D,Chen S,et al.Single-Image Crowd Counting via Multi-Column Convolutional Neural Network[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2016,70:589-597.
[6]Feris,R.S,et al.Large-Scale Vehicle Detection,Indexing,and Search in Urban Surveillance Videos[J].Multimedia,IEEE Transactions on,2012,14(1):28-42.
[7]Wang G,Li B,Zhang Y,et al.Background Modeling and Referencing for Moving Cameras-Captured Surveillance Video Coding in HEVC[J].IEEE Transactions on Multimedia,2018:2921-2934.
[8]Ran E,Moses Y.Tracking in a Dense Crowd Using Multiple Cameras[J].International Journal of Computer Vision,2010,88(1):129-143.
[9]Brostow G J,Cipolla R.Unsupervised Bayesian Detection of Independent Motion in Crowds[C]//2006 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2006:320.
[10]Rabaud V,Belongie S.Counting Crowded Moving Objects[C]//2006 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2006:92.
[11]Chan A B,Vasconcelos N.Bayesian Poisson regression for crowd counting[C]//2010 IEEE International Conference on Computer Vision(ICCV),2010:5459191.
[12]Ke C,Chen C L,Gong S,et al.Feature Mining for Localised Crowd Counting[C]//2013 British Machine Vision Conference(BMVC),2013:1–11.
[13]Zitouni M S,Bhaskar H,Dias J,et al.Advances and trends in visual crowd analysis:A systematic survey and evaluation of crowd modelling techniques[J].Neurocomputing,2016,186:139-159.
[14]Sindagi V A,Patel V M.A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation[J].Pattern Recognition Letters,2017,107:3-16.
[15]Chong S,Ai H,Bo B.End-to-end crowd counting via joint learning local and global count[C]//2016 IEEE International Conference on Image Processing(ICIP),2016:7532551.
[16]Wang X,Lv R,Zhao Y,et al.Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting[C]//2020 15th IEEE International Conference on Signal Processing(ICSP),2020:9321067.
[17]Zhu L,Zhang H,Ali S,et al.Crowd counting via Multi-Scale Adversarial Convolutional Neural Networks[J].Journal of Intelligent Systems,2020,30(1):180-191.
[18]Dai J,Qi H,Xiong Y,et al.Deformable Convolutional Networks[C]//2017 IEEE International Conference on Computer Vision(ICCV),2017:764-773.
[19]Zhu X,Hu H,Lin S,et al.Deformable ConvNets V2:More Deformable,Better Results[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2019:00953.
[20]Li Y,Zhang X,Chen D.CSRNet:Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2018:00120.
[21]Idrees H,Tayyab M,Athrey K,et al.Composition Loss for Counting,Density Map Estimation and Localization in Dense Crowds[J].Lecture Notes in Computer Science,2018:544-559.
[22]Idrees H,Saleemi I,Seibert C,et al.Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2013:329.
[23]Yang Y,Li G,Du D,et al.Embedding Perspective Analysis Into Multi-Column Convolutional Neural Network for Crowd Counting[J].IEEE Transactions on Image Processing,2021,30:1395-1407.
[24]Yan Z,Y Yuan,Zuo W,et al.Perspective-Guided Convolution Networks for Crowd Counting[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV),2019:00104.
[25]Liu N,Long Y,Zou C,et al.ADCrowdNet:An Attention-Injective Deformable Convolutional Network for Crowd Understanding[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2019:3220-3229.
[26]Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition[J].Computer Science,arXiv preprint arXiv:1409.1556,2014.
[27]Kingma D,Ba J.Adam:AMethod for Stochastic Optimization[J].Computer Science,2014:273-297.
[28]Shi Z,Mettes P,Snoek C.Counting with Focus for Free[C]//2019 IEEE International Conference on Computer Vision(ICCV),2019:00430.
[29]Xu C,Qiu K,Fu J,et al.Learn to Scale:Generating Multipolar Normalized Density Maps for Crowd Counting[C]//2019 IEEE International Conference on Computer Vision(ICCV),2019:00847.
[30]Cao X,Wang Z,Zhao Y,et al.Scale Aggregation Network for Accurate and Efficient Crowd Counting[J].Lecture Notes in Computer Science,2018:757-773.
[31]Jiang X,Xiao Z,Zhang B,et al.Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2019:00629.
[32]Liu Y,Shi M,Zhao Q,et al.Point in,Box Out:Beyond Counting Persons in Crowds[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2019:00663.
[33]Jiang X,Zhang L,Zhang T,et al.Density-Aware Multi-Task Learning for Crowd Counting[J].IEEE Transactions on Multimedia,2020:2980945.
[34]Liu L,Jiang J,Jia W,et al.DENet:AUniversal Network for Counting Crowd with Varying Densities and Scales[J].IEEE Transactions on Multimedia,2020:2992979.
[35]Krizhevsky Alex,Sutskever I,Hinton G.ImageNet Classification with Deep Convolutional Neural Networks[J].Communications of the ACM,2017,60(6):84-90.

Claims (5)

1. a population counting method based on improved normalized deformable convolution is characterized by comprising the following steps: constructing a normalized deformable convolution neural network; and constraining the position of the characteristic graph sampling point of the input image by utilizing the normalized deformable convolution neural network to obtain the accurate human head characteristic.
2. The improved normalized deformable convolution population counting based method of claim 1, wherein the normalized deformable convolution neural network consists essentially of an improved VGG-16 network; 5 layers of expansion convolution and the last layer of normalized deformable convolution are arranged before the pooling layer of the improved VGG-16 network; the normalized deformable convolution uses the following loss function to constrain the training loss:
Figure FDA0003306248830000011
wherein the content of the first and second substances,
Figure FDA0003306248830000012
which represents the total loss of training,
Figure FDA0003306248830000013
is the loss of density;
Figure FDA0003306248830000014
to normalize the deformable losses; λ is a regularization coefficient, and the value range is (0, 1).
3. The improved normalized deformable convolution based population counting method of claim 2, wherein the normalized deformable loss is calculated by the steps of:
(1) constraining the positions of the center sample point E, the horizontal sample point D, F, the vertical sample point B, H and the diagonal sample point A, C, G, I of the feature map obtained by convolution:
for the center sample point E, the loss equation is:
Figure FDA0003306248830000015
wherein, Delta Ex、ΔEyRepresenting the offset of the central sampling point E in the horizontal direction and the vertical direction relative to the sampling point E before offset;
for a horizontal sampling point, the loss formula is:
Figure FDA0003306248830000016
wherein, Δ Dx、ΔDyRepresenting the offset amount of the horizontal sampling point D in the horizontal direction and the vertical direction relative to the sampling point D before offset; Δ Fx、ΔFyRepresenting the offset amount of the horizontal sampling point F in the horizontal direction and the vertical direction relative to the sampling point F before offset;
for a vertical sampling point, the loss formula is:
Figure FDA0003306248830000017
wherein, Delta Bx、ΔByRepresenting the offset amount of the horizontal sampling point B in the horizontal direction and the vertical direction relative to the sampling point B before offset; Δ Hx、ΔHyRepresenting the offset of the horizontal sampling point H in the horizontal direction and the vertical direction relative to the sampling point H before offset;
for diagonal sampling points, the loss formula is:
Figure FDA0003306248830000021
Figure FDA0003306248830000022
Figure FDA0003306248830000023
Figure FDA0003306248830000024
wherein a, b, c, d, e, f, g, h and i respectively represent coordinates of sampling points before shifting;
(2) calculating normalized deformable loss:
Figure FDA0003306248830000025
4. the improved normalized deformable convolution based population counting method of claim 2, wherein the density loss formula is:
Figure FDA0003306248830000026
wherein, YiIs a density map of the number of real persons, P (I)i(ii) a Φ) is the density map of the estimated population, and N is the batch size.
5. The improved normalized deformable convolution population counting based method according to any one of claims 2-4, wherein the front-end convolution layer of the improved VGG-16 network is subjected to a batch normalization operation.
CN202111204377.5A 2021-10-15 2021-10-15 Normalized deformable convolution crowd counting method based on improvement Active CN113887473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111204377.5A CN113887473B (en) 2021-10-15 2021-10-15 Normalized deformable convolution crowd counting method based on improvement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111204377.5A CN113887473B (en) 2021-10-15 2021-10-15 Normalized deformable convolution crowd counting method based on improvement

Publications (2)

Publication Number Publication Date
CN113887473A true CN113887473A (en) 2022-01-04
CN113887473B CN113887473B (en) 2024-04-26

Family

ID=79003080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111204377.5A Active CN113887473B (en) 2021-10-15 2021-10-15 Normalized deformable convolution crowd counting method based on improvement

Country Status (1)

Country Link
CN (1) CN113887473B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214443A (en) * 2018-08-24 2019-01-15 北京第视频科学技术研究院有限公司 Car license recognition model training method, licence plate recognition method, device and equipment
US20190130575A1 (en) * 2017-10-30 2019-05-02 Beijing Curacloud Technology Co., Ltd. Systems and methods for image segmentation using a scalable and compact convolutional neural network
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
RU2742701C1 (en) * 2020-06-18 2021-02-09 Самсунг Электроникс Ко., Лтд. Method for interactive segmentation of object on image and electronic computing device for realizing said object
CN112381723A (en) * 2020-09-21 2021-02-19 清华大学 Light-weight and high-efficiency single image smog removing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130575A1 (en) * 2017-10-30 2019-05-02 Beijing Curacloud Technology Co., Ltd. Systems and methods for image segmentation using a scalable and compact convolutional neural network
CN109214443A (en) * 2018-08-24 2019-01-15 北京第视频科学技术研究院有限公司 Car license recognition model training method, licence plate recognition method, device and equipment
CN111242036A (en) * 2020-01-14 2020-06-05 西安建筑科技大学 Crowd counting method based on encoding-decoding structure multi-scale convolutional neural network
RU2742701C1 (en) * 2020-06-18 2021-02-09 Самсунг Электроникс Ко., Лтд. Method for interactive segmentation of object on image and electronic computing device for realizing said object
CN112381723A (en) * 2020-09-21 2021-02-19 清华大学 Light-weight and high-efficiency single image smog removing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
严芳芳;吴秦;: "多通道融合分组卷积神经网络的人群计数算法", 小型微型计算机***, no. 10, 15 October 2020 (2020-10-15) *
刘鹏;杜佳芝;吕伟刚;窦明武: "面向不平衡数据集的一种改进的k-近邻分类器", 东北大学学报(自然科学版), no. 007, 31 December 2019 (2019-12-31) *
吴昊昊;王方石;: "多尺度膨胀卷积在图像分类中的应用", 计算机科学, no. 1, 15 June 2020 (2020-06-15) *

Also Published As

Publication number Publication date
CN113887473B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN111723693B (en) Crowd counting method based on small sample learning
CN105741252B (en) Video image grade reconstruction method based on rarefaction representation and dictionary learning
CN109670462B (en) Continue tracking across panorama based on the aircraft of location information
CN111915484A (en) Reference image guiding super-resolution method based on dense matching and self-adaptive fusion
CN110490913B (en) Image matching method based on feature description operator of corner and single line segment grouping
CN113837147B (en) Transform-based false video detection method
CN109034065B (en) Indoor scene object extraction method based on point cloud
CN111860651B (en) Monocular vision-based semi-dense map construction method for mobile robot
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
CN107154017A (en) A kind of image split-joint method based on SIFT feature Point matching
CN107609571A (en) A kind of adaptive target tracking method based on LARK features
Shi et al. Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-CNN structure for face super-resolution
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
Guo et al. Deep illumination-enhanced face super-resolution network for low-light images
Wei et al. MSPNET: Multi-supervised parallel network for crowd counting
Chen et al. Robust face super-resolution via position relation model based on global face context
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
Wu et al. Cranet: cascade residual attention network for crowd counting
CN106934395B (en) Rigid body target tracking method adopting combination of SURF (speeded Up robust features) and color features
CN117115359A (en) Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN113887473B (en) Normalized deformable convolution crowd counting method based on improvement
CN113792746B (en) Yolo V3-based ground penetrating radar image target detection method
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video
CN115222959A (en) Lightweight convolutional network and Transformer combined human body key point detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant