CN116704432A - Multi-mode feature migration crowd counting method and device based on distribution uncertainty - Google Patents

Multi-mode feature migration crowd counting method and device based on distribution uncertainty Download PDF

Info

Publication number
CN116704432A
CN116704432A CN202310594811.8A CN202310594811A CN116704432A CN 116704432 A CN116704432 A CN 116704432A CN 202310594811 A CN202310594811 A CN 202310594811A CN 116704432 A CN116704432 A CN 116704432A
Authority
CN
China
Prior art keywords
crowd
counting
mode
modal
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310594811.8A
Other languages
Chinese (zh)
Inventor
曹亚如
朱鹏飞
曹兵
孙一铭
胡清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202310594811.8A priority Critical patent/CN116704432A/en
Publication of CN116704432A publication Critical patent/CN116704432A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-mode dynamic feature migration crowd counting method and device based on distribution uncertainty, wherein the method comprises the following steps: the multi-mode dynamic characteristic migration counting network uses a ResNet101 residual network as a backbone, extracts multi-mode characteristics by a double-flow network structure, and generates a crowd probability density map; in the multi-modal feature extraction process, carrying out inter-modal feature information interaction according to a channel dynamic interaction mechanism based on distribution uncertainty; acquiring a probability density map of the double-flow characteristic extraction network output, and realizing the self-adaptive fusion of the multi-mode output result through a decision-level self-adaptive fusion module; and after the fused multi-mode crowd probability density map is obtained, integrating the density map to obtain the crowd number. The device comprises: a processor and a memory. The invention avoids the disadvantage that only a single mode image is used for counting under the bad imaging condition, and better utilizes the multi-mode characteristic to improve the crowd counting performance.

Description

Multi-mode feature migration crowd counting method and device based on distribution uncertainty
Technical Field
The invention relates to the fields of multi-mode image processing and crowd counting, in particular to a multi-mode dynamic characteristic migration crowd counting method and device based on distribution uncertainty.
Background
The crowd counting task in computer vision aims at estimating the crowd number in social scenes such as real-time people flow monitoring, traffic control, public area people flow analysis and the like by using a crowd counting algorithm for carrying out statistical analysis on crowd density. The control of crowd density is increasingly important for normal human life production, and the problem can be effectively solved by the efficient crowd counting method. In recent years, the task is focused by researchers related to the field of computer vision, and the number of people is calculated by generating a pixel-level crowd density map through the evolution of a crowd counting method based on detection, regression and other methods.
In related method aspects, previous scene analysis efforts have mostly been based on visual data. However, the visible data may have the disadvantages of illumination variation and poor night imaging conditions, and the research is only performed under the view angle of the visible light, which may cause difficulty in accurately acquiring the crowd because of the view limitation and information shielding of the visible light picture. The defects that only visible light data is used can be effectively avoided through the thermal infrared data shot by the thermal infrared camera, and meanwhile, the visible light data can also avoid the highlight interference in the thermal infrared data, so that the two images of RGB-T (visible light-infrared) have complementary advantages and can effectively sense scenes in the daytime and at night. At present, the effectiveness of RGB-T (visible light-infrared) in promoting image analysis is proved by multi-mode fusion, but the advantages of RGB and T modes can not be dynamically utilized, and the advantages of the two modes can not be flexibly complemented to obtain a better counting effect.
Multimode learning has received increasing attention in the field of computer vision, and multimode fusion has also proven to be capable of performing feature complementation by effectively utilizing the advantages of different modalities. The most critical issue in multi-modal fusion is how to achieve optimal information complementation while maintaining the specificity of each modality. Most of the methods adopt early fusion input or post fusion of extracted features, so that deeper feature information in different modes cannot be mined, and dynamic decision fusion cannot be performed aiming at the specificity of each group of pictures. How to better utilize information of RGB and T modes for multi-mode fusion is the research content of many works in recent years.
Disclosure of Invention
The invention provides a distribution uncertainty-based multi-mode dynamic feature migration crowd counting method and device, and designs a distribution uncertainty-based multi-mode interaction mechanism, so that multi-mode dynamic interaction in a channel dimension is realized in a feature extraction process, and the feature extraction effect of each mode is improved through bidirectional information migration; the invention designs a self-adaptive decision-level fusion strategy, so that a training model overcomes the influence of unpredictable imaging conditions, thereby generating a more reliable density map, improving crowd counting performance, and being described in detail below:
in a first aspect, a method for counting a population of multi-modal dynamic feature migration based on distribution uncertainty, the method comprising:
the multi-mode dynamic characteristic migration counting network uses a ResNet101 residual network as a backbone, extracts multi-mode characteristics by a double-flow network structure, and generates a crowd probability density map;
in the multi-modal feature extraction process, carrying out inter-modal feature information interaction according to a channel dynamic interaction mechanism based on distribution uncertainty;
acquiring a probability density map of the double-flow characteristic extraction network output, and realizing the self-adaptive fusion of the multi-mode output result through a decision-level self-adaptive fusion module;
and after the fused multi-mode crowd probability density map is obtained, integrating the density map to obtain the crowd number.
In a second aspect, a multi-modal dynamic feature migration crowd counting device based on distribution uncertainty, the device comprising: a processor and a memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of the first aspects.
In a third aspect, a computer readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method steps of any of the first aspects.
The technical scheme provided by the invention has the beneficial effects that:
1. the invention provides a multi-mode feature fusion crowd counting network which can realize interactive migration among different mode features so as to improve the counting effect of each mode; the multi-modal crowd counting method avoids the disadvantage that only a single-modal image is used for counting under the condition of poor imaging, and the crowd counting performance is improved by better utilizing the multi-modal characteristics;
2. the invention designs a multi-mode interaction mechanism based on distribution uncertainty, utilizes the simple module design and the parameters of the network structure to give out a more robust and flexible interaction selection judgment basis at the channel level, thereby guiding interaction migration among different modes, and the mechanism can well promote the multi-mode interaction counting effect;
3. the invention provides a self-adaptive decision-level fusion strategy, which furthest synthesizes the effective results of two modes and is beneficial to obtaining more reliable results under unpredictable multi-mode imaging conditions; and feasibility experiments were verified on the dual-light data sets rgbrtcc and DroneRGBT.
Drawings
FIG. 1 is a schematic diagram of the overall network;
FIG. 2 is a flow chart of a multi-modal dynamic feature migration crowd counting method based on distribution uncertainty;
fig. 3 is a schematic structural diagram of a multi-modal dynamic feature migration crowd counting device based on distribution uncertainty.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
In order to solve the technical problems in the background art, the embodiment of the invention provides a multi-mode feature fusion crowd counting network, which can realize interactive migration among different mode features, thereby improving the counting effect of each mode and the synthesis. The network comprises: the multi-modal crowd counting network avoids the disadvantage that only a single-modal image is used for counting under the condition of poor imaging, and the crowd counting performance is improved by better utilizing the multi-modal characteristics.
Example 1
The embodiment of the invention provides a multi-mode dynamic characteristic migration crowd counting method based on distribution uncertainty, which comprises the following steps:
101: the multi-mode dynamic characteristic migration counting network uses a ResNet101 residual network as a backbone, extracts multi-mode characteristics by a double-flow network structure and respectively generates a crowd probability density map;
specifically, the structure of the double-flow characteristic extraction network is that a hole convolution layer is added on a ResNet101 basic network architecture to serve as a back-end network, and a density map is output through a final regression layer; and the same pre-training model (ResNet 101) is used to initialize the dual-flow counting network.
102: in the multi-modal feature extraction process, carrying out inter-modal feature information interaction according to a channel dynamic interaction mechanism based on distribution uncertainty;
after each Layer (unit) in the ResNet101 network, the output features complete one-time modal interaction through the channel dynamic migration module. In the channel dynamic migration module, firstly, selecting an object to be interacted according to the distribution difference between corresponding channels among modes through an interaction object selecting submodule; and then judging the direction of further carrying out inter-mode interaction according to the distribution difference between the channels to be interacted in the modes and the overall distribution by an interaction direction judging submodule, and cooperatively improving the single branch characteristic extraction effect by multiple modes.
103: acquiring a probability density map of the double-flow characteristic extraction network output, and realizing the self-adaptive fusion of the multi-mode output result through a decision-level self-adaptive fusion module;
wherein, the probability density map output by the double-flow network is input into the gating network to respectively generate pixel-level weights omega corresponding to each mode i Weighting the multi-modal density map by using the weight to obtain a fusion output result, namely a final crowd probability density map; weight omega i And the characteristics of the input gating network change, so that the adaptive fusion of the multi-mode characteristics is realized.
104: after the fused multi-mode crowd probability density map is obtained, integrating the density map to obtain the crowd number; furthermore, during the training process, the predicted probability density map will calculate MSE loss with the true probability density map for updating the model parameters.
In summary, according to the multi-mode dynamic feature migration crowd counting method formed by the feature extraction module, the channel dynamic migration module and the decision-stage adaptive fusion module, interactive migration can be realized among different mode features, so that each mode and comprehensive counting effect are improved, the disadvantage that only a single mode image is used for counting under poor imaging conditions is avoided, and crowd counting performance is improved by utilizing the multi-mode features better.
Example 2
The scheme in example 1 is further described below in conjunction with specific examples and calculation formulas, and is described in detail below:
1. data preparation
The embodiment of the invention verifies the effectiveness of the proposed method on RGBT-CC and Drone-RGBT data sets.
RGBT-CC is a large scale RGB-T demographic data set. The data contains various social scenes (markets, streets, train stations, etc.), together with 2030 pairs of artificially annotated RGB-T images. The dataset samples included bright and dark scenes, a total of 138,389 pedestrian markers, averaging 68 people per image. Training was performed using 1030 pairs, 200 pairs were verified, 800 pairs were tested. The data uses a geometrically adaptive gaussian kernel to generate a data set group-trunk.
The Drone-RGBT data set is a multi-modal data set collected by the unmanned aerial vehicle, and is the first unmanned aerial vehicle crowd count data set combining RGB and T data. It contains 3600 image pairs taken at different locations with an average population of 48.8 per image. The data set has more diversity in terms of lighting, viewing angle, and background, as compared to other population count data sets. In addition, data enhancement operations are performed on the Drone-RGBT data set prior to use, and the accuracy of the annotations is improved by manually supplementing the annotations. Training was performed using 2285 and testing was performed using 1395. The dataset uses a geometrically adaptive gaussian kernel to generate a dataset group-trunk.
2. Multi-mode dynamic characteristic migration crowd counting network structure
The multi-mode dynamic feature migration crowd counting network in the embodiment of the invention, as shown in figure 1, comprises three modules of a feature extraction network with two branches, a channel dynamic migration module and a decision-level self-adaptive fusion module.
In the embodiment of the invention, the counting network uses a common convolutional neural network ResNet101 as a front-end network architecture, a hole convolutional layer is added on the basis of the ResNet101 as a rear-end network, a density map with the input size of 1/8 is directly output through a final regression layer, and the output results of two branches are weighted and fused to obtain a final density map.
In order to better utilize information of RGB and T modes, in the embodiment of the invention, a channel dynamic migration module is added after Conv2_3, conv3_4, conv4_23 and Conv5_3 in a dual-branch trunk counting network, and in the channel dynamic migration module, an object to be interacted is selected by an interaction object selection submodule according to the distribution difference between corresponding channels among modes; and then judging the direction of further carrying out inter-mode interaction according to the distribution difference between the candidate interaction channels and the overall distribution in the modes by an interaction direction judging submodule. And after the output of the cavity convolution layer, the output results of the double branches are subjected to weighted fusion by using a weighted exchange module. For example, in the embodiment of the invention, a gating network is used, two modal specific weights are respectively given according to corresponding input data, and the feature graphs output by the two counting branch regression layers are subjected to weighted fusion to obtain a final density graph.
The interactive object selecting sub-module is as follows:
after normalization processing is carried out on a BN layer in a network, the corresponding distribution of each channel is established by utilizing the learnable parameters gamma and beta in the channel dimension, namely, the BN layer is a basic object for calculating the distribution difference: the distance measurement method between the two distributions represented by the visible light and infrared mode corresponding channels comprises the following steps:
wherein ,P1 and P2 Respectively two modes are distributed correspondingly, pi (P 1 ,P 2 ) Is P 1 and P2 And selecting n groups with larger distribution variability (n is a super parameter) as objects to be subjected to modal interaction from a set of all possible joint distributions which are distributed and combined.
Wherein, the interaction direction judging submodule is:
after the interactive object selecting submodule selects the channel object to be exchanged, the submodule judges and selects the channel-level information interaction direction among the modes. In each mode, the PoE distribution of all channels is taken as the average distribution of the channels, wherein the PoE distribution is expressed as:
P avg =N(μ kk )
μ 0 、σ 0 respectively is marked withMean and variance of quasi-normal distribution.
The distance measurement method between the channel distribution to be interacted in each mode and the average distribution of the mode comprises the following steps:
wherein pi (P, P) avg ) Is P 1 and Pavg The set of all possible joint distributions that are combined together by the distributions, the two sets of distribution distances are compared, the modality that differs significantly from the average distribution will be the recipient during the interaction, and the corresponding channel of the other modality will be the donor.
The decision-level self-adaptive fusion module is as follows:
inputting the multi-modal feature extraction results into a gating network respectively, and endowing specific weights of two modalities:
wherein , and />And respectively extracting output results of a back-end network in the network for the double-flow characteristics.
And carrying out weighted fusion on the feature graphs output by the two counting branch regression layers:
F=ω RGB F RGBT F T
wherein ,FRGB and FT And respectively outputting characteristic graphs of the two counting branch regression layers.
And carrying out pixel-level weighting on the multi-modal feature extraction result, and adaptively fusing different modal information to obtain a final density map.
3. Evaluation index and protocol
In order to calculate the density per pixel for each location in the image, a density map estimation task is performed while preserving spatial information of the population distribution. The present method uses Mean Absolute Error (MAE) and mean squared error (RMSE) to evaluate performance, namely:
wherein N is the number of images, C i For the estimated count of the test image,actual counts for corresponding image annotations.
Grade l GMAE was calculated as:
wherein , and />Is the estimated count of the jth region of the ith image and the actual count of the corresponding image annotation.
Wherein GMAE was used to evaluate the performance of different regions. GMAE (0) is equivalent to MAE. Specifically, a given image is divided into 4 l Non-overlapping regions, and separately measuring the count error for each region.
4. Details of use of the model
1. Data expansion: because of limited computational resources, a strategy of randomly flipping and cropping the training image is adopted to increase the diversity of the training data, and for images greater than 680×640, the size of the image is first adjusted to be smaller than 680×640. And improves the accuracy of the annotation by manually supplementing the annotation.
2. Model optimization:
the method sets the batch size N to 4 in training, then uses Adam optimization algorithm to run 10 in the first 10 cycles -6 Training the network at a learning rate of 10 for 20 cycles -5 Is a learning rate training network.
The method adopts a geometric self-adaptive Gaussian kernel method to generate a true value density map.
3. Crowd count:
the density map of the two-branch fusion network is obtained through the crowd counting network in the method, the density map represents the probability of the occurrence of people in the image, the density map not only contains the spatial distribution characteristic information, but also contains the people number characteristic information, and the people number corresponding to the image or the image module is obtained through accumulating the integral of the pixel density values in the crowd density map. Network model training uses mean square error to measure the difference between the estimated density map and the true density map.
The embodiment of the invention has the following three key creation points:
1. a multi-mode feature fusion crowd counting network is provided
The technical effects are as follows: the network can realize interactive migration among different mode characteristics, so that the counting effect of each mode is improved. The multi-modal feature fusion crowd counting network avoids the disadvantage that only a single-modal image is used for counting under the condition of poor imaging, and the crowd counting performance is improved by better utilizing the multi-modal features.
2. A multi-modal interaction mechanism based on distribution uncertainty is provided
The technical effects are as follows: the mechanism gives out a more robust and flexible interaction selection judgment basis on a channel level by utilizing the simple module design and the parameters of the network structure, so as to guide interaction migration among different modes, and the mechanism can well promote the multi-mode interaction counting effect.
3. An adaptive decision-level fusion strategy is provided
The technical effects are as follows: the effective results of the two modes are combined to the maximum extent, and more reliable results can be obtained under unpredictable multi-mode imaging conditions. Correlation experiments were performed on the dual light data sets rgbrcc and DroneRGBT.
In summary, the embodiment of the invention provides a multi-mode dynamic feature migration crowd counting method based on distribution uncertainty, and designs a multi-mode interaction mechanism based on distribution uncertainty, so that dynamic interaction sharing in channel dimension among different modes is realized in a feature extraction process, and the feature extraction effect of each mode is improved through bidirectional information migration; a decision-level adaptive fusion strategy is designed so that the training model overcomes the influence of unpredictable imaging conditions to generate a more reliable density map.
Example 3
The method provided by the embodiment of the invention is compared with a plurality of crowd counting methods on the RGBT-CC data set and the Drone-RGBT data set. For the single-mode crowd counting methods MCNN, SANet, CSRNet, BL, SASNet and MAN, the result of early fusion of RGB and T is used as network input; comparison was made with the multimodal methods csrnet+iadm and bl+iadm.
The results of the experiments on the RGBT-CC dataset are shown in Table 1, and all of the experimental indicators GAME (0), GAME (1), GAME (2), GAME (3) and RMSE were 13.12, 17.55, 22.11, 27.15 and 22.34, respectively, which are superior to the other comparative methods. Compared with a single-mode crowd counting method, the method in the embodiment of the invention obtains better evaluation results. Compared with a multi-mode crowd counting method, the method has the advantages that the improvement on an evaluation index GMAE is 4.82 and the improvement on an evaluation index RMSE is 8.57 and the improvement on an evaluation index RMSE is 5.84. Therefore, the method in the embodiment of the invention can be proved to dynamically combine the advantages of the multi-mode data and better utilize the complementary characteristics of the visible light image and the thermal infrared image.
The results of the experiments on the Drone-RGBT dataset are shown in Table 2, with all of the experimental indicators GAME (0), GAME (1), GAME (2), GAME (3), and RMSE being 9.14, 10.49, 12.92, and 14.69, respectively, superior to the other comparative methods. Compared with a single-mode crowd counting method, the method in the embodiment of the invention obtains better evaluation results. Compared with a multi-mode crowd counting method, the method has the advantages that the improvement on an evaluation index GMAE is 1.65 and the improvement on an evaluation index RMSE is 2.68 and the improvement on an evaluation index RMSE is 1.29. Therefore, it is demonstrated that the method in the embodiment of the present invention can be applied to RGB-T data sets under various scenes and views, thereby further verifying the effectiveness of the method.
The experimental results of the examples of the present invention are shown in table 3. The results demonstrate the test performance of three variants of the method on RGBT-CC datasets, namely DFTNet (w/o all), DFTNet (w/o fus) and DFTNet (w/o tran), the DFTNet (w/o fus) representing networks without decision-level adaptive fusion, with both sub-networks being the same weight and late fusion. DFTNet (w/o tran) represents a network without channel live migration modules. DFTNet (w/o all) represents a network with neither channel live migration modules nor decision-level adaptive fusion modules. All variants were trained on the training set and tested on the test set. In different experiments, the training steps and other parameters were identical, while the evaluation was identical. As shown in Table 3, the method achieves better results than other variants, and meanwhile, the channel dynamic interaction module and the decision-level self-adaptive fusion module are verified to be capable of remarkably improving the performance of the density map estimation task.
TABLE 1
TABLE 2
Method GMAE(0) GMAE(1) GMAE(2) GMAE(3) MSE
MCNN 15.56 16.68 18.33 19.91 22.49
SANet 16.42 17.52 19.62 21.81 22.57
CSRNet 11.68 13.73 16.39 18.83 16.69
CSRNet+IADM 10.79 12.9 15.33 17.19 17.24
BL 11.14 13.74 17.57 22.54 16.56
BL+IADM 10.47 12.25 14.77 18.57 16.44
SASNet 12.86 14.12 17.07 18.89 18.96
MAN 11.35 13.56 16.21 17.98 17.33
DFTNet 9.14 10.49 12.92 14.69 14.56
TABLE 3 Table 3
Example 4
A multi-modal dynamic feature migration crowd counting device based on distribution uncertainty, see fig. 3, the device comprising: a processor and a memory, the processor and the memory having program instructions stored therein, the processor invoking the program instructions stored in the memory to cause the apparatus to perform the following method steps in embodiment 1:
the multi-mode dynamic characteristic migration counting network uses a ResNet101 residual network as a backbone, extracts multi-mode characteristics by a double-flow network structure, and generates a crowd probability density map;
in the multi-modal feature extraction process, carrying out inter-modal feature information interaction according to a channel dynamic interaction mechanism based on distribution uncertainty;
acquiring a probability density map of the double-flow characteristic extraction network output, and realizing the self-adaptive fusion of the multi-mode output result through a decision-level self-adaptive fusion module;
and after the fused multi-mode crowd probability density map is obtained, integrating the density map to obtain the crowd number.
Wherein, this still includes: in the training process, the predicted probability density map and the true probability density map calculate MSE loss for updating the model parameters.
Further, the multi-mode dynamic feature migration crowd counting network comprises a feature extraction network with two branches, a channel dynamic migration module and a decision-level self-adaptive fusion module.
The channel dynamic migration module is as follows:
selecting an object to be interacted according to the distribution difference between corresponding channels among the modes through an interaction object selecting submodule; and judging the interaction direction between the modes according to the distribution difference between the candidate interaction channels and the overall distribution in the modes by the interaction direction judging submodule.
Further, the interaction object selection sub-module is as follows:
wherein ,P1 and P2 Respectively two modes are distributed correspondingly, pi (P 1 ,P 2 ) N groups with large distribution variation are selected as objects to be subjected to modal interaction.
Further, the interaction direction judging submodule is as follows:
wherein P (P, P avg ) Is P 1 and Pavg A collection of all possible joint distributions that are combined together.
The decision-level self-adaptive fusion module is as follows:
inputting the multi-modal feature extraction results into a gating network respectively, and endowing specific weights of two modalities:
wherein , and />Respectively extracting output results of a back-end network in the network for the double-flow characteristics;
and carrying out weighted fusion on the feature graphs output by the two counting branch regression layers:
F=ω RGB F RGBT F T
wherein ,FRGB and FT And respectively outputting characteristic graphs of the two counting branch regression layers.
It should be noted that, the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein in detail.
The execution main bodies of the processor 1 and the memory 2 may be devices with computing functions, such as a computer, a singlechip, a microcontroller, etc., and in particular implementation, the execution main bodies are not limited, and are selected according to the needs in practical application.
Data signals are transmitted between the memory 2 and the processor 1 via the bus 3, which is not described in detail in the embodiment of the present invention.
Based on the same inventive concept, the embodiment of the present invention also provides a computer readable storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute the method steps in the above embodiment.
The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.
It should be noted that the readable storage medium descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the invention, in whole or in part.
The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium or a semiconductor medium, or the like.
The embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.
Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Reference is made to:
[1]Huang X,Mallya A,Wang T C,et al.Multimodal Conditional Image Synthesis with Product-of-Experts GANs[J].2021.
the embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.
Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (9)

1. A multi-modal dynamic feature migration crowd counting method based on distribution uncertainty, the method comprising:
the multi-mode dynamic characteristic migration counting network uses a ResNet101 residual network as a backbone, extracts multi-mode characteristics by a double-flow network structure, and generates a crowd probability density map;
in the multi-modal feature extraction process, carrying out inter-modal feature information interaction according to a channel dynamic interaction mechanism based on distribution uncertainty;
acquiring a probability density map of the double-flow characteristic extraction network output, and realizing the self-adaptive fusion of the multi-mode output result through a decision-level self-adaptive fusion module;
and after the fused multi-mode crowd probability density map is obtained, integrating the density map to obtain the crowd number.
2. The distribution uncertainty-based multi-modal dynamic feature migration crowd counting method of claim 1, further comprising: in the training process, the predicted probability density map and the true probability density map calculate MSE loss for updating the model parameters.
3. The method for counting multi-modal dynamic feature migration crowd based on distribution uncertainty as claimed in claim 1, wherein the multi-modal dynamic feature migration crowd counting network comprises two branched feature extraction networks, a channel dynamic migration module and a decision-level adaptive fusion module.
4. The method for counting multi-modal dynamic feature migration crowd based on distribution uncertainty as recited in claim 3, wherein the channel dynamic migration module is:
selecting an object to be interacted according to the distribution difference between corresponding channels among the modes through an interaction object selecting submodule; and judging the interaction direction between the modes according to the distribution difference between the candidate interaction channels and the overall distribution in the modes by the interaction direction judging submodule.
5. The method for counting multi-modal dynamic feature migration crowd based on distribution uncertainty as recited in claim 4, wherein the interactive object selection sub-module is:
wherein ,P1 and P2 Respectively two modes are distributed correspondingly, pi (P 1 ,P 2 ) N groups with large distribution variation are selected as objects to be subjected to modal interaction.
6. The method for counting multi-modal dynamic feature migration crowd based on distribution uncertainty as recited in claim 4, wherein the interaction direction judging sub-module is:
wherein pi (P, P) avg ) Is P 1 and Pavg A collection of all possible joint distributions that are combined together.
7. The method for counting multi-modal dynamic feature migration crowd based on distribution uncertainty as recited in claim 3, wherein the decision-level adaptive fusion module is:
inputting the multi-modal feature extraction results into a gating network respectively, and endowing specific weights of two modalities:
wherein , and />Respectively extracting output results of a back-end network in the network for the double-flow characteristics;
and carrying out weighted fusion on the feature graphs output by the two counting branch regression layers:
F=ω RGB F RGBT F T
wherein ,FRGB and FT And respectively outputting characteristic graphs of the two counting branch regression layers.
8. A multi-modal dynamic feature migration crowd counting device based on distribution uncertainty, the device comprising: a processor and a memory, the memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of claims 1-7.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps of any of claims 1-7.
CN202310594811.8A 2023-05-25 2023-05-25 Multi-mode feature migration crowd counting method and device based on distribution uncertainty Pending CN116704432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310594811.8A CN116704432A (en) 2023-05-25 2023-05-25 Multi-mode feature migration crowd counting method and device based on distribution uncertainty

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310594811.8A CN116704432A (en) 2023-05-25 2023-05-25 Multi-mode feature migration crowd counting method and device based on distribution uncertainty

Publications (1)

Publication Number Publication Date
CN116704432A true CN116704432A (en) 2023-09-05

Family

ID=87844351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310594811.8A Pending CN116704432A (en) 2023-05-25 2023-05-25 Multi-mode feature migration crowd counting method and device based on distribution uncertainty

Country Status (1)

Country Link
CN (1) CN116704432A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118135495A (en) * 2024-05-06 2024-06-04 临沂大学 Method, apparatus, electronic device, storage medium and computer program for crowd counting

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118135495A (en) * 2024-05-06 2024-06-04 临沂大学 Method, apparatus, electronic device, storage medium and computer program for crowd counting

Similar Documents

Publication Publication Date Title
WO2020253416A1 (en) Object detection method and device, and computer storage medium
CN111402130B (en) Data processing method and data processing device
Pang et al. Visual haze removal by a unified generative adversarial network
CN104424634B (en) Object tracking method and device
Zhao et al. Dd-cyclegan: Unpaired image dehazing via double-discriminator cycle-consistent generative adversarial network
CN112949508B (en) Model training method, pedestrian detection method, electronic device, and readable storage medium
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
WO2022041830A1 (en) Pedestrian re-identification method and device
CN111414953B (en) Point cloud classification method and device
Li et al. Robust deep neural networks for road extraction from remote sensing images
CN116704432A (en) Multi-mode feature migration crowd counting method and device based on distribution uncertainty
Ren et al. Infrared small target detection via region super resolution generative adversarial network
CN110135428B (en) Image segmentation processing method and device
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
CN111914938A (en) Image attribute classification and identification method based on full convolution two-branch network
CN115482523A (en) Small object target detection method and system of lightweight multi-scale attention mechanism
CN117671597B (en) Method for constructing mouse detection model and mouse detection method and device
Wang et al. SLMS-SSD: Improving the balance of semantic and spatial information in object detection
CN113610016A (en) Training method, system, equipment and storage medium of video frame feature extraction model
Duan [Retracted] Deep Learning‐Based Multitarget Motion Shadow Rejection and Accurate Tracking for Sports Video
Zhou et al. Ship target detection in optical remote sensing images based on multiscale feature enhancement
Huang Object extraction of tennis video based on deep learning
CN114022516A (en) Bimodal visual tracking method based on high rank characteristics and position attention
CN114693986A (en) Training method of active learning model, image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination