CN115331171A - Crowd counting method and system based on depth information and significance information - Google Patents
Crowd counting method and system based on depth information and significance information Download PDFInfo
- Publication number
- CN115331171A CN115331171A CN202210992920.0A CN202210992920A CN115331171A CN 115331171 A CN115331171 A CN 115331171A CN 202210992920 A CN202210992920 A CN 202210992920A CN 115331171 A CN115331171 A CN 115331171A
- Authority
- CN
- China
- Prior art keywords
- information
- crowd
- significance
- prediction
- depth information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30242—Counting objects in image
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a crowd counting method and system based on depth information and significance information, which comprises the following steps: collecting a crowd sample image of a designated area; inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information; and outputting the total number of people in the crowd sample image. The method comprises the steps of introducing crowd significance information into the crowd counting field, taking a head marking point as a human eye attention point, generating a visual significance label of the crowd counting by utilizing Gaussian blur, performing training test by utilizing a deep learning network, obtaining the visual significance information of the crowd counting, and assisting in training of the crowd counting; the mode of combining the visual saliency information and the depth information is utilized to assist people counting, the depth information can be corrected by utilizing the saliency information, the interference caused by the area without the crowd information is reduced, and the counting effect is improved.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a crowd counting method based on depth information and significance information.
Background
The task of dense crowd counting is to estimate the number of people contained in an image or video. With the increase of the number of global population and the increase of human social activities, a large amount of crowds are often gathered in public places of various regions, such as transportation hubs, entertainment places and the like, and great hidden dangers are brought to public safety. Intensive population counting tasks are widely applied to video surveillance, traffic control and metropolis safety, and researchers in various countries carry out a great deal of research. The population counting method can also be popularized to similar tasks in other fields, such as cell number estimation of microscopic images in medicine, vehicle estimation in traffic congestion situations, extensive biological sample investigation and the like.
The traditional crowd counting method can be mainly divided into methods based on detection and regression, and the two methods are difficult to solve the problem of serious occlusion among crowds along with the increase of crowd density. Because the deep learning model has strong feature extraction capability, the population counting method research based on deep learning has achieved a lot of excellent achievements. The mainstream method at present is to predict the density map of the original image by using a convolutional neural network, and calculate the number of people by using the density map.
Wang et al first introduced a Convolutional Neural Network (CNN) into the population counting field, and proposed an end-to-end CNN regression model suitable for dense population scenarios. The AlexNet network is improved by the model, and the number of the crowd is directly predicted by replacing the final full-connection layer with a single neuron layer. The method has the disadvantages that the personnel distribution condition in the scene can not be counted, and the effect is not good under the condition of dense crowd or complex scene. Zhang et al proposed a multi-column convolutional neural network MCNN for population counting by a multi-branch deep convolutional neural network, and each branch network adopts convolutional kernels with different sizes to extract characteristic information of targets with different scales, so that counting errors caused by different sizes of targets formed by view angle changes are reduced. Although the multi-branch counting network achieves a better counting effect, the complexity of the multi-branch counting network model is higher, and new problems are brought about. For example, the network model has many parameters, is difficult to train, and has redundant structure. To this end, li et al propose an expanded convolutional neural network model CSRNet suitable for dense population counting. CSRNet does not adopt a multi-branch network structure which is widely used in the past, but a VGG16 network with a discarded full connection layer is used as the front end part of the network, and a 6-layer expanded convolutional neural network is adopted as the rear end to form a single-channel counting network, so that the parameter quantity is greatly reduced, and the training difficulty is reduced. Meanwhile, the advantage of the field of view can be expanded while the resolution of the input image is kept by means of the hole convolution, more image detail information is reserved, and the quality of the generated crowd density image is higher.
In order to solve the problem that the size of a target is greatly changed due to different distances between a camera and people, attention is paid to the introduction of auxiliary information to assist people counting. Shi et al combine perspective information with crowd counting to improve counting accuracy. The perspective information shows the depth difference of the whole image, and has certain similarity with the depth image. Xu et al uses the depth information of the image to segment the scene into a distant view region and a near view region, and then applies different mechanisms (based on density maps and based on detection) to estimate the count results of these two regions to count out the total population. Yang et al use the depth branch of training in advance to provide depth information for crowd's count, and depth information has reflected crowd's density to a certain extent to implied scale change information, ignored the depth information outside the crowd's region nevertheless can cause this problem of influence to the counting result.
Disclosure of Invention
The invention aims to provide a crowd counting method and system based on depth information and significance information, which assist crowd counting by combining visual significance information and depth information, correct depth information by utilizing significance information, reduce interference caused by areas without crowd information and improve counting effect.
In order to realize the purpose, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a people counting method based on depth information and saliency information, comprising:
collecting a crowd sample image of a designated area;
inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information;
and outputting the total number of people in the crowd sample image.
Further, the density map prediction model is constructed by the following method:
carrying out depth prediction on an input crowd sample image by using an image depth information prediction network to obtain image depth information;
the input crowd sample image, the corresponding prediction significance information and the depth information are input into a crowd density map prediction network together, the depth information is corrected by using the significance information, and the corrected depth information is used for guiding the density map prediction network to train so as to generate a density map prediction model.
Further, the significance information is generated by predicting a significance map of the input human population sample image by using a significance prediction model, and the significance prediction model is constructed by the following method:
carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth value saliency map;
predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a predicted significance map;
and calculating a loss function according to the prediction significance map and the truth value significance map, adjusting network parameters through gradient back propagation, and generating a significance prediction model through iteration.
Further, the gaussian blurring of the head labeling data corresponding to the input crowd sample image is performed by using a gaussian kernel function with a standard deviation of 19.
Further, the density map prediction network training comprises:
for the input crowd sample image R, the depth map D and the saliency map S corresponding to the crowd sample image R, at the l-th layer of the encoder, let R l 、D l And S l Respectively, the output feature maps of the previous convolution layers of the encoder, and the depth features are corrected by the significance features of the corresponding layers, wherein the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi s Represents a 1X 1 convolutional layer, V l Weights for encoder layer l calculated by sigmoid function, the-l indicates element level multiplication, using weight V l The Dl is acted to highlight the crowd area and reduce the influence of depth information of the non-crowd area;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data into a subsequent network.
Further, the density prediction network includes: the device comprises a coding module, a depth correction and embedding module, an enhanced multi-scale module and a decoding module;
the encoding module is used for extracting multi-level features of the input image;
the depth correction and embedding module is used for correcting and fusing depth information;
the enhanced multi-scale module is used for extracting and fusing multi-scale comprehensive features;
the decoding module is used for outputting a prediction density map with the same size as the input image.
Further, the encoder module is a pre-trained VGG16 network; the enhanced multi-scale module comprises multi-branch 3 x 3 convolutions with different expansion rates, and the expansion convolutions provide a larger receptive field than ordinary convolution operations; the decoding module is a 7-layer expansion convolution network and is used for outputting a prediction density map with the same size as the input image.
Further, the total number of people in the output crowd sample image comprises
And generating a predicted density map of the crowd sample image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
In a second aspect, the present invention also provides a people counting system based on depth information and saliency information, comprising a processor and a storage medium;
the storage medium is to store instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
(1) The method introduces the crowd significance information into the crowd counting field, takes the head marking point as the eye attention point, generates a visual significance label of the crowd counting by Gaussian blur, performs training test by using a deep learning network, obtains the visual significance information of the crowd counting, and assists in the training of the crowd counting;
(2) According to the method, the mode of combining the visual saliency information and the depth information is utilized to assist people counting, the depth information can be corrected by utilizing the saliency information, interference caused by a region without the crowd information is reduced, and the counting effect is improved.
Drawings
Fig. 1 is a flowchart of a crowd counting method based on depth information and saliency information according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an overall network architecture of population counting according to an embodiment;
fig. 3 is a schematic network structure diagram of an enhanced multi-scale module according to an embodiment.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are only used for more clearly illustrating the technical solutions of the present invention, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 1 to 3, a method for counting people based on depth information and saliency information includes: collecting a crowd sample image of a designated area; inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information; and outputting the total number of people in the crowd sample image.
In this embodiment, for the acquired crowd sample image, a crowd counting method based on depth information and saliency information is adopted, and an application process thereof is shown in fig. 1, and specifically involves the following steps:
step 1) carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth significance map.
In the present embodiment, reference is made to a human eye attention prediction data set SALICON data set, which includes 20000 images selected from Microsoft COCO data sets and is the largest size data set of the image human eye attention detection field so far. However, the data set does not use an eye tracker to record eye movement data, but uses an amazon crowd-funding marking platform to enable a marker to click on a position concerned by the marker with a mouse. For the population count data set, this is similar to the process of head labeling. And then carrying out Gaussian blur on all preprocessed mouse click samples of the same image to generate a true value saliency map. And performing Gaussian blur on the head annotation data corresponding to the input sample image by using a Gaussian kernel function with the standard deviation of 19 to generate a true value significance map.
And 2) predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a prediction significance map, calculating a loss function according to the prediction significance map and a truth value significance map, adjusting network parameters through gradient back propagation, generating a significance prediction model through iteration, and predicting the significance map of the input sample image by using the model to generate prediction significance information.
And 3) carrying out depth prediction on the input sample image by using an image depth information prediction network to obtain image depth information.
In the implementation of the invention, the pre-trained depth information prediction network model is used for carrying out depth prediction on the input sample image, and the predicted image can be well adapted to various scene layouts and shows the distance change from different positions to the camera.
And 4) inputting the input sample image, the corresponding prediction significance information and the depth information into the crowd density map prediction network, correcting the depth information by using the significance information, and guiding the density map prediction network to train by using the corrected depth information to generate a density map prediction model.
In this embodiment, the overall network structure of the crowd density map prediction network is shown in fig. 2. And inputting the input sample image and the corresponding prediction significance information thereof into the crowd density map prediction network together with the depth information. For the input sample image R, the depth map D and the saliency map S corresponding to the input sample image R are led to R at the l-th layer of the encoder l 、D l And S l Respectively, the output characteristic diagram of the previous convolution layer of the encoder. And correcting the depth features by using the significance features of the corresponding layers, wherein the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi is s Represents a 1X 1 convolutional layer, V l Weights for encoder layer l calculated by sigmoid function, the-l indicates element level multiplication, using weight V l Action D l So that the crowd area is highlighted, and the influence of depth information of the non-crowd area is reduced;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data to a subsequent network.
In order to deal with the problem of scale variation of the crowd, most of the previous works adopt a multi-column network architecture, for example, MCNN uses three columns of subnetworks to extract features of different scales, but the scale diversity of the features is limited by the number of columns of the network. In order to solve the problem, an enhancement multi-scale module is provided by using the architectural idea of inclusion and further utilizing the expansion convolution, so as to perform scale enhancement on the input feature map, as shown in fig. 3.
The dilated convolution provides a larger field of experience than the normal convolution operation, and can capture the area around the boundary and richer context information than the normal convolution. The head area of a person in an image in a crowd scene always varies greatly. The single receptive field can not adapt to the variation of the scale of the human head, and 3 multiplied by 3 convolutions with expansion rates d of 1, 2 and 4 are respectively used for capturing the characteristics, so that the method can better adapt to diversified population distribution in the population scene.
And finally, inputting the feature map into a decoder module, wherein the decoder module is a 7-layer expansion convolutional network, extracting deeper important information by utilizing a larger receptive field, outputting a predicted density map with the same size as the input image, and measuring the difference between the predicted result density map and the label by utilizing Euclidean distance, as shown in formula (1)
In the formula (1), X i For the input image, F (X) i ) To estimate the density map, D (X) i ) Is a real density map, and N is the number of training samples. And adjusting network parameters through gradient back propagation, and iteratively training a density prediction network.
And 5) when the crowd of a single image is counted, generating a predicted density map of the image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
After a stable density prediction model is trained, a predicted density map of the image is generated for an input image by using the model, and after the density map is obtained, the total number of people in the map is obtained through summation of pixel points.
Example 2
Based on the crowd counting method based on the depth information and the significance information described in embodiment 1, the embodiment provides a crowd counting system based on the depth information and the significance information, which comprises a processor and a storage medium; the storage medium is used for storing instructions; the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment 1.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and technical principles of the described embodiments, and such modifications and variations should also be considered as within the scope of the present invention.
Claims (9)
1. A crowd counting method based on depth information and significance information is characterized by comprising the following steps:
collecting a crowd sample image of a designated area;
inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information;
and outputting the total number of people in the crowd sample image.
2. The population counting method based on depth information and significance information according to claim 1, wherein the density map prediction model is constructed by the following method:
carrying out depth prediction on an input crowd sample image by using an image depth information prediction network to obtain image depth information;
the input crowd sample image, the corresponding prediction significance information and the depth information are input into a crowd density map prediction network together, the depth information is corrected by using the significance information, and the corrected depth information is used for guiding the density map prediction network to train so as to generate a density map prediction model.
3. The method of claim 2, wherein the saliency information is generated by prediction of a saliency map of an input crowd sample image using a saliency prediction model, the saliency prediction model being constructed by:
carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth value significance map;
predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a prediction significance map;
and calculating a loss function according to the prediction significance map and the truth value significance map, adjusting network parameters through gradient back propagation, and generating a significance prediction model through iteration.
4. The method according to claim 3, wherein the Gaussian blur of the head labeling data corresponding to the input human sample image is performed by using a Gaussian kernel function with a standard deviation of 19.
5. The population counting method based on depth information and significance information according to claim 2, wherein the density map prediction network training comprises:
for the input crowd sample image R, the depth map D and the saliency map S corresponding to the crowd sample image R, at the l-th layer of the encoder, let R l 、D l And S l Respectively, the output characteristic maps of the previous convolution layers of the encoder, and the significance characteristics of the corresponding layers are used for correcting the depth characteristics, and the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi s Represents a 1X 1 convolutional layer, V l A weight for encoder layer l, <' > indicates an element level multiplication;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data to a subsequent network.
6. The crowd counting method based on depth information and significance information according to claim 5, wherein the density map prediction network comprises: the device comprises a coding module, a depth correction and embedding module, an enhanced multi-scale module and a decoding module;
the encoding module is used for extracting multi-level features of an input image;
the depth correction and embedding module is used for correcting and fusing depth information;
the enhanced multi-scale module is used for extracting and fusing multi-scale comprehensive features;
the decoding module is used for outputting a prediction density map with the same size as the input image.
7. The people counting method based on depth information and saliency information of claim 6, characterized in that said encoder module is a pre-trained VGG16 network; the enhanced multi-scale module comprises a multi-branch 3 × 3 convolution with different expansion rates; the decoding module is a 7-layer expansion convolution network and is used for outputting a prediction density map with the same size as the input image.
8. The method of claim 1, wherein the outputting the population in the population sample image comprises outputting the population
And generating a predicted density map of the crowd sample image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
9. A crowd counting system based on depth information and saliency information, characterized by: comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210992920.0A CN115331171A (en) | 2022-08-18 | 2022-08-18 | Crowd counting method and system based on depth information and significance information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210992920.0A CN115331171A (en) | 2022-08-18 | 2022-08-18 | Crowd counting method and system based on depth information and significance information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115331171A true CN115331171A (en) | 2022-11-11 |
Family
ID=83925597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210992920.0A Pending CN115331171A (en) | 2022-08-18 | 2022-08-18 | Crowd counting method and system based on depth information and significance information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115331171A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456449A (en) * | 2023-10-13 | 2024-01-26 | 南通大学 | Efficient cross-modal crowd counting method based on specific information |
-
2022
- 2022-08-18 CN CN202210992920.0A patent/CN115331171A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117456449A (en) * | 2023-10-13 | 2024-01-26 | 南通大学 | Efficient cross-modal crowd counting method based on specific information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276316B (en) | Human body key point detection method based on deep learning | |
CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
Ying et al. | Multi-attention object detection model in remote sensing images based on multi-scale | |
CN112446342B (en) | Key frame recognition model training method, recognition method and device | |
WO2019136591A1 (en) | Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network | |
Wang et al. | FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection | |
CN113012172A (en) | AS-UNet-based medical image segmentation method and system | |
CN111027505B (en) | Hierarchical multi-target tracking method based on significance detection | |
CN112150493A (en) | Semantic guidance-based screen area detection method in natural scene | |
CN112633220B (en) | Human body posture estimation method based on bidirectional serialization modeling | |
CN108830170B (en) | End-to-end target tracking method based on layered feature representation | |
CN113192124B (en) | Image target positioning method based on twin network | |
CN110827320B (en) | Target tracking method and device based on time sequence prediction | |
CN114140469B (en) | Depth layered image semantic segmentation method based on multi-layer attention | |
CN111507184B (en) | Human body posture detection method based on parallel cavity convolution and body structure constraint | |
Zhu et al. | A multi-scale and multi-level feature aggregation network for crowd counting | |
CN116524062A (en) | Diffusion model-based 2D human body posture estimation method | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN115331171A (en) | Crowd counting method and system based on depth information and significance information | |
CN117351414A (en) | Crowd density estimation method based on deep neural network | |
CN114596515A (en) | Target object detection method and device, electronic equipment and storage medium | |
CN116778187A (en) | Salient target detection method based on light field refocusing data enhancement | |
CN112053386B (en) | Target tracking method based on depth convolution characteristic self-adaptive integration | |
CN115965905A (en) | Crowd counting method and system based on multi-scale fusion convolutional network | |
Yao et al. | MLP-based Efficient Convolutional Neural Network for Lane Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |