CN115331171A - Crowd counting method and system based on depth information and significance information - Google Patents

Crowd counting method and system based on depth information and significance information Download PDF

Info

Publication number
CN115331171A
CN115331171A CN202210992920.0A CN202210992920A CN115331171A CN 115331171 A CN115331171 A CN 115331171A CN 202210992920 A CN202210992920 A CN 202210992920A CN 115331171 A CN115331171 A CN 115331171A
Authority
CN
China
Prior art keywords
information
crowd
significance
prediction
depth information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210992920.0A
Other languages
Chinese (zh)
Inventor
崔子冠
苏航
唐贵进
干宗良
刘峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210992920.0A priority Critical patent/CN115331171A/en
Publication of CN115331171A publication Critical patent/CN115331171A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a crowd counting method and system based on depth information and significance information, which comprises the following steps: collecting a crowd sample image of a designated area; inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information; and outputting the total number of people in the crowd sample image. The method comprises the steps of introducing crowd significance information into the crowd counting field, taking a head marking point as a human eye attention point, generating a visual significance label of the crowd counting by utilizing Gaussian blur, performing training test by utilizing a deep learning network, obtaining the visual significance information of the crowd counting, and assisting in training of the crowd counting; the mode of combining the visual saliency information and the depth information is utilized to assist people counting, the depth information can be corrected by utilizing the saliency information, the interference caused by the area without the crowd information is reduced, and the counting effect is improved.

Description

Crowd counting method and system based on depth information and significance information
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a crowd counting method based on depth information and significance information.
Background
The task of dense crowd counting is to estimate the number of people contained in an image or video. With the increase of the number of global population and the increase of human social activities, a large amount of crowds are often gathered in public places of various regions, such as transportation hubs, entertainment places and the like, and great hidden dangers are brought to public safety. Intensive population counting tasks are widely applied to video surveillance, traffic control and metropolis safety, and researchers in various countries carry out a great deal of research. The population counting method can also be popularized to similar tasks in other fields, such as cell number estimation of microscopic images in medicine, vehicle estimation in traffic congestion situations, extensive biological sample investigation and the like.
The traditional crowd counting method can be mainly divided into methods based on detection and regression, and the two methods are difficult to solve the problem of serious occlusion among crowds along with the increase of crowd density. Because the deep learning model has strong feature extraction capability, the population counting method research based on deep learning has achieved a lot of excellent achievements. The mainstream method at present is to predict the density map of the original image by using a convolutional neural network, and calculate the number of people by using the density map.
Wang et al first introduced a Convolutional Neural Network (CNN) into the population counting field, and proposed an end-to-end CNN regression model suitable for dense population scenarios. The AlexNet network is improved by the model, and the number of the crowd is directly predicted by replacing the final full-connection layer with a single neuron layer. The method has the disadvantages that the personnel distribution condition in the scene can not be counted, and the effect is not good under the condition of dense crowd or complex scene. Zhang et al proposed a multi-column convolutional neural network MCNN for population counting by a multi-branch deep convolutional neural network, and each branch network adopts convolutional kernels with different sizes to extract characteristic information of targets with different scales, so that counting errors caused by different sizes of targets formed by view angle changes are reduced. Although the multi-branch counting network achieves a better counting effect, the complexity of the multi-branch counting network model is higher, and new problems are brought about. For example, the network model has many parameters, is difficult to train, and has redundant structure. To this end, li et al propose an expanded convolutional neural network model CSRNet suitable for dense population counting. CSRNet does not adopt a multi-branch network structure which is widely used in the past, but a VGG16 network with a discarded full connection layer is used as the front end part of the network, and a 6-layer expanded convolutional neural network is adopted as the rear end to form a single-channel counting network, so that the parameter quantity is greatly reduced, and the training difficulty is reduced. Meanwhile, the advantage of the field of view can be expanded while the resolution of the input image is kept by means of the hole convolution, more image detail information is reserved, and the quality of the generated crowd density image is higher.
In order to solve the problem that the size of a target is greatly changed due to different distances between a camera and people, attention is paid to the introduction of auxiliary information to assist people counting. Shi et al combine perspective information with crowd counting to improve counting accuracy. The perspective information shows the depth difference of the whole image, and has certain similarity with the depth image. Xu et al uses the depth information of the image to segment the scene into a distant view region and a near view region, and then applies different mechanisms (based on density maps and based on detection) to estimate the count results of these two regions to count out the total population. Yang et al use the depth branch of training in advance to provide depth information for crowd's count, and depth information has reflected crowd's density to a certain extent to implied scale change information, ignored the depth information outside the crowd's region nevertheless can cause this problem of influence to the counting result.
Disclosure of Invention
The invention aims to provide a crowd counting method and system based on depth information and significance information, which assist crowd counting by combining visual significance information and depth information, correct depth information by utilizing significance information, reduce interference caused by areas without crowd information and improve counting effect.
In order to realize the purpose, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a people counting method based on depth information and saliency information, comprising:
collecting a crowd sample image of a designated area;
inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information;
and outputting the total number of people in the crowd sample image.
Further, the density map prediction model is constructed by the following method:
carrying out depth prediction on an input crowd sample image by using an image depth information prediction network to obtain image depth information;
the input crowd sample image, the corresponding prediction significance information and the depth information are input into a crowd density map prediction network together, the depth information is corrected by using the significance information, and the corrected depth information is used for guiding the density map prediction network to train so as to generate a density map prediction model.
Further, the significance information is generated by predicting a significance map of the input human population sample image by using a significance prediction model, and the significance prediction model is constructed by the following method:
carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth value saliency map;
predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a predicted significance map;
and calculating a loss function according to the prediction significance map and the truth value significance map, adjusting network parameters through gradient back propagation, and generating a significance prediction model through iteration.
Further, the gaussian blurring of the head labeling data corresponding to the input crowd sample image is performed by using a gaussian kernel function with a standard deviation of 19.
Further, the density map prediction network training comprises:
for the input crowd sample image R, the depth map D and the saliency map S corresponding to the crowd sample image R, at the l-th layer of the encoder, let R l 、D l And S l Respectively, the output feature maps of the previous convolution layers of the encoder, and the depth features are corrected by the significance features of the corresponding layers, wherein the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi s Represents a 1X 1 convolutional layer, V l Weights for encoder layer l calculated by sigmoid function, the-l indicates element level multiplication, using weight V l The Dl is acted to highlight the crowd area and reduce the influence of depth information of the non-crowd area;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data into a subsequent network.
Further, the density prediction network includes: the device comprises a coding module, a depth correction and embedding module, an enhanced multi-scale module and a decoding module;
the encoding module is used for extracting multi-level features of the input image;
the depth correction and embedding module is used for correcting and fusing depth information;
the enhanced multi-scale module is used for extracting and fusing multi-scale comprehensive features;
the decoding module is used for outputting a prediction density map with the same size as the input image.
Further, the encoder module is a pre-trained VGG16 network; the enhanced multi-scale module comprises multi-branch 3 x 3 convolutions with different expansion rates, and the expansion convolutions provide a larger receptive field than ordinary convolution operations; the decoding module is a 7-layer expansion convolution network and is used for outputting a prediction density map with the same size as the input image.
Further, the total number of people in the output crowd sample image comprises
And generating a predicted density map of the crowd sample image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
In a second aspect, the present invention also provides a people counting system based on depth information and saliency information, comprising a processor and a storage medium;
the storage medium is to store instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
(1) The method introduces the crowd significance information into the crowd counting field, takes the head marking point as the eye attention point, generates a visual significance label of the crowd counting by Gaussian blur, performs training test by using a deep learning network, obtains the visual significance information of the crowd counting, and assists in the training of the crowd counting;
(2) According to the method, the mode of combining the visual saliency information and the depth information is utilized to assist people counting, the depth information can be corrected by utilizing the saliency information, interference caused by a region without the crowd information is reduced, and the counting effect is improved.
Drawings
Fig. 1 is a flowchart of a crowd counting method based on depth information and saliency information according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an overall network architecture of population counting according to an embodiment;
fig. 3 is a schematic network structure diagram of an enhanced multi-scale module according to an embodiment.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are only used for more clearly illustrating the technical solutions of the present invention, and the protection scope of the present invention is not limited thereby.
Example 1
As shown in fig. 1 to 3, a method for counting people based on depth information and saliency information includes: collecting a crowd sample image of a designated area; inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information; and outputting the total number of people in the crowd sample image.
In this embodiment, for the acquired crowd sample image, a crowd counting method based on depth information and saliency information is adopted, and an application process thereof is shown in fig. 1, and specifically involves the following steps:
step 1) carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth significance map.
In the present embodiment, reference is made to a human eye attention prediction data set SALICON data set, which includes 20000 images selected from Microsoft COCO data sets and is the largest size data set of the image human eye attention detection field so far. However, the data set does not use an eye tracker to record eye movement data, but uses an amazon crowd-funding marking platform to enable a marker to click on a position concerned by the marker with a mouse. For the population count data set, this is similar to the process of head labeling. And then carrying out Gaussian blur on all preprocessed mouse click samples of the same image to generate a true value saliency map. And performing Gaussian blur on the head annotation data corresponding to the input sample image by using a Gaussian kernel function with the standard deviation of 19 to generate a true value significance map.
And 2) predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a prediction significance map, calculating a loss function according to the prediction significance map and a truth value significance map, adjusting network parameters through gradient back propagation, generating a significance prediction model through iteration, and predicting the significance map of the input sample image by using the model to generate prediction significance information.
And 3) carrying out depth prediction on the input sample image by using an image depth information prediction network to obtain image depth information.
In the implementation of the invention, the pre-trained depth information prediction network model is used for carrying out depth prediction on the input sample image, and the predicted image can be well adapted to various scene layouts and shows the distance change from different positions to the camera.
And 4) inputting the input sample image, the corresponding prediction significance information and the depth information into the crowd density map prediction network, correcting the depth information by using the significance information, and guiding the density map prediction network to train by using the corrected depth information to generate a density map prediction model.
In this embodiment, the overall network structure of the crowd density map prediction network is shown in fig. 2. And inputting the input sample image and the corresponding prediction significance information thereof into the crowd density map prediction network together with the depth information. For the input sample image R, the depth map D and the saliency map S corresponding to the input sample image R are led to R at the l-th layer of the encoder l 、D l And S l Respectively, the output characteristic diagram of the previous convolution layer of the encoder. And correcting the depth features by using the significance features of the corresponding layers, wherein the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi is s Represents a 1X 1 convolutional layer, V l Weights for encoder layer l calculated by sigmoid function, the-l indicates element level multiplication, using weight V l Action D l So that the crowd area is highlighted, and the influence of depth information of the non-crowd area is reduced;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data to a subsequent network.
In order to deal with the problem of scale variation of the crowd, most of the previous works adopt a multi-column network architecture, for example, MCNN uses three columns of subnetworks to extract features of different scales, but the scale diversity of the features is limited by the number of columns of the network. In order to solve the problem, an enhancement multi-scale module is provided by using the architectural idea of inclusion and further utilizing the expansion convolution, so as to perform scale enhancement on the input feature map, as shown in fig. 3.
The dilated convolution provides a larger field of experience than the normal convolution operation, and can capture the area around the boundary and richer context information than the normal convolution. The head area of a person in an image in a crowd scene always varies greatly. The single receptive field can not adapt to the variation of the scale of the human head, and 3 multiplied by 3 convolutions with expansion rates d of 1, 2 and 4 are respectively used for capturing the characteristics, so that the method can better adapt to diversified population distribution in the population scene.
And finally, inputting the feature map into a decoder module, wherein the decoder module is a 7-layer expansion convolutional network, extracting deeper important information by utilizing a larger receptive field, outputting a predicted density map with the same size as the input image, and measuring the difference between the predicted result density map and the label by utilizing Euclidean distance, as shown in formula (1)
Figure BDA0003804641020000081
In the formula (1), X i For the input image, F (X) i ) To estimate the density map, D (X) i ) Is a real density map, and N is the number of training samples. And adjusting network parameters through gradient back propagation, and iteratively training a density prediction network.
And 5) when the crowd of a single image is counted, generating a predicted density map of the image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
After a stable density prediction model is trained, a predicted density map of the image is generated for an input image by using the model, and after the density map is obtained, the total number of people in the map is obtained through summation of pixel points.
Example 2
Based on the crowd counting method based on the depth information and the significance information described in embodiment 1, the embodiment provides a crowd counting system based on the depth information and the significance information, which comprises a processor and a storage medium; the storage medium is used for storing instructions; the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment 1.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and technical principles of the described embodiments, and such modifications and variations should also be considered as within the scope of the present invention.

Claims (9)

1. A crowd counting method based on depth information and significance information is characterized by comprising the following steps:
collecting a crowd sample image of a designated area;
inputting the collected crowd sample images into a trained density map prediction model based on significance information and depth information;
and outputting the total number of people in the crowd sample image.
2. The population counting method based on depth information and significance information according to claim 1, wherein the density map prediction model is constructed by the following method:
carrying out depth prediction on an input crowd sample image by using an image depth information prediction network to obtain image depth information;
the input crowd sample image, the corresponding prediction significance information and the depth information are input into a crowd density map prediction network together, the depth information is corrected by using the significance information, and the corrected depth information is used for guiding the density map prediction network to train so as to generate a density map prediction model.
3. The method of claim 2, wherein the saliency information is generated by prediction of a saliency map of an input crowd sample image using a saliency prediction model, the saliency prediction model being constructed by:
carrying out Gaussian blur on head annotation data corresponding to the input crowd sample image to generate a truth value significance map;
predicting significance information of the input crowd sample image by using a visual significance prediction network to generate a prediction significance map;
and calculating a loss function according to the prediction significance map and the truth value significance map, adjusting network parameters through gradient back propagation, and generating a significance prediction model through iteration.
4. The method according to claim 3, wherein the Gaussian blur of the head labeling data corresponding to the input human sample image is performed by using a Gaussian kernel function with a standard deviation of 19.
5. The population counting method based on depth information and significance information according to claim 2, wherein the density map prediction network training comprises:
for the input crowd sample image R, the depth map D and the saliency map S corresponding to the crowd sample image R, at the l-th layer of the encoder, let R l 、D l And S l Respectively, the output characteristic maps of the previous convolution layers of the encoder, and the significance characteristics of the corresponding layers are used for correcting the depth characteristics, and the correction method comprises the following steps:
V l =sigmoid(Φ s (S l ))
D l =V l ⊙D l
wherein phi s Represents a 1X 1 convolutional layer, V l A weight for encoder layer l, <' > indicates an element level multiplication;
the corrected depth information D l To R l Weighted as
R l =R l ⊙D l
Wherein, an element level multiplication;
then R is put into l 、D l And S l And inputting the data to a subsequent network.
6. The crowd counting method based on depth information and significance information according to claim 5, wherein the density map prediction network comprises: the device comprises a coding module, a depth correction and embedding module, an enhanced multi-scale module and a decoding module;
the encoding module is used for extracting multi-level features of an input image;
the depth correction and embedding module is used for correcting and fusing depth information;
the enhanced multi-scale module is used for extracting and fusing multi-scale comprehensive features;
the decoding module is used for outputting a prediction density map with the same size as the input image.
7. The people counting method based on depth information and saliency information of claim 6, characterized in that said encoder module is a pre-trained VGG16 network; the enhanced multi-scale module comprises a multi-branch 3 × 3 convolution with different expansion rates; the decoding module is a 7-layer expansion convolution network and is used for outputting a prediction density map with the same size as the input image.
8. The method of claim 1, wherein the outputting the population in the population sample image comprises outputting the population
And generating a predicted density map of the crowd sample image by using a density prediction model, and summing all pixel points of the predicted density map to obtain the total number of people in the map.
9. A crowd counting system based on depth information and saliency information, characterized by: comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of any one of claims 1 to 8.
CN202210992920.0A 2022-08-18 2022-08-18 Crowd counting method and system based on depth information and significance information Pending CN115331171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210992920.0A CN115331171A (en) 2022-08-18 2022-08-18 Crowd counting method and system based on depth information and significance information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210992920.0A CN115331171A (en) 2022-08-18 2022-08-18 Crowd counting method and system based on depth information and significance information

Publications (1)

Publication Number Publication Date
CN115331171A true CN115331171A (en) 2022-11-11

Family

ID=83925597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210992920.0A Pending CN115331171A (en) 2022-08-18 2022-08-18 Crowd counting method and system based on depth information and significance information

Country Status (1)

Country Link
CN (1) CN115331171A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456449A (en) * 2023-10-13 2024-01-26 南通大学 Efficient cross-modal crowd counting method based on specific information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456449A (en) * 2023-10-13 2024-01-26 南通大学 Efficient cross-modal crowd counting method based on specific information

Similar Documents

Publication Publication Date Title
CN110276316B (en) Human body key point detection method based on deep learning
CN113065558B (en) Lightweight small target detection method combined with attention mechanism
Ying et al. Multi-attention object detection model in remote sensing images based on multi-scale
CN112446342B (en) Key frame recognition model training method, recognition method and device
WO2019136591A1 (en) Salient object detection method and system for weak supervision-based spatio-temporal cascade neural network
Wang et al. FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection
CN113012172A (en) AS-UNet-based medical image segmentation method and system
CN111027505B (en) Hierarchical multi-target tracking method based on significance detection
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN112633220B (en) Human body posture estimation method based on bidirectional serialization modeling
CN108830170B (en) End-to-end target tracking method based on layered feature representation
CN113192124B (en) Image target positioning method based on twin network
CN110827320B (en) Target tracking method and device based on time sequence prediction
CN114140469B (en) Depth layered image semantic segmentation method based on multi-layer attention
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
Zhu et al. A multi-scale and multi-level feature aggregation network for crowd counting
CN116524062A (en) Diffusion model-based 2D human body posture estimation method
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN115331171A (en) Crowd counting method and system based on depth information and significance information
CN117351414A (en) Crowd density estimation method based on deep neural network
CN114596515A (en) Target object detection method and device, electronic equipment and storage medium
CN116778187A (en) Salient target detection method based on light field refocusing data enhancement
CN112053386B (en) Target tracking method based on depth convolution characteristic self-adaptive integration
CN115965905A (en) Crowd counting method and system based on multi-scale fusion convolutional network
Yao et al. MLP-based Efficient Convolutional Neural Network for Lane Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination