CN113177546A - Target detection method based on sparse attention module - Google Patents

Target detection method based on sparse attention module Download PDF

Info

Publication number
CN113177546A
CN113177546A CN202110484922.4A CN202110484922A CN113177546A CN 113177546 A CN113177546 A CN 113177546A CN 202110484922 A CN202110484922 A CN 202110484922A CN 113177546 A CN113177546 A CN 113177546A
Authority
CN
China
Prior art keywords
sparse
input
feature
attention
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110484922.4A
Other languages
Chinese (zh)
Inventor
陈春霖
凌强
李峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202110484922.4A priority Critical patent/CN113177546A/en
Publication of CN113177546A publication Critical patent/CN113177546A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method based on a sparse attention module, which comprises the following specific steps of: step 1: inputting the convolution feature map into a sparse attention module; step 2: carrying out sparse position sampling operation on the convolution characteristic diagram input in the step 1, and searching a position set of sparse characteristics with the most expression ability; and step 3: performing convolution transformation on the convolution characteristic diagram input in the step 1, sampling by using a position set of sparse characteristics to obtain sparse key-value characteristic pairs, and then calculating an attention matrix of the value characteristics and each key characteristic; step 4): and carrying out weighted summation on the value characteristics according to the attention matrix to obtain attention fusion characteristics, carrying out direct connection addition on the attention fusion characteristics and the input characteristic diagram, and outputting the characteristic diagram after the sparse attention module is enhanced.

Description

Target detection method based on sparse attention module
Technical Field
The invention relates to the field of digital image processing, target detection and deep learning, in particular to a target detection method based on a sparse attention module.
Background
Object detection is a fundamental computer vision-aware task, and over the last few years, many advanced object detection methods have been based on convolutional networks. Under the canonical local linear weighting operation of the conventional convolutional layer, it is difficult to effectively obtain global context information. Some recent work has focused on enhancing the way contextual information is fused by a more flexible network computing architecture. In the prior art, a deformable convolution layer is adopted to dynamically adjust the sampling position of a convolution kernel, and the sampling position of the convolution kernel can be distributed to a far position in an image space, so that remote dependence can be more effectively modeled, and context information of the environment can be extracted. There is also a new non-local module proposed by the scholars to model remote dependencies, which successfully implements the application of self-attention mechanisms to visual tasks such as video classification, target detection and keypoint detection by aggregating context information from any two locations of the input feature map. Here, each location is associated with all other locations in the image feature space through a dense attention map, and for a certain location, the context information will be aggregated and fused through a weighted sum of all features. Without limitation, non-local networks can improve the performance of existing networks in various image tasks (e.g., video classification, object detection, and keypoint detection). Although non-local networks have excellent performance, they require the additional introduction of significant computational effort and GPU memory footprint. This is because non-local operations require a great deal of attention to describe the relationship between any two locations of the input feature map. For example, given an input feature map with a spatial resolution of H × W, a non-local operation will compute an attention map with a size of (HW × HW), and when the spatial resolution of the input feature map is increased, the required attention map is increased by a multiple of the square, so the required computation amount and display space occupation are high. Especially for object detection, in order to detect all different scale objects in the input image as much as possible, the resolution of the input image will typically be large, and thus the convolution signature in the network will typically have a high resolution. Therefore, in practical applications, a non-local based detection network would introduce high computational complexity and would also cost a very large GPU memory. This memory-unfriendly computing mechanism limits the application of such non-local networks.
Disclosure of Invention
In order to solve the technical problem, the invention provides a target detection method based on a sparse attention module to capture a remote dependency relationship in an image space, and the context information extraction capability of a model is improved. The relationship between the query and the key elements is modeled by dynamically selecting the locations of a set of sparse points after searching for local response peaks in a thermodynamic diagram for a given input feature map. By utilizing the obtained sparse point positions, the sparse attention module can well model the remote dependence relationship and greatly improve the target detection performance, and the module is very light, and the introduced extra GPU memory and the calculation cost are less than 2% of those of the conventional non-local module. Such sparse attention modules can be easily inserted into various target detection frameworks, resulting in significant improvements to detection results, and the computational and memory overhead is almost negligible.
The sparse attention module is used for improving the expression capability of extracting features of a detection network and improving the context information extraction capability of a model. The proposed sparse attention module can be easily inserted into a generic detection framework, resulting in a better balance of speed and accuracy.
The technical scheme of the invention is as follows: a target detection method based on a sparse attention module specifically comprises the following steps:
step 1: inputting the convolution feature map into a sparse attention module;
step 2: carrying out sparse position sampling operation on the convolution characteristic diagram input in the step 1, and searching a position set of sparse characteristics with the most expression ability;
and step 3: performing convolution transformation on the convolution characteristic diagram input in the step 1, sampling by using a position set of sparse characteristics to obtain sparse key-value characteristic pairs, and then calculating an attention matrix of the value characteristics and each key characteristic;
step 4): and carrying out weighted summation on the value characteristics according to the attention matrix to obtain attention fusion characteristics, carrying out direct connection addition on the attention fusion characteristics and the input characteristic diagram, and outputting the characteristic diagram after the sparse attention module is enhanced.
The operation of the sparse attention module is mathematically represented as:
Figure BDA0003049917040000021
Y=softmax(QTs(Q))s(V)
wherein
Figure BDA0003049917040000022
V ═ g (X), each of which is obtained by outputting the input features X through a 1X1 convolution layer,
Figure BDA0003049917040000023
represents a 1 × 1 convolutional layer; s (-) epsilon RC'*NIt is a sparse sampling operation that samples the N most representative features from a given HW feature vectors, Z being the output feature and Y being the attention-fused feature.
Further, the step 2 sparse sampling operation includes: in the sparse position search block, the input features X will be applied to a channel-by-channel mean operation, generating a feature response thermodynamic diagram Hp∈RH*W;HpIs a matrix representing the input characteristic response; we then get these key locations by searching for local peaks of these responses:
Figure BDA0003049917040000024
where i is the index of the feature in spatial position, ranges from 1 to (HW), and ΩiA window centered at position i; if xiIs the maximum of the neighboring pixel, i is the position of the local peak in the response thermodynamic diagram; all local peak locations constitute a set P describing the set of locations of the most valuable features in the input feature map.
Further, the step 3 further comprises: convolution signature for a given inputIs input by the character X ∈ RC*H*WWhere H is the image height, W is the image width, and C is the number of convolution channels, first the feature X is input into two 1 × 1 convolution layers
Figure BDA0003049917040000031
g (-) to obtain two feature maps Q and V, thereby reducing the convolution channel number from C to C'.
Further, the step 4 specifically includes: the feature of attention fusion is expressed by a mathematical formula as:
Y=softmax(QTs(Q))s(V)
wherein
Figure BDA0003049917040000033
V ═ g (X), which is to output the input features X through 1X1 convolution layers, respectively; s (-) epsilon RC'*NIs a sparse sampling operation that samples the N most representative features from a given HW feature vectors.
It can be easily inserted into any existing target detection architecture through a residual connection without destroying the feature extraction capability of the pre-trained detection network, and can be expressed as:
Figure BDA0003049917040000034
where input features X are here raw input features, Z is the final output feature of the attention module,
Figure BDA0003049917040000035
represents a 1 × 1 convolutional layer. When embedding the detection network, in order to maintain the feature extraction performance of the pre-training basic network, the method comprises the following steps
Figure BDA0003049917040000036
Is initialized to zero.
Further comprising, step 5): and inserting the sparse attention module into a main part of a general target detection network to construct a new detection network.
Advantageous effects
(1) The invention improves the context information extraction capability
It is well known that contextual information of surrounding objects is very beneficial for object detection, especially the identification and localization of objects. Longer range target information in image space may aid in the identification of the current target, and thus such context information may be captured by enhancing the long-range dependence of image space. The sparse attention module provided by the invention can effectively capture the remote dependence in the image space and improve the context information extraction capability of the model. The module selects the most representative locations for remote dependency correlation modeling by searching for local peaks in the input feature response heatmap. The sparse attention module can be conveniently inserted into a general target detection frame, and the detection precision is stably improved.
(2) The invention reduces the time consumption, memory occupation and parameter quantity of model processing
The improvement method provided by the invention aims at the optimization of the intensive attention module, and is very simple and effective. The relationship between the query and the key elements is modeled by dynamically selecting the locations of a set of sparse points after searching for local response peaks in a thermodynamic diagram for a given input feature map. By utilizing the obtained sparse point position, the sparse attention module provided by the invention can well model a remote dependency relationship and greatly improve the target detection performance, and the module is very light, the extra GPU memory and the calculation cost introduced by the module are less than 2% of those of a conventional non-local module, and meanwhile, the parameter quantity of the module is also greatly reduced, so that the module has higher application value for practical industrial practice.
Drawings
FIG. 1 sparse attention module;
FIG. 2 sparse sampling operation;
FIG. 3 application of sparse attention module in a generic detection network;
fig. 4 shows an example of the detection result.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
According to an embodiment of the present invention, a target detection method based on a sparse attention module is provided, which specifically includes the following steps:
step 1: inputting the convolution feature map into a sparse attention module;
step 2: carrying out sparse position sampling operation on the convolution characteristic diagram input in the step 1, and searching a position set of sparse characteristics with the most expression ability;
and step 3: performing convolution transformation on the convolution characteristic diagram input in the step 1, sampling by using a position set of sparse characteristics to obtain sparse key-value characteristic pairs, and then calculating an attention matrix of the value characteristics and each key characteristic;
step 4): and carrying out weighted summation on the value characteristics according to the attention matrix to obtain attention fusion characteristics, carrying out direct connection addition on the attention fusion characteristics and the input characteristic diagram, and outputting the characteristic diagram after the sparse attention module is enhanced.
According to an embodiment of the present invention, the processing flow of the sparse attention module is as shown in fig. 1 below. Given an input feature X ∈ RC*H*WWhere H is the image height, W is the image width, and C is the number of convolution channels, first the feature X is input into two 1 × 1 convolution layers
Figure BDA0003049917040000041
g (-) to obtain two feature maps Q and V, thereby reducing the convolution channel number from C to C'. To accommodate the matrix multiplication of the attention fusion operation, the last two dimensions of the two feature maps Q and V are respectively flattened into RC'*HW. In order to improve computational efficiency, in one embodiment of the present invention, C' may be selected as C/rThe number of channels for both features is reduced, where r is the reduction rate. In order to best balance the model inference speed and detection accuracy, in practice r is 4.
The operation of the sparse attention module may be mathematically expressed as:
Figure BDA0003049917040000042
Y=softmax(QTs(Q))s(V)
wherein
Figure BDA0003049917040000043
V ═ g (X), each of which is obtained by outputting the input features X through a 1X1 convolution layer,
Figure BDA0003049917040000044
represents a 1 × 1 convolutional layer; . s (-) epsilon RC'*NIt is a sparse sampling operation that samples the N most representative features from a given HW feature vectors, Z being the output and Y being the attention-fused feature.
Compared with the conventional dense attention module, the sparse attention module proposed by the present invention has two main differences: (1) the sparse attention module of the invention introduces a sparse sampling operation s (-) to sample N elements as key elements and value elements, which are respectively called K and V in the conventional attention module; (2) the present invention extracts a key element K from a feature Q in a sparse attention module in which the key element shares the same input feature as a query element (value element), unlike a conventional attention module which extracts a new feature K using a convolution transform. This feature sharing mechanism between query elements and key elements produces little degradation in detection performance while greatly reducing the parameters and computational load of the entire module.
The two differences described above make the proposed sparse attention module more lightweight and consume very little GPU memory. It can be easily inserted into any existing target detection architecture through a residual connection without destroying the feature extraction capability of the pre-trained detection network, and can be expressed as:
Figure BDA0003049917040000051
where X is the original input feature of the input,
Figure BDA0003049917040000052
representing a 1x1 convolutional layer, in order to preserve the feature extraction performance of the pre-trained base network when embedding the detection network, it will usually be
Figure BDA0003049917040000053
Is initialized to zero.
The process of the sparse sampling operation according to one embodiment of the present invention is illustrated in fig. 2. Input features X ∈ RC*H*WFirstly inputting the input feature into a sparse position search block, and searching N positions P which are most representative of the input featureN. Then, PNWill be used to sample the features at the corresponding N positions in Q or V.
In the sparse position search block, the input features X will be applied to a channel-by-channel mean operation, generating a feature response thermodynamic diagram Hp∈RH*W。HpIs a matrix representing the response of the input characteristic. At HpA function that contains more valuable information will respond more strongly, while a function that contains less important information will result in a weaker response. Certain locations of the characteristic response thermodynamic diagram have very strong responses, typically corresponding to significant object edges. These edges provide the most valuable information about the location of the object, which helps to accurately regress the bounding box of each object. We can then find these key locations by searching for local peaks of these responses:
Figure BDA0003049917040000054
where i is the index of the feature in spatial position, ranges from 1 to (HW), and ΩiIs at position iA 3 x 3 window at the center. If xiIs the maximum of the neighboring pixels, i is the position of the local peak in the response thermodynamic diagram. All local peak locations constitute a set P describing the set of locations of the most valuable features in the input feature map.
In order to improve the efficiency of the sparse attention module, the local peak values are sequenced according to the responses of the local peak values, and only the first N peak values with the strongest responses are taken for sparse sampling. The resulting sparse position set is denoted PNWhere N ═ ρ HW, where ρ is set to 0.01 in our experiments in general.
In a sparse position set PNThe sparse attention module selectively samples the key elements and the value elements, respectively, thereby reducing the attention effort from (HW × HW) to (HW × ρ HW), more efficient in terms of memory and computational effort than the conventional non-local modules.
According to an embodiment of the present invention, a target detection network based on a sparse attention module may also be constructed based on the sparse attention module proposed by the present invention.
The sparse attention module proposed by the present invention is a practical universal module that can be naturally plugged into existing detection networks. Currently, ResNet is widely applied to various detection frameworks, such as fast R-CNN and RetinaNet, so we insert the proposed sparse attention block into the residual module of ResNet for feature enhancement.
As shown in fig. 3, the sparse attention module of the present invention is applied to a general detection network, where SA Block is the sparse attention module proposed in the present invention.
In general, conventional non-local modules are proposed to be used before the last residual module in the c4 phase of ResNet, considering the constraints of computational load and GPU memory consumption. However, the sparse attention module provided by the invention is lighter in weight and less in memory occupation, so that a plurality of modules can be inserted into the basic network leisurely.
Generally, in the whole detection framework, the sparse attention module is added after the middle 3 × 3 convolution layer of the residual module, as shown in fig. 3.
In particular, the invention inserts sparse attention modules at the c3, c4, c5 stages of ResNet. When the pre-training model is adopted to initialize the basic network, the additional sparse attention module may destroy the feature extraction capability of the pre-training network, and the weight advantage of the pre-training cannot be utilized to the maximum extent. Therefore, the sparse attention module should be initialized to zero output at the beginning of training to effectively maintain the feature extraction capability of the pre-training network, so we initialize the weight of the last 1 × 1 convolutional layer to 0.
According to an embodiment of the present invention, after the detection network is trained, it can be used in a practical application scenario to detect an object of interest in an input picture. An example of the detection result is shown in fig. 4.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (6)

1. A target detection method based on a sparse attention module is characterized by specifically comprising the following steps of:
step 1: inputting the convolution feature map into a sparse attention module;
step 2: carrying out sparse position sampling operation on the convolution characteristic diagram input in the step 1, and searching a position set of sparse characteristics with the most expression ability;
and step 3: performing convolution transformation on the convolution characteristic diagram input in the step 1, sampling by using a position set of sparse characteristics to obtain sparse key-value characteristic pairs, and then calculating an attention matrix of the value characteristics and each key characteristic;
step 4): and carrying out weighted summation on the value characteristics according to the attention matrix to obtain attention fusion characteristics, carrying out direct connection addition on the attention fusion characteristics and the input characteristic diagram, and outputting the characteristic diagram after the sparse attention module is enhanced.
2. The sparse attention module based object detection method of claim 1,
the operation of the sparse attention module is mathematically represented as:
Figure FDA0003049917030000011
Y=softmax(QTs(Q))s(V)
wherein
Figure FDA0003049917030000012
V ═ g (X), each of which is obtained by outputting the input features X through a 1X1 convolution layer,
Figure FDA0003049917030000013
represents a 1 × 1 convolutional layer; s (-) epsilon RC'*NIt is a sparse sampling operation that samples the N most representative features from a given HW feature vectors, Z being the output feature and Y being the attention-fused feature.
3. The sparse attention module based object detection method of claim 1,
the step 2 sparse sampling operation process comprises:
in the sparse position search block, the input features X will be applied to a channel-by-channel mean operation, generating a feature response thermodynamic diagram Hp∈RH*W;HpIs a matrix representing the input characteristic response; these key locations are then obtained by searching for local peaks of these responses:
Figure FDA0003049917030000014
where i is the index of the feature in spatial position, ranges from 1 to (HW), and ΩiA window centered at position i; if xiIs the maximum of the neighboring pixel, i is the position of the local peak in the response thermodynamic diagram; all local peak locations constitute a set P describing the set of locations of the most valuable features in the input feature map.
4. The sparse attention module based object detection method of claim 1,
the step 3 further comprises: given the input features X ∈ R of the input convolution feature map of the inputC*H*WWhere H is the image height, W is the image width, and C is the number of convolution channels, first the feature X is input into two 1 × 1 convolution layers
Figure FDA0003049917030000015
g (-) to obtain two feature maps Q and V, thereby reducing the convolution channel number from C to C'.
5. The sparse attention module based object detection method of claim 1,
the step 4 specifically includes: the feature of attention fusion is expressed by a mathematical formula as:
Y=softmax(QTs(Q))s(V)
wherein
Figure FDA0003049917030000021
V ═ g (X), which is to output the input features X through 1X1 convolution layers, respectively; s (-) epsilon RC*NIs a sparse sampling operation that samples the N most representative features from a given HW feature vectors;
it is inserted into any existing target detection architecture through a residual connection without destroying the feature extraction capability of the pre-trained detection network and is expressed as:
Figure FDA0003049917030000022
where input features X are represented here as raw input features, Z is the final output feature of the attention module,
Figure FDA0003049917030000023
represents a 1 × 1 convolutional layer. When embedding the detection network, in order to maintain the feature extraction performance of the pre-training basic network, the method comprises the following steps
Figure FDA0003049917030000024
Is initialized to zero.
6. The sparse attention module based object detection method of claim 1, further comprising:
step 5): and inserting the sparse attention module into a main part of a general target detection network to construct a new detection network.
CN202110484922.4A 2021-04-30 2021-04-30 Target detection method based on sparse attention module Pending CN113177546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110484922.4A CN113177546A (en) 2021-04-30 2021-04-30 Target detection method based on sparse attention module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110484922.4A CN113177546A (en) 2021-04-30 2021-04-30 Target detection method based on sparse attention module

Publications (1)

Publication Number Publication Date
CN113177546A true CN113177546A (en) 2021-07-27

Family

ID=76925880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110484922.4A Pending CN113177546A (en) 2021-04-30 2021-04-30 Target detection method based on sparse attention module

Country Status (1)

Country Link
CN (1) CN113177546A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019169A (en) * 2022-05-31 2022-09-06 海南大学 Single-stage water surface small target detection method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567997A (en) * 2012-01-04 2012-07-11 西安电子科技大学 Target detection method based on sparse representation and visual cortex attention mechanism
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
CN109492232A (en) * 2018-10-22 2019-03-19 内蒙古工业大学 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN109977774A (en) * 2019-02-25 2019-07-05 中国科学技术大学 A kind of fast target detection method based on adaptive convolution
CN110738247A (en) * 2019-09-30 2020-01-31 中国科学院大学 fine-grained image classification method based on selective sparse sampling
CN111444913A (en) * 2020-03-22 2020-07-24 华南理工大学 License plate real-time detection method based on edge-guided sparse attention mechanism
US20200272823A1 (en) * 2017-11-14 2020-08-27 Google Llc Weakly-Supervised Action Localization by Sparse Temporal Pooling Network
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
CN111931795A (en) * 2020-09-25 2020-11-13 湖南大学 Multi-modal emotion recognition method and system based on subspace sparse feature fusion
CN112464097A (en) * 2020-12-07 2021-03-09 广东工业大学 Multi-auxiliary-domain information fusion cross-domain recommendation method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567997A (en) * 2012-01-04 2012-07-11 西安电子科技大学 Target detection method based on sparse representation and visual cortex attention mechanism
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
US20200272823A1 (en) * 2017-11-14 2020-08-27 Google Llc Weakly-Supervised Action Localization by Sparse Temporal Pooling Network
CN109492232A (en) * 2018-10-22 2019-03-19 内蒙古工业大学 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN109977774A (en) * 2019-02-25 2019-07-05 中国科学技术大学 A kind of fast target detection method based on adaptive convolution
CN110738247A (en) * 2019-09-30 2020-01-31 中国科学院大学 fine-grained image classification method based on selective sparse sampling
CN111444913A (en) * 2020-03-22 2020-07-24 华南理工大学 License plate real-time detection method based on edge-guided sparse attention mechanism
CN111833246A (en) * 2020-06-02 2020-10-27 天津大学 Single-frame image super-resolution method based on attention cascade network
CN111931795A (en) * 2020-09-25 2020-11-13 湖南大学 Multi-modal emotion recognition method and system based on subspace sparse feature fusion
CN112464097A (en) * 2020-12-07 2021-03-09 广东工业大学 Multi-auxiliary-domain information fusion cross-domain recommendation method and system

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
ADITYA CHATTOPADHAY 等: "Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)》 *
BIAO ZHANG 等: "Sparse Attention with Linear Units", 《ARXIV》 *
REWON CHILD 等: "Generating Long Sequences with Sparse Transformers", 《ARXIV》 *
SHUN-YAO SHIH 等: "Temporal Pattern Attention for Multivariate Time Series Forecasting", 《ARXIV》 *
XIZHOU ZHU 等: "DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION", 《ARXIV》 *
YAO DING 等: "Selective Sparse Sampling for Fine-Grained Image Recognition", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
YUE CAO等: "GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW)》 *
张如意: "基于深度学习的视频目标检测算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019169A (en) * 2022-05-31 2022-09-06 海南大学 Single-stage water surface small target detection method and device

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
Song et al. Monocular depth estimation using laplacian pyramid-based depth residuals
CN112001914A (en) Depth image completion method and device
Akey Sungheetha Classification of remote sensing image scenes using double feature extraction hybrid deep learning approach
CN115497005A (en) YOLOV4 remote sensing target detection method integrating feature transfer and attention mechanism
Heo et al. Monocular depth estimation using whole strip masking and reliability-based refinement
CN110162657B (en) Image retrieval method and system based on high-level semantic features and color features
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN116758130A (en) Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion
CN110633640A (en) Method for identifying complex scene by optimizing PointNet
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
CN111899203A (en) Real image generation method based on label graph under unsupervised training and storage medium
CN115713632A (en) Feature extraction method and device based on multi-scale attention mechanism
Patil et al. Semantic segmentation of satellite images using modified U-Net
Ni et al. Category-level assignment for cross-domain semantic segmentation in remote sensing images
CN112800932B (en) Method for detecting remarkable ship target in offshore background and electronic equipment
CN112906800B (en) Image group self-adaptive collaborative saliency detection method
CN108154522B (en) Target tracking system
CN113177546A (en) Target detection method based on sparse attention module
Singh et al. Wavelet based histogram of oriented gradients feature descriptors for classification of partially occluded objects
Pang et al. PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
Mujtaba et al. Automatic solar panel detection from high-resolution orthoimagery using deep learning segmentation networks
Pal et al. MAML-SR: Self-adaptive super-resolution networks via multi-scale optimized attention-aware meta-learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210727

WD01 Invention patent application deemed withdrawn after publication