CN117891964A - Cross-modal image retrieval method based on feature aggregation - Google Patents

Cross-modal image retrieval method based on feature aggregation Download PDF

Info

Publication number
CN117891964A
CN117891964A CN202410059094.3A CN202410059094A CN117891964A CN 117891964 A CN117891964 A CN 117891964A CN 202410059094 A CN202410059094 A CN 202410059094A CN 117891964 A CN117891964 A CN 117891964A
Authority
CN
China
Prior art keywords
footprint
image
gray
dust
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410059094.3A
Other languages
Chinese (zh)
Inventor
张艳
吴红英
王年
汪思彤
严毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202410059094.3A priority Critical patent/CN117891964A/en
Publication of CN117891964A publication Critical patent/CN117891964A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-mode image retrieval method based on feature aggregation, which comprises the following steps: processing the acquired footprint image by a CPU processor; sending the footprint data set into a multi-stage feature aggregation network for optimization and loading gray-scale footprint images in a search library; acquiring a dust footprint image to be queried; calculating the similarity between the dust footprint image to be queried and the gray footprint image in the search library; and outputting the personnel information of the gray-scale footprint image which is most similar to the dust footprint image to be queried in the search library. The invention relates to the field of image processing, in particular to a cross-modal image retrieval method based on feature aggregation, which is used for effectively reducing the modal difference between dust footprints and gray footprints and improving the accuracy of cross-modal footprint image retrieval.

Description

Cross-modal image retrieval method based on feature aggregation
Technical Field
The invention relates to the field of image processing, in particular to a cross-mode image retrieval method based on feature aggregation.
Background
At present, a gray-scale footprint database is established for criminals, and the result of the method is greatly dependent on expert experience according to dust footprints left on site and compared with the footprint database. In addition, manual alignment requires a lot of human resources and time costs. The manual comparison efficiency is not high, and the actual effect is not ideal. Therefore, in order to improve the comparison efficiency and accuracy, a cross-modal image retrieval method of dust footprint and gray scale footprint is needed.
However, the two modal images of the dust footprint and the gray footprint have larger modal differences, and the footprint images of different objects and the same modality have similarity. In addition, the cluttered background of the dust footprint is not conducive to extraction of high-representation features, which presents challenges to cross-modal retrieval of footprint images. Therefore, extracting high-characterization features of the same object and different modalities is of great help to footprint retrieval.
Disclosure of Invention
(One) solving the technical problems
Aiming at the defects of the prior art, the invention provides a cross-modal image retrieval method based on feature aggregation, which effectively reduces the modal difference between dust footprints and gray footprints and improves the accuracy of cross-modal footprint image retrieval.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: a cross-modal image retrieval method based on feature aggregation comprises the following steps:
s1: processing the acquired footprint image by a CPU processor;
s2: sending the footprint data set into a multi-stage feature aggregation network for optimization and loading gray-scale footprint images in a search library;
S3: acquiring a dust footprint image to be queried;
S4: calculating the similarity between the dust footprint image to be queried and the gray footprint image in the search library;
S5: outputting gray-scale footprint image personnel information most similar to the dust footprint image to be queried in the search library;
wherein, step S2 includes the following steps:
firstly, preprocessing an acquired footprint image: collecting the barefoot dust footprint and gray footprint images by shooting the footprint of the field environment and an optical sensor; image segmentation is adopted on the dust footprint to extract high characterization features, and the bare footprint is segmented from the background to obtain a dust footprint data set; unifying footprint images to 512×512 in size, and performing data augmentation on the gray-scale footprint image data set;
Secondly, training a network model: inputting the footprint images of seven persons in total nine persons as training samples into a network model for training, taking the footprint images of the remaining two persons in nine persons as test samples, and establishing a search library based on the gray scale footprint images;
The specific training steps for training the network model in the second step are as follows:
(1) Constructing a mixed attention module: adopting ResNet as a backbone network, wherein the input of the mixed attention module is a low-level characteristic diagram before each layer of the backbone network and a high-level characteristic diagram after each layer of the backbone network respectively, firstly, respectively sending the low-level characteristic diagram and the high-level characteristic diagram into two 1X 1 convolution layers, then, calculating channel similarity through matrix multiplication and softmax functions by the output of the two convolution layers, sending the low-level characteristic diagram into the 1X 1 convolution layers, then, enhancing channel characteristic representation with a channel similarity matrix through matrix multiplication, finally, converting the characteristic into the size of an original high-level characteristic diagram through one 1X 1 convolution layer, and adding the size with the original high-level characteristic diagram to obtain output; the spatial feature representation can be enhanced by similar operation with the low-level feature map;
(2) And (3) constructing a characteristic aggregation module: the mixed attention module is fused among layers of the backbone network to form a feature aggregation module, a Layer4 Layer is not used, one and two mixed attention modules are respectively fused behind a Layer1 Layer and a Layer2 Layer, the first two modules input a high-level feature map and a low-level feature map before and after each Layer, the low-level feature map input by the finally fused mixed attention module is an original feature before the Layer1 Layer, and the input high-level feature map is output by the former mixed attention module;
(3) Constructing a partial attention module: and adding a partial attention module after a backbone network, focusing on fine-grained partial characteristics, dividing the characteristics after Layer3 layers into 3 mutually non-overlapped parts through a self-adaptive average pooling function, respectively sending each part into three 1X 1 convolution layers, performing matrix multiplication on the output of the first two convolution layers through a softmax activation function, performing matrix multiplication on the output of the first two convolution layers, obtaining a fine-grained partial characteristic, performing weighted summation on the fine-grained partial characteristic and a weight matrix normalized by the softmax activation function, obtaining attention-enhanced partial characteristics, obtaining characteristic vectors by the input characteristics through a global average pooling Layer and a batch normalization Layer, and adding the obtained characteristics of the two parts to obtain output characteristics.
As training, in step S3, the manhattan distance between the dust footprint image to be queried and the gray footprint image in the search library is calculated by using the trained network model and metric function, and the similarity of the dust footprint image to be queried and the gray footprint image in the search library is measured by using the distance.
As training, when footprint images were acquired in the first step, each subject contained 42 barefoot dust footprint images and 6 barefoot gray scale footprint images.
As training, the method of augmenting the grayscale footprint data in the first step includes horizontal flipping, clockwise rotation by 10 ° and counterclockwise rotation by 10 °.
(III) beneficial effects
The invention provides a cross-mode image retrieval method based on feature aggregation, which has the following beneficial effects:
The invention utilizes the barefoot dust footprint image and the barefoot gray footprint image to extract high characterization features, and realizes cross-modal footprint retrieval intellectualization through deep learning. The method can extract the characteristics and calculate the similarity at low cost, and solves the problems of cross-modal differences of footprint images and the like. Compared with manual retrieval, the method can improve the comparison efficiency and accuracy to a certain extent, and effectively realize cross-modal image retrieval of dust footprints and gray footprints. The invention has positive significance for dust footprint comparison and identification by means of artificial intelligence technology.
Drawings
FIG. 1 is a flow chart of an image retrieval method of the present invention;
FIG. 2 is a framework diagram of network optimization and cross-modal retrieval in accordance with the present invention;
FIG. 3 is a block diagram of a feature aggregation module of the present invention;
fig. 4 is a frame diagram of a hybrid attention module of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, the present invention provides a technical solution:
As shown in fig. 2, the network optimization and cross-modal retrieval are completed by the following steps:
Firstly, preprocessing an acquired footprint image: collecting barefoot dust footprint and gray footprint images by shooting footprints of field environments and optical sensors, wherein each object comprises 42 barefoot dust footprint images and 6 barefoot gray footprint images; image segmentation is adopted on the dust footprint to extract high characterization features, and the bare footprint is segmented from the background to obtain a dust footprint data set; unifying footprint images to 512×512 in size, and performing data augmentation on the gray scale footprint image dataset, wherein the augmentation method comprises horizontal overturn, clockwise rotation by 10 ° and anticlockwise rotation by 10 °;
Secondly, training a network model: inputting the footprint images of seven persons in total nine persons as training samples into a network model for training, taking the footprint images of the remaining two persons in nine persons as test samples, and establishing a search library based on the gray scale footprint images; inputting the dust footprint image to be queried into the network, extracting query features by the network, comparing the query features with features in a search library, determining the similarity by calculating Manhattan distance, indicating that the smaller the distance is, the more similar the distance is, and finally outputting the prediction accuracy.
As shown in fig. 4, the specific training steps for the network model are as follows:
(1) Constructing a mixed attention module: adopting ResNet as a backbone network, wherein the input of the mixed attention module is a low-level characteristic diagram before each layer of the backbone network and a high-level characteristic diagram after each layer of the backbone network respectively, firstly, respectively sending the low-level characteristic diagram and the high-level characteristic diagram into two 1X 1 convolution layers, then, calculating channel similarity through matrix multiplication and softmax functions by the output of the two convolution layers, sending the low-level characteristic diagram into the 1X 1 convolution layers, then, enhancing channel characteristic representation with a channel similarity matrix through matrix multiplication, finally, converting the characteristic into the size of an original high-level characteristic diagram through one 1X 1 convolution layer, and adding the size with the original high-level characteristic diagram to obtain output; the spatial feature representation can be enhanced by similar operation with the low-level feature map;
(2) And (3) constructing a characteristic aggregation module: as shown in fig. 3, a mixed attention module is added between each Layer of the backbone network to form a feature aggregation module, a Layer4 Layer is not used, a mixed attention module is added behind a Layer1 Layer, a high-level feature map and a low-level feature map are used behind the Layer1 Layer, two mixed attention modules are added behind the Layer2 Layer, the former feature is used behind the Layer2 Layer, the last low-level feature map input by the added mixed attention module is an original feature in front of the Layer1 Layer, and the input high-level feature map is the output of the former mixed attention module;
(3) Constructing a partial attention module: adding a partial attention module after a backbone network, focusing on fine-grained partial characteristics, dividing the characteristics of the Layer3 Layer into 3 mutually non-overlapped parts through a self-adaptive average pooling function, respectively sending each part into three 1X 1 convolution layers, performing matrix multiplication on the output of the first two convolution layers through a softmax activation function, performing matrix multiplication on the output of the first two convolution layers, obtaining a fine-grained partial characteristic, performing weighted summation on the fine-grained partial characteristic and a weight matrix normalized by the softmax activation function, obtaining attention-enhanced partial characteristics, obtaining characteristic vectors by the input characteristics through a global average pooling Layer and a batch normalization Layer, and adding the obtained characteristics of the two parts to obtain output characteristics;
third step, cross-modal footprint retrieval: when we need to find an object in the footprint retrieval library that matches the footprint image of unknown identity to be queried, the footprint image to be queried is first provided to a trained network model. The Manhattan distance between the image and each image in the search pool is then calculated using a metric function of the network model. The distance is used as a basis for measuring the similarity between the image to be detected and the image of the search library. And finally, returning the personnel information closest to the distance to the user as an output result. Can help find possible matching objects in the scene of footprint image comparison.
Based on the steps, as shown in fig. 1, a cross-mode image retrieval method based on feature aggregation specifically comprises the following operation steps:
s1: processing the acquired footprint image by a CPU processor;
s2: sending the footprint data set into a multi-stage feature aggregation network for optimization and loading gray-scale footprint images in a search library;
S3: acquiring a dust footprint image to be queried;
S4: calculating the similarity between the dust footprint image to be queried and the gray footprint image in the search library;
S5: and outputting the personnel information of the gray-scale footprint image which is most similar to the dust footprint image to be queried in the search library.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A cross-modal image retrieval method based on feature aggregation is characterized by comprising the following steps of: the method comprises the following steps:
s1: processing the acquired footprint image by a CPU processor;
s2: sending the footprint data set into a multi-stage feature aggregation network for optimization and loading gray-scale footprint images in a search library;
S3: acquiring a dust footprint image to be queried;
S4: calculating the similarity between the dust footprint image to be queried and the gray footprint image in the search library;
S5: outputting gray-scale footprint image personnel information most similar to the dust footprint image to be queried in the search library;
wherein, step S2 includes the following steps:
firstly, preprocessing an acquired footprint image: collecting the barefoot dust footprint and gray footprint images by shooting the footprint of the field environment and an optical sensor; image segmentation is adopted on the dust footprint to extract high characterization features, and the bare footprint is segmented from the background to obtain a dust footprint data set; unifying footprint images to 512×512 in size, and performing data augmentation on the gray-scale footprint image data set;
Secondly, training a network model: inputting the footprint images of seven persons in total nine persons as training samples into a network model for training, taking the footprint images of the remaining two persons in nine persons as test samples, and establishing a search library based on the gray scale footprint images;
The specific training steps for training the network model in the second step are as follows:
(1) Constructing a mixed attention module: adopting ResNet as a backbone network, wherein the input of the mixed attention module is a low-level characteristic diagram before each layer of the backbone network and a high-level characteristic diagram after each layer of the backbone network respectively, firstly, respectively sending the low-level characteristic diagram and the high-level characteristic diagram into two 1X 1 convolution layers, then, calculating channel similarity through matrix multiplication and softmax functions by the output of the two convolution layers, sending the low-level characteristic diagram into the 1X 1 convolution layers, then, enhancing channel characteristic representation with a channel similarity matrix through matrix multiplication, finally, converting the characteristic into the size of an original high-level characteristic diagram through one 1X 1 convolution layer, and adding the size with the original high-level characteristic diagram to obtain output; the spatial feature representation can be enhanced by similar operation with the low-level feature map;
(2) And (3) constructing a characteristic aggregation module: the mixed attention module is fused among layers of the backbone network to form a feature aggregation module, a Layer4 Layer is not used, one and two mixed attention modules are respectively fused behind a Layer1 Layer and a Layer2 Layer, the first two modules input a high-level feature map and a low-level feature map before and after each Layer, the low-level feature map input by the finally fused mixed attention module is an original feature before the Layer1 Layer, and the input high-level feature map is output by the former mixed attention module;
(3) Constructing a partial attention module: and adding a partial attention module after a backbone network, focusing on fine-grained partial characteristics, dividing the characteristics after Layer3 layers into 3 mutually non-overlapped parts through a self-adaptive average pooling function, respectively sending each part into three 1X 1 convolution layers, performing matrix multiplication on the output of the first two convolution layers through a softmax activation function, performing matrix multiplication on the output of the first two convolution layers, obtaining a fine-grained partial characteristic, performing weighted summation on the fine-grained partial characteristic and a weight matrix normalized by the softmax activation function, obtaining attention-enhanced partial characteristics, obtaining characteristic vectors by the input characteristics through a global average pooling Layer and a batch normalization Layer, and adding the obtained characteristics of the two parts to obtain output characteristics.
2. The cross-modal image retrieval method based on feature aggregation as claimed in claim 1, wherein: in step S3, the cross-modal retrieval function provides the footprint image to be queried to the trained network model, calculates the manhattan distance between the dust footprint image to be queried and the gray footprint image in the retrieval library by using the trained network model and the metric function, and measures the similarity of the dust footprint image to be queried and the gray footprint image in the retrieval library by using the distance.
3. The cross-modal image retrieval method based on feature aggregation as claimed in claim 1, wherein: in the first step, footprint images were acquired, each object contained 42 barefoot dust footprint images and 6 barefoot gray scale footprint images.
4. The cross-modal image retrieval method based on feature aggregation as claimed in claim 1, wherein: the method of augmenting the grayscale footprint data in the first step includes a horizontal flip, a clockwise rotation of 10 ° and a counterclockwise rotation of 10 °.
CN202410059094.3A 2024-01-16 2024-01-16 Cross-modal image retrieval method based on feature aggregation Pending CN117891964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410059094.3A CN117891964A (en) 2024-01-16 2024-01-16 Cross-modal image retrieval method based on feature aggregation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410059094.3A CN117891964A (en) 2024-01-16 2024-01-16 Cross-modal image retrieval method based on feature aggregation

Publications (1)

Publication Number Publication Date
CN117891964A true CN117891964A (en) 2024-04-16

Family

ID=90646972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410059094.3A Pending CN117891964A (en) 2024-01-16 2024-01-16 Cross-modal image retrieval method based on feature aggregation

Country Status (1)

Country Link
CN (1) CN117891964A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111444889A (en) * 2020-04-30 2020-07-24 南京大学 Fine-grained action detection method of convolutional neural network based on multi-stage condition influence
CN112257601A (en) * 2020-10-22 2021-01-22 福州大学 Fine-grained vehicle identification method based on data enhancement network of weak supervised learning
CN113868449A (en) * 2021-09-22 2021-12-31 西安理工大学 Image retrieval method based on fusion of multi-scale features and spatial attention mechanism
CN115331141A (en) * 2022-08-03 2022-11-11 天津大学 High-altitude smoke and fire detection method based on improved YOLO v5
WO2023273290A1 (en) * 2021-06-29 2023-01-05 山东建筑大学 Object image re-identification method based on multi-feature information capture and correlation analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948165A (en) * 2019-04-24 2019-06-28 吉林大学 Fine granularity feeling polarities prediction technique based on mixing attention network
CN111444889A (en) * 2020-04-30 2020-07-24 南京大学 Fine-grained action detection method of convolutional neural network based on multi-stage condition influence
CN112257601A (en) * 2020-10-22 2021-01-22 福州大学 Fine-grained vehicle identification method based on data enhancement network of weak supervised learning
WO2023273290A1 (en) * 2021-06-29 2023-01-05 山东建筑大学 Object image re-identification method based on multi-feature information capture and correlation analysis
CN113868449A (en) * 2021-09-22 2021-12-31 西安理工大学 Image retrieval method based on fusion of multi-scale features and spatial attention mechanism
CN115331141A (en) * 2022-08-03 2022-11-11 天津大学 High-altitude smoke and fire detection method based on improved YOLO v5

Similar Documents

Publication Publication Date Title
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN109815770A (en) Two-dimentional code detection method, apparatus and system
CN111027576B (en) Cooperative significance detection method based on cooperative significance generation type countermeasure network
WO2022100607A1 (en) Method for determining neural network structure and apparatus thereof
CN113408472A (en) Training method of target re-recognition model, target re-recognition method and device
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN116824625A (en) Target re-identification method based on generation type multi-mode image fusion
CN114565916A (en) Target detection model training method, target detection method and electronic equipment
CN116310396A (en) RGB-D significance target detection method based on depth quality weighting
CN112269892B (en) Based on multi-mode is unified at many levels Interactive phrase positioning and identifying method
CN117809198A (en) Remote sensing image significance detection method based on multi-scale feature aggregation network
CN112634174A (en) Image representation learning method and system
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
CN116958724A (en) Training method and related device for product classification model
CN115984093A (en) Depth estimation method based on infrared image, electronic device and storage medium
CN117891964A (en) Cross-modal image retrieval method based on feature aggregation
CN117011219A (en) Method, apparatus, device, storage medium and program product for detecting quality of article
CN115830420A (en) RGB-D significance target detection method based on boundary deformable convolution guidance
CN116824686A (en) Action recognition method and related device
CN115331021A (en) Dynamic feature extraction and description method based on multilayer feature self-difference fusion
CN114298961A (en) Image processing method, device, equipment and storage medium
CN114821013B (en) Element detection method and device based on point cloud data and computer equipment
CN117173247B (en) Outdoor positioning and composition method and system based on 2D laser radar and LightGBM
Wang et al. CT-MVSNet: Curvature-guided multi-view stereo with transformers
CN117994822B (en) Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination