CN111737512B - Silk cultural relic image retrieval method based on depth feature region fusion - Google Patents
Silk cultural relic image retrieval method based on depth feature region fusion Download PDFInfo
- Publication number
- CN111737512B CN111737512B CN202010498104.5A CN202010498104A CN111737512B CN 111737512 B CN111737512 B CN 111737512B CN 202010498104 A CN202010498104 A CN 202010498104A CN 111737512 B CN111737512 B CN 111737512B
- Authority
- CN
- China
- Prior art keywords
- cultural relic
- target
- silk cultural
- retrieval
- silk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/54—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Library & Information Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a silk cultural relic image retrieval method based on depth feature region fusion, which is characterized by comprising the following steps of: classifying and learning the silk cultural relic image by adopting a deep learning global feature extraction mode; selecting an activation area corresponding to a silk cultural relic image of a certain category by adopting a neural network visualization mode, and further realizing retrieval target positioning; fusing the characteristics related to the target area in a regional characteristic fusion mode to be used as a local descriptor of the target; and selecting the silk cultural relic image with the characteristic distance closest to the user request picture for retrieval. Aiming at the characteristic that the silk cultural relic image retrieval target usually only occupies a small part, the invention can accurately position and extract fine-grained characteristics of the retrieval target by combining depth characteristic extraction and candidate retrieval areas, thereby improving the silk cultural relic image retrieval performance and realizing the small target retrieval of the silk cultural relic image.
Description
Technical Field
The invention relates to a retrieval method of a silk cultural relic image, in particular to a retrieval method of a silk cultural relic image based on depth feature extraction and fine-grained region fusion, and belongs to the technical field of information.
Background
The development and the propagation of silk cultural relic image information resources as a widely utilized medium are witnessed. The silk cultural relic retrieval method adopting the depth feature extraction can effectively manage the rapidly-increased silk cultural relic image data set, and displays the traditional silk cultural relic path to a great number of users in a digital mode through a network means.
The silk cultural relic retrieval method adopting depth feature extraction at present is mainly based on global features, namely, the output of a full connection layer of a depth feature network is adopted as a feature descriptor, so that the whole semantic information of an image is kept. The global-based method mostly focuses on image classification type retrieval tasks, and the feature extraction method is also based on global full-connected layer output. However, since the convolutional neural network mainly encodes global spatial information, the obtained features lack invariance to geometric transformations such as scale, rotation, translation and the like and spatial layout changes of the image, and robustness of the convolutional neural network to highly variable image retrieval is limited. Meanwhile, for silk images, the retrieval target only occupies a small part of the whole image, so for the small target retrieval problem, the small target cannot be effectively represented and the small target area cannot be accurately positioned by the global-based feature.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the existing silk cultural relic retrieval method cannot realize small target retrieval and positioning.
In order to solve the technical problem, the technical scheme of the invention is to provide a silk cultural relic image retrieval method based on depth feature region fusion, which is characterized by comprising the following steps:
step 1, classifying and learning silk cultural relic images by adopting a deep learning global feature extraction mode, and classifying all silk cultural relic images into different categories;
step 2, selecting an activation area corresponding to the silk cultural relic image of a certain category determined in the step 1 by adopting a neural network visualization mode, and further realizing retrieval target positioning, wherein the method comprises the following steps:
step 201, fusing the characteristic surfaces of the silk cultural relic images of a certain specific category determined in the step 1 by using a Grad-CAM method to obtain a Grad-CAM image;
step 202, performing global mean pooling on the Grad-CAM graphs of each category, namely taking and scoring the mean value of the Grad-CAM graphs, and reserving the Grad-CAM graphs higher than a certain threshold value to indicate that the Grad-CAM graphs contain targets of the current category;
step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing target positioning;
and 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target, wherein the method comprises the following steps:
step 301, positioning a detection target to obtain a sensor feature surface of which the convolution result of the target in a positioning area is H multiplied by W multiplied by D, wherein H, W, D respectively represents the height, width and channel number of the feature surface;
step 302, adopting a policy of Region Maximum Activation of constants, regarding H × W × D sensor feature surfaces as D H × W-dimensional descriptors, and performing local average pooling or Maximum pooling on the D H × W descriptors to obtain a D-dimensional feature to represent the target;
and 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.
Preferably, in step 1, in the classification learning, the target data is subjected to classification fine tuning on a pre-training model by using a transfer learning manner.
Preferably, in step 302, if one picture contains multiple objects, D-dimensional features of different objects are concatenated as output in a region feature fusion manner.
Aiming at the characteristic that the silk cultural relic image retrieval target usually only occupies a small part, the invention can accurately position and extract fine-grained characteristics of the retrieval target by combining depth characteristic extraction and candidate retrieval areas, thereby improving the silk cultural relic image retrieval performance and realizing the small target retrieval of the silk cultural relic image.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The invention is improved on the basis of the existing method for extracting the global features based on deep learning so as to realize the small target retrieval and positioning of the silk cultural relic image.
The invention provides a silk cultural relic image retrieval method based on depth feature region fusion, which comprises the following steps:
step 1, classifying and learning the silk cultural relic image by adopting a deep learning global feature extraction mode, thereby keeping global classification information of features. During classification learning, the target data is classified and fine-tuned on a pre-training model (such as VGGNet or ResNet) by using a transfer learning mode, so that the fine-tuned CNN network feature plane can contain classification information. The classification information is only coarse-grained learning, and the subsequent fine-grained learning needs to be finely adjusted through a network.
And 2, selecting an activation area corresponding to the silk cultural relic image of a certain category determined in the step 1 by adopting a neural network visualization mode, and further realizing retrieval target positioning.
The step 2 comprises the following steps:
step 201, fusing the specific characteristic surface of the silk cultural relic image of a certain category determined in step 1 by using a Grad-CAM (Gradient-weighted Class Activation Mapping) method to obtain a Grad-CAM image so as to achieve the purpose of visualizing the target area. The core idea is to perform weighted fusion on a certain convolution layer feature plane to visualize object information of a specific type.
Step 202, performing Global Average Pooling (Global Average Pooling) on the Grad-CAM map of each category, namely, taking the Average of the Grad-CAM maps and scoring (voting), wherein the Grad-CAM map higher than a certain threshold value is reserved, which indicates that the Grad-CAM map contains the target of the current category.
And step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing the positioning of the retrieval target.
And 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target.
The step 3 comprises the following steps:
step 301, positioning the detected target to obtain a convolutional result of the target in a positioning area of the target, wherein the convolutional result is an H × W × D Tensor feature plane, and H, W, D respectively represents the height, width and channel number of the feature plane.
Step 302, in order to convert the sensor feature plane into a feature vector representing the target, a strategy of Region Maximum Activation of constants is adopted, and the H multiplied by W multiplied by D sensor feature plane is regarded as a D H multiplied by W dimensional descriptor. And performing local average pooling or maximum pooling on the D H multiplied by W descriptors to obtain a D-dimensional feature to represent the target.
Step 303, if one picture contains a plurality of targets, D-dimensional features of different targets can be connected in series in a region feature fusion manner to serve as output.
And 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.
Claims (3)
1. A silk cultural relic image retrieval method based on depth feature region fusion is characterized by comprising the following steps:
step 1, classifying and learning silk cultural relic images by adopting a deep learning global feature extraction mode, and classifying all silk cultural relic images into different categories;
step 2, selecting an activation area corresponding to each type of silk cultural relic image determined in the step 1 by adopting a neural network visualization mode, and further realizing target positioning, wherein the method comprises the following steps:
step 201, fusing the characteristic surfaces of the silk cultural relic images of each category determined in the step 1 by using a Grad-CAM method to obtain a Grad-CAM image;
step 202, performing global mean pooling on the Grad-CAM graphs of each category, namely taking and scoring the mean value of the Grad-CAM graphs, and reserving the Grad-CAM graphs higher than a certain threshold value to indicate that the Grad-CAM graphs contain targets of the current category;
step 203, positioning the specific position of the target of the corresponding category according to the reserved contour of the Grad-CAM diagram, and realizing target positioning;
and 3, fusing the features related to the target region in a region feature fusion mode to be used as the local descriptor of the target, wherein the method comprises the following steps:
step 301, positioning a detection target to obtain a sensor feature surface of which the convolution result of the target in a positioning area is H multiplied by W multiplied by D, wherein H, W, D respectively represents the height, width and channel number of the feature surface;
step 302, adopting a policy of Region Maximum Activation of constants, regarding H × W × D sensor feature surfaces as D H × W-dimensional descriptors, and performing local average pooling or Maximum pooling on the D H × W descriptors to obtain a D-dimensional feature to represent the target;
and 4, obtaining a user request picture, obtaining the characteristics of the user request picture by adopting the methods in the steps 2 and 3, calculating Euclidean distances between the characteristics of the user request picture and the characteristics of each type of silk cultural relic images in a local characteristic space, and selecting the type of silk cultural relic image closest to the characteristics of the user request picture for retrieval.
2. The silk cultural relic image retrieval method based on depth feature region fusion as claimed in claim 1, wherein in step 1, during the classification learning, a migration learning mode is used to perform classification fine adjustment on the target data on a pre-training model.
3. The silk cultural relic image retrieval method based on depth feature region fusion as claimed in claim 1, wherein in step 302, if a picture comprises a plurality of objects, D-dimensional features of different objects are connected in series as output by using a region feature fusion mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010498104.5A CN111737512B (en) | 2020-06-04 | 2020-06-04 | Silk cultural relic image retrieval method based on depth feature region fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010498104.5A CN111737512B (en) | 2020-06-04 | 2020-06-04 | Silk cultural relic image retrieval method based on depth feature region fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111737512A CN111737512A (en) | 2020-10-02 |
CN111737512B true CN111737512B (en) | 2021-11-12 |
Family
ID=72649012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010498104.5A Active CN111737512B (en) | 2020-06-04 | 2020-06-04 | Silk cultural relic image retrieval method based on depth feature region fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737512B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112837299B (en) * | 2021-02-09 | 2024-02-27 | 浙江工业大学 | Textile image fingerprint retrieval method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3166049A1 (en) * | 2015-11-03 | 2017-05-10 | Baidu USA LLC | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering |
CN109272011A (en) * | 2018-07-31 | 2019-01-25 | 东华大学 | Multitask depth representing learning method towards image of clothing classification |
CN110334746A (en) * | 2019-06-12 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of image detecting method and device |
CN110688511A (en) * | 2019-08-15 | 2020-01-14 | 深圳久凌软件技术有限公司 | Fine-grained image retrieval method and device, computer equipment and storage medium |
CN110825899A (en) * | 2019-09-18 | 2020-02-21 | 武汉纺织大学 | Clothing image retrieval method integrating color features and residual network depth features |
CN111104538A (en) * | 2019-12-06 | 2020-05-05 | 深圳久凌软件技术有限公司 | Fine-grained vehicle image retrieval method and device based on multi-scale constraint |
CN111159456A (en) * | 2019-12-30 | 2020-05-15 | 云南大学 | Multi-scale clothing retrieval method and system based on deep learning and traditional features |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909565B (en) * | 2018-09-14 | 2023-06-16 | 阿里巴巴集团控股有限公司 | Image recognition and pedestrian re-recognition method and device, electronic and storage equipment |
US11170257B2 (en) * | 2018-10-15 | 2021-11-09 | Ancestry.Com Operations Inc. | Image captioning with weakly-supervised attention penalty |
CN111177446B (en) * | 2019-12-12 | 2023-04-25 | 苏州科技大学 | Method for searching footprint image |
CN111177376B (en) * | 2019-12-17 | 2023-08-15 | 东华大学 | Chinese text classification method based on BERT and CNN hierarchical connection |
CN111104539A (en) * | 2019-12-20 | 2020-05-05 | 湖南千视通信息科技有限公司 | Fine-grained vehicle image retrieval method, device and equipment |
-
2020
- 2020-06-04 CN CN202010498104.5A patent/CN111737512B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3166049A1 (en) * | 2015-11-03 | 2017-05-10 | Baidu USA LLC | Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering |
CN109272011A (en) * | 2018-07-31 | 2019-01-25 | 东华大学 | Multitask depth representing learning method towards image of clothing classification |
CN110334746A (en) * | 2019-06-12 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of image detecting method and device |
CN110688511A (en) * | 2019-08-15 | 2020-01-14 | 深圳久凌软件技术有限公司 | Fine-grained image retrieval method and device, computer equipment and storage medium |
CN110825899A (en) * | 2019-09-18 | 2020-02-21 | 武汉纺织大学 | Clothing image retrieval method integrating color features and residual network depth features |
CN111104538A (en) * | 2019-12-06 | 2020-05-05 | 深圳久凌软件技术有限公司 | Fine-grained vehicle image retrieval method and device based on multi-scale constraint |
CN111159456A (en) * | 2019-12-30 | 2020-05-15 | 云南大学 | Multi-scale clothing retrieval method and system based on deep learning and traditional features |
Non-Patent Citations (2)
Title |
---|
Clothes Keypoints Detection with Cascaded Pyramid Network;li chao,等;《Journal of Donghua University》;20200331;第37卷(第3期);第232-236页 * |
基于卷积神经网络的织物图像特征提取与检索研究进展;孙洁,等;《纺织学报》;20191231;第40卷(第12期);第1345-1353页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111737512A (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
Chen et al. | Improved saliency detection in RGB-D images using two-phase depth estimation and selective deep fusion | |
CN109598241B (en) | Satellite image marine ship identification method based on Faster R-CNN | |
JP6440303B2 (en) | Object recognition device, object recognition method, and program | |
Yap et al. | A comparative study of mobile-based landmark recognition techniques | |
US9025863B2 (en) | Depth camera system with machine learning for recognition of patches within a structured light pattern | |
Yan et al. | TrAdaBoost based on improved particle swarm optimization for cross-domain scene classification with limited samples | |
CN107291936A (en) | The hypergraph hashing image retrieval of a kind of view-based access control model feature and sign label realizes that Lung neoplasm sign knows method for distinguishing | |
CN110363071A (en) | A kind of sea ice detection method cooperateing with Active Learning and transductive SVM | |
Chen et al. | Integrated content and context analysis for mobile landmark recognition | |
Qian et al. | On combining social media and spatial technology for POI cognition and image localization | |
CN102867192B (en) | A kind of Scene Semantics moving method propagated based on supervision geodesic line | |
CN113159043A (en) | Feature point matching method and system based on semantic information | |
Liao et al. | Tag features for geo-aware image classification | |
CN114332921A (en) | Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network | |
JP4926266B2 (en) | Learning data creation device, learning data creation method and program | |
CN111737512B (en) | Silk cultural relic image retrieval method based on depth feature region fusion | |
Chen et al. | Human motion target posture detection algorithm using semi-supervised learning in internet of things | |
CN112446431A (en) | Feature point extraction and matching method, network, device and computer storage medium | |
Chen et al. | Correlation filter tracking via distractor-aware learning and multi-anchor detection | |
CN107578003A (en) | A kind of remote sensing images transfer learning method based on GEOGRAPHICAL INDICATION image | |
Liao et al. | Multi-scale saliency features fusion model for person re-identification | |
CN111144466B (en) | Image sample self-adaptive depth measurement learning method | |
CN116994034A (en) | Small target detection algorithm based on feature pyramid | |
Li et al. | A Sparse Feature Matching Model Using a Transformer towards Large‐View Indoor Visual Localization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |