CN115965818A - Small sample image classification method based on similarity feature fusion - Google Patents

Small sample image classification method based on similarity feature fusion Download PDF

Info

Publication number
CN115965818A
CN115965818A CN202310032701.2A CN202310032701A CN115965818A CN 115965818 A CN115965818 A CN 115965818A CN 202310032701 A CN202310032701 A CN 202310032701A CN 115965818 A CN115965818 A CN 115965818A
Authority
CN
China
Prior art keywords
sample
representation
image
text
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310032701.2A
Other languages
Chinese (zh)
Inventor
何向南
王硕
卢金达
郝艳宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310032701.2A priority Critical patent/CN115965818A/en
Publication of CN115965818A publication Critical patent/CN115965818A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a small sample image classification method based on similarity feature fusion, which comprises the following steps: the method comprises the following steps: performing feature extraction on an input image; step two: extracting similarity relation of text ends; step three: extracting similarity relation among samples; step four: fusing features based on text similarity; step five: feature fusion based on sample similarity; step six: multi-stage feature fusion; step seven: and (5) training and testing the model. According to the method, based on the similarity between the samples and the categories, the characteristics of the input small sample images and the natural image characteristics of the basic categories are fused, so that the diversity of the characteristics of the small sample images can be improved, the category expression of the small sample images is perfected, the response capability of a classifier on the small sample images is improved, and the accuracy of the classification of the small sample images is improved.

Description

Small sample image classification method based on similarity feature fusion
Technical Field
The invention belongs to the field of image classification, and particularly relates to a small sample image classification method based on similarity feature fusion.
Background
In recent years, convolutional Neural Networks (CNNs) have shown strong performance on a large number of visual tasks including image classification, segmentation, and the like, but they rely on large-scale labeling data for training, and the labeling of the large-scale data requires a large amount of manpower and material resource cost, which limits the application scenarios. To solve this problem, a task of small sample learning (FSL) has been proposed. It aims to accomplish the classification of test samples by a limited training sample.
Currently, a pre-training approach is often adopted in a small sample learning (FSL) task. It uses a pre-trained feature extractor (backphone) on the base class to directly extract sample features of the support classes and uses the features of the support samples to train a classifier. Training a robust feature extractor (backphone) can effectively improve the performance of a small sample learning (FSL) model, however, designing, training, and validating one feature extractor from zero is time consuming and expensive. Moreover, because the base class and the support class are disjoint, a feature extractor (backhaul) pre-trained on the base class tends to focus more on the texture and structure information of the base class samples it learns, causing it to ignore the details of the support samples, which has the problem of poor classification performance.
To solve the above problem of insufficient classification performance on a small number of support samples, a data generation-based approach generates more new samples based on the current support samples to assist the optimization process of the classifier, but ignores the difference between the basic class and the support class, and introduces extra noise in the data generation process, which may mislead the classifier.
Based on the above analysis, how to reduce the deviation between feature representations introduced by the difference between the basic category and the support category and between the basic sample and the support sample so as to improve the response capability of the classifier to the support category is a problem that small sample learning is urgently needed to solve.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a small sample image classification method based on similarity feature fusion, and can improve the accuracy of small sample image classification by directly modeling the similarity between a support sample and a basic sample and between a support class and a basic class.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a small sample image classification method based on similarity feature fusion, which is characterized by comprising the following steps of:
step 1, feature extraction of an input image:
step 1.1, acquiring a natural image set, inputting the natural image set into a pre-trained CNN model for feature extraction to obtain feature representation of a natural image and a basic category set thereof, and recording the feature representation and the basic category set as
Figure BDA0004047883950000011
Wherein it is present>
Figure BDA0004047883950000012
Represents a feature representation of the i-th natural image, and->
Figure BDA0004047883950000021
d represents the dimension of the feature representation>
Figure BDA0004047883950000022
Represents the base class to which the i-th natural image belongs, and->
Figure BDA0004047883950000023
C base Set of base classes, | C, representing a set of natural images base I denotes natureNumber of base classes of image set, N base Representing the number of natural images in each base category; />
Step 1.2, another image sample set is obtained and input into the pre-trained CNN model for feature extraction, and feature representation and support category sets of the image samples are obtained and recorded as
Figure BDA0004047883950000024
Wherein it is present>
Figure BDA0004047883950000025
Represents a feature representation of a jth image sample, and +>
Figure BDA0004047883950000026
Figure BDA0004047883950000027
Represents the support class to which the jth image sample belongs, and->
Figure BDA0004047883950000028
C novel Represents a set of support classes for the image sample and satisfies C novel ∩C base =φ,|C novel I represents the number of supported classes of an image sample, N novel Representing the number of image samples in each support category;
step 2: extracting the similarity relation of the text ends:
step 2.1, extracting a basic category set C by using a pre-trained word embedding model base Vector representation of text information of each basic category
Figure BDA0004047883950000029
Wherein it is present>
Figure BDA00040478839500000210
Vector representation representing the text information of the kth base class>
Figure BDA00040478839500000211
t represents the dimension of the vector representation;
step 2.2, extracting a support category set C by using the pre-trained word embedding model novel Vector representation of text information of each support category
Figure BDA00040478839500000212
Wherein it is present>
Figure BDA00040478839500000213
Vector representation of the text information representing the s-th support category, and->
Figure BDA00040478839500000214
Step 2.3, calculating the vector representation of the text information of the s-th support category by using the formula (1)
Figure BDA00040478839500000215
With a vector representation of the ith base category text information->
Figure BDA00040478839500000216
Is greater than or equal to>
Figure BDA00040478839500000217
And the similarity relation between the text end of the s-th support category and the text end of one basic category is used as the similarity relation between the text end of the s-th support category and the text end of one basic category, so that a text end similarity relation vector between the s-th support category and all the basic categories is obtained>
Figure BDA00040478839500000218
Figure BDA00040478839500000219
In the formula (1), the reaction mixture is,
Figure BDA00040478839500000220
represents->
Figure BDA00040478839500000221
And/or>
Figure BDA00040478839500000222
Is greater than or equal to>
Figure BDA00040478839500000223
And/or>
Figure BDA00040478839500000224
Respectively represent->
Figure BDA00040478839500000225
And/or>
Figure BDA00040478839500000226
The L2 paradigm of (1);
and step 3: extracting similarity relation among samples:
computing a feature representation for a jth image sample using equation (2)
Figure BDA0004047883950000031
Is compared with the characteristic representation of the i-th natural image->
Figure BDA0004047883950000032
Is greater than or equal to>
Figure BDA0004047883950000033
And the similarity relation is used as the similarity relation between the jth image sample and a natural image, so that a sample similarity relation vector between the jth image sample and all natural images is obtained>
Figure BDA0004047883950000034
Figure BDA0004047883950000035
In the formula (2), the reaction mixture is,
Figure BDA0004047883950000036
represents->
Figure BDA0004047883950000037
And/or>
Figure BDA0004047883950000038
Is greater than or equal to>
Figure BDA0004047883950000039
And &>
Figure BDA00040478839500000310
Respectively represent->
Figure BDA00040478839500000311
And &>
Figure BDA00040478839500000312
The L2 paradigm of (1);
and 4, step 4: feature fusion based on text similarity and generating fused features
Figure BDA00040478839500000313
And 5: feature fusion based on sample similarity and generating fused features
Figure BDA00040478839500000314
/>
Step 6: multi-stage feature fusion and generation of fused features
Figure BDA00040478839500000315
And 7: model training and testing:
step 7.1, extracting the feature representation of the image for the basic sample set and the support set according to the feature extraction module, forming a similarity feature fusion module by the feature fusion based on the text similarity, the feature fusion based on the sample similarity and the multi-stage feature fusion, and extracting the feature representation of the image for the support sample set and the support sample set according to the feature extraction module
Figure BDA00040478839500000316
Performing feature fusion according to the selection of the feature fusion mode to obtain fused samples->
Figure BDA00040478839500000317
7.2, constructing a loss function L by using the formula (3);
Figure BDA00040478839500000318
in formula (3), L CE Representing cross entropy loss, gamma representing a classifier, and lambda being a harmonic factor when the features are fused;
Figure BDA00040478839500000319
represents the class of the support sample and is fused with the fused sample>
Figure BDA00040478839500000320
The categories of the data are consistent;
7.3, training the classifier gamma by using a gradient descent algorithm, calculating a loss function L to update the parameters of the classifier gamma, and stopping training when the training iteration times reach the set times to obtain the trained classifier gamma * For predicting the class of the new image sample.
The small sample image classification method based on similarity feature fusion is also characterized in that the step 4 comprises the following steps:
step 4.1, representing the characteristics of the jth image sample
Figure BDA00040478839500000321
At V novel The vector representation of the text information corresponding to the support category is marked as ≥>
Figure BDA0004047883950000041
And extracts->
Figure BDA0004047883950000042
And a base set of classes C base Text similarity relation R of all basic categories in T (j);
Step 4.2, from the feature representation of the jth image sample
Figure BDA0004047883950000043
The text similarity relation R T (j) Selecting basic category sets corresponding to the beta closest distances, and representing the characteristics of all natural images in the beta basic category sets as a text side selection set (or greater than or equal to the standard value)>
Figure BDA0004047883950000044
Wherein +>
Figure BDA0004047883950000045
Representing alternate corpus of text ends D textual Representing the characteristic of the r-th natural image as an alternative characteristic;
step 4.3, generating a text end random vector V T ∈R d And the text side random vector V T Obey 0-1 uniform distribution V T U (0, 1), defining a hyper-parameter α, and α ∈ [0,1 ]]According to a random vector V T With the hyper-parameter α, a text end mask vector M is constructed using equation (4) T ∈R d
Figure BDA0004047883950000046
In the formula (4), v Tt Random vector V representing text end T The tth random value; m is a unit of Tt Represents M T The t-th mask value;
step 4.4, according to the alternative characteristic representation
Figure BDA0004047883950000047
And a text end mask vector M T The feature representation for the jth image sample is ^ based on equation (5)>
Figure BDA0004047883950000048
Performing feature fusion to generate fused features->
Figure BDA0004047883950000049
Figure BDA00040478839500000410
In the formula (5), the reaction mixture is,
Figure BDA00040478839500000411
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
The step 5 comprises the following steps:
step 5.1, feature representation for jth image sample
Figure BDA00040478839500000412
Pick up>
Figure BDA00040478839500000413
And a set of base classes D base Similarity relation R between samples of feature representation of all natural images I (j);
Step 5.2, from the current sample
Figure BDA00040478839500000414
(ii) inter-sample similarity relationship R I (j) The feature representation of the gamma natural images with the closest distance is selected as a sample end alternative set D instance And->
Figure BDA00040478839500000415
Wherein it is present>
Figure BDA00040478839500000416
Representing sample end candidate set D instance Representing the characteristic of the r-th natural image as an alternative characteristic;
step 5.3, generatingSample-side random vector V I ∈R d ,V I Obey 0-1 uniform distribution V I U (0, 1), defining a hyper-parameter α, and α ∈ [0,1 ]]According to a random vector V I With the hyper-parameter α, a sample-end mask vector M is constructed using equation (6) I ∈R d
Figure BDA0004047883950000051
In the formula (6), v Ik Represents the sample-end random vector V I The kth random value; m is Ik Represents M I The kth mask value;
step 5.4, according to the alternative characteristic representation
Figure BDA0004047883950000052
And a sample end mask vector M T Representing a characteristic of the jth image sample ^ using equation (7)>
Figure BDA0004047883950000053
Performing feature fusion to generate fused features->
Figure BDA0004047883950000054
Figure BDA0004047883950000055
In the formula (7), the reaction mixture is,
Figure BDA0004047883950000056
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
The step 6 comprises the following steps:
step 6.1, feature representation for jth image sample
Figure BDA0004047883950000057
V novel Vector representation of text information corresponding to its support categoryIs recorded as->
Figure BDA0004047883950000058
Extraction>
Figure BDA0004047883950000059
And a base set of classes C base Text similarity relation R of all basic categories in T (j) And pick up->
Figure BDA00040478839500000510
And a set of base classes D base Sample similarity relation R of feature representation of all natural images I (j);
Step 6.2, from the feature representation of the jth image sample
Figure BDA00040478839500000511
The text similarity relation R T (j) Selecting basic category sets corresponding to the beta closest distances, and representing the characteristics of all natural images in the beta basic category sets as a text side selection set (or greater than or equal to the standard value)>
Figure BDA00040478839500000512
Wherein it is present>
Figure BDA00040478839500000513
Representing alternate corpus of text ends D textual Characteristic representation of the r-th natural image;
step 6.3, select set D from text textual According to similarity relation R between samples I (s) selecting gamma nearest base image samples as an alternative set D candidate And is and
Figure BDA00040478839500000514
wherein x is f candidate Represents an alternative set D candidate Performing feature fusion by using the feature representation of the f-th natural image as a candidate feature representation;
step 6.4, generating a random vector V, wherein V belongs to R d And are randomThe vector V obeys 0-1 uniform distribution V-U (0, 1), a hyper-parameter alpha is defined, and the alpha belongs to [0,1 ]]And constructing a sample end mask vector M by using a formula (8) according to the random vector V and the hyperparameter alpha, wherein M belongs to R d
Figure BDA00040478839500000515
Step 6.5, according to the alternative characteristic representation
Figure BDA00040478839500000516
And a mask vector M representing ^ the feature of the jth image sample using equation (9)>
Figure BDA0004047883950000061
Performing feature fusion to generate fused features->
Figure BDA0004047883950000062
Figure BDA0004047883950000063
In the formula (9), the reaction mixture is,
Figure BDA0004047883950000064
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
The electronic device comprises a memory and a processor, wherein the memory is used for storing programs for supporting the processor to execute the small sample image classification method, and the processor is configured to execute the programs stored in the memory.
The invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the method for classifying images of small samples.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention designs a small sample image classification method based on similarity feature fusion, which solves the problems of information loss and insufficient attention to support feature details caused by extracting the features of a support class sample by using a feature extractor pre-trained on a basic class through directly modeling the similarity between the support sample and the basic sample and between the support class and the basic class.
2. The method and the device simultaneously utilize the similarities between the basic type and the support type and between the basic sample and the support sample to generate a new sample with more discriminative, representative and strong expression capability, reduce the deviation and noise introduced in the data generation process compared with the traditional data generation-based method, fully consider the difference between the basic type and the support type, better assist the training of the classifier and improve the classification accuracy of the small sample classification method.
3. Compared with the traditional scheme based on the feature extractor training, the method is simpler and more efficient by generating the support feature direct training classifier, greatly reduces the complex time cost and the expensive calculation cost caused by training the feature extractor, simultaneously makes up the semantic bias caused by the category difference, and improves the classification accuracy.
Drawings
FIG. 1 is a flowchart of a small sample image classification method based on similarity feature fusion according to the present invention;
FIG. 2 is a schematic diagram of inter-sample similarity relationship extraction according to the present invention;
FIG. 3 is a diagram illustrating text-end similarity relationship extraction according to the present invention;
FIG. 4 is a schematic diagram of a feature fusion method of the present invention.
Detailed Description
In the embodiment, a small sample classification method based on similarity feature fusion is characterized in that the similarity between a support sample and a basic sample and the similarity between a support category and a basic category are directly modeled, a new sample is generated based on the similarity, the description of the support sample is perfected, and the optimization process of a classifier is assisted, so that semantic bias caused by category difference is reduced, and the accuracy of the small sample image classification method is improved. Specifically, as shown in fig. 1, the method comprises the following steps:
step 1, performing feature extraction on an input image:
before similarity relation extraction, image samples from a natural image set and another image set are first converted into feature representations through a CNN model pre-trained on the natural image set.
Step 1.1, acquiring a natural image set, inputting the natural image set into a pre-trained CNN model for feature extraction to obtain feature representation of a natural image and a basic category set thereof, and recording the feature representation and the basic category set as
Figure BDA0004047883950000071
Wherein it is present>
Figure BDA0004047883950000072
Represents a feature representation of the i-th natural image, and->
Figure BDA0004047883950000073
d represents the dimension of the characteristic representation, and->
Figure BDA0004047883950000074
Represents the base class to which the i-th natural image belongs, and->
Figure BDA0004047883950000075
C base A set of base classes, | C, representing a set of natural images base I represents the number of basic categories of the natural image set, N base Representing the number of natural images in each base category;
step 1.2, another image sample set is obtained and input into a pre-trained CNN model for feature extraction, and feature representation and support category set of the image sample are obtained and recorded as
Figure BDA0004047883950000076
Wherein it is present>
Figure BDA0004047883950000077
Represents a characteristic representation of the jth image sample, and->
Figure BDA0004047883950000078
Figure BDA0004047883950000079
Represents the support class to which the jth image sample belongs, and
Figure BDA00040478839500000710
C novel represents a set of support classes for the image sample and satisfies C novel ∩C base =φ,|C novel I denotes the number of supported classes of the image sample, N novel Representing the number of image samples in each support category;
step 2: extracting the similarity relation of the text ends:
in order to implement feature fusion based on category text similarity, the similarity relationship between the text information of each support category and the text information of all basic categories needs to be extracted. Firstly, converting semantic labels of a basic category and a support category into a vector representation form by a pre-trained word embedding method, and then calculating a Cosine distance between the support category vector representation and the vector representation of each basic category as a similarity relation of a text end.
Step 2.1, extracting a basic category set C by using a pre-trained word embedding model base Vector representation of text information of each basic category
Figure BDA00040478839500000711
Wherein it is present>
Figure BDA00040478839500000712
Vector representation representing the text information of the kth base class>
Figure BDA00040478839500000713
t represents the dimension of the vector representation;
step 2.2, use pre-trained word to embed mouldType extraction support class set C novel Vector representation of text information of each support category
Figure BDA00040478839500000714
Wherein it is present>
Figure BDA00040478839500000715
Vector representation of the text information representing the s-th support category, and->
Figure BDA00040478839500000716
Step 2.3, calculating vector representation of the text information of the s-th support category by using the formula (1)
Figure BDA00040478839500000717
And the vector representation of the ith base category text information->
Figure BDA0004047883950000081
In conjunction with a distance>
Figure BDA0004047883950000082
And the similarity relation between the text end of the s-th support category and the text end of one basic category is used as the similarity relation between the text end of the s-th support category and the text end of one basic category, so that a text end similarity relation vector between the s-th support category and all the basic categories is obtained>
Figure BDA0004047883950000083
Figure BDA0004047883950000084
In the formula (1), the reaction mixture is,
Figure BDA0004047883950000085
represents->
Figure BDA0004047883950000086
And/or>
Figure BDA0004047883950000087
Is greater than or equal to>
Figure BDA0004047883950000088
And &>
Figure BDA0004047883950000089
Respectively denote->
Figure BDA00040478839500000810
And/or>
Figure BDA00040478839500000811
The L2 paradigm of (1);
and step 3: extracting similarity relation among samples:
in order to realize the similarity feature fusion between the samples, the similarity relationship extraction needs to be performed on the image sample of each support category and all the natural image samples, and for the image sample of each support category, the Cosine distance between the feature representation of the image sample of each support category and the feature representation of all the natural image samples is calculated as the similarity relationship between the samples.
Step 3.1, calculating the feature representation of the jth image sample by using the formula (2)
Figure BDA00040478839500000812
Is compared with the characteristic representation of the i-th natural image->
Figure BDA00040478839500000813
Is greater than or equal to>
Figure BDA00040478839500000814
And the similarity relation between the jth image sample and a natural image is used as the similarity relation between the jth image sample and the natural image, so that a sample similarity relation vector between the jth image sample and all the natural images is obtained>
Figure BDA00040478839500000815
Figure BDA00040478839500000816
In the formula (2), the reaction mixture is,
Figure BDA00040478839500000817
represents->
Figure BDA00040478839500000818
And/or>
Figure BDA00040478839500000819
Is greater than or equal to>
Figure BDA00040478839500000820
And/or>
Figure BDA00040478839500000821
Respectively represent->
Figure BDA00040478839500000822
And/or>
Figure BDA00040478839500000823
The L2 paradigm of (1);
and 4, step 4: feature fusion based on text similarity:
step 4.1, representing the characteristics of the jth image sample
Figure BDA00040478839500000824
At V novel In which a vector representation of the text information corresponding to the support category is marked as &>
Figure BDA00040478839500000825
And extracts->
Figure BDA00040478839500000826
And a base set of classes C base Text similarity relation R of all basic categories in T (j);
Step 4.2, from the feature representation of the jth image sample, as shown in FIG. 2
Figure BDA00040478839500000827
The text similarity relation R T (j) Selecting basic category sets corresponding to the beta closest distances, and representing the characteristics of all natural images in the beta basic category sets as a text side selection set (or greater than or equal to the standard value)>
Figure BDA00040478839500000828
Wherein it is present>
Figure BDA00040478839500000829
Representing alternate corpus of text ends D textual Representing the characteristic of the r-th natural image as an alternative characteristic;
step 4.3, generating a text end random vector V T ∈R d And the text side random vector V T Obey 0-1 uniform distribution V T U (0, 1), defining a hyper-parameter α, and α ∈ [0,1 ]]In this example, α =0.7, in terms of a random vector V T With the hyper-parameter α, a text end mask vector M is constructed using equation (3) T ∈R d
Figure BDA0004047883950000091
In the formula (3), v Tt Representing text-end random vector V T The tth random value; m is Tt Represents M T The t-th mask value;
step 4.4, according to the alternative characteristic representation
Figure BDA0004047883950000092
And a text end mask vector M T The feature representation for the jth image sample is ^ based on equation (4)>
Figure BDA0004047883950000093
Performing feature fusion to generate fused features->
Figure BDA0004047883950000094
Figure BDA0004047883950000095
In the formula (4), the reaction mixture is,
Figure BDA0004047883950000096
represents the vector inner product, λ is the harmonic factor of the random sampling in the Beta (2, 2) distribution;
and 5: and (3) feature fusion based on sample similarity:
step 5.1, feature representation for jth image sample
Figure BDA0004047883950000097
Pick up>
Figure BDA0004047883950000098
And a set of base classes D base Similarity relation R between samples of feature representation of all natural images I (j);
Step 5.2, from the current sample, as shown in FIG. 3
Figure BDA0004047883950000099
Is related to similarity between samples R I (j) Selecting the feature representation of the gamma closest natural images as a sample end alternative set D instance In which>
Figure BDA00040478839500000910
Wherein,
Figure BDA00040478839500000911
representing sample end candidate set D instance And (5) representing the characteristic of the r-th natural image, wherein gamma =512 in the example, and serving as an alternative characteristic representation. />
Step 5.3, generating a random vector V of a sample end I ∈R d ,V I Obey 0-1 uniform distribution V I U (0, 1), defining a hyper-parameter α, and α ∈ [0,1 ]]In this example, α =0.7, in terms of a random vector V I With the hyper-parameter α, a sample-end mask vector M is constructed using equation (5) I ∈R d
Figure BDA00040478839500000912
In the formula (5), v Ik Represents the sample-end random vector V I The kth random value; m is Ik Represents M I The kth mask value;
step 5.4, according to the alternative characteristic representation
Figure BDA00040478839500000913
And a sample end mask vector M T The feature representation for the jth image sample is ^ based on equation (6)>
Figure BDA00040478839500000914
Performing feature fusion to generate fused features->
Figure BDA00040478839500000915
Figure BDA00040478839500000916
In the formula (6), the reaction mixture is,
Figure BDA0004047883950000101
represents the vector inner product, λ is the harmonic factor of the random sampling in the Beta (2, 2) distribution;
step 6: multi-stage feature fusion:
step 6.1, feature representation for jth image sample
Figure BDA0004047883950000102
V novel The vector representation of the text information corresponding to the support category is marked as ≥>
Figure BDA0004047883950000103
Extraction>
Figure BDA0004047883950000104
And a base set of classes C base Text similarity relation R of all basic categories in T (j) Extract and/or pick up>
Figure BDA0004047883950000105
And a set of base classes D base Similarity relation R between samples of feature representation of all natural images I (j);
Step 6.2, from the feature representation of the jth image sample
Figure BDA0004047883950000106
The text similarity relation R T (j) Selecting basic category sets corresponding to the beta closest distances, and representing the characteristics of all natural images in the beta basic category sets as a text side selection set (or greater than or equal to the standard value)>
Figure BDA0004047883950000107
Wherein it is present>
Figure BDA0004047883950000108
Representing alternate corpus of text ends D textual And as an alternative feature representation, in this example, β =2;
step 6.3, select set D from the text textual According to similarity relation R between samples I (s) selecting gamma nearest base image samples as an alternative set D candidate And is and
Figure BDA0004047883950000109
wherein x is f candidate Represents an alternative set D candidate Performing feature fusion on the feature representation of the f-th natural image as an alternative feature representation, wherein in the example, gamma =512;
step 6.4, as shown in FIG. 4, a random vector V is generated, where V ∈ R d And the random vector V obeys 0-1 and is uniformly distributed V-U (0, 1), defining a hyperparameter α, and α ∈ [, [ solution ] ]0,1]In this example, α =0.7, and a sample end mask vector M is constructed using equation (7) based on the random vector V and the hyperparameter α, where M ∈ R d
Figure BDA00040478839500001010
Step 6.5, according to the alternative characteristic representation
Figure BDA00040478839500001011
And a mask vector M representing ^ the feature of the jth image sample using equation (8)>
Figure BDA00040478839500001012
Performing feature fusion to generate fused features->
Figure BDA00040478839500001013
Figure BDA00040478839500001014
In the formula (8), the reaction mixture is,
Figure BDA00040478839500001015
represents the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution;
and 7: model training and testing:
step 7.1, according to the feature extraction module, extracting the feature representation of the image for the basic sample set and the support set, forming a similarity feature fusion module by feature fusion based on text similarity, feature fusion based on sample similarity and multi-stage feature fusion, and extracting the feature representation of the image for the support sample set and the support set, and forming a similarity feature fusion module for the support sample set
Figure BDA0004047883950000111
Performing feature fusion according to the selection of the feature fusion mode to obtain fused samples->
Figure BDA0004047883950000112
Step 7.2, constructing a loss function L by using the formula (9);
Figure BDA0004047883950000113
in formula (9), L CE Representing cross entropy loss, gamma representing a classifier, and lambda being a harmonic factor in feature fusion;
Figure BDA0004047883950000114
represents the class of the support sample and is fused with the fused sample>
Figure BDA0004047883950000115
The categories of the data are consistent;
7.3, training the classifier gamma by using a gradient descent algorithm, calculating a loss function L to update the parameter of the classifier gamma, and stopping training when the training iteration times reach the set times to obtain the trained classifier gamma * For predicting the class of the new image sample.
In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the above-described small sample classification method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program performs the steps of the small sample classification method.

Claims (6)

1. A small sample image classification method based on similarity feature fusion is characterized by comprising the following steps:
step 1, feature extraction of an input image:
step 1.1, acquiring a natural image set and inputting the natural image set into a pre-trained CNN model for feature extraction to obtain feature representation and a basic category set of the natural image, and recording the feature representation and the basic category set as
Figure FDA0004047883940000011
Wherein it is present>
Figure FDA0004047883940000012
Represents a feature representation of the i-th natural image, and->
Figure FDA0004047883940000013
d represents the dimension of the characteristic representation, and->
Figure FDA0004047883940000014
Represents the base class to which the i-th natural image belongs, and->
Figure FDA0004047883940000015
C base Set of base classes, | C, representing a set of natural images base I represents the number of basic categories of the natural image set, N base Representing the number of natural images in each base category;
step 1.2, another image sample set is obtained and input into the pre-trained CNN model for feature extraction, and feature representation and support category set of the image sample are obtained and recorded as
Figure FDA0004047883940000016
Wherein it is present>
Figure FDA0004047883940000017
Represents a characteristic representation of the jth image sample, and->
Figure FDA0004047883940000018
Figure FDA0004047883940000019
Represents the support class to which the jth image sample belongs, and
Figure FDA00040478839400000110
C novel represents a set of support classes for the image sample and satisfies C novel ∩C base =φ,|C novel I represents the number of supported classes of an image sample, N novel Representing the number of image samples in each support category;
step 2: extracting the similarity relation of the text ends:
step 2.1, extracting a basic category set C by using a pre-trained word embedding model base Vector representation of text information of each basic category
Figure FDA00040478839400000111
Wherein +>
Figure FDA00040478839400000112
Vector representation representing the text information of the kth base class>
Figure FDA00040478839400000113
t represents the dimension of the vector representation;
step 2.2, extracting a support category set C by using the pre-trained word embedding model novel Vector representation of text information of each support category
Figure FDA00040478839400000114
Wherein +>
Figure FDA00040478839400000115
A vector representation of the textual information representing the s-th support category, device for selecting or keeping>
Figure FDA00040478839400000116
Step 2.3, calculating the vector representation of the text information of the s-th support category by using the formula (1)
Figure FDA00040478839400000117
With a vector representation of the ith base category text information->
Figure FDA00040478839400000118
Is greater than or equal to>
Figure FDA00040478839400000119
And the similarity relation between the text end of the s-th support category and the text end of one basic category is used as the similarity relation between the text end of the s-th support category and the text end of one basic category, so that a text end similarity relation vector between the s-th support category and all the basic categories is obtained>
Figure FDA00040478839400000120
Figure FDA0004047883940000021
In the formula (1), the acid-base catalyst,
Figure FDA0004047883940000022
represents->
Figure FDA0004047883940000023
And/or>
Figure FDA0004047883940000024
Is greater than or equal to>
Figure FDA0004047883940000025
And/or>
Figure FDA0004047883940000026
Respectively represent->
Figure FDA0004047883940000027
And/or>
Figure FDA0004047883940000028
The L2 paradigm of (1);
and step 3: extracting similarity relation among samples:
computing a feature representation for a jth image sample using equation (2)
Figure FDA0004047883940000029
In relation to a feature representation of an i-th natural image>
Figure FDA00040478839400000210
Is greater than or equal to>
Figure FDA00040478839400000211
And the similarity relation between the jth image sample and a natural image is used as the similarity relation between the jth image sample and the natural image, so that a sample similarity relation vector between the jth image sample and all the natural images is obtained>
Figure FDA00040478839400000212
/>
Figure FDA00040478839400000213
In the formula (2), the reaction mixture is,
Figure FDA00040478839400000214
represents->
Figure FDA00040478839400000215
And/or>
Figure FDA00040478839400000216
Is greater than or equal to>
Figure FDA00040478839400000217
And &>
Figure FDA00040478839400000218
Respectively represent->
Figure FDA00040478839400000219
And
Figure FDA00040478839400000220
the L2 paradigm of (1);
and 4, step 4: feature fusion based on text similarity and generating fused features
Figure FDA00040478839400000221
And 5: feature fusion based on sample similarity and generating fused features
Figure FDA00040478839400000222
Step 6: multi-stage feature fusion and generating fused features
Figure FDA00040478839400000223
And 7: model training and testing:
step 7.1, extracting the feature representation of the image for the basic sample set and the support set according to the feature extraction module, forming a similarity feature fusion module by the feature fusion based on the text similarity, the feature fusion based on the sample similarity and the multi-stage feature fusion, and extracting the feature representation of the image for the support sample set and the support sample set according to the feature extraction module
Figure FDA00040478839400000224
Performing feature fusion according to the selection of the feature fusion mode to obtain fused samples->
Figure FDA00040478839400000225
7.2, constructing a loss function L by using the formula (3);
Figure FDA00040478839400000226
in the formula (3), L CE Representing cross entropy loss, gamma representing a classifier, and lambda being a harmonic factor in feature fusion;
Figure FDA00040478839400000227
represents a category of supporting samples and is fused with the samples>
Figure FDA00040478839400000228
The categories of the data are consistent;
7.3, training the classifier gamma by using a gradient descent algorithm, calculating a loss function L to update the parameters of the classifier gamma, and stopping training when the training iteration times reach the set times to obtain the trained classifier gamma * For predicting the class of the new image sample.
2. The method for classifying small sample images based on similarity feature fusion according to claim 1, wherein the step 4 comprises:
step 4.1, representing the characteristics of the jth image sample
Figure FDA0004047883940000031
At V novel The vector representation of the text information corresponding to the support category is marked as ≥>
Figure FDA0004047883940000032
And extracts->
Figure FDA0004047883940000033
And a base set of classes C base Text similarity relation R of all basic categories in T (j);
Step 4.2, from the feature representation of the jth image sample
Figure FDA0004047883940000034
The text similarity relation R T (j) Selecting beta ones of the bestBasic category sets corresponding to the short distance, and the characteristics of all natural images in the beta basic category sets are expressed as text side selection sets>
Figure FDA0004047883940000035
Wherein it is present>
Figure FDA0004047883940000036
Representing alternate corpus of text ends D textual Representing the characteristic of the r-th natural image as an alternative characteristic;
step 4.3, generating text end random vector V T ∈R d And the text side random vector V T Obey 0-1 uniform distribution V T U (0, 1), defining a hyper-parameter α, and α ∈ [0,1 ]]According to a random vector V T With the hyper-parameter α, a text end mask vector M is constructed using equation (4) T ∈R d
Figure FDA0004047883940000037
/>
In the formula (4), v Tt Representing text-end random vector V T The tth random value; m is a unit of Tt Represents M T The t-th mask value;
step 4.4, according to the alternative characteristic representation
Figure FDA0004047883940000038
And a text end mask vector M T The feature representation for the jth image sample is ^ based on equation (5)>
Figure FDA0004047883940000039
Performing feature fusion to generate fused features->
Figure FDA00040478839400000310
Figure FDA00040478839400000311
In the formula (5), the reaction mixture is,
Figure FDA00040478839400000312
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
3. The method for classifying small sample images based on similarity characteristic fusion according to claim 2, wherein the step 5 comprises:
step 5.1, feature representation for jth image sample
Figure FDA00040478839400000313
Extraction>
Figure FDA00040478839400000314
And a base set of categories D base Similarity relation R between samples of feature representation of all natural images I (j);
Step 5.2, from the current sample
Figure FDA00040478839400000315
Is related to similarity between samples R I (j) Selecting the feature representation of the gamma closest natural images as a sample end alternative set D instance And->
Figure FDA0004047883940000041
Wherein it is present>
Figure FDA0004047883940000042
Representing sample end candidate set D instance Representing the characteristic of the r-th natural image as an alternative characteristic;
step 5.3, generating a random vector V of a sample end I ∈R d ,V I Obey 0-1 uniform distribution V I U (0, 1), defining a hyper-parameter α, anα∈[0,1]According to a random vector V I With the hyper-parameter α, a sample-end mask vector M is constructed using equation (6) I ∈R d
Figure FDA0004047883940000043
In the formula (6), v Ik Represents the sample-end random vector V I The kth random value; m is Ik Represents M I The kth mask value;
step 5.4, according to the alternative characteristic representation
Figure FDA0004047883940000044
And a sample end mask vector M T The feature representation ^ for the jth image sample using equation (7)>
Figure FDA0004047883940000045
Performing feature fusion to generate fused features->
Figure FDA0004047883940000046
Figure FDA0004047883940000047
In the formula (7), the reaction mixture is,
Figure FDA0004047883940000048
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
4. The method for classifying small sample images based on similarity feature fusion according to claim 3, wherein the step 6 comprises:
step 6.1, feature representation for jth image sample
Figure FDA0004047883940000049
V novel The vector representation of the text information corresponding to its support category is marked as @>
Figure FDA00040478839400000410
Pick up>
Figure FDA00040478839400000411
And a base set of classes C base Text similarity relation R of all basic categories in T (j) And pick up->
Figure FDA00040478839400000412
And a set of base classes D base Sample similarity relation R of feature representation of all natural images I (j);
Step 6.2, from the feature representation of the jth image sample
Figure FDA00040478839400000413
The text similarity relation R T (j) Selecting basic category sets corresponding to the beta closest distances, and representing the characteristics of all natural images in the beta basic category sets as a text side selection set (or greater than or equal to the standard value)>
Figure FDA00040478839400000414
Wherein +>
Figure FDA00040478839400000415
Representing alternate corpus of text ends D textual Characteristic representation of the r-th natural image;
step 6.3, select set D from text textual According to similarity relation R between samples I (s) selecting gamma nearest base image samples as an alternative set D candidate And is made of
Figure FDA00040478839400000416
Wherein x is f candidate Represents an alternative set D candidate Performing feature fusion by using the feature representation of the f-th natural image as a candidate feature representation;
step 6.4, generating a random vector V, wherein V belongs to R d And the random vector V obeys 0-1 to uniformly distribute V-U (0, 1), a hyper-parameter alpha is defined, and alpha belongs to [0,1 ]]And constructing a sample end mask vector M by using a formula (8) according to the random vector V and the hyperparameter alpha, wherein M belongs to R d
Figure FDA0004047883940000051
Step 6.5, according to the alternative characteristic representation
Figure FDA0004047883940000052
And a mask vector M representing ^ the feature of the jth image sample using equation (9)>
Figure FDA0004047883940000053
Performing feature fusion to generate fused features->
Figure FDA0004047883940000054
Figure FDA0004047883940000055
In the formula (9), the reaction mixture is,
Figure FDA0004047883940000056
denotes the vector inner product, λ is the randomly sampled harmonic factor in the Beta (2, 2) distribution.
5. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that enables the processor to perform the method of classifying a small sample image according to any one of claims 1-4, and the processor is configured to execute the program stored in the memory.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for classifying a small sample image according to any one of claims 1 to 4.
CN202310032701.2A 2023-01-10 2023-01-10 Small sample image classification method based on similarity feature fusion Pending CN115965818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310032701.2A CN115965818A (en) 2023-01-10 2023-01-10 Small sample image classification method based on similarity feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310032701.2A CN115965818A (en) 2023-01-10 2023-01-10 Small sample image classification method based on similarity feature fusion

Publications (1)

Publication Number Publication Date
CN115965818A true CN115965818A (en) 2023-04-14

Family

ID=87363362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310032701.2A Pending CN115965818A (en) 2023-01-10 2023-01-10 Small sample image classification method based on similarity feature fusion

Country Status (1)

Country Link
CN (1) CN115965818A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452895A (en) * 2023-06-13 2023-07-18 中国科学技术大学 Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
CN116503674A (en) * 2023-06-27 2023-07-28 中国科学技术大学 Small sample image classification method, device and medium based on semantic guidance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452895A (en) * 2023-06-13 2023-07-18 中国科学技术大学 Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
CN116452895B (en) * 2023-06-13 2023-10-20 中国科学技术大学 Small sample image classification method, device and medium based on multi-mode symmetrical enhancement
CN116503674A (en) * 2023-06-27 2023-07-28 中国科学技术大学 Small sample image classification method, device and medium based on semantic guidance
CN116503674B (en) * 2023-06-27 2023-10-20 中国科学技术大学 Small sample image classification method, device and medium based on semantic guidance

Similar Documents

Publication Publication Date Title
CN111160037B (en) Fine-grained emotion analysis method supporting cross-language migration
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN112733866B (en) Network construction method for improving text description correctness of controllable image
CN107590177B (en) Chinese text classification method combined with supervised learning
CN115965818A (en) Small sample image classification method based on similarity feature fusion
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
WO2018188653A1 (en) Inspection method and inspection device
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN115130591A (en) Cross supervision-based multi-mode data classification method and device
Feng et al. Incremental few-shot object detection via knowledge transfer
CN111242059A (en) Method for generating unsupervised image description model based on recursive memory network
CN114780723A (en) Portrait generation method, system and medium based on guide network text classification
Juyal et al. Multilabel image classification using the CNN and DC-CNN model on Pascal VOC 2012 dataset
CN116543146B (en) Image dense description method based on window self-attention and multi-scale mechanism
CN110609895B (en) Sample automatic generation method for actively selecting examples to conduct efficient text classification
CN117056506A (en) Public opinion emotion classification method based on long-sequence text data
CN114817537A (en) Classification method based on policy file data
CN114492386A (en) Combined detection method for drug name and adverse drug reaction in web text
CN112364654A (en) Education-field-oriented entity and relation combined extraction method
CN112347258A (en) Short text aspect level emotion classification method
CN116503674B (en) Small sample image classification method, device and medium based on semantic guidance
Wei et al. A hybrid representation of word images for keyword spotting
CN117171343B (en) Text classification method
Malik et al. SIGN LANGUAGE RECOGNITION AND DETECTION: A COMPREHENSIVE SURVEY
CN113297845B (en) Resume block classification method based on multi-level bidirectional circulation neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination