CN116994076A

CN116994076A - Small sample image recognition method based on double-branch mutual learning feature generation

Info

Publication number: CN116994076A
Application number: CN202311264423.XA
Authority: CN
Inventors: 魏志强; 王矶法; 黄磊
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2023-11-03
Anticipated expiration: 2043-09-28
Also published as: CN116994076B

Abstract

The invention discloses a small sample image recognition method based on double-branch mutual learning feature generation, which relates to the technical field of small sample image recognition, and comprises the following steps: acquiring a small sample image set to be identified to form a query set to be identified; sending each image in the query set to be identified into a first feature generation module of a pre-constructed global branch to generate a first semantic feature of each image; sending each image in the query set to be identified into a second characteristic generating module of a pre-constructed local branch to generate a second semantic characteristic of each image; adding the first semantic features and the second semantic features, and determining third semantic features of each image in the query set to be identified; and respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and the prototypes of a plurality of categories in the support set to determine the image category of each image in the query set to be identified. The semantic relation between the local features and the global features of the sample is mined, and the technical effect of accurately identifying the small sample image is achieved.

Description

Small sample image recognition method based on double-branch mutual learning feature generation

Technical Field

The invention relates to the technical field of small sample image recognition, in particular to a small sample image recognition method based on double-branch mutual learning feature generation.

Background

In recent years, with the help of large-scale data sets and huge computing resources, artificial intelligence algorithms represented by deep learning have achieved great achievements in the fields related to image recognition such as face recognition, automatic driving, robots and the like, however, deep learning needs to rely on a large amount of tag data, and in practical application, data acquisition is often difficult, and among these, there are problems of personal privacy, such as face data, and problems of few problematic objects, such as the problem of recognizing rare protected animals, besides, the problem that the data labeling work often needs to consume a large amount of manpower and material resources, so that the development of deep learning technology in some image recognition fields is hindered. In contrast, a human being can identify a new object through a very small number of samples, under the inspire of the quick learning ability of the human being, researchers hope that a machine learning model can quickly learn a new category by only needing a small number of samples after learning a large amount of data of a certain category, and thus the problem of image identification of the small sample has gradually become a current research hotspot.

A core problem with the small sample image recognition task is that the sample size is too small, resulting in too low sample diversity. In the case of limited data volume, sample diversity can be improved by data enhancement. The data enhancement refers to data expansion or feature enhancement of the original small sample data set by aid of auxiliary data or auxiliary information under the condition of limited data volume. The data expansion refers to adding new unlabeled data or synthesized labeled data to the original data set, and the feature enhancement refers to adding features which are convenient to classify to the feature space of the original sample, so that the feature diversity of the sample is improved.

The existing methods based on feature enhancement mainly depend on the global semantic features of samples when generating new features, and the methods generate the new global semantic features by analyzing the similarity or the difference of the global semantic features among different samples. This approach, while capable of increasing the size and diversity of the data set, has the problem of ignoring the local characteristic information of the samples, and in the case of small samples, each sample may contain some unique or important local characteristic information that is very useful for distinguishing between different categories or tasks due to the limited amount of data. If only global semantic features are used to generate new features, such local feature information may be lost or confused, resulting in low or inaccurate quality of the generated new features.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a small sample image recognition method based on the generation of the double-branch mutual learning characteristics.

According to one aspect of the present invention, there is provided a small sample image recognition method based on a dual-branch mutual learning feature generation, including:

acquiring a small sample image set to be identified to form a query set to be identified;

sending each image in the query set to be identified into a first feature generation module of a pre-constructed global branch to generate a first semantic feature of each image;

sending each image in the query set to be identified into a second characteristic generating module of a pre-constructed local branch to generate a second semantic characteristic of each image;

adding the first semantic features and the second semantic features of each image in the query set to be identified, and determining the third semantic features of each image in the query set to be identified;

and respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and the prototypes of a plurality of categories in the support set, and determining the image category of each image in the query set to be identified.

Optionally, the process of constructing a class prototype for each class of the support set is as follows:

sequentially inputting all images in the support set into a first feature generation module of the global branch, and outputting fourth semantic features of each image in the support set;

Sequentially inputting all images in the support set into a second feature generation module of the local branch, and outputting fifth semantic features of each image in the support set;

and adding and averaging the fourth semantic features and the fifth semantic features of all the images of each category in the support set, and determining a category prototype of each category in the support set.

Optionally, the training process of the first feature generation module of the global branch and the second feature generation module of the local branch is as follows:

according to the small sample image training set, a small sample recognition task is constructed, wherein the small sample recognition task comprises N classes, and each class comprises M sample images;

extracting the characteristics of each sample image in the small sample recognition task by utilizing a characteristic extraction network, and determining the global characteristics and the local characteristics of each sample image;

the first feature generation module trains global branches through global features of each sample image in the small sample identification task;

the second feature generation module is used for training local branches through the global features and the local features of each sample image in the small sample identification task;

training information of the global branch and training information of the local branch are mutually learned, and a first feature generation module and a second feature generation module are trained;

And optimizing the first characteristic generating module and the second characteristic generating module according to a preset training total loss function.

Optionally, the first feature generation module for training the global branch through the global feature of each sample image in the small sample recognition task includes:

masking global features of M sample images of each category respectively to obtain global features of one sample image;

replacing global features of the sample image of each class mask with a learnable vector;

a first feature generation module for training the global branch according to the learnable vector replaced by each category and the global feature reserved by the mask;

and optimizing the first feature generation module according to a preset global branch loss function, wherein the global branch loss function comprises a global prediction loss function and a global classification loss function.

Optionally, the second feature generation module for training the local branch through the global feature and the local feature of each sample image in the small sample recognition task includes:

selecting local features of one sample image in each category;

training a second feature generation module according to the local features selected by each category and M preset learnable vectors;

And optimizing the second feature generation module according to a preset local branch loss function, wherein the local branch loss function comprises a local prediction loss function and a local classification loss function.

Optionally, the method further comprises: and calculating the KL divergence as a mutual learning loss function of mutual learning of training information of the global branch and the local branch.

Optionally, the training total loss function is a sum of the global branch loss function, the local branch loss function, and the mutually learned loss function.

Optionally, calculating the similarity between the third semantic feature of each image in the query set to be identified and the class prototype of the plurality of classes in the support set, and determining the image class of each image in the query set to be identified includes:

respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and category prototypes of a plurality of categories in the support set, and determining the probability value of each image in the query set to be identified belonging to each category in the support set;

and taking the category with the maximum probability value corresponding to each image in the query set to be identified as the image category of the image.

According to another aspect of the present invention, there is provided a small sample image recognition apparatus based on a dual-branch mutual learning feature generation, comprising:

The acquisition module is used for acquiring a small sample image set to be identified to form a query set to be identified;

the first generation module is used for sending each image in the query set to be identified into the first feature generation module of the pre-constructed global branch to generate the first semantic feature of each image;

the second generation module is used for sending each image in the query set to be identified into the second characteristic generation module of the partial branch constructed in advance to generate a second semantic characteristic of each image;

the first determining module is used for adding the first semantic feature and the second semantic feature of each image in the query set to be identified and determining the third semantic feature of each image in the query set to be identified;

and the second determining module is used for respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and the prototypes of the multiple categories in the support set and determining the image category of each image in the query set to be identified.

According to a further aspect of the present invention there is provided a computer readable storage medium storing a computer program for performing the method according to any one of the above aspects of the present invention.

According to still another aspect of the present invention, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method according to any of the above aspects of the present invention.

Therefore, the invention provides a small sample image recognition method based on double-branch mutual learning feature generation, which constructs a feature generation module based on local feature information. The method comprises the steps of carrying out feature generation on the basis of local feature information and carrying out feature generation on the basis of global semantic features, learning each other between the two branches, capturing complementary information, promoting mutual implicit knowledge transfer, and enabling a model to generate features with more discriminant.

Drawings

Exemplary embodiments of the present invention may be more completely understood in consideration of the following drawings:

FIG. 1 is a flow chart of a small sample image recognition method based on dual-branch mutual learning feature generation provided by an exemplary embodiment of the present invention;

FIG. 2 is a schematic diagram of a small sample image recognition device based on dual-branch mutual learning feature generation according to an exemplary embodiment of the present invention;

fig. 3 is a structure of an electronic device provided in an exemplary embodiment of the present invention.

Detailed Description

Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present invention are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present invention, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in an embodiment of the invention may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in the present invention is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present invention, the character "/" generally indicates that the front and rear related objects are an or relationship.

It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, the techniques, methods, and apparatus should be considered part of the specification.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations with electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Exemplary method

Fig. 1 is a flowchart of a small sample image recognition method based on a dual-branch mutual learning feature generation according to an exemplary embodiment of the present invention. The present embodiment is applicable to an electronic device, and as shown in fig. 1, a small sample image recognition method 100 generated based on a dual-branch mutual learning feature includes the following steps:

step 101, acquiring a small sample image set to be identified to form a query set to be identified.

Step 102, sending each image in the query set to be identified into a first feature generation module of a pre-constructed global branch to generate a first semantic feature of each image.

And step 103, sending each image in the query set to be identified into a second characteristic generating module of the pre-constructed local branch to generate a second semantic characteristic of each image.

Step 104, adding the first semantic feature and the second semantic feature of each image in the query set to be identified, and determining the third semantic feature of each image in the query set to be identified.

Step 105, calculating the similarity between the third semantic feature of each image in the query set to be identified and the prototypes of multiple categories in the support set, and determining the image category of each image in the query set to be identified.

Specifically, N classes are selected as the support set, and S image images are selected from each class to form the support setThe query set image is identified based on the support set image.

Class prototypes in the support set are computed. Firstly, N multiplied by S images of a support set are respectively sent to a global branch and a local branch, and are obtained through a first characteristic generating module of the global branchFourth semantic feature, likewise, is obtained by the second feature generation module of the local branch +.>Fifth semantic feature, then two branches are generated +.>Adding the semantic features to obtain total semantic features, and finally adding ++in each class>The semantic features are added and averaged as class prototypes for each class +.>Wherein->Is a global feature of the image.

Images in the query set are identified. For a set of queries to be identified Each image of (3)xRespectively sending the images into a global branch and a local branch, obtaining M first semantic features through a first feature generation module of the global branch, obtaining M second semantic features through a second feature generation module of the local branch, adding and averaging the M semantic features generated by the two branches to obtain the imagesxThird semantic feature->. And then calculating the probability value of each category of the query set sample to be identified according to the similarity between each query set sample to be identified and each category prototype, and taking the category with the maximum probability value as the prediction label of the query set sample to be identified. The probability calculation formula is:

wherein, the liquid crystal display device comprises a liquid crystal display device,C _i sample in a query set to be identifiedNA category label in the individual category(s),to sample according to the query set to be identifiedNCategory labels for individual category prototype distance predictions.

masking global features of M sample images of each category respectively to determine global features of one sample image;

selecting local features of one sample image in each category;

Specifically, the training steps of the first feature generation module of the global branch and the second feature generation module of the local branch are as follows:

step 1: a small sample recognition task is constructed. Constructing a series of small sample recognition tasks for training, and specifically, randomly selecting from a training setNClasses, then randomly sampling from each class MSamples of, forming a taskTCo-mingling withN×MAnd (5) image-forming.

Step 2: and (5) extracting characteristics. For tasksTEach image of (3)xThe input to the feature extraction network extracts features using the visual self-attention model ViT (Vision Transformer), including four basic encoder blocks Transformer Blocks, the local branches and the global branches sharing the parameters of the first three Transformer Blocks. For local branches, global features of images are extractedAnd local features->. For global branches, only global features of the image are extracted. Wherein the method comprises the steps ofH、WAndCthe height, width and number of channels of the feature map, respectively.

Step 3: global branching. Obtaining global features of all images in the task T based on the step 2, and for each classMThe global features of the images are masked, the global features of only one image are retained, and then the masked global features are replaced with a learnable vector.

Is of the categoryiIn (a) and (b)MThe global feature vector of the sheet of image,maskthe mask operation is represented as a result of a masking operation,a learnable vector representing feature substitutions to the mask.

The masked feature vector is then fed into a first feature generation module of the global branch. The first feature generation module consists of a series Transformer Blocks. The feature generation module predicts the mask features by using the reserved global feature information, and learns the intra-class changes of the samples in a one-to-many manner, so that the model can generate diversified features.

Generating a module for the first feature,>features generated for the first feature generation module.

The difference between the predicted feature and the feature before masking is measured using the Mean Square Error (MSE) as a global prediction loss function.

In order to make the features extracted by the global branches have discriminant, the global branches are trained under the supervision of the real labels of the input images, the relation among the classes is learned in the whole class space, and the global classification loss is as follows:

wherein the method comprises the steps ofy _i Is thatx _i Is a category label of (c) for a person,hrepresenting the classifier, which is a fully connected layer.

Finally, the loss of global branches is:

wherein the method comprises the steps ofIs a super parameter.

Step 4: and (5) local branching. Determining a task based on step twoTGlobal features and local features of all images in (1), for each classMLocal features of one image, global features of only one image are selected, andMthe learnable vectors are fed to a feature generation module of the local branch,Mthe learnable vectors are used to generate predicted global features and extracted using a feature extraction moduleMThe global features are supervised.

Representation ofMA learnable vector for generating predicted global features,representation ofW×HAnd local features.

Then willMSum of learnable vectors W×HThe local features are sent to a second feature generation module of the local branch, and the global semantic features are generated by mining semantic relations between the local features and the global features and using the local feature information.

Generating a module for the second feature,>generated for the second feature generation moduleMGlobal features.

As with the global branch, the difference between the predicted and original features is measured using the Mean Square Error (MSE) as a local prediction loss function.

g _i,j The global features generated are represented by a representation,f _i,j represent the firstiClass 1jGlobal features.

Wherein the method comprises the steps ofy _i Is thatx _i Class labels, h, in the selected N classes represent the classifier, which is a fully connected layer,kthe number of images in task T is indicated.

Finally, the total loss of local branches is:

wherein the method comprises the steps ofIs a super parameter.

Step 5: learning each other. In order to make information interaction between two branches, the two branches learn complementary information from other branches, and KL divergence is calculated as a mutual learning loss of the two branches:

wherein, the liquid crystal display device comprises a liquid crystal display device,image features extracted for global branches, F _l Image features extracted for local branches.

Step 6: total loss. Combining the global branch loss, the branch loss, and the two branch mutual learning losses to form a total loss:

Wherein, the liquid crystal display device comprises a liquid crystal display device, 、/>and->Is a super parameter.

Therefore, the existing feature enhancement-based method mainly generates new global semantic features by analyzing the similarity or the difference of the global semantic features among different samples. This approach, while capable of increasing the size and diversity of the data set, has the problem of ignoring the local characteristic information of the samples, and in the case of small samples, each sample may contain some unique or important local characteristic information that is very useful for distinguishing between different categories or tasks due to the limited amount of data. If only global semantic features are used to generate new features, such local feature information may be lost or confused, resulting in low or inaccurate quality of the generated new features. In contrast, the invention constructs a feature generation module based on local feature information, and the module generates diversified global semantic features by utilizing the local feature information of the sample by mining the semantic relation between the local features and the global features of the sample so as to enhance the features. In addition, for the limitation that only global features or local features are used for feature enhancement, the invention provides a double-branch mutual learning feature generation method, which comprises branches for feature generation based on local feature information and branches for feature generation based on global semantic features, mutual learning is carried out between the two branches, complementary information is captured, mutual implicit knowledge transfer is promoted, and a model is enabled to generate more discriminative features.

Exemplary apparatus

Fig. 2 is a schematic structural diagram of a small sample image recognition apparatus based on a dual-branch mutual learning feature generation according to an exemplary embodiment of the present invention. As shown in fig. 2, the apparatus 200 includes:

an obtaining module 210, configured to obtain a small sample image set to be identified, and form a query set to be identified;

a first generating module 220, configured to send each image in the query set to be identified to a first feature generating module of a pre-constructed global branch, to generate a first semantic feature of each image;

a second generating module 230, configured to send each image in the query set to be identified to a second feature generating module of the local branch constructed in advance, to generate a second semantic feature of each image;

a first determining module 240, configured to add the first semantic feature and the second semantic feature of each image in the query set to be identified, and determine a third semantic feature of each image in the query set to be identified;

the second determining module 250 is configured to calculate the similarity between the third semantic feature of each image in the query set to be identified and the prototypes of multiple categories in the support set, and determine the image category of each image in the query set to be identified.

Optionally, the process of constructing the category prototype for each category of the support set in the second determination module 250 is as follows:

The first output sub-module is used for sequentially inputting all images in the support set into the first feature generation module of the global branch and outputting fourth semantic features of each image in the support set;

the second output sub-module is used for sequentially inputting all the images in the support set into the second characteristic generating module of the local branch and outputting fifth semantic characteristics of each image in the support set;

and the determining submodule is used for adding and averaging the fourth semantic features and the fifth semantic features of all images of each category in the support set and determining a category prototype of each category in the support set.

Optionally, the training process of the first feature generation module of the global branch and the second feature generation module of the local branch in the first generation module 220 and the second generation module 230 is as follows:

the construction submodule is used for constructing a small sample recognition task according to the small sample image training set, wherein the small sample recognition task comprises N classes, and each class comprises M sample images;

the extraction sub-module is used for carrying out feature extraction on each sample image in the small sample recognition task by utilizing a feature extraction network, and determining global features and local features of each sample image;

The first training sub-module is used for training the first feature generation module of the global branch through the global features of each sample image in the small sample identification task;

the second training sub-module is used for training the second feature generation module of the local branch through the global feature and the local feature of each sample image in the small sample recognition task;

the third training sub-module is used for mutually learning the training information of the global branch and the local branch, and training the first characteristic generating module and the second characteristic generating module;

and the optimizing sub-module is used for optimizing the first characteristic generating module and the second characteristic generating module according to a preset training total loss function.

Optionally, the first training sub-module includes:

the masking unit is used for masking global features of M sample images of each category respectively and determining global features of one sample image;

a replacement unit for replacing global features of the sample image of each class mask with a learnable vector;

the first training unit is used for training the first feature generation module of the global branch according to the leachable vector replaced by each category and the global feature reserved by the mask;

the first optimizing unit is used for optimizing the first feature generating module according to a preset global branch loss function, wherein the global branch loss function comprises a global prediction loss function and a global classification loss function.

Optionally, the second training sub-module includes:

the selection unit is used for selecting the local characteristics of one sample image in each category;

the second training unit is used for training a second feature generating module according to the local features selected by each category and M preset learnable vectors;

and the second optimizing unit is used for optimizing the second feature generating module according to a preset local branch loss function, wherein the local branch loss function comprises a local prediction loss function and a local classification loss function.

Optionally, the apparatus 200 further comprises: as a module, the learning loss function is used for calculating KL divergence as a mutual learning loss function of training information of global branches and local branches.

Optionally, the second determining module 250 includes:

the computing sub-module is used for respectively computing the similarity between the third semantic feature of each image in the query set to be identified and the category prototypes of a plurality of categories in the support set, and determining the probability value that each image in the query set to be identified belongs to each category in the support set;

And the sub-module is used for taking the category with the maximum probability value corresponding to each image in the query set to be identified as the image category of the image.

Exemplary electronic device

Fig. 3 is a structure of an electronic device provided in an exemplary embodiment of the present invention. As shown in fig. 3, the electronic device 30 includes one or more processors 31 and memory 32.

The processor 31 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

Memory 32 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 31 to implement the methods of the software programs of the various embodiments of the present invention described above and/or other desired functions. In one example, the electronic device may further include: an input device 33 and an output device 34, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device 33 may also include, for example, a keyboard, a mouse, and the like.

The output device 34 can output various information to the outside. The output device 34 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device that are relevant to the present invention are shown in fig. 3 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the invention may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the invention described in the "exemplary methods" section of this specification.

The computer program product may write program code for performing operations of embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the invention may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the invention described in the "exemplary method" section of the description above.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present invention have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present invention are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present invention. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the invention is not necessarily limited to practice with the above described specific details.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, systems, apparatuses, systems according to the present invention are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, systems, apparatuses, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The method and system of the present invention may be implemented in a number of ways. For example, the methods and systems of the present invention may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present invention are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.

It is also noted that in the systems, devices and methods of the present invention, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present invention. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the invention to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. The small sample image recognition method based on the double-branch mutual learning feature generation is characterized by comprising the following steps of:

adding the first semantic feature and the second semantic feature of each image in the query set to be identified, and determining a third semantic feature of each image in the query set to be identified;

and respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and a plurality of category prototypes in the support set, and determining the image category of each image in the query set to be identified.

2. The method of claim 1, wherein the process of constructing a category prototype for each category of the support set is as follows:

sequentially inputting all images in the support set into the first feature generation module of the global branch, and outputting fourth semantic features of each image in the support set;

sequentially inputting all images in the support set into the second feature generation module of the local branch, and outputting fifth semantic features of each image in the support set;

and averaging the fourth semantic features and the fifth semantic features of all images of each category in the support set, and determining the category prototype of each category in the support set.

3. The method according to claim 1 or 2, wherein the training process of the first feature generation module of the global branch and the second feature generation module of the local branch is as follows:

constructing a small sample recognition task according to a small sample image training set, wherein the small sample recognition task comprises N classes, and each class comprises M sample images;

Training the first feature generation module of the global branch through the global features of each sample image in the small sample recognition task;

training the second feature generation module of the local branch through the global feature and the local feature of each sample image in the small sample recognition task;

training the first feature generation module and the second feature generation module by mutually learning training information of the global branch and the local branch;

4. A method according to claim 3, wherein training the first feature generation module of the global branch with the global features of each sample image in the small sample recognition task comprises:

masking the global features of M sample images of each category respectively to obtain global features of one sample image;

replacing the global features of the sample image of each class mask with a learnable vector;

a first feature generation module for training the global branches according to the learnable vector replaced by each category and the global features reserved by the mask;

5. The method of claim 4, wherein training the second feature generation module of the local branch with the global feature and the local feature of each sample image in the small sample recognition task comprises:

selecting local features of one sample image in each category;

training the second feature generation module according to the local features selected by each category and M preset learnable vectors;

6. The method as recited in claim 5, further comprising: and calculating KL divergence as a mutual learning loss function of mutual learning of training information of the global branch and the local branch.

7. The method of claim 6, wherein the training total loss function is a sum of the global branch loss function, the local branch loss function, and the mutually learned loss function.

8. The method of claim 1, wherein separately computing similarity of the third semantic feature for each image in the query set to a class prototype supporting multiple classes in the set, determining an image class for each image in the query set to be identified, comprises:

respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and category prototypes of a plurality of categories in the support set, and determining the probability value of each category in the support set to which each image in the query set to be identified belongs;

9. A small sample image recognition device based on dual-branch mutual learning feature generation, comprising:

the first generation module is used for sending each image in the query set to be identified into a first feature generation module of a pre-constructed global branch to generate a first semantic feature of each image;

the second generation module is used for sending each image in the query set to be identified into a second characteristic generation module of a pre-constructed local branch to generate a second semantic characteristic of each image;

and the second determining module is used for respectively calculating the similarity between the third semantic feature of each image in the query set to be identified and a plurality of category prototypes in the support set and determining the image category of each image in the query set to be identified.

10. A computer readable storage medium, characterized in that the storage medium stores a computer program for executing the method of any of the preceding claims 1-8.