CN113033612A

CN113033612A - Image classification method and device

Info

Publication number: CN113033612A
Application number: CN202110209031.8A
Authority: CN
Inventors: 黄高; 王朝飞; 宋士吉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2021-06-25

Abstract

The embodiment of the application provides an image classification method and device, and the method comprises the following steps: acquiring an object image to be classified, a foreground object extractor and a final classifier; inputting the object image into a foreground object extractor to obtain a foreground object image; inputting the foreground target image into a final classifier to obtain a classification result; the final classifier consists of an image feature extractor and a small sample event classifier; the image feature extractor is obtained by training foreground target images of known types of things of a large sample as training data; the small sample object classifier is obtained by training foreground target images of the expanded new type objects as training data; the expansion method comprises the step of carrying out posture conversion on a foreground target image of a new type of object by adopting a posture conversion generator to obtain an expanded foreground target image. The scheme of the embodiment effectively solves the problem of classifying fine-grained images under the condition of small samples, and effectively improves the classification accuracy.

Description

Image classification method and device

Technical Field

The present disclosure relates to image recognition technologies, and more particularly, to an image classification method and apparatus.

Background

The classification of fine-grained images has a wide research demand and application scene both in the industrial and academic circles, and the purpose of the classification is to identify different subclasses from one large class, such as different birds and different models of automobiles, and is a hot research problem in the field of computer vision in recent years. The rapid development of the deep learning technology greatly improves the accuracy of fine-grained image classification, and the technology generally needs to rely on a large number of marked samples for model training. However, in many practical application scenarios, such as: mechanical failure detection, medical image recognition, deep-sea biological recognition and the like are frequently confronted with the condition of few marked samples, namely the problem of classifying small-sample fine-grained images due to the reasons of few category data, high marking cost and the like. This problem is more challenging than the fine-grained image classification problem or the common small-sample image classification problem, because it inherits the difficulties of each of these two types of problems: the method has the advantages that the intra-class variance of the fine-grained image classification is large, and the inter-class variance is small; secondly, the difficulty that the samples classified by small sample images are too few to train a deep learning model is difficult.

Disclosure of Invention

The embodiment of the application provides an image classification method and device, which can effectively solve the classification problem of fine-grained images under the condition of small samples and effectively improve the classification accuracy.

The embodiment of the application provides an image classification method, which can comprise the following steps:

acquiring an object image to be classified, a pre-trained foreground object extractor and a final classifier;

inputting the object image to be classified into the foreground object extractor, and acquiring a foreground object image of the object image to be classified;

inputting a foreground object image of the object image to be classified into the final classifier, and outputting a classification result about the object image to be classified by the final classifier;

the final classifier consists of a pre-trained image feature extractor and a small sample event classifier;

the image feature extractor is obtained by training foreground target images of known types of things of a large sample as training data;

the small sample object classifier is obtained by training foreground target images of the expanded new type objects as training data; the expansion method comprises the step of carrying out posture conversion on a foreground target image of a new type of object by adopting a posture conversion generator to obtain an expanded foreground target image.

In an exemplary embodiment of the present application, the foreground object extractor may include: a salient object detection model and an image manipulation module comprising a plurality of image manipulations;

wherein the image operation may sequentially comprise one or more of: pixel level logic operations, multiplication operations, masking operations, and amplification operations;

the salient object detection model is used for acquiring a salient image of an input image of the salient object detection model, and the salient image is a feature image containing the posture of a foreground object; the image operation module is used for carrying out image processing on the saliency map.

In an exemplary embodiment of the present application, the obtaining the foreground object extractor may include: acquiring the saliency target detection model;

the obtaining the salient object detection model may include: and training the constructed convolutional neural network by adopting a first training set consisting of the image of the object of the known type and a saliency map corresponding to the image to obtain the saliency target detection model.

In an exemplary embodiment of the present application, the method of acquiring the image feature extractor may include:

and generating a foreground target image of the first image based on the first image of the object of the known type of the large sample and the foreground target extractor, and training a convolutional neural network by adopting the foreground target image to obtain the image feature extractor.

In an exemplary embodiment of the present application, the method for acquiring the small sample event classifier may include:

generating a new sample for a new type of thing based on a first image of a known type of thing for a large sample, a second image of a new type of thing for a small sample, a pre-trained foreground object extractor, and a pre-trained pose transition generator;

expanding the new sample into the small sample, constituting an expanded sample set for the new type of thing;

and training by adopting the extended sample set to obtain the small sample object classifier.

In an exemplary embodiment of the present application, the generating a new sample for a new type of thing based on a first image of a known type of thing with a large sample, a second image of a new type of thing with a small sample, a pre-trained foreground object extractor, and a pre-trained pose transition generator may include:

respectively setting type labels for the first image set and the second image set, and acquiring the foreground object extractor and the attitude transformation generator;

respectively inputting a first image set and a second image set provided with type labels into a pre-trained foreground object extractor, and respectively acquiring a first foreground object image set related to the known type of object and a second foreground object image set related to the new type of object;

inputting the first set of foreground target images and the second set of foreground target images into the pose transformation generator to transform the pose of the new type of thing to the pose of the known type of thing; and outputting, by the pose conversion generator, a pose conversion image for the new type of thing;

and taking the acquired multiple posture conversion images as a new sample of the new type of things.

In an exemplary embodiment of the present application, the method for acquiring the gesture conversion generator may include:

extracting a multiparous training set from a second training set consisting of first images of a large sample of a known type of thing;

and training the constructed self-encoder neural network by adopting the multi-birth training set to obtain the attitude transformation generator.

In an exemplary embodiment of the present application, the multiparous training set comprises: a training set of four-fetus;

the extracting a multiparous training set from a second training set composed of first images of a large sample of a known type of thing may include:

obtaining a plurality of pairs of quadruple data { A } from the second training set₁，A₂，B₁，B₂Forming the tetrad training set;

wherein the tetrad data pairs { A }₁，A₂，B₁，B₂Satisfy: a. the₁、A₂From a known type, B₁、B₂From another known type, A₁Significant graph and B of₁Is greater than or equal to a preset similarity threshold, A₂Significant graph and B of₂Is greater than or equal to the similarity threshold; from A₁To A₂Posture change and from B₁To B₂The posture changes of (2) are consistent.

In an exemplary embodiment of the present application, the training the constructed self-coding network with the multi-fetus training set to obtain the pose transformation generator may include:

will be { A₁，A₂，B₁As input, take B as input₂As a target output of the attitude transition generator, will

As an actual output of the gesture transition generator;

according to the target output B₂The actual output

And the target output B₂The class label y optimizes a preset loss function, the training is determined to be finished when the loss value of the loss function meets the preset requirement, and the trained self-coding network is used as the attitude conversion generator.

In an exemplary embodiment of the present application, the loss function may include:

wherein Loss is the Loss value,

is the mean square error between the actual output and the target output,

to be composed of

A cross-entropy loss as an input to the pose transition generator; λ is a hyper-parameter that adjusts the ratio between the mean squared error and the cross entropy loss.

In an exemplary embodiment of the present application, the small sample event classifier may be a cosine distance-based classifier.

An embodiment of the present application further provides an image classification apparatus, which may include a processor and a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, the image classification apparatus implements the image classification method according to any one of the above items.

Compared with the related art, the embodiment of the application can comprise the following steps: acquiring an object image to be classified, a pre-trained foreground object extractor and a final classifier; inputting the object image to be classified into the foreground object extractor, and acquiring a foreground object image of the object image to be classified; inputting a foreground object image of the object image to be classified into the final classifier, and outputting a classification result about the object image to be classified by the final classifier; the final classifier consists of a pre-trained image feature extractor and a small sample event classifier; the image feature extractor is obtained by training foreground target images of known types of things of a large sample as training data; the small sample object classifier is obtained by training foreground target images of the expanded new type objects as training data; the expansion method comprises the step of carrying out posture conversion on a foreground target image of a new type of object by adopting a posture conversion generator to obtain an expanded foreground target image.

Through the scheme of the embodiment, the problem of classifying fine-grained images under the condition of small samples is effectively solved, and the classification accuracy is effectively improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

FIG. 1 is a flowchart of an image classification method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a basic structure of a Baseline + + method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of the overall architecture and workflow of the FOT method according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a process of processing an image by a foreground object extractor according to an embodiment of the present application;

FIG. 5 is a flowchart of an acquisition method of a small sample event classifier according to an embodiment of the present application;

FIG. 6 is a flowchart of an acquisition method of the pose transformation generator according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a network structure of a gesture translation generator according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an example of a new type of image generated according to the FOT method in an embodiment of the present application;

fig. 9 is a block diagram of an image classification apparatus according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

In the exemplary embodiment of the present application, the following analysis exists for two difficulties raised by the background art:

aiming at the difficulty one, the current fine-grained image classification method usually adopts information such as additional labeling frames and local area positions to assist in identifying important areas in an image, or captures similar information by a deep learning method, but the method still needs a large number of labeled samples to be realized, and cannot be well applied to the environment of small-sample image classification.

For the second difficulty, the current small sample image classification method often adopts image clipping, mirror image transformation, proper rotation and other modes to solve the problem of too few samples through sample expansion, which has obvious effect on the general small sample image classification problem, but has very limited effect when a fine-grained image is taken as an object. To explore the source of this problem, we have carefully observed two phenomena: the method comprises the following steps that firstly, different types of images in fine-grained image classification have similar backgrounds, for example, different bird life habits are not greatly different, and the images of the images usually take woods, water, sky and the like as backgrounds; and secondly, key features for distinguishing the categories are all positioned on a foreground target of the image, for example, local features of birds such as beaks, colors, tail feathers and the like are often required to distinguish the birds. The first phenomenon shows that the image background plays more negative roles in the small sample fine-grained image classification problem, and the second phenomenon shows that the image foreground object plays more positive roles in the small sample fine-grained image classification problem. Therefore, the mainstream sample expansion method adopted by the current small sample image classification does not treat the foreground and the background of the image differently, and the sample expansion directly by the image integral transformation mode is difficult to obtain a good effect, because the negative effect of the background is expanded while the sample is expanded.

Based on the above analysis, the present application provides an image classification method, as shown in fig. 1, the method may include steps S101 to S103:

s101, acquiring an object image to be classified, a foreground object extractor and a final classifier which are trained in advance;

s102, inputting the object image to be classified into the foreground object extractor, and acquiring a foreground object image of the object image to be classified;

s103, inputting a foreground object image of the object image to be classified into the final classifier, and outputting a classification result of the object image to be classified by the final classifier;

In an exemplary embodiment of the present application, the large sample event classifier refers to a classifier capable of classifying only a known type of event of a large sample. The small sample event classifier is a classifier capable of classifying a new type of event of a small sample.

In an exemplary embodiment of the present application, the foreground object extractor is configured to extract a foreground object image of an input object image to be classified, the image feature extractor is configured to extract a feature map of the foreground object image, and the small-sample object classifier is configured to perform type classification based on the input feature map.

In the exemplary embodiment of the application, a small sample fine-grained image classification method based on foreground object posture transformation is provided, the scheme is mainly based on a trained final classification model, when a test picture (or object image to be classified) about a new type of object is subjected to foreground object image extraction through a foreground object extractor, the foreground object image can be used as the input of the final classification model, image feature extraction is performed on the foreground object image through an image feature extractor respectively, and the extracted feature image is classified through a small sample object classifier, so that a class label corresponding to the test picture can be obtained. By the method, the problem of classification of fine-grained images under the condition of small samples can be effectively solved, and the classification accuracy is effectively improved.

In an exemplary embodiment of the application, the small sample image classification method (FOT for short) based on foreground pose transformation proposed in the embodiment is a complete new method formed by adding two new modules, namely a foreground object extractor and a pose conversion generator, to a small sample image reference method Baseline + + as a basic framework. The basic structure of the Baseline + + method is shown in fig. 2, and can be divided into two stages: 1. training a training stage (first stage) for training an image feature extractor and a large sample object classifier by using a known type object image with sufficient sample number; 2. and a fine adjustment stage (second stage), namely fixing the image feature extractor unchanged, expanding limited new samples, and training a small sample event classifier by using the expanded samples.

In an exemplary embodiment of the present application, an overall architecture of the FOT method proposed by the present embodiment may be as shown in fig. 3, and a workflow may include:

and (I) extracting a foreground target. The known type and the new type are processed by a foreground object extractor to obtain foreground objects of the amplified version of all the images after the background is removed, and the detailed process can be as shown in fig. 4.

(II) training on known types. The known type has a large number of labeled samples, and an image feature extraction network based on a Convolutional Neural Network (CNN) and a classifier aiming at the known type are obtained by training the image of the known type (namely the foreground target image) processed in the last step.

And (III) training a posture conversion generator. A satisfactory multi-fetus training set (e.g., a four-fetus dataset) is constructed on a known type, and a training pose transformation generator is trained on the multi-fetus training set.

And (IV) producing a new type of sample. For the problem of insufficient new-type samples, each new-type sample can be utilized to find a sample with a similar saliency map to a known class, and the posture transformation of the known sample is transferred to the new-type sample, so that the generation of additional new-type samples is realized.

And (V) training a small sample event classifier. And after the generated new type sample is expanded into the new type, training a new type classifier by utilizing the expanded new type data set.

And sixthly, obtaining a complete classifier (namely a final classifier) through the steps, wherein the complete classifier can be formed by sequentially connecting an image feature extractor and a small sample event classifier, and when a new-class test image (namely an object image to be classified) is used as input, obtaining a corresponding class label through image feature extraction and new-type classification.

In an exemplary embodiment of the present application, as can be seen from the above steps, the obtaining scheme of the final classifier may include: firstly, designing a foreground object extractor, and extracting a foreground object image from object images of known types and new types, wherein the foreground object extractor consists of a significance detection model and several image operations; training by using the acquired foreground target image of the known type to obtain an image feature extractor based on a neural network; constructing a four-fetus data set (or called a four-fetus training set), and training a neural network structure of a self-encoder by adopting the four-fetus data set to obtain a posture conversion generator; performing pose transformation on the new type of image by using a pose transformation generator to generate an additional sample (i.e. a new sample about the new type of thing); and training by utilizing the expanded data set to obtain a classifier of the new type of object images (namely the small sample object classifier), and sequentially connecting the image feature extractor and the small sample object classifier to form the final classifier.

In the exemplary embodiment of the present application, the following describes in detail the acquisition methods of the plurality of components of the final classifier, respectively.

In exemplary embodiments of the present application, the foreground object extractor may be constructed from a pre-trained saliency object detection model (BASNet) f, a pixel-level logic operation σ, a multiplication operation

A mask operation g and a zoom operation

Composition, as shown in fig. 4.

In an exemplary embodiment of the present application, the pre-trained saliency target detection model adopts a BASNet network model, and other saliency target detection models may also be adopted.

In the exemplary embodiment of the present application, in practical applications, a three-channel RGB (red, green, blue) image X is taken as an input, and a final foreground target picture Y can be obtained through 5 processing steps:

firstly, obtaining a saliency map f (X) of an input image X by utilizing a pre-trained BASNet model;

performing pixel-level logic operation on the saliency map to obtain a single-channel 0-1 logic map sigma (f (X)), wherein the execution process of the operation is to give a threshold value gamma, when the mean value of three channels of a certain pixel point is greater than gamma, the pixel point value is made to be 1, and otherwise, the pixel point value is 0;

(III) multiplying the single-channel 0-1 logic diagram with the original image to change the input image into a black background image

(IV) using masking to crop the image to obtain only the region (e.g. rectangular region) containing the foreground object

The size and position of the mask frame can be automatically read from the logic diagram of the previous step;

(V) amplifying the area containing the foreground object to a uniform size by using an amplifying operation as an input of a subsequent deep learning model

In an exemplary embodiment of the present application, combining the above steps, the input-output relationship may be expressed as:

In an exemplary embodiment of the present application, the structure of the image feature extractor is typically a Convolutional Neural Network (CNN), and the training data set uses a foreground object image of a known type.

In an exemplary embodiment of the present application, as shown in fig. 5, the method for acquiring a small sample object classifier may include steps S201 to S203:

s201, generating a new sample of the new type of things based on a first image of the known type of things of a large sample, a second image of the new type of things of a small sample, a pre-trained foreground object extractor and a pre-trained posture conversion generator.

In an exemplary embodiment of the present application, as shown in fig. 6, the obtaining method of the gesture conversion generator may include steps S301 to S302:

s301, extracting a multi-fetus training set from a second training set formed by first images of known type objects of a large sample.

In an exemplary embodiment of the present application, the pose transformation generator is an auto-encoder neural network (auto-encoder network), the architecture of which is shown in fig. 7, and is a deep learning model, and besides the network structure itself, the emphasis is on the training of the model, and the training data set of which uses a multi-fetus training set (e.g., a four-fetus training set) constructed from known types. For this purpose, a multi-fetus training set construction algorithm based on a saliency map is proposed on the basis of known types (base classes), namely, a training set of a deep learning model is formed by a large number of four-fetus data pairs { A }₁，A₂，B₁，B₂Is composed of (A) } in which₁，A₂From a known type, B₁，B₂From another known type and satisfies A₁Significant graph and B of₁Are similar to the significant figures of (A)₂Significant graph and B of₂The saliency maps of are similar. Since the saliency map can represent the pose of a foreground object, then from A₁To A₂The attitude change of (A) is equal to that of (B)₁To B₂The posture changes are consistent; the formalization is described as follows:

find all{A₁，A₂，B₁，B₂}，

wherein,

represent two different known types of the type that are known,

each represents A₁，A₂，B₁，B₂The Distance () uses euclidean Distance, and α, β represent two thresholds for measuring the magnitude of euclidean Distance.

S302, training the constructed self-encoder neural network by adopting the multi-birth training set to obtain the attitude transformation generator.

As an actual output of the gesture transition generator;

according to the target output B₂The actual output

wherein Loss is the Loss value,

is the mean square error between the actual output and the target output,

to be composed of

In an exemplary embodiment of the present application, through data training, the gesture transition generator can learn how from A will be₁To A₂The posture change of (A) is transferred to (B)₁To learn a similar B₂Is/are as follows

In an exemplary embodiment of the present application, applying the gesture conversion generator to perform gesture conversion on a new class sample may follow the following steps: for each training sample Z in the new type₁Finding a similar sample X from the known type to its saliency map₁Binding to sample X₁Other samples X of the type₂Taken together [ X)₁，X₂，Z₁]As input to the attitude transformation generator, thereby obtaining Z₁New sample sequence after morphological transformation

This process is repeated for a sufficient number of new class samples.

S202, expanding the new sample into the small sample to form an expanded sample set related to the new type of things.

S203, training by adopting the extended sample set to obtain the small sample object classifier.

In an exemplary embodiment of the present application, the small sample event classifier may be a cosine distance-based classifier, and other types of classifiers may also be employed.

In the exemplary embodiments of the present application, detailed implementation of the embodiments of the present application is further described below with reference to an implementation case. It should be apparent that the embodiment described in this section is only one embodiment of the present invention, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments of the present application belong to the protection scope of the embodiments of the present application.

In the exemplary embodiment of the application, taking the fine-grained image classification benchmark datasets Cub birds, Stanford logs and Stanford Cars as examples, two small-sample fine-grained classification strategies are mainly adopted, namely 5-way 1-shot, that is, for 5 new classes (i.e., new types), only 1 image is in each type in the training set; and the other is 5-way 5-shot, namely, aiming at 5 new classes, each class of the training set only has 5 images, and each class of 5-way 1-shot or 5-way 5-shot selects 15 images as test images during testing.

In the exemplary embodiment of the present application, to illustrate the effectiveness of the method of this embodiment, comparison experiments are performed on three reference data sets with two reference methods, namely, Baseline and Baseline + +, four mainstream small sample image classification methods, namely, MatchingNet, ProtoNet, MAML and RelationNet, and four most advanced fine-grained small sample image classification methods, namely, PCM, PABN, covannet and DN4, and the results are shown in table 1. In table 1, the results of the experiments on three fine-grained image reference datasets for different methods are compared, the dark gray areas represent the best results per column, and the light gray areas represent the next best results per column.

TABLE 1

In the exemplary embodiment of the present application, as can be seen from table 1, on the basis of the Baseline + + method, the method (FOT) proposed in the embodiment of the present application is improved by 7.20% and 5.99% on the 5-way 1-shot and 5-way 5-shot on average; the method greatly surpasses the current mainstream small sample image classification method; compared with the current most advanced fine-grained small sample image classification method, the highest results are obtained on both the Cub birds and the Stanford Dogs data sets, and the classification performance and generalization performance are higher on the Stanford Cars next to the DN4 method.

In an exemplary embodiment of the present application, as a sample expansion method, the FOT method provided in the embodiment of the present application may also be used as an auxiliary method to be combined with a current small sample image classification method, so as to improve the performance of the current small sample image classification method. Table 2 shows the experimental results of combining the FOT method with the current mainstream four small sample image classification methods.

TABLE 2

In an exemplary embodiment of the present application, as a data expansion method, the FOT method proposed by the embodiment of the present application can be conveniently applied to any existing small sample image classification method as an auxiliary module. As can be seen from table 2, the FOT module can effectively improve the performance of the mainstream small sample image classification method. Virtually all of the methods listed in table 1 can be further enhanced by FOT modules.

In the exemplary embodiment of the present application, to further visualize and describe the implementation effect of the method of the embodiment of the present application, as shown in fig. 8, a partial example of a new type of image generated by the FOT method proposed by the embodiment of the present application is shown, and each column represents different pairs of data of four-cell-based images

Z₁An image representing a new type, X₁Is an image of known type, X, similar to the saliency map₂Is and X₁Another image of the same known type,

is the actual image generated. As can be seen,

is provided with Z in category₁Has X in posture (e.g. bird color, key part characteristics, etc.)₂The characteristics of (1).

An embodiment of the present application further provides an image classification apparatus 1, as shown in fig. 9, which may include a processor 11 and a computer-readable storage medium 12, where the computer-readable storage medium 12 stores instructions, and when the instructions are executed by the processor 11, the image classification apparatus implements any one of the image classification methods described above.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A method of image classification, the method comprising:

2. The image classification method according to claim 1, wherein the foreground object extractor comprises: a salient object detection model and an image manipulation module comprising a plurality of image manipulations;

wherein the image operation comprises one or more of the following in sequence: pixel level logic operations, multiplication operations, masking operations, and amplification operations;

3. The image classification method according to claim 2, wherein obtaining the foreground object extractor comprises: acquiring the saliency target detection model;

the acquiring the salient object detection model comprises: and training the constructed convolutional neural network by adopting a first training set consisting of the image of the object of the known type and a saliency map corresponding to the image to obtain the saliency target detection model.

4. The image classification method according to claim 1, wherein the image feature extractor acquisition method includes:

5. The image classification method according to claim 1, wherein the method for acquiring the small sample object classifier comprises:

6. The image classification method according to claim 5, wherein the generating of the new sample for the new type of thing based on the first image of the known type of thing in the large sample, the second image of the new type of thing in the small sample, the pre-trained foreground object extractor and the pre-trained pose transformation generator comprises:

7. The image classification method according to claim 5 or 6, characterized in that the acquisition method of the pose conversion generator comprises:

8. The image classification method according to claim 7, wherein the multiparous training set comprises: a training set of four-fetus;

said extracting a multiparous training set from a second training set consisting of first images of a large sample of a known type of thing comprises:

wherein the tetrad data pairs { A }₁，A₂，B₁，B₂Satisfy: a. the₁、A₂From a known type, B₁、B₂From another known type, A₁Significant graph and B of₁Is greater than or equal to a preset similarity threshold, A₂Significant graph and B of₂Similarity of saliency maps ofGreater than or equal to the similarity threshold; from A₁To A₂Posture change and from B₁To B₂The posture changes of (2) are consistent.

9. The image classification method according to claim 8, wherein the training of the constructed self-coding network with the multi-fetus training set to obtain the pose transformation generator comprises:

As an actual output of the gesture transition generator;

according to the target output B₂The actual output

10. The image classification method according to claim 9, characterized in that the loss function comprises:

wherein Loss is the Loss value,

is the mean square error between the actual output and the target output,

to be composed of

11. The image classification method according to any of claims 1 to 4, characterized in that the small sample object classifier is a cosine distance-based classifier.

12. An image classification apparatus comprising a processor and a computer-readable storage medium having instructions stored therein, wherein the instructions, when executed by the processor, implement the image classification method according to any one of claims 1 to 9.