CN111340097B

CN111340097B - Image fine granularity classification method, device, storage medium and equipment

Info

Publication number: CN111340097B
Application number: CN202010111834.5A
Authority: CN
Inventors: 戴秋菊
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2024-03-12
Anticipated expiration: 2040-02-24
Also published as: CN111340097A

Abstract

The embodiment of the application discloses an image fine granularity classification method, an image fine granularity classification device, a storage medium and an image fine granularity classification device, wherein the method comprises the following steps: acquiring at least two sample images; extracting the characteristics of the at least two sample images to obtain the characteristic information corresponding to each of the at least two sample images; determining first label values corresponding to the at least two sample images; according to the obtained characteristic information and the first label value, a preset measurement model is constructed; wherein the preset metric model represents a metric model that optimizes a distance between at least two feature information; combining the preset measurement model with a pre-trained preset classification model to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed.

Description

Image fine granularity classification method, device, storage medium and equipment

Technical Field

The present disclosure relates to the field of image classification technologies, and in particular, to a method, an apparatus, a storage medium, and a device for classifying fine granularity of an image.

Background

The difference between fine-grained image classification and general image classification tasks is that the granularity of the category to which the image belongs is finer, and the difference between different fine-grained object categories is only reflected in subtle points. Even objects of the same class may have widely varying intra-class visual differences due to environmental, positional, background, and appearance conditions. For example, dogs of different species belong to the general class of dogs, so there is little variability between the classes; however, there is still a large difference between categories due to the diversity of conditions such as background and appearance.

The existing scheme is improved on the basis of a general classification algorithm, but the general classification algorithm has low degree of classification among classes of fine granularity classification, so that the center distance among the characteristics is relatively close, the characteristics in the classes are not gathered together enough, the characteristic distribution among a plurality of classes is overlapped, and the problem of misclassification among the classes is easily caused; in addition, the operation in the current scheme is complex, and more time delay is brought.

Disclosure of Invention

The embodiment of the application provides an image fine granularity classification method, an image fine granularity classification device, a storage medium and image fine granularity classification equipment, which can improve the inter-class distinguishing property and the intra-class cohesion of a fine granularity classification algorithm and can reduce the operation workload.

In order to achieve the above purpose, the technical scheme of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a method for classifying fine granularity of an image, including:

acquiring at least two sample images;

extracting the characteristics of the at least two sample images to obtain the characteristic information corresponding to each of the at least two sample images;

determining first label values corresponding to the at least two sample images;

according to the obtained characteristic information and the first label value, a preset measurement model is constructed; wherein the preset metric model represents a metric model that optimizes a distance between at least two feature information;

Combining the preset measurement model with a pre-trained preset classification model to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed.

In a second aspect, an embodiment of the present application provides an image fine granularity classification apparatus, including an acquisition unit, an extraction unit, a determination unit, a construction unit, and a merging unit; wherein,

an acquisition unit configured to acquire at least two sample images;

the extraction unit is configured to perform feature extraction on the at least two sample images to obtain feature information corresponding to each of the at least two sample images;

a determining unit configured to determine first tag values corresponding to the at least two sample images;

the construction unit is configured to construct a preset measurement model according to the obtained characteristic information and the first label value; wherein the preset metric model represents a metric model that optimizes a distance between at least two feature information;

the merging unit is configured to merge the preset measurement model and a pre-trained preset classification model to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed.

In a third aspect, embodiments of the present application provide an image fine granularity classification apparatus, including a memory and a processor; wherein,

a memory for storing a computer program capable of running on the processor;

a processor for performing the method as described in the first aspect when the computer program is run.

In a fourth aspect, embodiments of the present application provide a computer storage medium storing a computer program which, when executed by at least one processor, implements a method according to the first aspect.

In a fifth aspect, embodiments of the present application provide an apparatus, which at least includes the image fine-granularity classification device according to the second aspect or the third aspect.

According to the image fine granularity classification method, the device, the storage medium and the equipment, at least two sample images are obtained; extracting the characteristics of the at least two sample images to obtain the characteristic information corresponding to each of the at least two sample images; determining first label values corresponding to the at least two sample images; according to the obtained characteristic information and the first label value, a preset measurement model is constructed, and the preset measurement model represents a measurement model for optimizing the distance between at least two characteristic information; and combining the preset measurement model with a pre-trained preset classification model to obtain a target classification model, wherein the target classification model is used for realizing fine granularity classification of the image to be processed. In this way, as the preset measurement model represents the measurement model for optimizing the distance between at least two pieces of characteristic information and the first label value is determined according to the labels corresponding to at least two sample images, the preset measurement model can realize the inter-class distinction and intra-class convergence of the classification algorithm, and meanwhile, the operation workload is reduced; in addition, the target classification model is superimposed with the preset measurement model on the basis of the preset classification model, so that the precision of fine granularity classification can be improved, and the effect of fine granularity classification is improved.

Drawings

Fig. 1 is a flow chart of an image fine granularity classification method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating another method for classifying fine granularity of an image according to an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram for constructing a binary set metric model according to an embodiment of the present application;

FIG. 4 is a flowchart of another image fine granularity classification method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an image fine granularity classifying device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another image fine granularity classification apparatus according to an embodiment of the present application;

fig. 7 is a schematic hardware structure diagram of an image fine granularity classification apparatus according to an embodiment of the present application;

fig. 8 is a schematic diagram of a composition structure of an apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting of the application. It should be noted that, for convenience of description, only a portion related to the related application is shown in the drawings.

In an embodiment of the present application, referring to fig. 1, a flowchart of an image fine granularity classification method provided in an embodiment of the present application is shown. As shown in fig. 1, the method may include:

s101: acquiring at least two sample images;

it should be noted that the method is applied to an image fine-granularity classification device or a device integrated with the image fine-granularity classification device. Here, the device may be a server, such as a web server, a data server, or the like; terminals, such as smartphones, tablet computers, notebook computers, palmtops, personal digital assistants (Personal Digital Assistant, PDA), navigation devices, wearable devices, digital cameras, desktop computers, and the like, are also possible, and the embodiments of the present application are not limited in any way.

It should be noted that, image classification often occurs in daily life, and images of different categories can be distinguished according to semantic information of the images, which is an important research topic in computer vision, and is also the basis of other high-level visual tasks such as image detection, image segmentation, object tracking, behavior analysis, and the like. The image classification technology comprises coarse granularity classification, fine granularity classification and the like; coarse-grain classification mainly can identify main objects in images, fine-grain classification is used for classifying large categories of coarse-grain classification into finer sub-categories, and different categories can be distinguished only by means of small local differences. At this time, at least two sample images need to be acquired to construct a binary group, or a ternary group or even an N-tuple, where N is an integer greater than or equal to 2.

S102: extracting the characteristics of the at least two sample images to obtain the characteristic information corresponding to each of the at least two sample images;

it should be noted that the features in the images are descriptions of salient features or attributes, and each image has its own feature information, such as brightness, contour of an edge, texture, or color; and the feature extraction and representation of the image are the basis of highlighting classification, and the extracted feature information can fully represent the semantic content of the image. Here, there are various modes of feature extraction, such as a convolutional neural network (Convolutional Neural Networks, CNN) model, a local binary pattern (Local Binary Patterns, LBP) algorithm model, a direction gradient histogram (Histogram of Oriented Gradient, HOG) feature extraction model, a Scale-invariant feature transform (SIFT) operator model, and the like; in the embodiment of the present application, feature extraction may be generally performed using a convolutional neural network model, but the embodiment of the present application is not particularly limited.

Further, the convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a depth structure, and is one of representative algorithms of deep learning; and the convolutional neural network has characteristic learning capability and can be widely applied to the field of visual image recognition. Here, the novel artificial neural network model generated by combining the artificial neural network and the deep learning technology by the convolutional neural network has the characteristics of global training of combining the local sensing area, the hierarchical structure, the feature extraction and the classification process, and has relatively accurate recognition capability on the local part of the image. Thus, in some embodiments, for S102, the performing feature extraction on the at least two sample images to obtain feature information corresponding to each of the at least two sample images may include:

And extracting the characteristics of the at least two sample images by using a convolutional neural network model to obtain the characteristic information corresponding to each of the at least two sample images.

Here, the convolutional neural network model includes an input layer, an implicit layer, and an output layer; the input layer can receive the sample image, and usually, the input layer mainly performs standardization processing on the input sample image, namely, the input data needs to be normalized; the hidden layer comprises a convolution layer, a pooling layer and a full connection layer; the method comprises the steps of performing feature extraction on input data during the function of a convolution layer, wherein the input data comprises a plurality of convolution kernels, and each element forming the convolution kernels corresponds to a weight coefficient and an offset; the pooling layer is used for carrying out feature extraction on the convolution layer, and then the output feature information is transmitted to the pooling layer for carrying out feature selection and information filtering, and the full-connection layer is positioned at the last part of the hidden layer and only transmits signals to other full-connection layers; that is, the convolution layer and the pooling layer in the convolution neural network model can perform feature extraction on the input sample image, and the function of the full-connection layer is to perform nonlinear combination on the extracted feature information to obtain output, namely the full-connection layer has no feature extraction capability, and the extracted features are utilized to complete a learning target; finally, the upstream of the output layer is usually a full-connection layer, specifically, for the problem of image classification, the output layer outputs classification labels by using a logic function or a normalized exponential function, so as to obtain the corresponding feature information of each sample image.

S103: determining first label values corresponding to the at least two sample images;

after the at least two sample images are obtained, labels corresponding to the at least two sample images may be obtained, and then, according to the labels corresponding to the at least two sample images, a first label value corresponding to the at least two sample images may be determined. Here, the tag may include a category to which each sample image belongs, that is, the tag corresponding to each sample image may refer to a category to which each sample image belongs, for example, the tag may include a cat category, a dog category, a bird category, and the like.

Specifically, for S103, the determining the first label value corresponding to the at least two sample images may include:

acquiring labels corresponding to the at least two sample images respectively;

judging whether labels corresponding to the at least two sample images belong to the same category or not;

if the labels corresponding to the at least two sample images belong to the same category, determining the first label value as a first value;

and if the labels corresponding to the at least two sample images do not belong to the same category, determining that the first label value is a second value.

The first value is different from the second value; preferably, the first value is equal to 1 and the second value is equal to 0. That is, after obtaining the labels corresponding to the at least two sample images, determining whether the labels corresponding to the at least two sample images belong to the same category; if the labels corresponding to the at least two sample images belong to the same category, determining that the first label value is 1; if the labels corresponding to the at least two sample images do not belong to the same category, then the first label value may be determined to be 0.

In this way, after at least two sample images are acquired, the feature information and the first label value corresponding to each of the at least two sample images can be obtained, and then a preset measurement model is constructed according to the obtained feature information and the first label value so as to carry out fine-granularity classification on the images to be processed.

S104: according to the obtained characteristic information and the first label value, a preset measurement model is constructed; the preset measurement model represents a measurement model for optimizing the distance between at least two pieces of characteristic information;

it should be noted that, the distance between at least two feature information may be optimized by the preset metric model; for example, when the at least two pieces of characteristic information belong to the same category, the distance between the at least two pieces of characteristic information can be made smaller; when the at least two pieces of feature information belong to different categories, the distance between the at least two pieces of feature information can be made larger.

In some embodiments, after S104, the method may further include:

when the first label value is the first value, controlling the distance between the at least two feature information to be reduced by utilizing the preset measurement model;

and when the first label value is the second value, controlling the distance between the at least two pieces of characteristic information to be increased by utilizing the preset measurement model.

That is, when the first tag value is a first value (for example, 1), it indicates that at least two pieces of characteristic information belong to the same category, and at this time, the distance between the at least two pieces of characteristic information can be controlled to be reduced by using a preset measurement model; when the first label value is a second value (such as 0), the first label value indicates that at least two pieces of characteristic information belong to different categories, and the distance between the at least two pieces of characteristic information can be controlled to be increased by using a preset measurement model; therefore, the distance can be made closer by utilizing the preset measurement model to restrict the characteristic information of the same category; and the characteristic information aiming at different categories is limited, so that the distance can be further increased.

Illustratively, taking two sample images as an example, constructing a tuple from the two sample images; suppose I _p And I _q For two binary groups of input, the first tag value is denoted by y, I _p And I _q The distance between them is denoted d; then a pre-set metric model can be derived as shown in the following equation,

wherein L is _sianese (I _p ，I _q ) A preset metric model, which may also be referred to as a loss function (i.e., loss function) based on a binary metric; when I _p And I _q When belonging to the same category, y is equal to 1; when I _p And I _q When belonging to different categories, y is equal to 0; in addition, α is a constant, which can be expressed in terms of m or margin, with the main purpose of making the distance between different categories larger; here, α may be set according to experimental conditions or empirical values, and in general, α is not less than 0.5, and in the embodiment of the present application, α may be set to 0.5, but the embodiment of the present application is not particularly limited.

Thus, according to the preset metric model shown in formula (1), since d is I _p And I _q Euclidean distance between, when I _p And I _q When belonging to the same category, the loss function can be obtainedThis time the distance d between the two can be made smaller; when I _p And I _q When belonging to different categories, the loss function can be obtained>This time the distance d between the two can be made larger; thus, by using the preset measurement model, different categories in the fine-grained classification algorithm can be improved And the convergence of the same class.

S105: combining the preset measurement model with a pre-trained preset classification model to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed.

It should be noted that, the target classification model is obtained by overlapping the preset measurement model and the preset classification model, which not only can be used for realizing fine-granularity classification of the image to be processed, but also can promote the effect of fine-granularity classification. Here, the preset classification model may be trained from a sample training set according to a general classification algorithm. Specifically, in some embodiments, before the merging processing is performed on the preset metric model and the pre-trained preset classification model to obtain the target classification model, the method may further include:

and carrying out model training on the sample training set by using a general classification algorithm, and determining a preset classification model.

It should also be noted that the most common loss function of the generic classification algorithm may be the softmax function. Among them, the softmax function is a generalization of the logic function, which may also be called normalized exponential function, and it may be used in a multi-classification process.

Taking a softmax function as an example, the preset classification model is shown in the following formula,

wherein xi represents feature information extracted from the ith sample image in the sample training set based on the classification network, where the feature information may be represented by a feature vector; yi represents the label corresponding to the ith sample image in the sample training set, b _yi Representing a bias term corresponding to yi; w represents a classification layer weight value corresponding to the extracted features of the convolutional neural network model, and W _j Representing the weight value corresponding to the j-th column (i.e. j-th class), b _j Representing the bias term corresponding to the j-th column (i.e., j-th category); w (W) ^T Represents the transpose of W.

Here, i=1, 2, …, N represents the total number of samples in the sample training set, N being an integer greater than or equal to 2; j=1, 2, …, n, n represents the total number of tags (or total number of categories) in the sample training set, n being an integer greater than or equal to 2.

Thus, according to the preset classification model shown in the formula (2), although the image to be processed can be classified in fine granularity, the preset classification model does not include a measurement model for optimizing the distance between at least two pieces of characteristic information, that is, the distance measurement between the pieces of characteristic information belonging to the same category cannot be optimized, so that the accuracy of fine granularity classification is not high.

In the embodiment of the application, at least two sample images can be selected from the sample training set on the basis of a softmax function so as to construct a preset measurement model; and combining the preset measurement model and the preset classification model, wherein the target classification model obtained after combination not only can carry out fine-grained classification on the image to be processed, but also can improve the accuracy of fine-grained classification by optimizing the distance between at least two pieces of characteristic information.

Further, since the current fine-grained classification algorithm can be roughly divided into the following branches: algorithms based on existing classification network fine tuning, algorithms based on fine-grained feature learning, algorithms based on detection and classification of target blocks, and algorithms based on visual attention mechanisms. Here, the algorithm based on the fine tuning of the existing classification network may generally refer to performing preliminary training on a large-scale visualization database (ImageNet) by using the existing classification network (such as a MobileNet network, an Xception network, etc.) to obtain a trained classification model, and then continuing fine tuning on a fine-grained data set, so that the trained classification model can be more suitable for the region molecular class; the algorithm based on fine-grained feature learning requires the combination of information acquired by two networks, wherein one network is used for acquiring the position information of the target and the other network is used for extracting the abstract feature expression of the target; the method is characterized in that the method is based on the combination of target detection and classification, the thought of target detection is used for referencing the thought of target detection, the target area of the image is marked through a target detection module, then fine granularity classification is carried out based on the target area, and the classification algorithm can be a traditional support vector machine (SupportVector Machines, SVM) classifier or a general classification network; finally, compared with a general classification algorithm, the attention mechanism is added to the attention mechanism-based algorithm, so that the classification model can pay more attention to the information expression of the target position. However, the classification degree of the fine granularity classification algorithm between classes is not high, and the characteristics in the classes are not gathered sufficiently, so that the characteristic distribution among a plurality of classes is overlapped, and the problem of misclassification is easily caused; meanwhile, the algorithm added with the detection module can also introduce complex operation, so that the calculation cost is increased to cause more time delay. Based on this, in order to solve these problems, the embodiment of the present application introduces a preset metric model, and by means of metric learning, not only can the distinguishing property between different categories and the gathering property of the same category in the fine-grained classification algorithm be improved, but also no more computation is performed.

The embodiment provides an image fine granularity classification method, which comprises the steps of obtaining at least two sample images; extracting the characteristics of the at least two sample images to obtain the characteristic information corresponding to each of the at least two sample images; determining first label values corresponding to the at least two sample images; according to the obtained characteristic information and the first label value, a preset measurement model is constructed; wherein the preset metric model represents a metric model that optimizes a distance between at least two feature information; combining the preset measurement model with a pre-trained preset classification model to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed. In this way, as the preset measurement model represents the measurement model for optimizing the distance between at least two pieces of characteristic information and the first label value is determined according to the labels corresponding to at least two sample images, the preset measurement model can realize the inter-class distinction and intra-class convergence of the classification algorithm, and meanwhile, the operation workload is reduced; in addition, the target classification model is superimposed with the preset measurement model on the basis of the preset classification model, so that the precision of fine granularity classification can be improved, and the effect of fine granularity classification is improved.

In another embodiment of the present application, referring to fig. 2, a schematic flow chart of another image fine granularity classification method provided in an embodiment of the present application is shown. As shown in fig. 2, the method may include:

s201: acquiring a sample training set and a label corresponding to each sample image in the sample training set;

it should be noted that, the sample training set may include at least two sample images, and the label may include a category to which each sample image belongs, that is, the label corresponding to each sample image may refer to the category to which each sample image belongs; here, the tag may include cat category, dog category, bird category, and the like.

It should be further noted that, before performing model training, a sample training set needs to be acquired first, and a label corresponding to each sample image is acquired respectively. Here, the sample training set is mainly used for training a preset classification model, so that the effect of the loss function is optimal; and the more samples in the sample training set, the better the effect of the loss function can be made.

S202: extracting the characteristics of each sample image in the sample training set by using a convolutional neural network model to obtain the characteristic information corresponding to each sample image;

After the sample training set is obtained, feature extraction can be performed on each sample image in the sample training by using the convolutional neural network model so as to obtain feature information corresponding to each sample image. Here, the features extracted by the convolutional neural network model are more scientific than simple projection, direction and gravity center, and the feature extraction cannot become the bottleneck for improving the accuracy; in addition, the fitting capacity of the whole model can be controlled by utilizing different convolution kernels, pooling and the size of the finally output feature vector; the dimension of the feature vector can be reduced during over fitting; the output dimension of the convolution layer can be improved when the convolution layer is under fit, and compared with other characteristic extraction modes, the convolution layer is more flexible; thus, in embodiments of the present application, a convolutional neural network model may be utilized to perform feature extraction on each sample image in the sample training.

S203: based on the convolutional neural network model, acquiring a weight value corresponding to each category;

it should be further noted that, after the label corresponding to each sample image is obtained, the number of categories included in the sample training set may be obtained; in the convolutional neural network model, a weight value corresponding to each class can also be obtained.

In this way, after the feature information, the label and the weight value corresponding to each class corresponding to each sample image are obtained, a preset classification model can be trained.

S204: model training is carried out based on the characteristic information, the labels and the weight values corresponding to each type of sample image to obtain the preset classification model;

it should be noted that, the preset classification model represents a model obtained by training according to a general classification algorithm based on a sample training set. Here, the most common loss function for the generic classification algorithm is the softmax function. For example, taking a softmax function as an example, the preset classification model may be as shown in equation (2) above.

Since the fine-grained classification can be performed on the image to be processed only according to the preset classification model shown in the formula (2), the preset classification model does not include a measurement model for optimizing the distance between at least two pieces of feature information, that is, the distance measurement between pieces of feature information belonging to the same category cannot be optimized, so that the accuracy of fine-grained classification is not high. At this time, a preset measurement model can be constructed according to a sample training set on the basis of a softmax function; that is, at least two sample images may be selected from the sample training set to construct a preset metric model for optimizing a distance between at least two feature information, thereby improving accuracy of fine-grained classification.

S205: acquiring the at least two sample images from the sample training set;

s206: constructing a preset measurement model based on the at least two sample images;

here, no sequence division exists between the construction of the preset classification model and the preset measurement model, that is, steps S201 to S204 and steps S205 to S206 may be processed in parallel or may be processed in series, which is not limited in the embodiment of the present application.

It should be noted that if the preset metric model is a binary metric model, two sample images may be selected from the sample training set and used as two branches of the preset metric model to construct the binary metric model.

Illustratively, taking a binary metric model as an example, as shown in fig. 3, the specific process of building and training may include: a first branch 301, a second branch 302, and a binary metric model 303; in the first branch 301, after the feature extraction of the convolutional neural network, the first sample image may obtain first feature information Ip; in the second branch 302, after the feature extraction of the convolutional neural network, the second sample image may obtain second feature information Iq; from these two feature information (Ip and Iq), a loss function as shown in equation (1) can be obtained, i.e., a binary metric model 303 is constructed; here, the convolutional neural networks in the first branch 301 and the second branch 302 are the same neural network, and the first branch 301 and the second branch 302 are processed in parallel. Thus, if I _p And I _q If the two types belong to the same category, y is equal to 1, and the distance between the two types can be smaller by using the binary measurement model; if I _p And I _q Belonging to different categories, then y is equal to 0, which can be made greater by using the two-tuple metric model.

S207: and combining the preset measurement model and the preset classification model to obtain the target classification model.

It should be noted that, the object classification model may be obtained by combining the loss function shown in the formula (1) on the basis of the softmax function shown in the formula (2); and combining the preset classification model and the preset measurement model to obtain the target classification model.

It should be further noted that, for the preset metric model, the construction of the tuple is simpler and the network convergence speed is faster than the construction of the triplet or even the N-tuple. In the embodiment of the present application, the preset metric model generally refers to a binary metric model. At this time, if the sample training set includes N sample images, two by two combinations can be combined to construct n× (N-1)/2 binary group metric models; the target classification model at this time may be obtained by jointly adding a preset classification model and n× (N-1)/2 preset metric models; therefore, as the target classification model considers the preset measurement model, the distinguishing property among different categories and the gathering property of the same category in the fine-granularity classification algorithm can be improved, and therefore the accuracy of fine-granularity classification can be improved, and the effect of improving fine-granularity classification can be achieved.

The embodiment provides a fine-granularity image classification method, which is used for elaborating the specific implementation of the foregoing embodiment, and it can be seen that a preset measurement model is newly added on the basis of a preset classification model (i.e. softmax function); because the preset measurement model represents a measurement model for optimizing the distance between at least two pieces of characteristic information, the classification algorithm can realize the inter-class distinction and intra-class convergence based on the preset measurement model, so that the precision of fine-granularity classification can be improved, and the effect of fine-granularity classification can be improved.

In yet another embodiment of the present application, reference is made to fig. 4, which is a schematic flow chart illustrating yet another image fine-granularity classification method provided in an embodiment of the present application. As shown in fig. 4, the method may include:

s401: acquiring an image to be processed;

s402: determining a target classification model; the target classification model is obtained by combining a preset measurement model and a preset classification model;

it should be noted that the preset measurement model may refer to a loss function shown in the formula (1) to optimize a distance between at least two feature information; the preset classification model may be a softmax function shown in formula (2), which is obtained by training according to a general classification algorithm according to a sample training set; in this way, the preset classification model does not contain the optimization of the distance measurement of the characteristic information belonging to the same category, so that the degree of distinction between the categories is not high; at this time, the measurement learning is performed by combining the preset measurement model on the basis of the preset classification model, so that the fine-granularity classification effect can be improved.

S403: and carrying out fine-granularity classification on the image to be processed through the target classification model to obtain a classification result of the image to be processed.

It should also be noted that the object classification model may be applied to fine-grained classification or fine-grained classification identification. After the target classification model is obtained, the image to be processed can be subjected to fine-grained classification so as to obtain a classification result of the image to be processed, and the classification result is more accurate. That is, the feature distribution in the class can be gathered more and the feature distribution difference between the classes is larger by combining the measurement learning on the basis of the general classification algorithm, so that the precision of fine-grained classification is improved.

The embodiment provides a fine-granularity image classification method, which is used for elaborating the specific implementation of the foregoing embodiment, and it can be seen that a preset measurement model is newly added on the basis of a preset classification model (i.e. softmax function); at this time, the fine granularity classification is carried out on the image to be processed, so that the inter-class distinction and intra-class convergence of the classification algorithm can be realized, the precision of the fine granularity classification can be improved, and the effect of the fine granularity classification can be improved.

Based on the same inventive concept as the previous embodiments, referring to fig. 5, a schematic diagram of the composition structure of an image fine granularity classifying apparatus 50 according to an embodiment of the present application is shown. As shown in fig. 5, the image fine granularity classification apparatus 50 may include an acquisition unit 501, an extraction unit 502, a determination unit 503, a construction unit 504, and a merging unit 505; wherein,

an acquisition unit 501 configured to acquire at least two sample images;

the extracting unit 502 is configured to perform feature extraction on the at least two sample images to obtain feature information corresponding to each of the at least two sample images;

a determining unit 503 configured to determine first tag values corresponding to the at least two sample images;

a construction unit 504 configured to construct a preset metric model according to the obtained feature information and the first tag value; wherein the preset metric model represents a metric model that optimizes a distance between at least two feature information;

the merging unit 505 is further configured to merge the preset measurement model and a preset classification model trained in advance to obtain a target classification model; the target classification model is used for achieving fine-grained classification of the image to be processed.

In the above-described aspect, referring to fig. 6, the image fine-granularity classification apparatus 50 may further include a judgment unit 506; wherein,

the acquiring unit 501 is further configured to acquire labels corresponding to the at least two sample images respectively;

a judging unit 506, configured to judge whether the labels corresponding to the at least two sample images respectively belong to the same category;

the determining unit 503 is specifically configured to determine that the first label value is a first value if labels corresponding to the at least two sample images respectively belong to the same category; if the labels corresponding to the at least two sample images do not belong to the same category, determining that the first label value is a second value; wherein the first value is different from the second value.

In the above solution, the construction unit 504 is specifically configured to control, when the first tag value is the first value, the decrease of the distance between the at least two feature information by using the preset metric model; and when the first label value is the second value, controlling the distance between the at least two pieces of characteristic information to be increased by utilizing the preset measurement model.

In the above scheme, the first value is equal to 1, and the second value is equal to 0.

In the above-mentioned scheme, referring to fig. 6, the image fine-granularity classification apparatus 50 may further include a training unit 507 configured to perform model training on the sample training set by using a general classification algorithm, so as to obtain the preset classification model.

In the above solution, the obtaining unit 501 is further configured to obtain a sample training set and a label corresponding to each sample image in the sample training set; the sample training set comprises at least two sample images, and the label comprises a category to which each sample image belongs;

the extracting unit 502 is further configured to perform feature extraction on each sample image in the sample training set by using a convolutional neural network model, so as to obtain feature information corresponding to each sample image;

the obtaining unit 501 is further configured to obtain a weight value corresponding to each class based on the convolutional neural network model;

the training unit 507 is specifically configured to perform model training based on the feature information and the label corresponding to each sample image and the weight value corresponding to each class, so as to obtain the preset classification model.

In the above-described aspect, referring to fig. 6, the image fine-granularity classification apparatus 50 may further include a classification unit 508; wherein,

An acquisition unit 501 further configured to acquire an image to be processed;

and the classification unit 508 is configured to perform fine granularity classification on the image to be processed through the target classification model to obtain a classification result of the image to be processed.

It will be appreciated that in this embodiment, the "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may of course be a module, or may be non-modular. Furthermore, the components in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.

The integrated units, if implemented in the form of software functional modules, may be stored in a computer-readable storage medium, if not sold or used as separate products, and based on such understanding, the technical solution of the present embodiment may be embodied essentially or partly in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform all or part of the steps of the method described in the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Accordingly, the present embodiment provides a computer storage medium storing a computer program which, when executed by at least one processor, implements the steps of the method of any of the preceding embodiments.

Based on the composition of the image fine granularity classification apparatus 50 and the computer storage medium described above, referring to fig. 7, which shows a specific hardware configuration example of the image fine granularity classification apparatus 50 provided in the embodiment of the application, may include: a communication interface 701, a memory 702, and a processor 703; the various components are coupled together by a bus system 704. It is appreciated that bus system 704 is used to enable connected communications between these components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 704 in fig. 7. The communication interface 701 is configured to receive and send signals in a process of receiving and sending information with other external network elements;

a memory 702 for storing a computer program capable of running on the processor 703;

a processor 703 for executing, when running the computer program:

Acquiring at least two sample images;

determining first label values corresponding to the at least two sample images;

It is appreciated that the memory 702 in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous linkDRAM, SLDRAM, and Direct RAM (DRRAM). The memory 702 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

And the processor 703 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 703 or by instructions in the form of software. The processor 703 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 702, and the processor 703 reads information in the memory 702 and, in combination with its hardware, performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Optionally, as another embodiment, the processor 703 is further configured to perform the steps of the method of any of the preceding embodiments when the computer program is run.

Referring to fig. 8, a schematic diagram of a composition structure of an apparatus according to an embodiment of the present application is shown. As shown in fig. 8, the apparatus 80 may include at least the image fine-granularity classification apparatus 50 according to any of the foregoing embodiments. In this way, the device 80 can realize the inter-class distinction and intra-class cohesion of the classification algorithm through the included image fine-granularity classification device 50, and simultaneously reduce the operation workload; in addition, since the target classification model in the image fine-granularity classification device 50 is a preset measurement model superimposed on the preset classification model, the accuracy of fine-granularity classification can be improved, so that the effect of fine-granularity classification is improved.

It should be noted that, in this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The methods disclosed in the several method embodiments provided in the present application may be arbitrarily combined without collision to obtain a new method embodiment.

The features disclosed in the several product embodiments provided in the present application may be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in the several method or apparatus embodiments provided in the present application may be arbitrarily combined without conflict to obtain new method embodiments or apparatus embodiments.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of fine-grained classification of images, the method comprising:

acquiring at least two sample images;

Determining first label values corresponding to the at least two sample images;

according to the obtained characteristic information and the first label value, a preset binary set measurement model is constructed; the preset binary set measurement model represents a measurement model for optimizing the distance between at least two pieces of characteristic information of the same category;

combining the preset binary set measurement model with a pre-trained preset classification model to obtain a target classification model; the target classification model is used for realizing fine granularity classification of the image to be processed;

under the condition that the sample training set comprises N sample images, the constructed preset binary group measurement model is N x (N-1)/2; the target classification model is obtained by combining the preset classification model and N× (N-1)/2 preset binary group measurement models.

2. The method of claim 1, wherein determining the first label value corresponding to the at least two sample images comprises:

acquiring labels corresponding to the at least two sample images respectively;

if the labels corresponding to the at least two sample images do not belong to the same category, determining that the first label value is a second value; wherein the first value is different from the second value.

3. The method of claim 2, wherein after said building a pre-set binary metric model, the method further comprises:

when the first label value is the first value, controlling the distance between the at least two pieces of characteristic information to be reduced by utilizing the preset binary set measurement model;

and when the first label value is the second value, controlling the distance between the at least two pieces of characteristic information to be increased by utilizing the preset binary set measurement model.

4. The method of claim 2, wherein the first value is equal to 1 and the second value is equal to 0.

5. The method of claim 1, wherein prior to said combining the pre-set binary metrics model and pre-trained pre-set classification model, the method further comprises:

and performing model training on the sample training set by using a general classification algorithm to obtain the preset classification model.

6. The method of claim 5, wherein the model training the sample training set using a general classification algorithm to obtain the preset classification model comprises:

acquiring a sample training set and a label corresponding to each sample image in the sample training set; the sample training set comprises at least two sample images, and the label comprises a category to which each sample image belongs;

extracting the characteristics of each sample image in the sample training set by using a convolutional neural network model to obtain the characteristic information corresponding to each sample image;

based on the convolutional neural network model, acquiring a weight value corresponding to each category;

and performing model training based on the characteristic information corresponding to each sample image, the label and the weight value corresponding to each class to obtain the preset classification model.

7. The method according to any one of claims 1 to 6, wherein after said deriving a target classification model, the method further comprises:

acquiring an image to be processed;

and carrying out fine-granularity classification on the image to be processed through the target classification model to obtain a classification result of the image to be processed.

8. The image fine granularity classifying device is characterized by comprising an acquisition unit, an extraction unit, a determination unit, a construction unit and a merging unit; wherein,

the acquisition unit is configured to acquire at least two sample images;

the determining unit is configured to determine first label values corresponding to the at least two sample images;

the construction unit is configured to construct a preset binary set measurement model according to the obtained characteristic information and the first label value; the preset binary set measurement model represents a measurement model for optimizing the distance between at least two pieces of characteristic information of the same category;

the merging unit is configured to merge the preset binary group measurement model and a pre-trained preset classification model to obtain a target classification model; the target classification model is used for realizing fine granularity classification of the image to be processed;

9. An image fine granularity classification device, characterized in that the image fine granularity classification device comprises a memory and a processor; wherein,

the memory is used for storing a computer program capable of running on the processor;

the processor being adapted to perform the method of any of claims 1 to 7 when the computer program is run.

10. A computer storage medium storing a computer program which, when executed by at least one processor, implements the method of any one of claims 1 to 7.

11. An image fine-granularity classification apparatus, characterized in that the apparatus comprises at least an image fine-granularity classification device as claimed in claim 8 or 9.