CN112101437A

CN112101437A - Fine-grained classification model processing method based on image detection and related equipment thereof

Info

Publication number: CN112101437A
Application number: CN202010930234.1A
Authority: CN
Inventors: 林春伟; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2020-12-18
Anticipated expiration: 2040-09-07
Also published as: CN112101437B; WO2021143267A1

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a fine-grained classification model processing method based on image detection, which comprises the steps of receiving keywords and constructing an image data set through a search engine; randomly grouping the image data sets into a plurality of groups of training sets; inputting a plurality of groups of training sets into a fine-grained classification initial model to obtain attention weighted vectors of images in the plurality of groups of training sets; pooling attention weighting vectors to respectively generate a plurality of groups of training examples corresponding to training sets; inputting the training examples into a classifier of a fine-grained classification initial model to calculate model loss; and adjusting model parameters according to the model loss to obtain a fine-grained classification model. The application also provides a fine-grained classification model processing device based on image detection, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the trained model parameters can be stored in the block chain. The method and the device can quickly and accurately realize the classification processing of the fine-grained images.

Description

Fine-grained classification model processing method based on image detection and related equipment thereof

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for processing a fine-grained classification model based on image detection, a computer device, and a storage medium.

Background

With the development of computer technology, the research and application of computer vision are more and more extensive, wherein fine-grained image classification is a hot topic in computer vision. The aim of fine-grained image classification is to retrieve and identify images of different subclasses under a large class, and the method relates to image detection in artificial intelligence.

In the conventional fine-grained image classification technology, in order to improve the classification accuracy, a large-scale image data set is generally required to be prepared, and training and application can be performed only after images in the image data set are manually labeled, so that time and labor are wasted, and the processing efficiency of fine-grained image classification is low.

Disclosure of Invention

An embodiment of the application aims to provide a fine-grained classification model processing method and device based on image detection, computer equipment and a storage medium, so as to solve the problem of low fine-grained image classification processing efficiency.

In order to solve the above technical problem, an embodiment of the present application provides a fine-grained classification model processing method based on image detection, which adopts the following technical solutions:

constructing an image data set by a search engine based on the received keywords;

randomly grouping the image data sets into a plurality of groups of training sets;

inputting the groups of training sets into a fine-grained classification initial model to obtain attention weighting vectors of images in the groups of training sets;

pooling the attention weighting vectors to respectively generate training examples corresponding to the groups of training sets;

inputting the obtained training example into a classifier of the fine-grained classification initial model to calculate model loss;

and adjusting the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model.

Further, the step of constructing an image data set by a search engine based on the received keywords comprises:

receiving a keyword sent by a terminal;

sending the keyword to a search engine to instruct the search engine to perform image search from the Internet according to the keyword;

an image dataset is constructed based on the searched images.

Further, the step of inputting the plurality of groups of training sets into a fine-grained classification initial model to obtain the attention weighting vector of each image in the plurality of groups of training sets comprises:

inputting each image in the plurality of groups of training sets into a convolution layer of a fine-grained classification initial model respectively to obtain convolution characteristic vectors of each image area in each image;

calculating, by an attention detector, a regularized attention score for the convolved feature vectors; wherein the regularized attention score is used to characterize a degree of association of an image region with the keyword;

and correspondingly multiplying the regularized attention score and the convolution feature vector to obtain the attention weighting vector of each image.

Further, the step of inputting each image in the plurality of sets of training sets into a convolution layer of a fine-grained classification initial model to obtain a convolution feature vector of each image region in each image includes:

inputting the groups of training sets into a convolution layer of a fine-grained classification initial model;

acquiring a convolution characteristic diagram output by the last convolution layer of the convolution layer;

and setting the vector corresponding to each image area in the convolution characteristic diagram as a convolution characteristic vector.

Further, the step of inputting the obtained training example into the classifier of the fine-grained classification initial model to calculate the model loss comprises:

inputting the obtained training examples into a classifier to calculate the loss of the classifier;

calculating a regularization factor according to the convolution feature vector;

and performing linear operation on the classifier loss and the regularization factor to obtain model loss.

Further, the step of inputting the obtained training examples into a classifier to calculate a classifier loss comprises:

inputting the obtained training examples into a classifier to obtain fine-grained classes of the images in the training examples;

setting the keyword as an instance tag;

and calculating the classifier loss of the training example according to the example label and the fine-grained category of each image in the training example.

Further, after the step of adjusting the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model, the method further includes:

acquiring an image to be classified;

inputting the image to be classified into the fine-grained classification model to obtain an attention weighted vector of the image to be classified;

generating a test case of the image to be classified based on the attention weighting vector;

and inputting the test case into a classifier of the fine-grained classification model to obtain the fine-grained classification of the image to be classified.

In order to solve the above technical problem, an embodiment of the present application further provides an apparatus for processing a fine-grained classification model based on image detection, which adopts the following technical solutions:

the data set construction module is used for constructing an image data set through a search engine based on the received keywords;

the data set grouping module is used for randomly grouping the image data sets into a plurality of groups of training sets;

the data set input module is used for inputting the plurality of groups of training sets into a fine-grained classification initial model to obtain an attention weighting vector of each image in the plurality of groups of training sets;

the example generation module is used for pooling the attention weighting vectors and respectively generating training examples corresponding to the plurality of groups of training sets;

the loss calculation module is used for inputting the obtained training examples into a classifier of the fine-grained classification initial model so as to calculate model loss;

and the parameter adjusting module is used for adjusting the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the above fine-grained classification model processing method based on image detection when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for processing the fine-grained classification model based on image detection is implemented.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: the image data set is directly constructed through a search engine according to the keywords, the image data set can be rapidly expanded through the Internet, and the speed of establishing the image data set is improved; because the images are independent, the image data are randomly grouped into a plurality of groups of training sets, so that the negative influence of the images which do not conform to the labels is reduced; inputting a plurality of groups of training sets into a fine-grained classification initial model, and calculating an attention weighting vector of an input image by the fine-grained classification initial model in a manner of fusing an attention mechanism so as to enhance an image area related to keywords in the image and enable the model to be concentrated in the image area related to classification; generating a training example according to the attention weighting vector, wherein the training example comprises the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain model loss, model parameters are adjusted according to the model loss to obtain a fine-grained classification model capable of being classified accurately, and the fine-grained image classification is rapidly and accurately processed.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for image detection-based fine-grained classification model processing according to the present application;

FIG. 3 is a schematic structural diagram of an embodiment of an apparatus for processing a fine-grained classification model based on image detection according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the fine-grained classification model processing method based on image detection provided by the embodiment of the present application is generally executed by a server, and accordingly, a fine-grained classification model processing apparatus based on image detection is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for image detection-based fine-grained classification model processing according to the present application is shown. The fine-grained classification model processing method based on image detection comprises the following steps:

step S201, an image data set is constructed by a search engine based on the received keyword.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the fine-grained classification model processing method based on image detection operates may communicate with a terminal through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

Wherein the keyword may be a word, word or phrase indicating that the server searches for the image; the keywords may be the names of subclasses in the fine-grained image classification. The image dataset may be a collection of images acquired based on the keyword.

Specifically, the fine-grained image classification requires a theme, that is, a keyword, the name of a subclass in the fine-grained image classification task may be used as the keyword, and the keyword may be manually input and sent to the server. And after receiving the keywords, the server searches pictures in a search engine according to the keywords and constructs an image data set according to search results.

In one embodiment, the image dataset may comprise positive examples and negative examples, wherein the positive examples are related to the keywords and the negative examples are unrelated to the keywords.

In one embodiment, constructing the image dataset by the search engine based on the received keywords comprises: receiving a keyword sent by a terminal; sending the keywords to a search engine to instruct the search engine to perform image search from the Internet according to the keywords; an image dataset is constructed based on the searched images.

Specifically, the user may control the processing of the fine-grained classification initial model at the terminal. The user inputs the keywords at the terminal, and the terminal sends the keywords to the server. The server calls an interface of the search engine and sends the keywords to the search engine, so that image search is performed from the Internet through the search engine.

The server can directly search keywords in a search engine, take the searched images as positive samples, and construct an image data set based on the positive samples. In addition, the server can randomly search images in a search engine to obtain negative samples, the positive samples and the negative samples are combined to obtain an image data set, and at the moment, the negative samples serve as noise interference in training to prevent overfitting of the model. It is stated that, the present application explains the positive sample as an example, and the negative sample has the same data processing procedure as the positive sample after being input into the model and is processed synchronously with the positive sample.

For example, assuming that the swans are composed of black swans and white swans, the black swans are subclasses of the swans, and the "black swans" can be used as keywords, and the server searches the search engine for images related to the black swans as positive samples. It should be noted that the positive samples are not necessarily all images of black swans, but there may be images of white swans, and the like, but the positive samples are all from search results of keywords. The negative examples are not related to the fine-grained image classification, for example, the negative examples may be images of cars, landscapes, etc.

In the embodiment, after the keywords are received, the search engine is used for searching from the internet, so that a large number of images can be obtained quickly, and the construction speed of the image data set is greatly improved.

Step S202, the image data sets are randomly grouped into a plurality of training sets.

Specifically, if an image is directly taken out from the image data set, the image has a certain probability of not matching the keyword; when a plurality of images are taken out from the image data set, the probability that the plurality of images are not matched with the keyword is extremely low, as long as one image is matched with the keyword in the plurality of images, the whole formed by the plurality of images can be considered to be matched with the keyword, and the keyword can be regarded as a label of the whole.

Therefore, the server randomly groups the image data sets to obtain a plurality of training sets. Assuming that the probability of the mismatch between the images in the image data set and the keywords is ζ, because the images have independence with each other, the probability p of the correct label of the training set is:

p＝1-ζ^K (1)

wherein K is the number of images in the training set, and K is a positive integer. It is easy to know that as K increases, the probability that the training set label is correct will increase rapidly.

Step S203, inputting a plurality of groups of training sets into the fine-grained classification initial model to obtain the attention weighting vector of each image in the plurality of groups of training sets.

Wherein, the fine-grained classification initial model can be a fine-grained classification model which is not trained yet. The attention weighting vector may be a vector representation output after processing each image, and may be subjected to weighting processing by an attention mechanism.

Specifically, the server inputs a plurality of groups of training sets into a convolutional layer of the fine-grained classification initial model, the convolutional layer performs convolution processing on each image in each group of training sets, and performs attention weighting on vectors in the convolutional layer by combining an attention mechanism to obtain an attention weighting vector of each image.

The vectors in the convolutional layer are used for classifying fine-grained images, the attention mechanism aims to differentiate the vectors in the convolutional layer in two poles, the vectors related to the keywords are strengthened by the attention mechanism, and the vectors unrelated to the keywords are weakened by the attention mechanism, so that the fine-grained image classification initial model can learn better according to the strengthened vectors, and the classification accuracy is improved. An attention detector can be arranged in the fine-grained image classification initial model, and an attention mechanism is realized by the attention detector.

Step S204, pooling attention weighting vectors and respectively generating training examples corresponding to a plurality of groups of training sets.

The training example is fusion of images in a training set, and attention weighted vectors of the images in the training set are combined.

Specifically, a pooling layer may be set in the fine-grained image classification initial model, and the pooling layer performs global average pooling on the attention weighting vectors, thereby generating training examples of the training set respectively. The training example fuses the image characteristics of each image in the training set for further fine-grained image classification.

In one embodiment, the formula for the global average pooling is:

wherein h is_nFor the training example, d is the dimension of feature map in the model, k is the kth picture in the training set,

the attention weighting vector of the image area of the kth picture (i, j) in the nth training set is shown.

And S205, inputting the obtained training example into a classifier of the fine-grained classification initial model to calculate the model loss.

Specifically, the server inputs the training examples into a classifier of the fine-grained classification initial model, and the classifier classifies according to the training examples and outputs classification results. The server may calculate the model loss based on the classification result and the label using the keyword as the label.

And S206, adjusting model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model.

Specifically, the server adjusts model parameters of the fine-grained classification initial model with the aim of reducing model loss, continues training after adjusting the model parameters each time, and stops training when the model loss meets a training stop condition to obtain the fine-grained classification model. Wherein the training stop condition may be that the model loss is less than a preset loss threshold.

The adjusted model parameters include parameters in the convolutional layer, the attention detector, and the classifier. After training, the attention detector can effectively identify image areas irrelevant to the keywords in the image, and can suppress or weaken the attention weighting vectors of the image areas and strengthen the attention weighting vectors of the image areas relevant to the keywords.

It should be emphasized that, in order to further ensure the privacy and security of the model parameters, the trained model parameters may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In the embodiment, the image data set is directly constructed through the search engine according to the keywords, the image data set can be rapidly expanded through the Internet, and the speed of establishing the image data set is improved; because the images are independent, the image data are randomly grouped into a plurality of groups of training sets, so that the negative influence of the images which do not conform to the labels is reduced; inputting a plurality of groups of training sets into a fine-grained classification initial model, and calculating an attention weighting vector of an input image by the fine-grained classification initial model in a manner of fusing an attention mechanism so as to enhance an image area related to keywords in the image and enable the model to be concentrated in the image area related to classification; generating a training example according to the attention weighting vector, wherein the training example comprises the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain model loss, model parameters are adjusted according to the model loss to obtain a fine-grained classification model capable of being classified accurately, and the fine-grained image classification is rapidly and accurately processed.

Further, the step S203 may include: inputting each image in a plurality of groups of training sets into a convolution layer of a fine-grained classification initial model respectively to obtain convolution characteristic vectors of each image area in each image; calculating a regularized attention score for the convolved feature vectors by an attention detector; the regularization attention score is used for representing the association degree of the image area and the keywords; and correspondingly multiplying the regularized attention score and the convolution feature vector to obtain the attention weighting vector of each image.

The convolution feature vector may be a vector representation output by the convolution layer after performing convolution processing on an image region in each image.

Specifically, the server inputs each image in a plurality of training sets into a convolution layer of the fine-grained image classification initial model, and the convolution layer outputs convolution characteristic vectors of each image area in each image after convolution processing. The image area may be in units of pixels, or may be in units of multiple pixels, for example, in units of 2 × 2 pixels and 3 × 3 pixels.

For each training set, the server collects the convolution feature vectors and inputs the convolution feature vectors into an attention detector, and the attention detector calculates the regularized attention scores of the convolution feature vectors according to the weights and the offsets.

The regularization attention score can represent the degree of association between the image region corresponding to the convolution feature vector and the keyword, and the higher the degree of association, the larger the regularization attention score can be. For each image, the server multiplies the convolution feature vector by the corresponding regularized attention score to obtain an attention weighting vector.

In an embodiment, the step of inputting each image in the plurality of training sets into the convolution layer of the fine-grained classification initial model to obtain the convolution feature vector of each image region in each image includes: inputting a plurality of groups of training sets into a convolution layer of a fine-grained classification initial model; acquiring a convolution characteristic diagram output by the last convolution layer of the convolution layer; and setting the vector corresponding to each image area in the convolution characteristic image as a convolution characteristic vector.

The convolution feature map may be a vector matrix, and each sub-matrix of the convolution feature map corresponds to each image area in the image.

Specifically, the convolutional layer may be composed of a plurality of sublayers, and the multi-layer convolutional processing is performed on the input training set. And the last convolutional layer is the last convolutional layer in the convolutional layers, the server acquires a convolutional feature map output by the last convolutional layer, the submatrices at all positions in the convolutional feature map correspond to all image areas in the image, and vectors corresponding to all the image areas in the convolutional feature map are used as convolutional feature vectors.

In this embodiment, the training set is input into the convolutional layer, a convolutional feature map output by the last convolutional layer is obtained, vectors in the convolutional feature map correspond to image regions in the image, and convolutional feature vectors can be accurately extracted according to the corresponding relationship.

In one embodiment, order

Representing the convolution characteristic vector corresponding to the (i, j) image area on the kth picture in the nth training set, and calculating the attention score by the attention detector according to the convolution characteristic vector

Wherein:

f(x)＝ln(1+exp(x)) (4)

wherein w ∈ R^cAnd b ∈ R respectively represent the weight and the bias of the attention detector, are key factors for strengthening or weakening the image area by the attention detector, and can be obtained by adjusting the model parameters.

After the attention detector obtains the attention score, the attention score may be normalized to compress the attention score to 0,1]Interval, get regularized attention score

Where constant, can be an empirical value for scoring regularized attention

More reasonable distribution, if not, and

very small, possibly resulting in very small

Corresponds to a very large

If the arrangement is reasonable, one is very small

Will cause

Where d is the dimension of the feature map in the model. After the regularization attention score is obtained, element-by-element multiplication is carried out on the convolution feature vector and the regularization attention score corresponding to the convolution feature vector, and then the vector representation weighted by the regularization attention score can be obtained

I.e. attention weighted vector

Wherein |, indicates element-by-element multiplication.

In this embodiment, images in a training set are input into a convolutional layer to obtain convolutional feature vectors of each image region in the images, an attention mechanism is introduced through an attention detector, the convolutional feature vectors are calculated to obtain regularized attention scores, the regularized attention scores can be used as weights of the convolutional feature vectors, attention weighting vectors are obtained after corresponding multiplication, the attention weighting vectors already complete reinforcement or suppression of the image regions, and therefore a fine-grained classification initial model can perform targeted learning.

Further, the step S205 may include: inputting the obtained training examples into a classifier to calculate the loss of the classifier; calculating a regularization factor according to the convolution feature vector; and performing linear operation on the classifier loss and the regularization factor to obtain the model loss.

Wherein, the classifier loss can be the loss calculated by the classifier; the model loss can be the total loss obtained by calculating the fine-grained classification initial model; the regularization factor may be a factor that regularizes classifier loss.

Specifically, the server inputs the training examples into a classifier of the fine-grained classification initial model, the classifier classifies according to the training examples, outputs classification results, and calculates the loss of the classifier according to the classification results.

The attention mechanism in the application aims to enable the regularized attention score of one or a plurality of image areas in the images matched with the keywords in the training set to have a higher value; for images that do not match keywords or are not relevant to fine-grained image classification, the regularization attention scores for the image regions should be close and low. To achieve the above goal in training, the present application sets the regularization factor separately in addition to the classifier penalty. The negative samples in the application are used as noise interference, and the regularization of attention calculation can also be realized.

Specifically, the regularization factor is calculated from the convolution feature vector. And after the server obtains the regularization factor, linearly adding the classifier loss and the regularization factor to obtain the model loss of the model level.

In this embodiment, the training examples are input into the classifier to calculate the classifier loss, the regularization factor is calculated according to the convolution feature vector to further strengthen or suppress the image, and the model loss is obtained based on linear operation on the classifier loss and the regularization factor, so that the model parameters of the fine-grained classification initial model can be more reasonably adjusted according to the model loss.

Further, the step of inputting the obtained training examples into the classifier to calculate the classifier loss comprises: inputting the obtained training examples into a classifier to obtain fine-grained classes of the images in the training examples; setting the keywords as instance tags; and calculating the classifier loss of the training example according to the example label and the fine-grained classification of each image in the training example.

Wherein the fine-grained category may be a classification result output by the classifier.

Specifically, the server inputs the training examples into a classifier of the fine-grained classification initial model, the classifier classifies according to the training examples and outputs a plurality of fine-grained classes, and the number of the fine-grained classes is equal to the number of the images in the training set.

The keywords can be used as example labels, and the server calculates the classifier loss by taking the training examples as a whole according to the output fine-grained categories and the example labels.

In one embodiment, the classifier penalty is a cross-entropy penalty, and the calculation formula is as follows:

wherein, F_nFor the fine-grained class output in the training example, y_nIs an example tag, L_classIs a classifier penalty.

When the regularization factor is calculated from the convolution eigenvector, a second attention score is defined

Second type of attention score

Different from that involved in regularizing attention score calculation

Wherein:

wherein the content of the first and second substances,

positive samples from the training set, and negative samples from the training set; b is the bias of the attention detector. When in use

Negative examples from the training set, attention mechanism intended to achieve

When in use

From positive samples in the training set, the attention mechanism is aimed at achieving at least one image area, so that

Combining the two cases, the regularization factor is as follows:

wherein the content of the first and second substances,_n1, -1, when the image is a positive sample, then take 1, otherwise take 0.

Regularization factor and classifier penalty h_nPerforming linear operations, there is a model penalty:

L＝L_class+λR (9)

wherein λ is a weight used to adjust the relative importance of classifier penalties and regularization factors; r is the regularization factor in equation (8).

The specific effect of the attention mechanism is as follows: if both images are from a training set, one is related to fine-grained image classification and related to a keyword, the regularization attention score is boosted in an image area related to the keyword; for images that are not related to fine-grained image classification or to keywords, the regularization attention score tends to zero on average in each image region, and the classifier does not pay much attention to these regions, i.e., learns or classifies less considering the features of these regions. Therefore, the attention mechanism in the application can filter out image areas which are irrelevant to a fine-grained image classification task or are irrelevant to keywords in the images of the training set, and can detect image areas which are helpful for fine-grained image classification in the images.

In the embodiment, the training examples are input into the classifier to obtain fine-grained classes, then the keywords are used as example labels, the training examples are used as a whole to calculate the loss of the classifier, and the loss of the classifier can be guaranteed by considering the information fused in the training examples.

Further, after step S206, the method may further include: acquiring an image to be classified; inputting the image to be classified into a fine-grained classification model to obtain an attention weighting vector of the image to be classified; generating a test case of the image to be classified based on the attention weighting vector; and inputting the test case into a classifier of a fine-grained classification model to obtain a fine-grained classification of the image to be classified.

Specifically, the server obtains a fine-grained classification model after completing training. When the method is applied, the image to be classified is obtained, and the image to be classified can be sent by the terminal. The server inputs the image to be classified into the convolution layer of the fine-grained classification model, the output of the last convolution layer of the convolution layer is input into the attention detector, and the attention weighting vector of each image area in the image to be classified is obtained.

Different from the method that a plurality of images are input at a time during training, one image can be input at a time during testing application, so that a pooling layer is not needed during application testing, and a test case of the image to be classified can be obtained according to the attention weighting vector. In the test case, the image area related to the fine-grained image classification is strengthened, the image area unrelated to the fine-grained image classification is inhibited, the test case is input into the classifier, and the classifier processes according to the test case and outputs the fine-grained category of the image to be classified.

In the embodiment, the image to be classified is input into the fine-grained classification model during the application test to obtain the test case, the test case strengthens the image area related to the fine-grained image classification, and inhibits the image area unrelated to the fine-grained image classification task, so that the classifier can accurately output the fine-grained classification.

The processing of the fine-grained classification model is explained through a specific application scene, taking identification of the type of the swans as an example, the swans are large types, the black swans and the white swans in the swans are subclasses, and the model for identifying the black swans and the white swans is the fine-grained classification model.

In the training stage, a large number of images are obtained from the internet according to the black swan to obtain an image data set. The image data sets are randomly grouped into a plurality of groups of training sets, and the black swans are labels of each group of training sets. And inputting each image in the training set into a convolution layer of the fine-grained classification initial model to obtain a convolution characteristic vector, inputting the convolution characteristic vector into an attention detector to obtain an attention weighting vector, and pooling the attention weighting vector to obtain a training example. The training example fuses the characteristics of the images in the training set, the image related to the black swan in the images is strengthened by the attention detector, the image which does not conform to the black swan (such as the image of the white swan) is restrained by the attention detector, namely the attention detector filters the information in the image, so that the model can be focused on learning. The classifier classifies according to the training examples and calculates model loss, the fine-grained classification model adjusts model parameters according to the model loss to strengthen the attention detector and the classifier, and the fine-grained classification model can be obtained after training is completed.

The fine-grained classification initial model can learn the characteristics of the black swan and the white swan in training. When the subclasses of the fine-grained image classification tasks are more, images of other subclasses can be collected for supplementary training. For example, images of white swans may be collected for additional training.

When the fine-grained classification model is used, an image to be classified is input into the model, the fine-grained classification model calculates the attention weighting vector of the image to be classified and generates a test case, the test case weights the image to be classified, and an area which is useful for fine-grained classification in the image to be classified is strengthened. After the test case is input into the classifier, the classifier can accurately identify whether the image is a black swan or a white swan according to the test case, and fine-grained image classification is achieved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for processing a fine-grained classification model based on image detection, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 3, the fine-grained classification model processing apparatus 300 based on image detection according to the present embodiment includes: a data set construction module 301, a data set grouping module 302, a data set input module 303, an instance generation module 304, a loss calculation module 305, and a parameter adjustment module 306, wherein:

a data set construction module 301, configured to construct an image data set through a search engine based on the received keywords.

A data set grouping module 302 for randomly grouping the image data sets into sets of training sets.

And the data set input module 303 is configured to input the plurality of groups of training sets into the fine-grained classification initial model to obtain an attention weighting vector of each image in the plurality of groups of training sets.

The instance generating module 304 is configured to pool the attention weighting vectors, and generate training instances corresponding to a plurality of groups of training sets respectively.

And a loss calculation module 305, configured to input the obtained training example into a classifier of the fine-grained classification initial model to calculate a model loss.

And the parameter adjusting module 306 is configured to adjust a model parameter of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model.

In some optional implementations of this embodiment, the data set constructing module 301 includes: a receiving submodule, a searching submodule and a constructing submodule, wherein:

and the receiving submodule is used for receiving the keywords sent by the terminal.

And the search sub-module is used for sending the keywords to a search engine so as to instruct the search engine to perform image search from the Internet according to the keywords.

A construction sub-module for constructing an image dataset based on the searched images.

In some optional implementations of this embodiment, the data set input module 303 includes: a data set input sub-module, a score calculation sub-module, and a multiplication sub-module, wherein:

and the data set input submodule is used for respectively inputting each image in the plurality of groups of training sets into the convolution layer of the fine-grained classification initial model to obtain the convolution characteristic vector of each image area in each image.

A score calculation sub-module for calculating a regularized attention score of the convolution feature vector by the attention detector; wherein the regularized attention score is used to characterize a degree of association of the image region with the keyword.

And the multiplication submodule is used for correspondingly multiplying the regularized attention score and the convolution characteristic vector to obtain an attention weighting vector of each image.

In some optional implementations of this embodiment, the data set input sub-module includes:

and the training set input unit is used for inputting a plurality of groups of training sets into the convolution layer of the fine-grained classification initial model.

And the output acquisition unit is used for acquiring the convolution characteristic diagram output by the last convolution layer of the convolution layers.

And the vector setting unit is used for setting the vector corresponding to each image area in the convolution characteristic diagram as a convolution characteristic vector.

In some optional implementations of this embodiment, the loss calculating module includes: loss calculation submodule, factor calculation submodule and linear operation submodule, wherein:

and the loss calculation submodule is used for inputting the obtained training examples into the classifier so as to calculate the classifier loss.

And the factor calculation submodule is used for calculating the regularization factor according to the convolution feature vector.

And the linear operation submodule is used for performing linear operation on the classifier loss and the regularization factor to obtain the model loss.

In some optional implementations of this embodiment, the loss calculating sub-module includes: an example input unit, a label setting unit, and a loss calculation unit, wherein:

and the example input unit is used for inputting the obtained training examples into the classifier to obtain the fine-grained classes of the images in the training examples.

And the label setting unit is used for setting the keywords as example labels.

And the loss calculation unit is used for calculating the classifier loss of the training examples according to the example labels and the fine-grained classes of the images in the training examples.

In some optional implementations of the embodiment, the fine-grained classification model processing apparatus 300 based on image detection further includes: the device comprises an acquisition module to be classified, an input module to be classified, a test generation module and a test input module, wherein:

and the to-be-classified acquisition module is used for acquiring the to-be-classified image.

And the to-be-classified input module is used for inputting the to-be-classified image into the fine-grained classification model to obtain the attention weighting vector of the to-be-classified image.

And the test generation module is used for generating a test case of the image to be classified based on the attention weighting vector.

And the test input module is used for inputting the test case into the classifier of the fine-grained classification model to obtain the fine-grained classification of the image to be classified.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of a fine-grained classification model processing method based on image detection. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions or processing data stored in the memory 41, for example, execute computer readable instructions of the fine-grained classification model processing method based on image detection.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment may execute the steps of the fine-grained classification model processing method based on image detection. Here, the steps of the fine-grained classification model processing method based on image detection may be steps in the fine-grained classification model processing method based on image detection of the above embodiments.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the fine-grained classification model processing method based on image detection as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A fine-grained classification model processing method based on image detection is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of constructing the image data set by a search engine based on the received keywords comprises:

receiving a keyword sent by a terminal;

an image dataset is constructed based on the searched images.

3. The method according to claim 1, wherein the step of inputting the plurality of training sets into the fine-grained classification initial model to obtain the attention weighting vector of each image in the plurality of training sets comprises:

4. The method according to claim 3, wherein the step of inputting each image in the training sets into a convolution layer of a fine-grained classification initial model to obtain a convolution feature vector of each image region in each image comprises:

5. The method of claim 3, wherein the step of inputting the obtained training examples into the classifier of the fine-grained classification initial model to calculate model loss comprises:

6. The method of claim 5, wherein the step of inputting the obtained training examples into a classifier to calculate classifier loss comprises:

setting the keyword as an instance tag;

7. The method according to any one of claims 1 to 6, wherein after the step of adjusting the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model, the method further comprises:

acquiring an image to be classified;

8. A fine-grained classification model processing device based on image detection is characterized by comprising:

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the image detection based fine grain classification model processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the image detection-based fine-grained classification model processing method according to any one of claims 1 to 7.