CN112101437B

CN112101437B - Fine granularity classification model processing method based on image detection and related equipment thereof

Info

Publication number: CN112101437B
Application number: CN202010930234.1A
Authority: CN
Inventors: 林春伟; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2024-05-31
Anticipated expiration: 2040-09-07
Also published as: WO2021143267A1; CN112101437A

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a fine granularity classification model processing method based on image detection, which comprises the steps of receiving keywords and constructing an image data set through a search engine; randomly grouping the image data sets into a plurality of sets of training sets; inputting a plurality of groups of training sets into a fine granularity classification initial model to obtain attention weighting vectors of images in the plurality of groups of training sets; pooling the attention weighted vectors to respectively generate training examples corresponding to a plurality of groups of training sets; inputting the training examples into a classifier of a fine-grained classification initial model to calculate model loss; and adjusting model parameters according to the model loss to obtain a fine-grained classification model. The application also provides a fine-granularity classification model processing device based on image detection, computer equipment and a storage medium. In addition, the application also relates to a blockchain technology, and the trained model parameters can be stored in the blockchain. The application can rapidly and accurately realize the processing of fine-granularity image classification.

Description

Fine granularity classification model processing method based on image detection and related equipment thereof

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a fine-granularity classification model processing method, a fine-granularity classification model processing device, computer equipment and a storage medium based on image detection.

Background

With the development of computer technology, research and application of computer vision are also becoming more and more widespread, wherein fine-grained image classification is a hot topic in computer vision. The goal of fine-grained image classification is to retrieve and identify images of different subclasses under a large class, involving image detection in artificial intelligence.

In the traditional fine-granularity image classification technology, in order to improve the classification accuracy, a large-scale image data set is generally required to be prepared, the images in the image data set can be trained and applied after being marked manually, and therefore time and labor are wasted, and the processing efficiency of fine-granularity image classification is low.

Disclosure of Invention

The embodiment of the application aims to provide a fine-granularity image classification model processing method, device, computer equipment and storage medium based on image detection, so as to solve the problem of low fine-granularity image classification processing efficiency.

In order to solve the above technical problems, the embodiment of the present application provides a fine-granularity classification model processing method based on image detection, which adopts the following technical scheme:

constructing an image dataset by a search engine based on the received keywords;

Randomly grouping the image data sets into a plurality of sets of training sets;

inputting the training sets into a fine-granularity classification initial model to obtain attention weighting vectors of images in the training sets;

pooling the attention weighted vectors to respectively generate training examples corresponding to the plurality of groups of training sets;

Inputting the obtained training examples into a classifier of the fine-grained classification initial model to calculate model loss;

and adjusting model parameters of the fine-granularity classification initial model according to the model loss to obtain a fine-granularity classification model.

Further, the step of constructing the image dataset by the search engine based on the received keywords includes:

Receiving keywords sent by a terminal;

sending the keywords to a search engine to instruct the search engine to search images from the Internet according to the keywords;

An image dataset is constructed based on the searched image.

Further, the step of inputting the plurality of sets of training sets into a fine-granularity classification initial model to obtain the attention weighting vector of each image in the plurality of sets of training sets includes:

Respectively inputting each image in the plurality of groups of training sets into a convolution layer of a fine-granularity classification initial model to obtain convolution feature vectors of each image region in each image;

Calculating a regularized attention score of the convolved feature vector by an attention detector; wherein the regularized attention score is used for representing the association degree of the image area and the keywords;

and correspondingly multiplying the regularized attention score by the convolution feature vector to obtain attention weighted vectors of the images.

Further, the step of inputting each image in the plurality of sets of training sets into the convolution layer of the fine-grained classification initial model to obtain the convolution feature vector of each image region in each image includes:

inputting the training sets into a convolution layer of a fine-granularity classification initial model;

acquiring a convolution characteristic diagram of the output of a last convolution layer of the convolution layers;

and setting vectors corresponding to the image areas in the convolution feature map as convolution feature vectors.

Further, the step of inputting the obtained training examples into the classifier of the fine-grained classification initial model to calculate model loss includes:

Inputting the obtained training examples into a classifier to calculate classifier loss;

Calculating a regularization factor according to the convolution feature vector;

and performing linear operation on the classifier loss and the regularization factor to obtain model loss.

Further, the step of inputting the resulting training examples into a classifier to calculate a classifier loss includes:

inputting the obtained training examples into a classifier to obtain fine granularity categories of each image in the training examples;

setting the keywords as instance tags;

and calculating the classifier loss of the training example according to the example label and the fine granularity category of each image in the training example.

Further, after the step of adjusting the model parameters of the fine-grained classification initial model according to the model loss to obtain a fine-grained classification model, the method further includes:

Acquiring an image to be classified;

Inputting the image to be classified into the fine granularity classification model to obtain an attention weighting vector of the image to be classified;

generating a test instance of the image to be classified based on the attention weighting vector;

and inputting the test case into a classifier of the fine-granularity classification model to obtain the fine-granularity category of the image to be classified.

In order to solve the above technical problems, the embodiment of the present application further provides a fine-granularity classification model processing device based on image detection, which adopts the following technical scheme:

The data set construction module is used for constructing an image data set through a search engine based on the received keywords;

the data set grouping module is used for randomly grouping the image data sets into a plurality of groups of training sets;

The data set input module is used for inputting the training sets into a fine-granularity classification initial model to obtain attention weighting vectors of the images in the training sets;

The example generation module is used for pooling the attention weighted vectors and respectively generating training examples corresponding to the training sets;

The loss calculation module is used for inputting the obtained training examples into the classifier of the fine-granularity classification initial model so as to calculate model loss;

And the parameter adjustment module is used for adjusting the model parameters of the fine-granularity classification initial model according to the model loss to obtain a fine-granularity classification model.

In order to solve the above technical problems, an embodiment of the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the fine-grained classification model processing method based on image detection when executing the computer program.

In order to solve the above technical problem, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the fine granularity classification model processing method based on image detection is implemented.

Compared with the prior art, the embodiment of the application has the following main beneficial effects: the image data set is directly built through the search engine according to the keywords, and can be rapidly expanded through the Internet, so that the speed of building the image data set is improved; because the images are mutually independent, the image data sets are randomly grouped into a plurality of groups of training sets, so that the negative influence of the images which do not accord with the labels is reduced; inputting a plurality of groups of training sets into a fine-granularity classification initial model, and calculating attention weighting vectors of the input images by fusing an attention mechanism to strengthen image areas related to keywords in the images so that the model concentrates on the image areas related to classification; generating a training example according to the attention weighted vector, wherein the training example comprises the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain model loss, model parameters are adjusted according to the model loss, and a fine-granularity classification model capable of accurately classifying is obtained, so that fine-granularity image classification processing is rapidly and accurately realized.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a fine-grained classification model processing method based on image detection in accordance with the application;

FIG. 3 is a schematic diagram of an embodiment of an image detection-based fine-grained classification model processing apparatus according to the application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the fine-grained classification model processing method based on image detection provided by the embodiment of the application is generally executed by a server, and correspondingly, the fine-grained classification model processing device based on image detection is generally arranged in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a fine-grained classification model processing method based on image detection in accordance with the application is shown. The fine granularity classification model processing method based on image detection comprises the following steps:

Step S201, based on the received keywords, an image dataset is constructed by a search engine.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the fine-grained classification model processing method based on image detection operates may communicate with a terminal through a wired connection or a wireless connection. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

Wherein, the keyword may be a word, a word or a phrase that instructs the server to search for an image; the keywords may be names of sub-categories in the fine-grained image classification. The image dataset may be a collection of images acquired based on keywords.

Specifically, the fine-grained image classification requires a theme, namely a keyword, and the names of the subclasses in the fine-grained image classification task can be used as the keyword, and the keyword can be manually input and sent to the server. After the server receives the keywords, picture searching is carried out in a search engine according to the keywords, and an image dataset is constructed according to the searching results.

In one embodiment, the image dataset may include positive samples related to keywords and negative samples unrelated to keywords.

In one embodiment, building an image dataset by a search engine based on received keywords includes: receiving keywords sent by a terminal; sending the keywords to a search engine to instruct the search engine to search images from the Internet according to the keywords; an image dataset is constructed based on the searched image.

Specifically, the user can control the processing of the fine-grained classification initial model at the terminal. The user inputs the keywords at the terminal, and the terminal sends the keywords to the server. The server calls an interface of the search engine and sends the keywords to the search engine, so that image searching is carried out from the Internet through the search engine.

The server may search keywords directly in the search engine, take the searched image as a positive sample, and construct an image dataset based on the positive sample. In addition, the server can also search the image randomly in the search engine to obtain a negative sample, combine the positive sample and the negative sample to obtain an image data set, and at the moment, the negative sample is used as noise interference in training to prevent the model from being fitted. In this statement, the present application is explained by taking a positive sample as an example, and the negative sample has the same data processing procedure as the positive sample after being input into the model and is processed synchronously with the positive sample.

For example, assuming that swan is composed of swan and swan, the swan is a subclass of swan, and "swan" may be used as a keyword, and a server searches a search engine for images related to swan as positive samples. It should be noted that the positive samples are not necessarily all images of swans, but images of swans, swan pictures, etc. may also exist, but the positive samples are all from search results of keywords. The negative examples are not related to fine-grained image classification, for example, the negative examples may be images of automobiles, landscapes, etc.

In the embodiment, after the keywords are received, the search engine searches the Internet, so that a large number of images can be obtained quickly, and the construction speed of the image data set is improved greatly.

Step S202, randomly grouping the image data sets into a plurality of training sets.

Specifically, if an image is directly taken out from the image dataset, the image has a certain probability of not matching with the keywords; when a plurality of images are taken out from the image data set, the probability that none of the plurality of images matches the keyword is extremely small, and as long as one of the plurality of images matches the keyword, the whole formed by the plurality of images can be considered to match the keyword, and the keyword can be regarded as a label of the whole.

Therefore, the server randomly groups the image data sets to obtain a plurality of groups of training sets. Assuming that the probability of mismatching the images in the image data set with the keywords is ζ, because each image has independence, the probability p of correct training set label is:

p＝1-ζ^K (1)

Wherein K is the number of images in the training set, and K is a positive integer. It is readily appreciated that as K increases, the probability of the training set label being correct will increase rapidly.

Step S203, inputting a plurality of groups of training sets into the fine granularity classification initial model to obtain attention weighting vectors of images in the plurality of groups of training sets.

The fine-grained classification initial model may be a fine-grained classification model that has not been trained. The attention weighting vector may be a vector representation of the processed output of each image, which has undergone a weighting process by the attention mechanism.

Specifically, the server inputs a plurality of groups of training sets into a convolution layer of a fine-granularity classification initial model, the convolution layer carries out convolution processing on each image in each group of training sets, and the vector in the convolution layer is weighted by combining an attention mechanism to obtain an attention weighted vector of each image.

The vector in the convolution layer is used for fine-grained image classification, the attention mechanism aims at carrying out bipolar differentiation on the vector in the convolution layer, the vector related to the keyword is strengthened by the attention mechanism, the vector unrelated to the keyword is weakened by the attention mechanism, and therefore the fine-grained image classification initial model can learn better according to the strengthened vector, and the classification accuracy is improved. An attention detector may be provided in the fine-grained image classification initial model, with an attention mechanism implemented by the attention detector.

Step S204, the attention weighted vectors are pooled to generate training examples corresponding to a plurality of groups of training sets respectively.

The training example is fusion of images in a training set, and attention weighting vectors of the images in the training set are combined.

Specifically, a pooling layer can be set in the fine-grained image classification initial model, and the pooling layer carries out global average pooling on the attention weighting vectors, so that training examples of the training set are respectively generated. The training examples fuse the image features of the images in the training set for further fine-grained image classification.

In one embodiment, the formula for global average pooling is:

wherein h _n is a training example, d is the scale of feature map in the model, k is the kth picture in the training set, An attention weighting vector representing the image region of the kth picture (i, j) in the nth training set.

In step S205, the obtained training examples are input into a classifier of the fine-grained classification initial model to calculate model loss.

Specifically, the server inputs the training examples into a classifier of the fine-granularity classification initial model, and the classifier classifies according to the training examples and outputs classification results. The server may calculate model loss based on the classification result and the label with the keyword as the label.

And S206, adjusting model parameters of the fine-granularity classification initial model according to the model loss to obtain a fine-granularity classification model.

Specifically, the server adjusts model parameters of the fine-grained classification initial model with the model loss as a target, training is continued after the model parameters are adjusted each time, and when the model loss meets the training stop condition, training is stopped, so that the fine-grained classification model is obtained. The training stopping condition may be that the model loss is smaller than a preset loss threshold.

The adjusted model parameters include parameters in the convolutional layer, the attention detector, and the classifier. After training, the attention detector can effectively identify image areas in the image which are irrelevant to the keywords, inhibit or weaken attention weighted vectors of the image areas, and strengthen the attention weighted vectors of the image areas relevant to the keywords.

It should be emphasized that, to further ensure the privacy and security of the model parameters described above, the trained model parameters may also be stored in a blockchain node.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

In the embodiment, the image data set is directly built through the search engine according to the keywords, and the image data set can be rapidly expanded through the Internet, so that the speed of building the image data set is improved; because the images are mutually independent, the image data sets are randomly grouped into a plurality of groups of training sets, so that the negative influence of the images which do not accord with the labels is reduced; inputting a plurality of groups of training sets into a fine-granularity classification initial model, and calculating attention weighting vectors of the input images by fusing an attention mechanism to strengthen image areas related to keywords in the images so that the model concentrates on the image areas related to classification; generating a training example according to the attention weighted vector, wherein the training example comprises the characteristics of each image in the corresponding training set; after the training examples are input into the classifier to obtain model loss, model parameters are adjusted according to the model loss, and a fine-granularity classification model capable of accurately classifying is obtained, so that fine-granularity image classification processing is rapidly and accurately realized.

Further, the step S203 may include: respectively inputting each image in a plurality of groups of training sets into a convolution layer of a fine-granularity classification initial model to obtain convolution feature vectors of each image region in each image; calculating a regularized attention score of the convolution feature vector by an attention detector; the regularized attention score is used for representing the association degree of the image area and the keywords; and correspondingly multiplying the regularized attention score with the convolution feature vector to obtain an attention weighting vector of each image.

The convolution feature vector may be a vector representation output after the convolution layer convolves the image region in each image.

Specifically, the server inputs each image in a plurality of groups of training sets into a convolution layer of a fine-granularity image classification initial model, and the convolution layer outputs convolution feature vectors of each image area in each image after convolution processing. The image area may be a unit of pixels, or may be a unit of a plurality of pixels, for example, a unit of 2×2 pixels and 3*3 pixels.

For each training set, the server sums up the convolution feature vectors and inputs the convolution feature vectors to the attention detector, and the attention detector calculates regularized attention scores of the convolution feature vectors according to the weights and the offsets.

The regularized attention score may characterize a degree of association of the image region corresponding to the convolution feature vector with the keyword, and the higher the degree of association, the greater the regularized attention score may be. For each image, the server multiplies the convolution feature vector by the corresponding regularized attention score to obtain an attention weighting vector.

In one embodiment, the step of inputting each image in the plurality of sets of training sets into the convolution layer of the fine-grained classification initial model to obtain the convolution feature vector of each image region in each image includes: inputting a plurality of groups of training sets into a convolution layer of a fine-grained classification initial model; acquiring a convolution characteristic diagram of the output of a last convolution layer of the convolution layers; and setting vectors corresponding to the image areas in the convolution feature map as convolution feature vectors.

The convolution feature map may be a vector matrix, and each sub-matrix of the convolution feature map corresponds to each image region in the image.

Specifically, the convolution layer may be formed of a plurality of sub-layers, and performs a multi-layer convolution process on the input training set. The last layer of convolution layers is the last layer of convolution layers, the server acquires a convolution feature map output by the last layer of convolution layers, the submatrices at all positions in the convolution feature map correspond to all image areas in the image, and vectors corresponding to all image areas in the convolution feature map are used as convolution feature vectors.

In this embodiment, the training set is input into the convolution layer, and the convolution feature image output by the last convolution layer is obtained, where the vectors in the convolution feature image correspond to each image area in the image, and the convolution feature vectors can be accurately extracted according to the correspondence.

In one embodiment, let theRepresenting a convolution feature vector corresponding to an (i, j) image region on a kth picture in an nth training set, and the attention detector calculates an attention score/>, according to the convolution feature vectorWherein:

f(x)＝ln(1+exp(x)) (4)

Wherein w epsilon R ^c and b epsilon R respectively represent the weight and the bias of the attention detector, are key factors for strengthening or weakening the image area by the attention detector, and can be obtained by adjusting model parameters.

After the attention score is obtained by the attention detector, regularization operation can be carried out on the attention score, the attention score is compressed into a [0,1] interval, and the regularized attention score is obtained

Where ε is a constant, may be an empirical value, for regularizing the attention scoreDistribution is more reasonable, if epsilon and/>, are not presentVery small, may result in very small/>Corresponding to a large/>If epsilon is set reasonably, a small/>Will cause/>Where d is the scale of the feature map in the model. After regularized attention scores are obtained, the convolution feature vectors and the regularized attention scores corresponding to the convolution feature vectors are multiplied element by element, so that the vector representation/>, weighted by the regularized attention scores, can be obtainedI.e. attention weighting vector/>Wherein +..

In this embodiment, the images in the training set are input into the convolution layer to obtain the convolution feature vectors of each image region in the images, the attention detector is used to introduce an attention mechanism, the convolution feature vectors are calculated to obtain regularized attention scores, the regularized attention scores can be used as weights of the convolution feature vectors, attention weighted vectors are obtained after corresponding multiplication, and the attention weighted vectors are used to strengthen or inhibit the image regions, so that the fine-granularity classification initial model can perform targeted learning.

Further, the step S205 may include: inputting the obtained training examples into a classifier to calculate classifier loss; calculating a regularization factor according to the convolution feature vector; and performing linear operation on the classifier loss and the regularization factor to obtain model loss.

The classifier loss can be a loss calculated by the classifier; the model loss can be the total loss obtained by calculation of the fine-grained classification initial model; the regularization factor may be a factor that regularizes the classifier loss.

Specifically, the server inputs the training examples into a classifier of the fine-grained classification initial model, the classifier classifies according to the training examples, outputs classification results, and calculates classifier losses according to the classification results.

The attention mechanism in the application aims at enabling regularized attention scores of one or a plurality of image areas to have higher values in images matched with keywords in the training set; for images that do not match keywords or are not related to fine-grained image classification, the regularized attention score for each image region should be close and low. In order to achieve the above object in training, the present application separately sets a regularization factor in addition to classifier loss. The negative sample in the application is used as noise interference, and regularization of attention calculation can be realized.

Specifically, the regularization factor is calculated from the convolution feature vector. After the regularization factor is obtained by the server, the classifier loss and the regularization factor are added linearly, and the model loss of the model layer is obtained.

In this embodiment, the training examples are input into the classifier to calculate the classifier loss, and then the regularization factor is calculated according to the convolution feature vector to further strengthen or inhibit the image, and the model loss is obtained based on the linear operation of the classifier loss and the regularization factor, so that the fine-grained classification initial model can more reasonably adjust the model parameters according to the model loss.

Further, the step of inputting the obtained training examples into the classifier to calculate the classifier loss includes: inputting the obtained training examples into a classifier to obtain fine granularity categories of each image in the training examples; setting the keywords as instance labels; and calculating the classifier loss of the training example according to the example label and the fine granularity category of each image in the training example.

The fine-grained category may be a classification result output by the classifier.

Specifically, the server inputs the training examples into a classifier of the fine-granularity classification initial model, the classifier classifies according to the training examples, and outputs a plurality of fine-granularity categories, wherein the number of the fine-granularity categories is equal to the number of images in the training set.

The keywords can be used as instance labels, and the server calculates classifier losses as a whole by training instances according to the output fine granularity categories and the instance labels.

In one embodiment, the classifier loss is a cross entropy loss, calculated as follows:

Wherein, F _n is the fine granularity class output in the training example, y _n is the example label, and L _class is the classifier loss.

When regularization factors are calculated from the convolution feature vectors, a second attention score is definedSecond attention score/>Unlike/>, which is involved in regularized attention score computationWherein:

wherein, Positive samples from the training set, or negative samples from the training set; b is the offset of the attention detector. When/>From the negative sample in the training set, the attention mechanism aims at achieving/>When/>From the positive samples in the training set, the attention mechanism aims at realizing at least one image area, so that/>Combining the two cases, the regularization factor is as follows:

wherein δ _n = {1, -1}, when the image is a positive sample, then take 1, otherwise take 0.

The regularization factor and the classifier loss h _n are subjected to linear operation, so that model loss exists:

L＝L_class+λR (9)

wherein λ is a weight used to adjust the relative importance of classifier loss and regularization factor; r is the regularization factor in equation (8).

The specific effects of the attention mechanism are as follows: if both images are from the training set, one is related to fine-grained image classification and related to the keywords, the regularized attention score is pushed high in the image area related to the keywords; for images that are not related to fine-grained image classification or to keywords, the regularized attention score tends to zero on average across image regions, and the classifier does not pay too much attention to these regions, i.e., features of these regions are less considered in learning or classification. Therefore, the attention mechanism in the application can filter out the image areas irrelevant to the fine-granularity image classification task or keywords in the images of the training set, and can also detect the image areas contributing to fine-granularity image classification in the images.

In this embodiment, the training examples are input into the classifier to obtain fine-grained categories, then the keyword is used as an example label, and the training examples are used as a whole to calculate the classifier loss, so that the classifier loss is ensured to take the information fused in the training examples into consideration.

Further, after the step S206, the method may further include: acquiring an image to be classified; inputting the images to be classified into a fine-granularity classification model to obtain attention weighting vectors of the images to be classified; generating a test instance of the image to be classified based on the attention weighting vector; and inputting the test instance into a classifier of the fine-granularity classification model to obtain the fine-granularity category of the image to be classified.

Specifically, the server obtains a fine-grained classification model after training. When the method is applied, the image to be classified is acquired, and the image to be classified can be sent by the terminal. The server inputs the images to be classified into the convolution layers of the fine-granularity classification model, and the output of the last convolution layer of the convolution layers is input to the attention detector to obtain attention weighting vectors of all image areas in the images to be classified.

Unlike the training process, in which multiple images are input at a time, one image can be input at a time during test application, so that a pooling layer is not needed during application test, and a test instance of the image to be classified can be obtained according to the attention weighting vector. In the test example, the image area related to the fine-granularity image classification is enhanced, the image area unrelated to the fine-granularity image classification is restrained, the test example is input into a classifier, the classifier processes according to the test example, and the fine-granularity class of the image to be classified is output.

In the embodiment, when a test is applied, an image to be classified is input into the fine-granularity classification model to obtain a test example, the test example strengthens an image area related to fine-granularity image classification, suppresses an image area unrelated to a fine-granularity image classification task, and enables the classifier to accurately output fine-granularity categories.

The processing of the fine-grained classification model is described through a specific application scene, the identification of the swan type is taken as an example, swans are of a large class, swans and swans in swans are of a subclass, and the models for identifying the swans and swans are the fine-grained classification model.

In the training stage, a large number of images are acquired from the Internet according to the black swan, and an image data set is obtained. The image data sets are randomly grouped into a plurality of sets of training sets, and the black swan is labeled by each set of training sets. The images in the training set are input into a convolution layer of the fine-granularity classification initial model to obtain convolution feature vectors, the convolution feature vectors are input into an attention detector to obtain attention weighting vectors, and the attention weighting vectors are pooled to obtain training examples. The training examples integrate the characteristics of the images in the training set, the images related to the black swan in the images are enhanced by the attention detector, the images which do not accord with the black swan (such as the images of the white swan) are restrained by the attention detector, namely the attention detector filters the information in the images, so that the model can concentrate on learning. The classifier classifies according to the training examples and calculates model loss, the fine-granularity classification model adjusts model parameters according to the model loss to strengthen the attention detector and the classifier, and the fine-granularity classification model can be obtained after training is completed.

The fine-grained classification initial model can learn the characteristics of two swans, namely black swans and swans in training. When more subclasses of the fine-grained image classification task are provided, images of other subclasses can be collected again for supplementary training. For example, images of white swans may be re-acquired for additional training.

When the fine-granularity classification model is used, an image to be classified is input into the model, the fine-granularity classification model calculates the attention weighting vector of the image to be classified and generates a test instance, the test instance weights the image to be classified, and the region of the image to be classified, which is useful for fine-granularity classification, is reinforced. After the test examples are input into the classifier, the classifier can accurately identify whether the image is a black swan or a white swan according to the test examples, and fine-grained image classification is achieved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for processing a fine-grained classification model based on image detection, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 3, the fine-granularity classification model processing apparatus 300 based on image detection according to the present embodiment includes: a data set construction module 301, a data set grouping module 302, a data set input module 303, an instance generation module 304, a loss calculation module 305, and a parameter adjustment module 306, wherein:

The data set construction module 301 is configured to construct an image data set by a search engine based on the received keywords.

A data set grouping module 302, configured to randomly group the image data sets into a plurality of training sets.

The data set input module 303 is configured to input a plurality of training sets into the fine-granularity classification initial model, so as to obtain attention weighting vectors of images in the plurality of training sets.

The instance generating module 304 is configured to pool the attention weighted vectors to generate training instances corresponding to the training sets respectively.

The loss calculation module 305 is configured to input the obtained training examples into a classifier of the fine-grained classification initial model to calculate model loss.

The parameter adjustment module 306 is configured to adjust model parameters of the fine-grained classification initial model according to model loss, and obtain a fine-grained classification model.

In some optional implementations of this embodiment, the data set construction module 301 includes: receiving submodule, searching submodule and constructing submodule, wherein:

And the receiving sub-module is used for receiving the keywords sent by the terminal.

And the searching sub-module is used for sending the keywords to the search engine so as to instruct the search engine to search images from the Internet according to the keywords.

A construction sub-module for constructing an image dataset based on the searched image.

In some optional implementations of this embodiment, the data set input module 303 includes: a dataset input sub-module, a score computation sub-module, and a phase multiplication sub-module, wherein:

and the data set input sub-module is used for respectively inputting each image in the plurality of groups of training sets into the convolution layer of the fine-granularity classification initial model to obtain the convolution feature vector of each image area in each image.

The score calculation sub-module is used for calculating regularized attention scores of the convolution feature vectors through the attention detector; wherein the regularized attention score is used to characterize the degree of association of the image region with the keywords.

And the phase multiplication submodule is used for correspondingly multiplying the regularized attention score with the convolution characteristic vector to obtain an attention weighting vector of each image.

In some optional implementations of this embodiment, the data set input submodule includes:

and the training set input unit is used for inputting a plurality of groups of training sets into the convolution layer of the fine-granularity classification initial model.

And the output acquisition unit is used for acquiring a convolution characteristic map of the final convolution layer output of the convolution layer.

And the vector setting unit is used for setting the vector corresponding to each image area in the convolution characteristic diagram as a convolution characteristic vector.

In some optional implementations of this embodiment, the loss calculation module includes: an loss calculation sub-module, a factor calculation sub-module, and a linear operation sub-module, wherein:

and the loss calculation sub-module is used for inputting the obtained training examples into the classifier to calculate the classifier loss.

And the factor calculation sub-module is used for calculating the regularization factor according to the convolution feature vector.

And the linear operation sub-module is used for carrying out linear operation on the classifier loss and the regularization factor to obtain model loss.

In some optional implementations of this embodiment, the loss calculation submodule includes: an example input unit, a tag setting unit, and a loss calculation unit, wherein:

The example input unit is used for inputting the obtained training examples into the classifier to obtain the fine granularity category of each image in the training examples.

And a tag setting unit configured to set the keyword as an instance tag.

The loss calculation unit is used for calculating the classifier loss of the training example according to the example label and the fine granularity category of each image in the training example.

In some optional implementations of this embodiment, the fine-granularity classification model processing apparatus 300 based on image detection further includes: the device comprises an acquisition module to be classified, an input module to be classified, a test generation module and a test input module, wherein:

the to-be-classified acquisition module is used for acquiring the to-be-classified image.

The to-be-classified input module is used for inputting the to-be-classified image into the fine-granularity classification model to obtain the attention weighting vector of the to-be-classified image.

And the test generation module is used for generating a test instance of the image to be classified based on the attention weighting vector.

The test input module is used for inputting the test instance into the classifier of the fine-granularity classification model to obtain the fine-granularity class of the image to be classified.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used for storing an operating system and various types of application software installed on the computer device 4, such as computer readable instructions of a fine-grained classification model processing method based on image detection. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the fine-grained classification model processing method based on image detection.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The computer apparatus provided in the present embodiment may perform the steps of the fine-grained classification model processing method based on image detection described above. The steps of the fine-grained classification model processing method based on image detection herein may be the steps of the fine-grained classification model processing method based on image detection of the above-described respective embodiments.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the fine-grained classification model processing method based on image detection as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The fine granularity classification model processing method based on image detection is characterized by comprising the following steps of:

adjusting model parameters of the fine-granularity classification initial model according to the model loss to obtain a fine-granularity classification model;

the step of inputting the resulting training examples into a classifier of the fine-grained classification initial model to calculate model loss includes:

calculating a regularization factor according to a convolution feature vector, wherein the convolution feature vector is obtained based on a last convolution layer after each image in the training sets is input into the fine-granularity classification initial model; the regularization factor enables regularized attention scores of at least one image area to have a higher value in images matched with the keywords, regularized attention scores of all the image areas to be close to and lower than each other in images which are not matched with the keywords or are irrelevant to fine-granularity image classification, and convolution feature vectors and regularized attention scores are used for calculating attention weighting vectors;

2. The image detection-based fine-grained classification model processing method according to claim 1, wherein the step of constructing an image dataset by a search engine based on the received keywords comprises:

Receiving keywords sent by a terminal;

An image dataset is constructed based on the searched image.

3. The method for processing the fine-granularity classification model based on image detection according to claim 1, wherein the step of inputting the plurality of sets of training sets into the fine-granularity classification initial model to obtain the attention weighting vector of each image in the plurality of sets of training sets comprises:

4. The method for processing the fine-granularity classification model based on image detection according to claim 3, wherein the step of inputting each image in the plurality of sets of training sets into a convolution layer of the fine-granularity classification initial model to obtain a convolution feature vector of each image region in each image comprises:

5. The method of claim 1, wherein the step of inputting the resulting training examples into a classifier to calculate classifier losses comprises:

setting the keywords as instance tags;

6. The image detection-based fine-grained classification model processing method according to any of the claims 1-5, further comprising, after the step of adjusting model parameters of the fine-grained classification initial model according to the model loss, the step of obtaining a fine-grained classification model:

Acquiring an image to be classified;

7. A fine-grained classification model processing device based on image detection, characterized by comprising:

The parameter adjustment module is used for adjusting the model parameters of the fine-granularity classification initial model according to the model loss to obtain a fine-granularity classification model;

The inputting the resulting training examples into a classifier of the fine-grained classification initial model to calculate model loss includes:

8. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the fine-grained classification model processing method based on image detection of any of claims 1-6.

9. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the fine-grained classification model processing method based on image detection of any of claims 1 to 6.