CN113780284B

CN113780284B - Logo detection method based on target detection and metric learning

Info

Publication number: CN113780284B
Application number: CN202111090918.6A
Authority: CN
Inventors: 吕晨; 吴志强
Original assignee: Focus Technology Co Ltd
Current assignee: Focus Technology Co Ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2024-04-19
Anticipated expiration: 2041-09-17
Also published as: CN113780284A

Abstract

The invention discloses a logo detection method based on target detection and metric learning, which is characterized by comprising the following steps of: the method specifically comprises the following steps: s1, constructing and training a logo detection model; s2, constructing and training a logo feature extraction model; and S3, detecting whether a candidate logo target exists in the picture to be detected, and determining whether the candidate logo target is a logo category in a logo retrieval picture library. The logo position in the commodity poster data can be efficiently detected, the candidate region of the logo is determined, the newly added brand can be identified, and redundant detection is realized. The model is not required to be retrained, and the accuracy of brand logo feature extraction can be improved more effectively. The complexity of the system is greatly simplified, the recall rate is improved through logo detection, the accuracy is improved through logo feature extraction and retrieval, and compared with a single target detection method, the recognition effect is better.

Description

Logo detection method based on target detection and metric learning

Technical Field

The invention relates to the field of computer vision, in particular to a logo detection method based on target detection and metric learning.

Background

With the development of online shopping, infringement of commodity pictures in web pages becomes more serious. For massive commodity pictures, if manual auditing is performed, a large amount of manpower and material resources are consumed, so automatic logo infringement detection becomes very important.

In the prior art, for the logo with infringement, when the number of categories is small, basic requirements can be met through target detection, but when the category of the logo with infringement is continuously expanded, retraining the model each time becomes complicated, and meanwhile time is wasted, so that the expandability of the logo detection model becomes important.

Disclosure of Invention

The invention aims to solve the technical problem of overcoming the defects of the prior art and providing a logo detection method which is good in expansibility, efficient and accurate. Identifying whether logo exists in the picture to be detected by using a detection technology, extracting logo picture features by using a feature extraction model based on measurement learning, recalling similar logo pictures by using a retrieval technology, and judging the category of the logo by voting.

In order to solve the technical problems, the invention provides a logo detection method based on target detection and metric learning, which is characterized in that a candidate logo target in a picture is detected by using a target detection technology, the candidate logo target is subjected to feature extraction by using a metric learning technology, and finally the candidate logo is judged by logo retrieval to determine the category of the candidate logo, and the method specifically comprises the following steps:

step S1, constructing and training a logo detection model, which specifically comprises the following steps:

S1-1, constructing a Logo detection data set, wherein commodity pictures in the Logo detection data set are taken from a Logo database, and the Logo database is LogoDet-3K;

S1-2, constructing and training a logo detection model, wherein the logo detection model is a two-class model developed based on yolov target detection algorithm and is used for detecting whether a picture contains a logo or not;

s2, constructing and training a logo feature extraction model, which specifically comprises the following steps:

s2-3, constructing a logo classification data set;

S2-4, constructing and training a logo feature extraction model, wherein the logo feature extraction model is constructed based on measurement learning, inputting a logo classification data set for training, and taking out the features of the last convolution layer as picture features after the logo feature extraction model is trained;

S2-5, constructing a logo retrieval picture library for candidate logo retrieval judgment;

S2-6, extracting feature vectors of logo pictures in a logo retrieval picture library, inputting the logo pictures into a logo feature extraction model, and outputting and storing corresponding feature vectors for retrieval;

Step S3, detecting whether a candidate logo object exists in the picture to be detected, if so, further extracting logo picture characteristics, comparing the logo image characteristics with a logo retrieval picture library, and determining whether the logo object is a logo category in the logo retrieval picture library;

S3-7, detecting the picture to be detected by using a trained logo detection model to obtain the position of a candidate logo, and intercepting the candidate logo from an original picture;

step S3-8, scaling the picture resolution of the intercepted candidate logo to 256 x 256, inputting a logo feature extraction model for feature extraction to obtain feature vectors, calculating cosine similarity between the feature vectors of the candidate logo and the feature vectors stored in the logo retrieval picture library in step S2-6, and returning the nearest 10 samples according to cosine distances, wherein the definition of the cosine distances is as follows:

A, B are feature vectors of the vectors A and B, respectively, and the category of logo pictures is determined through sample voting.

Step S1-2 further comprises that when commodity pictures in the logo detection data set are sent into the logo detection model for training, brand categories are all changed into single categories, tags are set to be the logos, the specific brands of the logos are not judged at the moment, and category confidence coefficient parameters of the logo detection model are set to be 0.2 at the same time, a large number of redundant logo detection targets exist at the moment, so that higher recall rate of the logo detection part can be guaranteed, and the loss condition of the logos is reduced as much as possible.

In the step S2-3, a logo classification data set is constructed, 500 brands are selected, 200 logo pictures are intercepted by each brand, and the logo picture size is reset to 256 x 256.

In the step S2-5, a logo feature extraction model is constructed based on EFFICIENTNET algorithm, features are converted into 1792-dimensional features by Avgpooling for the features of the last convolution layer, training loss is based on ArcFace algorithm, and the mathematical expression is as follows:

in the step S3-8, the sample voting rule is as follows: the cosine distance returns the nearest 10 samples, if the number of the classes belonging to the same class is not less than 7 and the maximum similarity value is greater than 0.6, the class of the logo to be detected is determined to be a voting class, otherwise, the logo to be detected is directly discarded without a specific class.

In the step S2-5, the hyper-parameters of ArcFace algorithm are configured as follows: the weight s is 30, the margin is 0.5, and the initial learning rate of model training is 1e-4.

The beneficial effects achieved by the invention are as follows:

1. The method can efficiently detect the logo positions in the commodity poster data, determine the candidate areas of the logos, realize that a new brand can be identified without distinguishing a specific brand through the mixed training of various brands of logos, effectively improve the model recall rate and realize redundant detection.

2. The feature extraction model constructed based on the measurement learning is beneficial to the extraction of the feature vector of the newly added brand logo, retraining of the model is not needed, and the accuracy of the feature extraction of the brand logo can be improved more effectively based on the measurement learning method.

3. Compared with the traditional method for directly detecting and identifying logo based on target detection, the method can be used for adding different brands of logo at will without retraining a model, so that the complexity of a system is greatly simplified, the recall rate is improved through logo detection, the accuracy is improved through logo feature extraction and retrieval, and compared with the method for single target detection, the method is better in identification effect.

Drawings

FIG. 1 is a method flow diagram of an exemplary embodiment of the present invention;

fig. 2 is a schematic diagram of a model structure of an exemplary embodiment of the present invention.

Detailed Description

The invention provides a logo detection method based on target detection and metric learning, which utilizes a target detection technology to locate candidate logos in commodity pictures, and then utilizes a metric learning technology to search detection results to determine categories, and comprises the following steps:

Step 1, constructing and training a logo detection model;

The Logo detection data is constructed, because the category types of the Logo are more, the labeling of the detection data is relatively large in workload, and a great deal of labor is consumed if the Logo is labeled by crawling the data from the beginning, so that the Logo detection data can be constructed by using some published Logo data, such as Logo-3k, logo-2k and the like.

The logo detection model is mainly developed based on yolov and can be used for positioning the candidate areas of the logo, so that the specific category of the logo can be omitted. Based on the reasons, when the Logo detection model is trained, only one two-class model is trained, and only the foreground and the background in the picture are considered. Thus, the detection part only needs to ensure the recall height of the model, and thus, the candidate logo regions can be detected.

And carrying out model reasoning on the commodity poster picture by using the trained model, and positioning the candidate logo region.

And 2, constructing and training a logo feature extraction model.

And (3) constructing logo classified data, wherein the data of the part can be constructed based on the detection data in the step (1), and the classified data set can be constructed by only taking out the corresponding logo data from the marked document because the logo region is selected in the construction process of the detection data.

The logo feature extraction model can be constructed and trained, feature extraction can be realized by using a characteristic learning mode and also can be realized by using a measurement learning mode, and because the gap between the logo is not large, details are more emphasized, the logo picture extraction effect is poor by using the model trained by the classification method, and therefore, the measurement learning is adopted. The measurement learning can fully mine the fine difference between logo pictures, amplify the difference and realize the separation of logo features of different categories in vector space. The process of model training is the same as the normal process of classification model training, except that the classified loss function is replaced by a measured loss function, and the cross entropy is replaced by ArcFace in the model. And after the feature extraction model training is completed, taking out the last layer of features of the model as the features of the picture. ArcFace super parameters are configured as follows: the weight s is 30, the margin is 0.5, and the initial learning rate of model training is 1e-4.

The logo picture library is mainly used as a recalled picture library, and the picture library data is constructed with high quality requirements. Since the features extracted from the high-quality pictures can represent the center vectors of the categories, the retrieval result is more accurate. And at least 200 pictures are selected from each category according to different categories.

Extracting logo feature vectors, inputting pictures in a logo picture library into a logo feature extraction model, outputting corresponding feature vectors, and storing the feature vectors. The saved feature vectors are used for retrieval of the subsequent model.

And 3, retrieving and recalling logo pictures.

And (2) performing first-step target detection on the new poster picture, positioning a candidate region of the logo, inputting the candidate region to a logo feature extraction model for feature extraction, calculating cosine distance between extracted features and feature vectors stored in the step (2) to determine similarity, returning the nearest 10 samples according to the distance, and finally determining category through sample voting.

In the step 2, the disclosed dataset has the problem of unbalanced category, and if the feature extraction model is directly trained based on the dataset, the difficulty of training is increased. The model is used for extracting the characteristics, and the long tail problem is not needed to be solved, so that the category with smaller data quantity is removed in the model training,

In the step 2, the model is mainly constructed based on EFFICIENTNET model, the last convolution layer of the effect model is used for converting the feature into the 1792-dimension feature through Avgpooling, other structures are kept unchanged, the training loss is based on ArcFace, and the mathematical expression is as follows:

And 3, extracting features from the candidate region positioned in the step 1 through the model in the step 2, and then calculating cosine similarity with the logo feature vector in the step 2 to obtain a final logo category. Wherein the cosine distance is defined as follows:

wherein A, B are the eigenvectors of vectors a and B, respectively.

According to the invention, a logo detection model based on YOLOv is constructed and trained and used for detecting whether candidate logos exist in pictures, a feature vector extraction model based on efficientnetb is constructed and used for extracting feature vectors of the logos, a recall model based on picture similarity is constructed, optimal practical parameters are obtained through regulation rules, and each time of recall of a top10 similar logo is determined finally when the number of the same categories is not less than 7 and the maximum similarity is greater than 0.6.

The invention is further described below with reference to the drawings and exemplary embodiments:

as shown in fig. 1, a logo detection algorithm based on target detection and metric learning in this example mainly includes the following steps:

step S1: the collection and labeling of logo detection data is mainly to collect sufficient data to train a target detection model.

Step S2: and training a logo detection model, wherein the detection model is mainly trained based on yolov, and the detection model is mainly used for detecting candidate areas in poster data.

Step S3: and carrying out logo positioning on the poster data by using a trained detection algorithm, and determining the logo position in the poster.

Step S4: the logo classification data is mainly constructed by buckling the detected marking frame based on logo detection data to obtain logo classification data.

Step S5: and constructing and training a logo feature extraction model, wherein the logo feature extraction model is mainly constructed by a logo classification algorithm based on metric learning.

Step S6: feature extraction is carried out on logo data in gallery libraries, and feature vectors are generated for subsequent logo retrieval.

Step S7: and searching the detected candidate region to determine the category of the last logo.

In the step S1, the detection model may be trained based on the published data sets of logo3k and logo2k, or the detection data set may be constructed by crawling data for labeling.

In the step S2, the detection model is mainly trained based on yolov, but may be based on other detection algorithms yolov, SSD, etc. The single-stage model is mainly used for detecting the high-speed and high-efficiency.

In the step S3, the trained detection model is used to detect and locate the test data to obtain the logo region of the picture data candidate.

In the step S4, the classification data for metric learning is constructed, and this part of data is mainly constructed based on the above-described detection data.

In the step S5, the feature extraction model is mainly constructed by using a metric learning algorithm, and the model training is mainly trained by using arcface loss.

In the step S6, feature extraction is performed on the data in the gallery library to obtain a final feature vector, and the feature matrix of logo is obtained by performing forward propagation on the data in the library for one time.

FIG. 2 is a schematic block diagram of a logo detection method based on target detection and metric learning according to the present invention.

The module 1 is a candidate logo detection module, and is used for detecting whether a logo possibly exists in a picture, redundancy detection is achieved by reducing confidence coefficient of a candidate frame, a plurality of candidate logo areas are sampled, and the candidate logo areas are output as coordinates of the candidate logo in an original picture.

The module 2 is a logo vector generation module, and is used for generating a corresponding logo search picture vector library for a logo search picture, generating 1792-bit feature vectors for each picture, and generating 1792-dimensional feature vectors for candidate logo areas by intercepting coordinates of candidate logos in an original picture in the original picture and scaling the coordinates to 256×256. The module outputs 1792-dimensional floating point feature vectors for picture conversion.

The module 3 is a logo retrieval identification module and is used for calculating cosine similarity for candidate logo feature vectors and logo retrieval picture vectors, each candidate logo feature vector can obtain 10 retrieval library picture vectors with the nearest cosine distance, 10 nearest retrieval picture targets are obtained, and the specific category of the candidate logo is determined through the rule that the number of the same category is not less than 7 and the maximum similarity is greater than 0.6.

The beneficial effects achieved by the invention are as follows:

The above embodiments are not intended to limit the present invention in any way, and all other modifications and applications of the above embodiments which are equivalent to the above embodiments fall within the scope of the present invention.

Claims

1. A logo detection method based on target detection and metric learning is characterized in that a candidate logo target in a picture is detected by using a target detection technology, the candidate logo target is subjected to feature extraction by using a metric learning technology, and finally the candidate logo is judged by logo retrieval to determine the category of the candidate logo, and the method specifically comprises the following steps:

s2-3, constructing a logo classification data set;

A, B are feature vectors of the vectors A and B respectively, and the category of logo pictures is determined through sample voting;

step S1-2 further comprises that when commodity pictures in the logo detection data set are sent to a logo detection model for training, brand categories are all changed into single categories, tags are set to be logo, and category confidence parameters of the logo detection model are set to be 0.2;

In the step S2-5, the hyper-parameters of ArcFace algorithm are configured as follows: the weight s is 30, the margin is 0.5, and the initial learning rate of model training is 1e-4;

in the step S3-8, the sample voting rule is as follows: the cosine distance returns the nearest 10 samples, if the number of the classes belonging to the same class is not less than 7 and the maximum similarity value is greater than 0.6, the class of the logo to be detected is determined to be a voting class, otherwise, the logo to be detected has no specific class.

2. The logo detection method based on target detection and metric learning as claimed in claim 1, wherein: in the step S2-3, a logo classification data set is constructed, 500 brands are selected, 200 logo pictures are intercepted by each brand, and the logo picture size is reset to 256 x 256.