CN116680435B

CN116680435B - Similar image retrieval matching method based on multi-layer feature extraction

Info

Publication number: CN116680435B
Application number: CN202310967050.6A
Authority: CN
Inventors: 陈彤; 陈松辉; 杨丰玉; 熊宇; 吴超庆
Original assignee: Nanchang Hangkong University
Current assignee: Nanchang Hangkong University
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2024-01-19
Anticipated expiration: 2043-08-03
Also published as: CN116680435A

Abstract

The invention provides a similar image retrieval matching method based on multi-layer feature extraction, which is characterized in that a plurality of convolutional neural networks are selected and combined to obtain a feature extractor, the feature extractor is trained by utilizing a transfer learning technology, and image retrieval matching optimization is carried out by utilizing a KMeans clustering and sub-space dividing method.

Description

Similar image retrieval matching method based on multi-layer feature extraction

Technical Field

The invention relates to the technical field of image analysis, in particular to a similar image retrieval matching method based on multi-layer feature extraction.

Background

At present, similar image retrieval matching is a basic and important technology in the field of computer vision, and can be applied to many aspects in industrial production. Taking the automobile insurance damage assessment as an example, in the automobile insurance damage assessment link, multiple aspects of image shooting and uploading systems are required to be carried out on accident sites, the authenticity consideration of the images is an important and tedious work, and the situation that similar images are used for impersonating real images to carry out continuous claims so as to cheat insurance claim reimbursement, so that economic losses of insurance companies are caused, therefore, how to carry out similar image retrieval matching on the images by using an accurate and efficient means to assist in investigation is a critical and significant task.

In the related art, although there are image feature extraction and matching technology based on traditional manual features and feature extraction and matching technology based on deep learning, they cannot achieve strong robustness, the accuracy of searching and matching complex and varied images is low, and in the searching and matching link, a large amount of time is often consumed in large-scale vector concentration using linear searching and matching, which is unacceptable in application scenes.

Disclosure of Invention

The invention aims to provide a similar image retrieval matching method based on multi-layer feature extraction, which aims to solve the technical problems of low accuracy and low speed of similar image investigation in the prior art.

A similar image retrieval matching method based on multi-layer feature extraction comprises the following steps:

s1, collecting related image data of a vehicle accident to form a reference image set, generating a similar image set by using the reference image set, and constructing an original data set by the reference image set and the similar image set;

s2, selecting a plurality of convolutional neural networks, combining to obtain a feature extractor, and training the feature extractor by utilizing a transfer learning technology based on an original data set to obtain a trained feature extractor;

s3, respectively extracting features of the images in the existing image library and the images to be searched by using the trained feature extractor to obtain feature sets of the existing image library and feature vectors of the images to be searched;

s4, dividing the existing image library feature set into a plurality of sub feature vector sets by using a KMeans clustering method;

s5, calculating cosine similarity between the feature vector of the image to be searched and the center of each sub-feature vector set, arranging the feature vector according to descending order, and selecting a target sub-feature vector set corresponding to the cosine similarity with the preset number which is ranked at the front;

s6, calculating cosine similarity of the feature vector of the image to be retrieved and the feature vector in each target sub-feature vector set, and outputting an image corresponding to the feature vector with the largest cosine similarity as a matching result;

in step S4, feature sets are stored for existing images，/>IncludednEach object hasmAttributes of individual dimensions, first need to be initializedkThe clustering centers are used for calculating the Euclidean distance from each object to each clustering center through the following formula;

wherein,representing an existing image library feature set +.>The first of (3)iPersonal object(s)>，/>Represent the firstjClustering center (S)/(S)>，/>Representation->And->European distance,/, of->Representation->T-th property of->Representation->Is the t-th attribute of (2);

distributing all objects in the feature set of the existing image library to clusters corresponding to the cluster centers closest to the object to obtainkAnd updating the cluster center of each cluster to be the average value of all objects of each cluster in each dimension, wherein the calculation formula is as follows:

wherein,represent the firstlCluster center of individual clusters,/>，/>Represent the firstlCluster (S)>Represent the firstlThe number of objects in a cluster,/-, and>represent the firstlThe first of the clusterspAn object;

and then, recalculating the Euclidean distance from each object to the clustering center of each cluster, and continuously iterating until the change of the clustering center is smaller than a threshold value, wherein a plurality of clusters obtained by a clustering algorithm are a plurality of sub-feature vector sets divided by the feature set of the existing image library, and the clustering center corresponding to each cluster is the feature set center of each sub-vector.

According to the similar image retrieval matching method based on multi-layer feature extraction, a plurality of convolutional neural networks are selected and combined to obtain the feature extractor, the feature extractor is trained by utilizing a migration learning technology, and image retrieval matching optimization is carried out by utilizing a KMeans clustering subspace dividing method.

Drawings

Fig. 1 is a flowchart of a similar image retrieval matching method based on multi-layer feature extraction according to an embodiment.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a similar image retrieval matching method based on multi-layer feature extraction includes steps S1 to S6:

s1, collecting vehicle accident related picture data to form a reference image set, generating a similar image set by using the reference image set, and constructing an original data set by the reference image set and the similar image set.

The step S1 specifically includes:

collecting a plurality of field images in a car insurance scene as a reference image set;

constructing an image transformation function according to a predefined independent variable and a variation distribution range of the independent variable, and transforming a reference image so as to construct a similar image set, wherein the predefined independent variable comprises cutting, overturning, rotating, noise, brightness variation and blurring;

the original dataset is constructed from the reference image set and the similar image set.

The invention collects a plurality of field images in a car insurance scene as a reference image set, and simultaneously, the invention repeatedly claims the problem based on a similar image in the anti-fraud field of car insurance, wherein the images in the specific scene are often different images of the same main body shot from different angles and different distances in the same time period. The pictures mainly relate to visual angle change and scale change, and the actual scene is complex and changeable, so that various conditions such as illumination change, noise, shielding and the like can exist. In view of the lack of the reference image set, the invention produces a similar image set by implementing a multi-scale varying similar image generation method.

In the process of generating a similar image set, according to specific tasks and conditions, determining the change range of each transformation mode, and taking the change parameters as a series of independent variables, wherein the parameters comprise cutting, overturning, rotating, noise, brightness change, blurring and the like, and the specific steps are as follows: the clipping is random clipping with short side square, turning is horizontal turning (i.e. mirror turning), the rotation angle is randomly extracted from [ -45, -30, -15,15,30,45] list, gaussian noise variance is [10,20,30,40], brightness change is [0.5,0.7,1.3,1.5] times, and blur kernel is in [1,2,3,4 ]. According to the predefined independent variables and the variation distribution ranges of the independent variables, an image transformation function is constructed, the reference image is transformed, a group of analog images are randomly selected and generated from the value ranges set by the parameters according to the six modes respectively to serve as similar images of the reference image, and a similar image set is constructed. Finally, the reference image set and the similar image set constructed according to the plurality of transformation methods together constitute the original data set.

S2, selecting a plurality of convolutional neural networks, combining to obtain a feature extractor, and training the feature extractor by utilizing a transfer learning technology based on an original data set to obtain the trained feature extractor.

The step S2 specifically includes:

three convolutional neural networks VGG16, denseNet121 and ResNet50 are selected to form a feature extractor. The depth of the VGG16 is deeper, a small convolution kernel is adopted, the receptive field is large, the shallow layer coarse granularity characteristic of the picture is extracted, and in the training stage, the VGG16 performs model training by using a cross entropy loss function and a random gradient descent (SGD) optimization algorithm; denseNet121 connects densely, raise the efficiency of the characteristic reuse, reduce and fit excessively, draw deep texture of the picture and characteristic of fine granularity of edge, it is made up of 4 Dense blocks (Dense Block) and 3 Transition layers (Transition Layer). Each dense block comprises a plurality of convolution layers, and residual connection enables the convolution layers in each dense block to directly receive the input of all previous layers, and meanwhile, the structure is simple and training is easy. Meanwhile, batch normalization (Batch Normalization) and ReLU activation functions are used in each dense block, so that the nonlinear characteristics and robustness of the model are further enhanced. ResNet50 has a deeper network structure and a residual structure, and can more effectively extract deep image features.

The method comprises the steps of taking a pretraining model trained on an ImageNet data set by three convolutional neural networks VGG16, denseNet121 and ResNet50 as a basis of transfer learning, and then training all three pretraining models on an original data set, wherein a reference image set is used as a verification set, a similar image set is used as a training set, model parameters of a feature extractor are optimized, and accordingly the trained feature extractor is obtained and is adapted to a specific scene of a car risk. The feature extractor obtained through training can fully represent the original image from each level, and multi-model feature fusion is realized.

And S3, respectively extracting the characteristics of the images in the existing image library and the images to be searched by using the trained characteristic extractor to obtain the characteristic set of the existing image library and the characteristic vector of the images to be searched.

The invention aims at solving the problem of repeated claim of the similar graph in the field of anti-fraud of car insurance, and aims at selecting an image which is most similar to an image to be searched from the existing image library. In the present invention, the feature extraction process of one image includes: and extracting 512-dimensional, 1024-dimensional and 2048-dimensional sub-feature vectors from one image (an image in an existing image library or an image to be searched) by using a feature extractor obtained through training, and sequentially splicing the three sub-feature vectors after normalization to finally obtain the feature vector corresponding to the image.

In step S3, the normalization processing method selects an L2 norm normalization method, which divides the original vector by its second-order norm, i.e. euclidean norm, so that the L2 norm of the processed vector is 1, which indicates that the direction is unchanged, but the length is 1, which can map the original feature vector into a fixed range, in order to accelerate the subsequent search matching process. Specifically, the following conditional expression is satisfied:

wherein,representing onewVector of dimension,/->Representation vector->The first of (3)rPersonal attributes (i.e.)>Representing the L2 norm normalized vector.

Through L2 norm normalization, all vectors can be equal in length, the difference of pixel values of different areas is eliminated, image feature extraction is more robust, and therefore retrieval matching is more stable and reliable.

And extracting the characteristics of each image in the existing image library, wherein a plurality of obtained characteristic vectors form the characteristic set of the existing image library. And carrying out feature extraction on the image to be searched to obtain the feature vector of the image to be searched.

S4, dividing the existing image library feature set into a plurality of sub feature vector sets by using a KMeans clustering method.

Wherein for an existing image library feature set，/>IncludednEach object hasmAttributes of individual dimensions, first need to be initializedkThe clustering centers are used for calculating the Euclidean distance from each object to each clustering center through the following formula;

wherein,representing an existing image library feature set +.>The first of (3)iPersonal object(s)>，/>Represent the firstjClustering center (S)/(S)>，/>Representation->And->European distance,/, of->Representation->T-th property of->Representation->Is the t-th attribute of (c).

wherein,represent the firstlCluster center of individual clusters,/>，/>Represent the firstlCluster (S)>Represent the firstlThe number of objects in a cluster,/-, and>represent the firstlThe first of the clusterspAnd (3) the objects.

S5, calculating cosine similarity between the feature vector of the image to be searched and the center of each sub-feature vector set, arranging the feature vector according to descending order, and selecting a target sub-feature vector set corresponding to the cosine similarity with the preset number and the front sequence.

The cosine similarity between the feature vector of the image to be searched and the center of each sub feature vector set is calculated by adopting the following formula:

wherein Vec is the feature vector of the image to be retrieved,representing Vec andcosine similarity of>Represents the norm of Vec ++>Representation->Is a norm of (c).

The larger the cosine similarity, the smaller the included angle between the two feature vectors, and the more similar the corresponding images. Specifically, the cosine similarity between the calculated picture feature vector to be detected and the center of each sub-feature vector set is arranged in a descending order, and the sub-feature vector set corresponding to the first three of the cosine similarity is obtained.

And S6, calculating cosine similarity of the feature vector of the image to be retrieved and the feature vector in each target sub-feature vector set, and outputting an image corresponding to the feature vector with the maximum cosine similarity as a matching result.

The feature vector with the maximum cosine similarity meets the following conditional expression:

wherein,representing the maximum value of cosine similarity of the feature vector of the image to be retrieved with the feature vector in the respective target sub-feature vector set,maxrepresenting a maximum calculation function, +.>Representation->Is a norm of (c).

Will eventuallyAnd outputting the corresponding image as a matching result.

In order to verify the effectiveness of the method provided by the invention, three experiments are designed to respectively verify the effectiveness of the migration learning method, the effectiveness of the multi-model feature fusion method and the effectiveness of the division subspace retrieval method.

In order to better adapt to the specific scene of the invention, the migration learning based on the reference image set and the similar image set is performed, and the parameters are finely adjusted to achieve better performance. The similar image set is used as training data, the reference image set is used as verification data, 100 rounds of training are performed by using cross entropy loss, and the accuracy is used as a performance evaluation index. The accuracy of the three models before and after the transfer learning is shown in table 1.

TABLE 1 comparison of accuracy of models before and after transfer learning

The three deep learning methods selected by the invention can respectively extract the effective information of the image from different angles and levels to serve as feature vectors, and the effect optimization is realized by fusing the features of the three models, and the effect pairs are shown in a table 2.

TABLE 2 accuracy contrast of single model to model feature fusion

In addition to building efficient feature extractors, optimizing search matching at large data volumes is also one of the focus of the present invention. And a search mode based on KMeans clustering division subspace is used, so that the speed of image search matching is improved. The performance pairs of the brute force search and the partitioning subspace search based on kmens clusters are shown in table 3.

TABLE 3 comparison of the performance of traversal and sub-space-divided search modes at different data volumes

In summary, according to the similar image retrieval matching method based on multi-layer feature extraction provided by the invention, a plurality of convolutional neural networks are selected and combined to obtain the feature extractor, the feature extractor is trained by utilizing a migration learning technology, and image retrieval matching optimization is performed by utilizing a KMeans clustering subspace dividing method.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A similar image retrieval matching method based on multi-layer feature extraction is characterized by comprising the following steps:

then, re-calculating the Euclidean distance from each object to the clustering center of each cluster, and continuously iterating until the change of the clustering center is smaller than a threshold value, wherein a plurality of clusters obtained by a clustering algorithm are a plurality of sub-feature vector sets divided by the feature set of the existing image library, and the clustering center corresponding to each cluster is the feature set center of each sub-vector;

the step S2 specifically includes:

three convolutional neural networks of VGG16, denseNet121 and ResNet50 are selected to form a feature extractor, and in a training stage, the VGG16 uses a cross entropy loss function and a random gradient descent optimization algorithm to perform model training; the DenseNet121 consists of 4 dense blocks and 3 transition layers, each dense block comprises a plurality of convolution layers, the residual connection is that the convolution layers in each dense block can directly receive the input of all previous layers, and meanwhile, batch normalization and ReLU activation functions are used in each dense block;

the method comprises the steps of taking a pretraining model trained on an ImageNet data set by adopting three convolutional neural networks VGG16, denseNet121 and ResNet50 as a basis of transfer learning, and then training all three pretraining models on an original data set, wherein a reference image set is used as a verification set, a similar image set is used as a training set, and model parameters of a feature extractor are optimized, so that the trained feature extractor is obtained.

2. The method for matching search of similar images based on multi-layer feature extraction as claimed in claim 1, wherein step S1 specifically comprises:

3. The method for matching similar image retrieval based on multi-layer feature extraction according to claim 2, wherein in step S3, the process of feature extraction of the image or the image to be retrieved in the existing image library is as follows:

and extracting 512-dimensional, 1024-dimensional and 2048-dimensional sub-feature vectors from the images in the existing image library or the images to be retrieved by using the trained feature extractor, and sequentially splicing the three sub-feature vectors after normalization processing to finally obtain the corresponding feature vectors.

4. The similar image retrieval matching method based on multi-layer feature extraction as claimed in claim 3, wherein in step S3, the normalization processing method selects an L2 norm normalization method, which satisfies the following conditional expression:

5. The method for matching search of similar images based on multi-layer feature extraction according to claim 4, wherein in step S5, the cosine similarity between the feature vector of the image to be searched and the center of each sub-feature vector set is calculated by using the following formula:

wherein Vec is the feature vector of the image to be retrieved,represents Vec and +.>Cosine similarity of>Represents the norm of Vec ++>Representation->Is a norm of (c).

6. The method for matching search of similar images based on multi-layer feature extraction according to claim 5, wherein in step S6, the feature vector with the largest cosine similarity satisfies the following conditional expression: