CN106557533B

CN106557533B - Single-target multi-image joint retrieval method and device

Info

Publication number: CN106557533B
Application number: CN201510672504.2A
Authority: CN
Inventors: 郭阶添; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2015-09-24
Filing date: 2015-10-16
Publication date: 2020-03-06
Anticipated expiration: 2035-10-16
Also published as: CN106557533A

Abstract

The invention discloses a method and a device for single-target multi-image combined retrieval. The method comprises the following steps: receiving a plurality of images comprising the same retrieval target; respectively generating differential weights of the images; fusing the features of the retrieval targets in the multiple images according to the difference weights to generate a feature set; retrieving in an image database based on the feature set to obtain a first retrieval result; and performing weighted fusion on the first retrieval result according to the difference weight, and outputting a weighted and fused second retrieval result. Extracting difference weights and feature sets from a plurality of samples of a retrieval target, retrieving based on the feature sets, and weighting the difference weights into a retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher.

Description

Single-target multi-image joint retrieval method and device

Technical Field

The invention relates to the field of image processing, in particular to a method and a device for single-target multi-image joint retrieval.

Background

The image retrieval is widely applied to the application fields of video monitoring, intelligent analysis, pattern recognition and the like. In the conventional image retrieval, a database is generally searched by extracting a single image to extract features. In a video monitoring scene, the quantity of video data and image data is huge. The images at various angles may be numerous for various lighting conditions, including the same target, that the user grasps. It is a common practice for a user to search one by one and view the search results.

The existing retrieval scheme cannot fuse results of a plurality of query images of one target, and has lower recall ratio for the same target under different illumination conditions and angles.

Disclosure of Invention

The invention aims to provide a method and a device for single-target multi-image joint retrieval, which are characterized in that difference weights and feature sets are extracted from a plurality of samples, retrieval is carried out based on the feature sets, and the difference weights are weighted into retrieval results; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher.

In order to realize the purpose, the following technical scheme is adopted:

on one hand, the method for single-target multi-image joint retrieval is adopted, and comprises the following steps:

receiving a plurality of images comprising the same retrieval target;

respectively generating differential weights of the images;

fusing the features of the retrieval targets in the multiple images according to the difference weights to generate a feature set;

retrieving in an image database based on the feature set to obtain a first retrieval result;

and performing weighted fusion on the first retrieval result according to the difference weight, and outputting a weighted and fused second retrieval result.

In another aspect, an apparatus for single-target multi-image joint retrieval is adopted, including:

a retrieval target receiving unit for receiving a plurality of images including the same retrieval target;

the difference weight calculation unit is used for respectively generating difference weights of the images;

a feature set generating unit configured to fuse features of the search targets in the plurality of images according to the difference weights to generate a feature set;

the primary retrieval unit is used for retrieving in an image database based on the feature set to obtain a first retrieval result;

and the weighted fusion unit is used for performing weighted fusion on the first retrieval result according to the difference weight and outputting a weighted-fused second retrieval result.

The invention has the beneficial effects that: extracting difference weights and feature sets from a plurality of samples of a retrieval target, retrieving based on the feature sets, and weighting the difference weights into a retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher.

Drawings

Fig. 1 is a flowchart of a method of a first embodiment of a single-target multi-image joint search method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a single-target multi-image joint search method according to a second embodiment of the present invention.

Fig. 3 is a block diagram illustrating a first embodiment of an apparatus for single-target multi-image joint search according to an embodiment of the present invention.

Fig. 4 is a block diagram illustrating a second embodiment of an apparatus for single-target multi-image joint search according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

Please refer to fig. 1, which is a flowchart illustrating a method for joint retrieval of single-target multiple images according to a first embodiment of the present invention. The method in the embodiment is mainly used for image retrieval in various videos, particularly monitoring videos. As shown, the method includes:

step S101: a plurality of images including the same retrieval target are received.

For the retrieval target, the features extracted from a single image are not sufficient to fully describe it. When the environment or local features where the retrieval target is located change, the retrieval may be incomplete or inaccurate. In the scheme, a plurality of images of the same retrieval target are received, the characteristics of the retrieval target are comprehensively extracted from the plurality of images, and the retrieval target is subjected to multi-directional retrieval on the basis of multi-directional characteristic extraction, so that the comprehensiveness of the retrieval is improved.

Step S102: and respectively generating the difference weights of the plurality of images.

The difference (or similarity) between the images is realized by an image retrieval method, which is equivalent to performing image retrieval in a small range among all samples carried in multiple images.

Step S103: and fusing the features of the retrieval targets in the plurality of images according to the difference weights to generate a feature set.

And performing feature extraction and feature fusion on a plurality of samples of a single target to generate features which more fully describe the target.

Step S104: and searching in an image database based on the feature set to obtain a first search result.

Searching an image related to a retrieval target in the whole image database by using the fused feature set, wherein the retrieval mode is the same as that in the step S101, and the image retrieval is realized in various ways in the prior art, which is not further described herein.

Step S105: and performing weighted fusion on the first retrieval result according to the difference weight, and outputting a weighted and fused second retrieval result.

After the first retrieval result is weighted and fused by the difference weight, the accuracy of the retrieval result under multiple characteristic angles can be judged, and a more comprehensive judgment result is obtained.

In summary, the difference weights and the feature sets are extracted from a plurality of samples of the retrieval target, retrieval is performed based on the feature sets, and the difference weights are weighted into the retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher.

Please refer to fig. 2, which is a flowchart illustrating a method of a single-target multi-image joint search according to a second embodiment of the present invention, as shown in the figure, the method includes:

step S201: a plurality of images including the same retrieval target are received.

The multiple images are preferably images with certain differences, such as images acquired under different angles, different illumination and different backgrounds of the vehicle.

Step S202: and extracting the local features of the retrieval target, and carrying out compression coding on the local features to generate a target model.

The local features extracted from the input multiple target region images include, but are not limited to, SIFT, SURF, and the like. And performing coding compression on the extracted local features by using an offline trained model by using methods including but not limited to visual word Bag (BOW), Hamming embedding, local sensitive hashing and CDVS (compact description for image retrieval), and generating a target model.

Step S203: retrieving the target models and returning the similarity of any two target models in a sequence from high to low; the similarity is the similarity truncated by a preset similarity threshold.

During retrieval, visual word Bag (BOW), Hamming embedding, local sensitive hashing and CDVS (compact description for image retrieval) methods are used for retrieval, and the image similarity of the specified similarity threshold truncation is returned and ranked from high to low.

Step S204: and calculating the average similarity of each target model and other target models in the similarity ranking.

The average similarity

And i ≠ j.

Step S205: and calculating an average difference according to the average similarity, and performing L2 normalization on the average difference to obtain a difference weight.

The average degree of difference t_i＝1-r_i,0≤i＜k；

The L2 normalization specifically includes:

w_i＝t_i/sum,0≤i＜k；

where k denotes the total number of samples of the retrieval target, s_i,jRepresenting the similarity of the sample i and the sample j; w is a_iRepresenting the dissimilarity weight.

The method comprises the steps of describing differences among a plurality of input samples of the same retrieval target, calculating the similarity of each sample with other samples by using an image retrieval method, wherein the similarity with other samples is high, the difference weight is small, the similarity with other samples is low, and the difference weight is great.

Suppose k (k) is given for a single target>1) Calculating the similarity of the samples with other samples in the k sample image sets by using an image retrieval algorithm to obtain the similarity s between every two samples except the k sample image sets_i,jI is more than or equal to 0, j is less than k and i is not equal to j, and the average similarity r can be obtained by calculating the similarity with other samples_i。

Step S206: and obtaining feature word bag centers of all features of the retrieval target through off-line training.

The off-line training mode comprises kmeans and GMM.

Step S207: and respectively calculating the membership states of the feature and the feature bag center.

And respectively calculating nearest neighbors of Euclidean distances between the features and the feature word bag center, and counting the membership condition according to the nearest neighbors.

Step S208: and removing redundant information at the center of the feature word bag according to the distance between the features to obtain a feature set.

The distance between the features is used for describing the similarity between the features, and the larger the distance is, the smaller the similarity is; the smaller the distance, the greater the similarity; if the distance is 0, it indicates that the two are identical. And calculating the distance between every two characteristics belonging to the same characteristic bag center, performing mean value fusion on all the characteristics with the distance smaller than a specified threshold value, and replacing the original characteristics with the fused characteristics.

The calculation formula of the characteristic bag-of-words center of the characteristic is as follows:

the calculation formula of the distance between the features is as follows:

the calculation formula of the mean value fusion of the features is as follows:

wherein f is_i,j(0. ltoreq. i < n, 0. ltoreq. j < d) represents a characteristic, c_i,j(i is more than or equal to 0 and less than or equal to m, and j is more than or equal to 0 and less than d) represents the center of the word bag; k represents the total number of samples of the retrieval target; n represents the total number of features, and m represents the number of feature bag centers; d represents the dimension of the feature; n is_thd,iRepresenting the number of features whose distance is less than a specified threshold.

Step S209: and searching in an image database based on the feature set to obtain a first search result.

Step S210: and performing weighted fusion on the first retrieval result according to the difference weight, and outputting the sequence of the similarity from high to low after weighted fusion.

The calculation of the similarity after the weighted fusion specifically comprises the following steps:

wherein, w_iRepresenting a difference weight, t_i,jRepresenting the similarity between the image in the first retrieval result and the retrieval target, wherein k represents the total number of samples of the retrieval target; l denotes the total number of images in the first search result.

In summary, the difference weights and the feature sets are extracted from a plurality of samples of the retrieval target, retrieval is performed based on the feature sets, and the difference weights are weighted into the retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher. Meanwhile, the further limitation of data processing at each level is realized, the operation efficiency of the scheme is improved, and the precision ratio and the recall ratio of the picture retrieval are improved.

The following is an embodiment of a single-target multi-image joint retrieval apparatus provided in the detailed description of the present invention, and the embodiment of the single-target multi-image joint retrieval apparatus is implemented based on the above-mentioned embodiment of the single-target multi-image joint retrieval method, and please refer to the above-mentioned embodiment of the single-target multi-image joint retrieval method, which is not described in the embodiments of the single-target multi-image joint retrieval apparatus.

Referring to fig. 3, it is a block diagram illustrating a first embodiment of an apparatus for single-target multi-image joint search according to an embodiment of the present invention, as shown in the drawing, the apparatus includes:

a retrieval target receiving unit 310 configured to receive a plurality of images including the same retrieval target;

a difference weight calculating unit 320, configured to generate difference weights for the multiple images respectively;

a feature set generating unit 330, configured to fuse features of the search targets in the multiple images according to the difference weights to generate a feature set;

a primary retrieving unit 340, configured to retrieve, based on the feature set, in an image database to obtain a first retrieval result;

and a weighted fusion unit 350, configured to perform weighted fusion on the first search result according to the difference weight, and output a weighted-fused second search result.

In summary, in the cooperative work of the units, the difference weights and the feature sets are extracted from the multiple samples of the retrieval target, retrieval is performed based on the feature sets, and the difference weights are weighted into the retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher.

Referring to fig. 4, it is a block diagram illustrating a second embodiment of an apparatus for single-target multi-image joint search according to an embodiment of the present invention, as shown in the drawing, the apparatus includes:

Wherein the difference weight calculating unit 320 includes:

a target model generating module 321, configured to extract local features of the search target, perform compression coding on the local features, and generate a target model;

a similarity truncation module 322, configured to retrieve the target models and return a ranking of similarity of any two target models from high to low; the similarity is the similarity cut off by a preset similarity threshold;

a similarity accumulation module 323, configured to calculate an average similarity between each target model and other target models in the similarity ranking;

a difference normalization module 324, configured to calculate an average difference according to the average similarity, and perform L2 normalization on the average difference to obtain a difference weight.

Wherein the feature set generating unit 330 includes:

the offline training module 331 is configured to perform offline training to obtain feature word bag centers of all features of the search target;

a membership state judging module 332, configured to calculate membership states of the feature and the center of the feature word bag, respectively;

and the redundant eliminating module 333 is used for eliminating the redundant information at the center of the feature bag according to the distance between the features to obtain a feature set.

The weighted fusion unit 350 is specifically configured to:

and performing weighted fusion on the first retrieval result by using the difference weight, and outputting the sequence of the similarity from high to low after weighted fusion.

Wherein the average similarity

And i is not equal to j;

the average degree of difference t_i＝1-r_i,0≤i＜k；

The L2 normalization specifically includes:

w_i＝t_i/sum,0≤i＜k；

The off-line training mode comprises kmeans and GMM;

the membership status determining module 332 is specifically configured to:

respectively calculating nearest neighbors of Euclidean distances between the features and the feature word bag center, and counting membership conditions according to the nearest neighbors;

the redundant elimination module 333 is specifically configured to:

and calculating the distance between every two characteristics belonging to the same characteristic bag center, performing mean value fusion on all the characteristics with the distance smaller than a specified threshold value, and replacing the original characteristics with the fused characteristics.

Wherein, the calculation formula of the feature bag center of the feature is as follows:

the calculation formula of the distance between the features is as follows:

the calculation formula of the mean value fusion of the features is as follows:

Wherein, the calculation of the similarity after the weighted fusion specifically comprises the following steps:

In summary, the above functional modules cooperate with each other to extract difference weights and feature sets from a plurality of samples of a retrieval target, perform retrieval based on the feature sets, and weight the difference weights into a retrieval result; and a plurality of samples of a single target are fused for retrieval, so that the retrieval result is more visual, and the precision ratio and the recall ratio are higher. Meanwhile, the further limitation of data processing at each level is realized, the operation efficiency of the scheme is improved, and the precision ratio and the recall ratio of the picture retrieval are improved.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Although the embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the invention.

Claims

1. A method for single-target multi-image joint retrieval is characterized by comprising the following steps:

receiving a plurality of images comprising the same retrieval target;

respectively generating differential weights of the images;

2. The method of claim 1, wherein the separately generating the differential weights for the plurality of images comprises:

extracting local features of the retrieval target, and carrying out compression coding on the local features to generate a target model;

retrieving the target models and returning the similarity of any two target models in a sequence from high to low; the similarity is the similarity cut off by a preset similarity threshold;

calculating the average similarity of each target model and other target models in the similarity sequence;

and calculating an average difference according to the average similarity, and performing L2 normalization on the average difference to obtain a difference weight.

3. The method according to claim 1, wherein the fusing the features of the retrieval target in the plurality of images according to the difference weights to generate a feature set comprises:

off-line training to obtain the bag centers of the feature words of all the features of the retrieval target;

respectively calculating membership states of the characteristic and the characteristic word bag center;

and removing redundant information at the center of the feature word bag according to the distance between the features to obtain a feature set.

4. The method according to claim 1, wherein the performing weighted fusion on the first search result according to the difference weight and outputting a weighted-fused second search result comprises:

5. The method of claim 2, wherein the average similarity measure

I is more than or equal to 0, j is less than k, and i is not equal to j;

the average degree of difference t_i＝1-r_i,0≤i＜k；

The L2 normalization specifically includes:

w_i＝t_i/sum,0≤i＜k；

6. The method of claim 3, wherein the offline training modes include kmeans and GMMs;

the respectively calculating the membership states of the feature and the feature bag center comprises the following steps:

the removing of the redundant information of the center of the feature word bag according to the distance between the features to obtain a feature set comprises the following steps:

calculating the distance between every two characteristics belonging to the same characteristic bag center, performing mean value fusion on all characteristics with the distance smaller than a specified threshold value, and replacing the original characteristics with the fused characteristics;

the calculation formula of the distance between the features is as follows:

the calculation formula of the mean value fusion of the features is as follows:

wherein f is_i,j(0. ltoreq. i < n, 0. ltoreq. j < d) represents a characteristic, c_i,j(i is more than or equal to 0 and less than or equal to m, and j is more than or equal to 0 and less than d) represents the center of the word bag; k represents the total number of samples of the retrieval target; n represents the total number of features, m represents the bag of feature wordsThe number of centers; d represents the dimension of the feature; n is_thd,iRepresenting the number of features whose distance is less than a specified threshold.

7. The method according to claim 1, wherein the calculating of the weighted fused similarity comprises:

8. An apparatus for single-target multi-image joint retrieval, comprising:

9. The apparatus of claim 8, wherein the differential weight calculation unit comprises:

the target model generation module is used for extracting the local features of the retrieval target, carrying out compression coding on the local features and generating a target model;

the similarity truncation module is used for retrieving the target models and returning the sequence of the similarity of any two target models from high to low; the similarity is the similarity cut off by a preset similarity threshold;

the similarity accumulation module is used for calculating the average similarity of each target model and other target models in the similarity sequence;

and the difference degree normalization module is used for calculating the average difference degree according to the average similarity degree and carrying out L2 normalization on the average difference degree to obtain the difference weight.

10. The apparatus of claim 8, wherein the feature set generating unit comprises:

the off-line training module is used for off-line training to obtain the bag centers of the feature words of all the features of the retrieval target;

the membership state judging module is used for respectively calculating membership states of the characteristic and the bag center of the characteristic word;

and the redundant eliminating module is used for eliminating the redundant information at the center of the characteristic word bag according to the distance between the characteristics to obtain a characteristic set.

11. The apparatus according to claim 9, wherein the weighted fusion unit is specifically configured to:

12. The apparatus of claim 9, wherein the average similarity measure

I is more than or equal to 0, j is less than k, and i is not equal to j;

the average degree of difference t_i＝1-r_i,0≤i＜k；

The L2 normalization specifically includes:

w_i＝t_i/sum,0≤i＜k；

13. The apparatus of claim 10, wherein the offline training modes comprise kmeans and GMMs;

the membership state judgment module is specifically configured to:

the redundancy eliminating module is specifically used for:

the calculation formula of the distance between the features is as follows:

the calculation formula of the mean value fusion of the features is as follows:

wherein f is_i,j(0. ltoreq. i < n, 0. ltoreq. j < d) represents a characteristic, c_i,j(i is more than or equal to 0 and less than or equal to m, and j is more than or equal to 0 and less than d) represents the center of the word bag; k represents the total number of samples of the retrieval target; n represents TeThe total number of the characters, m represents the number of the centers of the character word bags; d represents the dimension of the feature; n is_thd,iRepresenting the number of features whose distance is less than a specified threshold.

14. The apparatus according to claim 8, wherein the calculation of the similarity after weighted fusion specifically comprises: