CN112287140A

CN112287140A - Image retrieval method and system based on big data

Info

Publication number: CN112287140A
Application number: CN202011173216.XA
Authority: CN
Inventors: 汪礼君
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2021-01-29

Abstract

The invention relates to the technical field of image retrieval, and discloses an image retrieval method based on big data, which comprises the following steps: acquiring mass image data, and performing distributed storage on the mass image data; preprocessing image graying and gray stretching on the stored massive image data; processing the preprocessed image data by using a multi-label semantic extraction algorithm based on a self-encoder to obtain multi-label semantic information of the image; establishing an image information connection graph according to the multi-label semantic information of the image; and storing the image data combined with the image information connection graph by using a data storage method based on the depth hash so as to take the hash code value as an image characteristic index, and performing image retrieval according to the image characteristic index. The invention also provides an image retrieval system based on the big data. The invention realizes the retrieval of the image.

Description

Image retrieval method and system based on big data

Technical Field

The invention relates to the technical field of image retrieval, in particular to an image retrieval method and system based on big data.

Background

In the internet era, a large amount of instant messaging software, office software, shopping platforms, game platforms and the like greatly facilitate and enrich the learning, life and work of people, and simultaneously generate massive multi-class, heterogeneous and unstructured data; the image data is explosively increased in quantity due to the characteristics of intuitiveness and large information, great convenience is brought to life of people, and higher requirements are brought to image retrieval by people due to the large quantity, the uneven quality, the complex application scene and the like.

Most current search engines retrieve images according to text keywords, which often do not match the true semantics of the images, so that retrieval performance is affected; meanwhile, in the current image retrieval, a mode of traversing images in sequence is mainly adopted, a good indexing mechanism is not provided for indexing, the load of a retrieval system is increased, and the traditional image retrieval method is mostly used for constructing an image index in a timing off-line mode based on existing data, so that the problem of poor timeliness of newly added image retrieval exists.

In view of this, how to extract more accurate image semantic information and perform index coding on the image semantic information, so as to perform image retrieval according to an image index, is a problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention provides an image retrieval method based on big data, which is characterized in that mass image data are stored in a distributed mode through an HDFS (Hadoop distributed File System), multi-label semantic information of the image data is stored by using a multi-label semantic extraction algorithm based on a self-encoder, an image connection map is established according to the multi-label semantic information of the image, finally, the image data combined with the image connection map information is stored by using a data storage method based on depth hash, and more efficient image retrieval is carried out according to hash indexes of the image.

In order to achieve the above object, the present invention provides an image retrieval method based on big data, comprising:

acquiring mass image data, and performing distributed storage on the mass image data;

preprocessing image graying and gray stretching on the stored massive image data;

processing the preprocessed image data by using a multi-label semantic extraction algorithm based on a self-encoder to obtain multi-label semantic information of the image;

establishing an image information connection graph according to the multi-label semantic information of the image;

and storing the image data combined with the image information connection graph by using a data storage method based on the depth hash so as to take the hash code value as an image characteristic index, and performing image retrieval according to the image characteristic index.

Optionally, the performing distributed storage on the mass image data includes:

1) merging k image data into 1 file, wherein k is set to be 10; converting the merged file into a byte code format;

2) creating an image storage table, and designing two column families MD (image data) and MI (image info), wherein the column families MD and MI respectively store byte codes and image information of files, and the image information comprises image id and image feature indexes; in a specific embodiment of the invention, the number of partitions is 9, and the RowKey range of each partition is N/(9k), where N is the total number of image data;

3) and storing the mass image data into an image storage table in the subarea, wherein the image characteristic index is set to be blank.

Optionally, the flow of preprocessing the stored massive image data for image graying and grayscale stretching is as follows:

1) obtaining a gray scale image of the stored image by solving the maximum value of three components of each pixel in the stored image and setting the maximum value as the gray scale value of the pixel point, wherein the formula of the gray scale processing is as follows:

G(i,j)＝max{R(i,j),G(i,j),B(i,j)}

wherein:

(i, j) is a pixel point in the stored image;

r (i, j), G (i, j) and B (i, j) are respectively the values of the pixel point (i, j) in R, G, B three color channels;

g (i, j) is the gray value of the pixel point (i, j);

2) according to the gray image, the formula for stretching the gray of the image by utilizing a piecewise linear transformation mode is as follows:

wherein:

f (x, y) is a grayscale image of the grayscale image;

MAX_f(x,y),MIN_f(x,y)respectively the maximum and minimum grey values of the grey map.

Optionally, the processing the preprocessed image data by using a self-encoder-based multi-tag semantic extraction algorithm includes:

1) constructing a denoising self-encoder with m layers, wherein the self-encoder takes the preprocessed image data set X as input and carries out self-encoding processing of the m layers; the self-encoding result of the image dataset X is:

wherein:

represents m copies of X;

is composed of

The data after the addition of the random noise,

δ_iis random noise;

2) for a single semantic label h and a single training sample x_iE to W, calculating k by using KNN algorithm_sAnx_iE W sample set knn of nearest neighbor with same label_sAnd k_dA and x_iE W sample set knn of nearest neighbors with different labels_d(ii) a Repeating the step until all semantic tags are traversed; all knn are added_sAggregate sum knn_dThe set constitutes a global geometric matrix L, where the ith action of the matrix is the ith training sample x_iSample set of (1), top k of row i_sSample set knn of number i training sample_sAfter k_dSample set knn of number i training sample_d；

3) For all semantic labels, obtaining a characterization matrix L of the global set matrix L by fusing the influence of a plurality of semantic labels on the geometric structure among the image samples_g：

Wherein:

y is a semantic label set, and Y is the number of semantic labels;

l is a global set characteristic;

4) solving the characterization matrix L_gThe characteristic vector set corresponding to r minimum non-zero characteristic values is obtained, the characteristic space formed by the set is the reduced characteristic space, the sample dimension of the characteristic space is r, and r is the number of image data; each feature vector is multi-label semantic information of corresponding image data.

Optionally, the step of establishing an image information connection graph according to the multi-label semantic information of the image is as follows:

the representation of the connection diagram is G ═ (V, E), where V ═ V₁,v₂,…,v_NIs the set of vertices, E is the set of edges; a node in the graph may be represented as a triplet (v, c)_v,f_v) V is a node identifier, representing an image; each node is associated with a self-encoding result c_vAnd a multi-label semantic information f_v；

The construction process of the image information connection graph comprises the following steps: and calculating Euclidean distances between self-coding results of the images, and establishing connection between k adjacent images of each image according to the Euclidean distances.

Optionally, the storing the image data combined with the image information connection map by using a data storage method based on depth hash includes:

using s from the image information connection map between images_ijIndicating information about the connection between two images, if s _ij1 means that there is an information connection between the two figures, and s is_ijIf 0, no information connection exists between the two graphs;

using sign function tanh (.) to combine s_ijThe image multi-label semantic information is subjected to hash coding, and a hash coding value is used as an image feature index, wherein the hash coding formula is as follows:

h_i＝tanh(h(W^TX_i+b_i))

X_i＝{s_ij,x_i1,…,x_im}

wherein:

w is a preset vector weight, which is set to 0.2 by the present invention;

b_iis an offset vector, b is more than or equal to 0_i≤1；

h is a hash function;

{x_i1,…,x_imthe image is the multi-label semantic information of the ith image, and m is the dimension of the multi-label semantic information;

s_ijconnecting information between the ith image and the jth image is obtained, and in detail, the jth image and the ith image are in the same data storage table;

the loss function of the hash-based data storage method is as follows:

wherein:

gamma is a hyperparameter, which the present invention sets to 0.02;

w_ijrepresenting the weight of each training, the invention adjusts the weight by using the following weight balance formula:

s denotes the set of S1 and S2, S1 denotes S_ijSet of 1, S0 denotes S_ijSet of 0;

d(h_i,h_j)＝1-cos(h_i,h_j) And represents the distance between image i and image j.

Optionally, the retrieving the image according to the image feature index includes:

image x to be retrieved_qCarrying out hash coding in the same way;

x is to be_qThe Hash code and the image feature index are used for calculating the Hamming distance, and the Hamming distance calculation results are sorted, wherein the calculation formula is as follows:

wherein:

h_qrepresenting a hash code of an image to be retrieved;

h_jrepresenting a hash code of the jth image in the database;

the smaller the distance between the two is, the closer the semantic information representing the two is, and the image retrieval result is quickly obtained by performing quick sequencing according to the Hamming distance.

In addition, to achieve the above object, the present invention also provides an image retrieval system based on big data, the system comprising:

the image data acquisition device is used for acquiring mass image data;

the image processor is used for preprocessing image graying and gray stretching of the stored massive image data and processing the preprocessed image data by using a multi-label semantic extraction algorithm based on a self-encoder to obtain multi-label semantic information of the image;

and the image retrieval device is used for establishing an image information connection graph according to the multi-label semantic information of the image, storing the image data combined with the image information connection graph by using a data storage method based on the depth hash, taking the hash code value as an image characteristic index, and retrieving the image according to the image characteristic index.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon image retrieval program instructions executable by one or more processors to implement the steps of the implementation method of big data based image retrieval as described above.

Compared with the prior art, the invention provides an image retrieval method based on big data, which has the following advantages:

firstly, most of the traditional image feature extraction modes are self-encoder-based feature extraction modes, the extracted feature space has stronger anti-jamming capability by carrying out feature extraction on original image data in a self-encoding stage, and the learning precision and the time performance of a semantic extraction algorithm constructed on the basis are restricted to a certain extent because the correlation between a label and the feature space is not considered and the dimension of the feature space is not reduced; the invention provides a multi-label semantic extraction algorithm based on a self-encoder, which comprises the steps of firstly adopting a self-encoder to obtain robust expression of an attribute space of a data set, then combining a new data space and different data labels into a plurality of data views, constructing a geometric relation among samples under a single view, describing common attributes of a plurality of image samples at each edge in the views, then constructing a manifold space based on a Laplace matrix corresponding to the plurality of data views under a plurality of semantic labels by a Laplace feature mapping method, finally constructing a complete characterization matrix by fusing the plurality of manifold spaces and obtaining a low-dimensional semantic space by decomposing characteristic values of the matrix, associating the multi-semantic labels with the characteristic space to enable extracted semantic features to contain more accurate multi-label classification information, and simultaneously reducing the characteristic dimensions of multi-semantic label data, the overfitting problem caused by high-dimensional semantic features is avoided.

Meanwhile, the image data combined with the image information connection graph is stored by using a data storage method based on the depth hash, so that the hash coding value is used as an image characteristic index, and the image retrieval is carried out according to the image characteristic index; because the Hash method maps the image features into binary codes, the distribution characteristics of the features in the original space are still kept, meanwhile, because the binary codes are used for representing the features, the feature matching time and the memory cost required by retrieval can be greatly reduced, and in order to carry out quicker and more accurate matching, the invention adopts a sign function tanh (.) -to-combination s_ijThe image multi-label semantic information is subjected to hash coding, and a hash coding value is used as an image feature index, wherein the hash coding formula is as follows:

h_i＝tanh(h(W^TX_i+b_i))

X_i＝{s_ij,x_i1,…,x_im}

wherein: w is a preset vector weight, which is set to 0.2 by the present invention; b_iIs an offset vector, b is more than or equal to 0_iLess than or equal to 1; h is a hash function; { x_i1,…,x_imThe image is the multi-label semantic information of the ith image, and m is the dimension of the multi-label semantic information; s_ijFor the connection information between the ith image and the jth image, the storage data table where the images are located can be located very quickly according to the connection information between the images, and faster query is realized.

Drawings

Fig. 1 is a schematic flowchart of an image retrieval method based on big data according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an image retrieval system based on big data according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method comprises the steps of performing distributed storage on massive image data through an HDFS (Hadoop distributed File System), storing multi-label semantic information of the image data by using a multi-label semantic extraction algorithm based on a self-encoder, establishing an image connection graph according to the multi-label semantic information of the image, storing the image data combined with the image connection graph information by using a data storage method based on depth hash, and performing more efficient image retrieval according to hash indexes of the image. Fig. 1 is a schematic diagram illustrating an image retrieval method based on big data according to an embodiment of the present invention.

In this embodiment, the image retrieval method based on big data includes:

and S1, acquiring the mass image data and storing the mass image data in a distributed manner.

Firstly, acquiring mass image data, and performing distributed storage on the mass image data; the distributed storage scheme of the mass data is as follows:

1) merging k image data into 1 file, and setting k as 10; converting the merged file into a byte code format;

And S2, preprocessing image graying and gray stretching on the stored mass image data.

Further, the invention carries out preprocessing of image graying and gray stretching on the stored massive image data;

in an embodiment of the present invention, the image preprocessing process includes:

G(i,j)＝max{R(i,j),G(i,j),B(i,j)}

wherein:

(i, j) is a pixel point in the stored image;

g (i, j) is the gray value of the pixel point (i, j);

wherein:

f (x, y) is a grayscale image of the grayscale image;

And S3, processing the preprocessed image data by using a multi-label semantic extraction algorithm based on a self-encoder to obtain multi-label semantic information of the image.

Further, the invention utilizes a multi-label semantic extraction algorithm based on a self-encoder to process the preprocessed image data to obtain multi-label semantic information of the image; the flow of the multi-label semantic extraction algorithm based on the self-encoder comprises the following steps:

1) constructing a m-layer denoising autoencoder, wherein the autoencoder takes a preprocessed image data set X as input, and solves a least square optimization problem with a global optimal solution through the m-layer autoencoding processing to extract robust expression of a feature space, so that the anti-interference performance of multi-label data is effectively improved; the self-encoding result of the image dataset X is:

wherein:

represents m copies of X;

is composed of

The data after the addition of the random noise,

δ_iis random noise;

2) for a single semantic label h and a single training sample x_iE to W, calculating k by using KNN algorithm_sA and x_iE W sample set knn of nearest neighbor with same label_sAnd k_dA and x_iE W sample set knn of nearest neighbors with different labels_d(ii) a Repeating the step until all semantic tags are traversed; all knn are added_sAggregate sum knn_dThe set constitutes a global geometric matrix L, where the ith action of the matrix is the ith training sample x_iSample set of (1), top k of row i_sSample set knn of number i training sample_sAfter k_dSample set knn of number i training sample_d；

Wherein:

y is a semantic label set, and Y is the number of semantic labels;

l is a global set characteristic;

And S4, establishing an image information connection graph according to the multi-label semantic information of the image.

Further, based on the multi-label semantic information of the image, the present invention builds a connected graph of the image information, the representation form of the connected graph is G ═ (V, E), wherein V ═ V ═ E₁,v₂,…,v_NIs the set of vertices, E is the set of edges; a node in the graph may be represented as a triplet (v, c)_v,f_v) V is a node identifier, representing an image; each node is associated with a self-encoding result c_vAnd a multi-label semantic information f_v；

And S5, storing the image data combined with the image information connection graph by using a data storage method based on the depth hash, thereby using the hash coding value as an image characteristic index and carrying out image retrieval according to the image characteristic index.

Further, the invention calculates the depth hash index value of the image by using a data storage method based on depth hash, and sets the depth hash index value of the image as an image feature index to finish the storage of the image data, wherein the flow of the image data storage is as follows:

h_i＝tanh(h(W^TX_i+b_i))

X_i＝{s_ij,x_i1,…,x_im}

wherein:

w is a preset vector weight, which is set to 0.2 by the present invention;

b_iis an offset vector, b is more than or equal to 0_i≤1；

h is a hash function;

the loss function of the hash-based data storage method is as follows:

wherein:

gamma is a hyperparameter, which the present invention sets to 0.02;

d(h_i,h_j)＝1-cos(h_i,h_j) Indicating the distance between image i and image j;

further, the image retrieval based on the image feature index comprises the following steps:

image x to be retrieved_qCarrying out hash coding in the same way;

wherein:

h_qrepresenting a hash code of an image to be retrieved;

h_jrepresenting a hash code of the jth image in the database;

The following describes a specific embodiment of the present invention through an algorithmic experiment, and tests the inventive search method. The hardware test environment of the algorithm of the invention is as follows: the processor is InterCore i 5-44603.2 GHz, the memory is 8G, the programming language is C #, and the database is MySQL; the contrast retrieval method is an image retrieval method based on an automatic encoder, an image retrieval method based on inverted index storage and an image retrieval method without an index.

In the algorithm experiment of the invention, the data set is 10000 different image data. In the experiment, 10000 different pieces of image data are stored and retrieved by different image retrieval methods, and the time required for completing retrieval is used as an evaluation index of the image retrieval method.

According to the experimental result, the retrieval time of the image retrieval method based on the automatic encoder is 15.18s, the retrieval time of the image retrieval method based on the inverted index storage is 16.28s, the retrieval time of the image retrieval method without the index is 21.32s, the retrieval time of the algorithm is 14.30s, and compared with a comparison algorithm, the image retrieval method based on the big data provided by the invention has higher image retrieval efficiency.

The invention also provides an image retrieval system based on the big data. Fig. 2 is a schematic diagram illustrating an internal structure of a big data based image retrieval system according to an embodiment of the present invention.

In the present embodiment, the big data based image retrieval system 1 includes at least an image data acquisition device 11, an image processor 12, an image retrieval device 13, a communication bus 14, and a network interface 15.

The image data acquiring device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.

Image processor 12 includes at least one type of readable storage medium including flash memory, a hard disk, a multi-media card, a card-type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. Image processor 12 may in some embodiments be an internal storage unit of the big data based image retrieval system 1, such as a hard disk of the big data based image retrieval system 1. The image processor 12 may also be an external storage device of the big data based image retrieval system 1 in other embodiments, such as a plug-in hard disk provided on the big data based image retrieval system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so on. Further, the image processor 12 may also include both an internal storage unit and an external storage device of the big-data based image retrieval system 1. The image processor 12 can be used not only to store application software installed in the big data based image retrieval system 1 and various kinds of data, but also to temporarily store data that has been output or is to be output.

Image retrieval device 13 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for executing program codes stored in image processor 12 or processing data, such as image retrieval program instructions.

The communication bus 14 is used to enable connection communication between these components.

The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.

Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the big data based image retrieval system 1 and for displaying a visualized user interface.

While FIG. 2 only shows the image retrieval system 1 with components 11-15 and based on big data, it will be understood by those skilled in the art that the structure shown in FIG. 1 does not constitute a limitation of the image retrieval system 1 based on big data, and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.

In the embodiment of device 1 shown in fig. 2, image processor 12 has stored therein image retrieval program instructions; the steps of the image retrieval device 13 executing the image retrieval program instructions stored in the image processor 12 are the same as the implementation method of the image retrieval method based on big data, and are not described here.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon image retrieval program instructions, which are executable by one or more processors to implement the following operations:

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An image retrieval method based on big data, characterized in that the method comprises:

2. The image retrieval method based on big data as claimed in claim 1, wherein the storing the massive image data in a distributed manner comprises:

2) creating an image storage table, and designing two column families MD (image data) and MI (image info), wherein the column families MD and M1 respectively store byte codes and image information of files, and the image information comprises id of an image and an image feature index; in a specific embodiment of the invention, the number of partitions is 9, and the RowKey range of each partition is N/(9k), where N is the total number of image data;

3. The image retrieval method based on big data as claimed in claim 2, wherein the pre-processing procedure of image graying and grayscale stretching for the stored massive image data is as follows:

G(i，j)＝max{R(i，j)，G(i，j)，B(i，j)}

wherein:

(i, j) is a pixel point in the stored image;

g (i, j) is the gray value of the pixel point (i, j);

wherein:

f (x, y) is a grayscale image of the grayscale image;

MAX_f(x，y)，MIN_f(x，y)respectively the maximum and minimum grey values of the grey map.

4. The big data-based image retrieval method of claim 3, wherein the processing the preprocessed image data by using the self-encoder-based multi-tag semantic extraction algorithm comprises:

wherein:

represents m copies of X;

is composed of

The data after the addition of the random noise,

δ_iis random noise;

Wherein:

y is a semantic label set, and Y is the number of semantic labels;

l is a global set characteristic;

5. The image retrieval method based on big data as claimed in claim 4, wherein the step of building the image information connection map according to the multi-label semantic information of the image is as follows:

the representation of the connection diagram is G ═ (V, E), where V ═ V₁，v₂，...，v_NIs the set of vertices, E is the set of edges; a node in the graph may be represented as a triplet (v, c)_v，f_v) V is a node identifier, representing an image; each node is associated with a self-encoding result c_vAnd a multi-label semantic information f_v；

6. The big data-based image retrieval method of claim 5, wherein the storing the image data combined with the image information connection map by using the depth hash-based data storage method comprises:

using s from the image information connection map between images_ijIndicating information about the connection between two images, if s_ij1 means that there is an information connection between the two figures, and s is_ijIf 0, no information connection exists between the two graphs;

h_i＝tanh(h(W^TX_i+b_i))

X_i＝{s_ij，x_i1，...，x_im}

wherein:

w is a preset vector weight, which is set to 0.2 by the present invention;

b_iis an offset vector, b is more than or equal to 0_i≤1；

h is a hash function;

{x_i1，...，x_imthe image is the multi-label semantic information of the ith image, and m is the dimension of the multi-label semantic information;

the loss function of the hash-based data storage method is as follows:

wherein:

gamma is a hyperparameter, which the present invention sets to 0.02;

d(h_i，h_j)＝1-cos(h_i，h_j) And represents the distance between image i and image j.

7. The big data-based image retrieval method according to claim 6, wherein the image retrieval according to the image feature index comprises:

image x to be retrieved_qCarrying out hash coding in the same way;

wherein:

h_qrepresenting a hash code of an image to be retrieved;

h_jrepresenting a hash code of the jth image in the database;

8. An image retrieval system based on big data, the system comprising:

the image data acquisition device is used for acquiring mass image data;

9. A computer readable storage medium having stored thereon image retrieval program instructions executable by one or more processors to implement the steps of a method of implementing big data based image retrieval as claimed in any one of claims 1 to 7.