CN108073356B

CN108073356B - Data storage and search method, device and data processing system

Info

Publication number: CN108073356B
Application number: CN201611037957.9A
Authority: CN
Inventors: 陈晨; 王宇; 曹毅
Original assignee: Hangzhou Hikvision System Technology Co Ltd
Current assignee: Hangzhou Hikvision System Technology Co Ltd
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2021-07-20
Anticipated expiration: 2036-11-10
Also published as: CN108073356A

Abstract

The embodiment of the invention discloses a data storage and search method, a device and a data processing system, wherein the storage method comprises the following steps: classifying data to be stored according to at least one preset dimension, and dividing each type of data to be stored into N parts, wherein N is the number of the determined disks, and N is more than 1; and respectively storing the N data to be stored into the N magnetic disks. Therefore, in the scheme, the data of the same type is stored in the N disks, the scattered storage of similar data is realized, when the data is required to be searched, the data is searched in the N disks, and the allowed read-write speed of the N disks is far higher than the allowed read-write speed of one disk, so that the occurrence of I/O (input/output) blocking can be reduced.

Description

Data storage and search method, device and data processing system

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data storage method, a data search method, a data storage device, a data search device, and a data processing system.

Background

Most of the existing data storage schemes store similar data collectively in a disk, for example, store data with similar time or characteristics in a disk. Thus, when the stored data needs to be looked up, the look-up is usually performed in a disk.

When data is searched in a disk, the allowable read/write speed of the disk is limited, and when the amount of data stored in the disk is large, I/O (input/output) blocking is easy to occur. Especially when the technology of searching images is applied, for example: a plurality of images and corresponding attribute information thereof are stored in a certain disk; determining a first image (target data) uploaded by a user, searching a second image matched with the first image in the disk, and displaying the attribute information of the searched second image to the user.

In the above solution, if the amount of data stored in the disk is very large, when the second image matching the first image is searched in the disk, the first image needs to be matched with each image stored in the disk, and I/O blocking is likely to occur.

Disclosure of Invention

Embodiments of the present invention provide a data storage method, a data search method, a data storage device, a data search device, and a data processing system, so as to implement distributed storage of similar data and reduce occurrence of I/O blocking.

In order to achieve the above object, an embodiment of the present invention discloses a data storage method, including:

acquiring data to be stored;

classifying the data to be stored according to at least one preset dimension;

dividing each type of data to be stored into N parts, wherein N is the number of the determined disks, and is more than 1;

and respectively storing the N parts of data to be stored into the N disks, wherein the disks corresponding to each part of data to be stored are different.

Optionally, the step of classifying the data to be stored according to at least one preset dimension may include:

and dividing the data to be stored into N types according to the time dimension and/or the characteristic dimension of the data to be stored.

Optionally, the step of dividing each type of data to be stored into N parts may include:

and averagely dividing each type of data to be stored into N parts.

Optionally, the step of respectively storing the N pieces of data to be stored in the N disks includes:

distributing corresponding first threads for the N disks respectively;

and for each disk, storing the corresponding data to be stored by using the corresponding first thread.

Optionally, the data to be stored is an image file and attribute information corresponding to the image file.

In order to achieve the above object, an embodiment of the present invention further discloses a data searching method, including:

determining target data;

searching matching data of the target data in the determined N disks respectively, wherein N is greater than 1; the data in the N disks is stored according to the data storage method of claim 1.

Optionally, the step of searching for matching data of the target data in the determined N disks respectively may include:

determining parameters of the target data under a filtering dimension according to a preset filtering dimension;

filtering the data in the N disks according to the parameters to obtain filtered data;

and searching the matched data of the target data in the filtered data.

Optionally, when the filtering dimension is a time dimension, the step of determining the parameter of the target data in the filtering dimension according to a preset filtering dimension may include:

determining a time parameter corresponding to the target data;

the step of filtering the data stored in the N disks according to the parameter to obtain filtered data may include:

and determining a target time parameter interval according to the time parameters, and determining the data which are stored in the N disks and are positioned in the interval as filtered data.

Optionally, the step of searching for matching data of the target data in the filtered data may include:

for each filtered data, carrying out similarity calculation on the filtered data and the target data to obtain a corresponding calculation result;

sorting each filtered data according to a calculation result corresponding to each filtered data;

and determining the matching data of the target data in each filtered data according to the sorting result.

Optionally, for each filtered data, performing similarity calculation between the filtered data and the target data to obtain a corresponding calculation result; the step of sorting each filtered data according to the calculation result corresponding to each filtered data may include:

caching each filtered data;

for each cached filtered data, performing similarity calculation on each filtered data and the target data to obtain a calculation result corresponding to each filtered data;

determining the position of each filtered data in the sequence by utilizing a binary sorting method according to a calculation result corresponding to each filtered data; the sequence is composed of each filtered data, and each filtered data in the sequence is sorted according to the corresponding calculation result.

distributing corresponding second threads for the N disks respectively;

and aiming at each disk, searching the matching data of the target data in each disk by using the second thread corresponding to each disk.

and respectively searching the matching data of the target data in the N disks in parallel.

Optionally, the target data is an image file; the matching data is an image file and corresponding attribute information thereof;

after the step of searching for matching data of the target data in the determined N disks, the method may further include:

determining matched data to be output in the searched matched data;

and determining the attribute information contained in the matching data to be output as the attribute information corresponding to the target data.

In order to achieve the above object, an embodiment of the present invention further discloses a data storage device, including:

the acquisition module is used for acquiring data to be stored;

the classification module is used for classifying the data to be stored according to at least one preset dimension;

the dividing module is used for dividing each type of data to be stored into N parts, wherein N is the determined number of the disks, and N is greater than 1;

and the storage module is used for respectively storing the N parts of data to be stored into the N disks, wherein the disks corresponding to each part of data to be stored are different.

Optionally, the classification module may be specifically configured to:

Optionally, the dividing module may be specifically configured to:

and averagely dividing each type of data to be stored into N parts.

Optionally, the storage module may be specifically configured to:

distributing corresponding first threads for the N disks respectively;

In order to achieve the above object, an embodiment of the present invention further discloses a data searching apparatus, including:

the first determining module is used for determining target data;

a searching module, configured to search for matching data of the target data in the determined N disks, respectively, where N is greater than 1; the data storage device of claim 14 wherein data in said N disks is stored.

Optionally, the searching module may include:

the first determining submodule is used for determining parameters of the target data under a filtering dimension according to a preset filtering dimension;

the filtering submodule is used for filtering the data in the N disks according to the parameters to obtain filtered data;

and the searching submodule is used for searching the matched data of the target data in the filtered data.

Optionally, in a case that the filtering dimension is a time dimension, the first determining submodule may be specifically configured to: determining a time parameter corresponding to the target data;

the filtering submodule may be specifically configured to: and determining a target time parameter interval according to the time parameters, and determining the data which are stored in the N disks and are positioned in the interval as filtered data.

Optionally, the searching module may include:

the calculation submodule is used for calculating the similarity of each filtered data and the target data to obtain a corresponding calculation result;

the sorting submodule is used for sorting each filtered data according to the calculation result corresponding to each filtered data;

and the second determining submodule is used for determining the matching data of the target data in each filtered data.

Optionally, the calculation sub-module may be specifically configured to:

caching each filtered data;

the sorting submodule may be specifically configured to:

Optionally, the search module may be specifically configured to:

distributing corresponding second threads for the N disks respectively;

Optionally, the search module may be specifically configured to:

Optionally, the target data is an image file; the matching data is an image file and corresponding attribute information thereof; the apparatus may further include:

the second determining module is used for determining the matched data to be output in the searched matched data;

and the third determining module is used for determining the attribute information contained in the to-be-output matching data as the attribute information corresponding to the target data.

In order to achieve the above object, an embodiment of the present invention further discloses a data processing system, including: a management device and a storage node, wherein,

the management equipment is used for acquiring data to be stored; classifying the data to be stored according to at least one preset dimension; dividing each type of data to be stored into N parts, wherein N is the number of the determined storage nodes, and is greater than 1; and respectively storing the N parts of data to be stored into the N storage nodes, wherein the storage nodes corresponding to each part of data to be stored are different.

Optionally, the management device is further configured to determine target data; and searching the matching data of the target data in the N storage nodes respectively.

By applying the embodiment of the invention, the data to be stored is classified according to at least one preset dimension, and each class of data to be stored is divided into N parts, wherein N is the determined number of the disks, and N is greater than 1; and respectively storing the N data to be stored into the N magnetic disks. Therefore, in the scheme, the data of the same type is stored in the N disks, the scattered storage of similar data is realized, when the data is required to be searched, the data is searched in the N disks, and the allowed read-write speed of the N disks is far higher than the allowed read-write speed of one disk, so that the occurrence of I/O (input/output) blocking can be reduced.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a data searching method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a data searching apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the foregoing technical problems, embodiments of the present invention provide a data storage method, a data search method, a data storage device, a data search device, and a data processing system, where the data storage method, the data search device, and the data processing system can be applied to various electronic devices such as a computer and a tablet computer, and are not particularly limited. First, a data storage method provided by an embodiment of the present invention is explained in detail below.

Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present invention, including:

s101: and acquiring data to be stored.

As an embodiment, the data to be stored may be an image file and attribute information corresponding to the image file, or may also be other data, such as an audio file, or an audio file and attribute information corresponding to the audio file, and is not limited herein.

When the data to be stored is an image file and attribute information corresponding to the image file, the image file may be a binary file, or may also be a file in another format, which is not limited herein. If the picture content of the image file is a vehicle collected by the image collecting device, the attribute information may be a license plate, a body color, or a brand and a model of the vehicle.

The binary file can be a file formed by binary data corresponding to the image; modeling can also be performed on the image, and the obtained model value is stored as the binary file; the feature value of the image may also be extracted, and the extracted feature value may be stored as the binary file, and the like, which is not limited herein.

S102: and classifying the data to be stored according to at least one preset dimension.

As an implementation manner, the data to be stored may be divided into N classes according to the time dimension of the data to be stored.

For example, suppose that the data to be stored includes a copy of data stored for 2016, 10, 20, 8: 00 (first piece of sub data), one piece of data stored for 2016, 10, 21, 10: 00 (second child data), one copy of data stored for 2016 years, 10 months, 22 days 12: 00 (third sub data) and one copy of data stored for 2016, 10, 21, 11: 00 (fourth child data).

A day may be used as a time period, a week may be used as a time period, and the like, which are not limited specifically. It is assumed here that a day is taken as a time period, and each piece of data stored at the same time period is divided into one type of data, that is, the second piece of sub-data and the fourth piece of sub-data are the same type of data, and the other pieces of data are each one type of data.

As an implementation manner, the data to be stored may be classified into N classes according to the feature dimension of the data to be stored.

The characteristic dimension may be a type of data or other characteristic of the data. When the data to be stored is an image file and attribute information corresponding to the image file, the characteristic dimension may be a body color or a brand of a vehicle included in the attribute information.

For example, it is assumed that the characteristic dimension is a data type, and the data to be stored includes a data (a first sub data) with a storage type of TXT, a data (a second sub data) with a storage type of JPG, a data (a third sub data) with a storage type of WMV, and a data (a fourth sub data) with a storage type of JPG.

That is, the second sub-data and the fourth sub-data are of the same type, and the other data are of the same type.

As an implementation manner, the data to be stored may be classified into N classes according to a time dimension and a feature dimension of the data to be stored.

For example, assuming that the characteristic dimension is a data type, the data to be stored includes a data (a first sub data) with a storage type of TXT, a data (a second sub data) with a storage type of JPG, a data (a third sub data) with a storage type of WMV, and a data (a fourth sub data) with a storage type of JPG, and the storage time of the first sub data is 2016, 10, 20, 8: 00. the storage time of the second sub-data is 2016, 10, 21, 10: 00. the storage time of the third sub-data is 2016, 10, 22 and 12: 00. the storage time of the fourth sub-data is 2016, 10, 21 and 11: 00.

in this case, the data sets with the same storage time and the same type are divided into one type of data, that is, the second sub-data and the fourth sub-data are the same type of data, and the other data sets are the same type of data.

S103: and dividing each type of data to be stored into N parts. And N is the determined number of the disks, and is greater than 1.

In the scheme, to realize the distributed storage of the similar data, namely to store the similar data into a plurality of disks, the number of available disks needs to be determined first, and the similar data is divided according to the number. Here, each type of data to be stored is similar data.

As an embodiment, each type of data to be stored may be divided into N pieces on average. Alternatively, the number of the N may be randomly divided, and is not particularly limited.

It should be noted that "disk" in this embodiment may also be understood as each storage node in the storage cluster.

S104: and respectively storing the N data to be stored into the N magnetic disks. And the corresponding magnetic disks of each piece of data to be stored are different.

N is larger than 1, N is assumed to be 4, the 4 disks are respectively a disk A, a disk B, a disk C and a disk D, the data to be stored is divided into 4 parts, the 4 parts of data to be stored are respectively A1, B1, C1 and D1, and A1, B1, C1 and D1 are respectively stored in the 4 disks. For convenience of description, it is assumed that a1 is stored to the disk a, B1 is stored to the disk B, C1 is stored to the disk C, and D1 is stored to the disk D.

As an implementation manner, the N disks may be respectively allocated with their corresponding first threads; and for each disk, storing the corresponding data to be stored by using the corresponding first thread.

Continuing the above example, assuming that 8 threads are available for the 4 disks to store data, the 8 threads are allocated to the 4 disks, which is assumed to be allocated equally, that is, each disk corresponds to 2 threads, and of course, the allocation may not be even, and is not limited specifically. Suppose that the 2 threads allocated for disk a are: thread 1 and thread 2, assuming that the 2 threads allocated for disk B are: thread 3 and thread 4, assuming that the 2 threads allocated for disk C are: thread 5 and thread 6, assuming that the 2 threads allocated for disk D are: thread 7 and thread 8.

Disk a stores a1 with thread 1 and thread 2, disk B stores B1 with thread 3 and thread 4, disk C stores C1 with thread 5 and thread 6, and disk D stores D1 with thread 7 and thread 8. Specifically, when the disk stores data using multiple threads, a Handler object may be called.

It should be noted that, if the implementation method of classifying the data to be stored according to the time dimension is adopted, when the data is stored, the data and the corresponding time dimension can be stored together; if the implementation mode of classifying the data to be stored according to the characteristic dimension is adopted, the data and the corresponding characteristic dimension can be stored together when the data is stored; if the implementation mode of classifying the data to be stored according to the time dimension and the characteristic dimension is adopted, the data can be stored together with the corresponding time dimension and the characteristic dimension when the data is stored; therefore, when the subsequent data is searched, the data can be filtered according to the dimension information, and the searching efficiency is improved.

By applying the embodiment shown in fig. 1 of the present invention, data to be stored is classified according to at least one preset dimension, and each class of data to be stored is divided into N parts, where N is the determined number of disks, and N is greater than 1; and respectively storing the N data to be stored into the N magnetic disks. Therefore, in the scheme, the data of the same type is stored in the N disks, the scattered storage of similar data is realized, when the data is required to be searched, the data is searched in the N disks, and the allowed read-write speed of the N disks is far higher than the allowed read-write speed of one disk, so that the occurrence of I/O (input/output) blocking can be reduced.

An embodiment of the present invention further provides a data searching method, as shown in fig. 2, including:

s201: target data is determined.

The target data may be an image file uploaded by a user, or may also be other data, such as audio data, and the like, which is not limited herein. The embodiment of the invention searches data, namely the matching data of the searched target data.

S202: searching matching data of the target data in the determined N disks respectively, wherein N is greater than 1; the data in the N disks is stored according to the data storage method provided by the embodiment of the invention shown in fig. 1.

According to the data storage method provided by the embodiment shown in fig. 1 of the present invention, the same type of data of the target data is stored in N disks, so that when data is searched, matching data of the target data needs to be searched in the N disks.

Specifically, parameters of the target data in the filtering dimension can be determined according to a preset filtering dimension; filtering the data in the N disks according to the parameters to obtain filtered data; and searching the matched data of the target data in the filtered data.

Corresponding to the embodiment shown in fig. 1, the filtering dimension may be a time dimension and/or a feature dimension.

If the data is stored with its corresponding time dimension in the embodiment shown in FIG. 1, the filter dimension may be the time dimension.

For example, assume that the target data is an image file uploaded by a user, and the data stored in the disk is the image file and its corresponding attribute information. The time parameter of the target data in the time dimension may also be understood as the creation time, the acquisition time, or the like of the target data, and it is assumed here that the time parameter is 2016, 10, 21, 10: 00.

according to the time parameter, the data in the 4 disks are filtered, specifically, a target time parameter interval can be determined according to the time parameter, and the time period in which the time parameter is determined can also be understood as the time period in which the time parameter is determined. The time parameter interval may be a day, or a week, and the like, and is not limited specifically, and the "time parameter interval" in the embodiment shown in fig. 2 may be the same as or different from the "time period" in the embodiment shown in fig. 1. Here, it is assumed that one day is used as a time parameter interval, and the target time parameter interval of the time parameter is 2016, 10, and 21 days. The data in these 4 disks with the time dimension of 2016, 10, and 21 days was determined as filtered data. And only the matched data of the target data is searched in the filtered data, so that the searching efficiency is improved.

If the data is stored with its corresponding feature dimension in the embodiment shown in FIG. 1, the filter dimension may be the feature dimension.

For example, assume that the target data is an image file uploaded by a user, and the data stored in the disk is the image file and its corresponding attribute information. The characteristic parameter of the target data in the characteristic dimension is assumed to be the type of the target data, and is assumed to be JPG.

And according to the characteristic parameters, filtering the data in the 4 disks, namely determining the data of the 4 disks with the type of JPG as the filtered data. And only the matched data of the target data is searched in the filtered data, so that the searching efficiency is improved.

The feature dimension may also be other features, and is not particularly limited.

If the data is stored with its corresponding time dimension and feature dimension in the embodiment shown in fig. 1, the filter dimension may be the time dimension and the feature dimension.

For example, assume that the target data is an image file uploaded by a user, and the data stored in the disk is the image file and its corresponding attribute information. The time parameter of the target data in the time dimension may also be understood as the creation time, the acquisition time, or the like of the target data, and it is assumed here that the time parameter is 2016, 10, 21, 10: 00. the feature parameter of the target data in the feature dimension is assumed to be the type of the target data, and is assumed to be JPG.

According to the time parameter, the data in the 4 disks are filtered, specifically, a target time parameter interval can be determined according to the time parameter, and the time period in which the time parameter is determined can also be understood as the time period in which the time parameter is determined. The time parameter interval may be a day, or a week, and the like, and is not limited specifically, and the "time parameter interval" in the embodiment shown in fig. 2 may be the same as or different from the "time period" in the embodiment shown in fig. 1. Here, it is assumed that one day is used as a time parameter interval, and the target time parameter interval of the time parameter is 2016, 10, and 21 days. Data of these 4 disks whose time dimension is 2016, 10, 21, and type JPG is determined as filtered data. And only the matched data of the target data is searched in the filtered data, so that the searching efficiency is improved.

In the process of filtering the data in the disk, a Pattern (model) can be generated according to parameters under the filtering dimension, and file filtering is performed in a regular mode, so that the searching efficiency can be further improved. Specifically, a configuration file may be generated in advance according to the filtering dimension, and when data in the disk is filtered, parameter transmission may be performed according to the filtering dimension included in the configuration file, so as to generate a Pattern (model). For example, assuming that the filtering dimension included in the configuration file is a time dimension, a parameter of the target data in the time dimension is determined, and the parameter is transferred to a set program, and the program generates a Pattern. In addition, model data corresponding to the image files are stored in the magnetic disk, and the model data are filtered according to the Pattern, so that the filtering of the image files is realized.

As an embodiment, finding matching data of the target data in the filtered data may include: for each filtered data, carrying out similarity calculation on the filtered data and the target data to obtain a corresponding calculation result; sorting each filtered data according to a calculation result corresponding to each filtered data; and determining the matching data of the target data in each filtered data according to the sorting result.

In this embodiment, the specific process of performing similarity calculation on the filtered data and the target data and sorting the filtered data according to the calculation result may include:

caching each filtered data;

In this embodiment, a plurality of linked lists may be created, and the filtered data and the corresponding calculation results thereof may be stored in the linked lists. For example, assume that the similarity calculation results have 10 levels: and 0-10, wherein the grade 0 indicates that the filtered data is completely different from the target data, and the grade 10 indicates that the filtered data is completely the same as the target data. 10 linked lists may be created, each holding a level of computation results and its corresponding filtered data. Therefore, when data searching is carried out by utilizing multiple threads, each thread searches data in different linked lists, and compared with the method that each thread searches data in one linked list, the performance loss can be reduced. Alternatively, the filtered data and the corresponding calculation result may be stored in other forms, such as an array, which is not limited herein.

Specifically, the position of each filtered data in the sequence is determined by using a binary sorting method, and firstly, the calculation result of the similarity corresponding to each filtered data can be respectively compared with the calculation results corresponding to the filtered data at the head and the tail of the existing sequence.

If the sequence arranges the filtered data in the order of the calculation results from large to small, the calculation result corresponding to the filtered data at the head of the sequence is the largest, and the calculation result corresponding to the filtered data at the tail of the sequence is the smallest. Or, the sequence arranges the filtered data in the order of the calculation results from small to large, so that the calculation result corresponding to the filtered data at the head of the sequence is the smallest, and the calculation result corresponding to the filtered data at the tail of the sequence is the largest. Here, it is assumed that the sequence arranges the respective filtered data in the order of the calculation results from large to small.

If the similarity calculation result corresponding to the filtered data is larger than the calculation result corresponding to the filtered data at the head of the sequence, the filtered data can be directly arranged at the head of the sequence;

if the similarity calculation result corresponding to the filtered data is smaller than the calculation result corresponding to the filtered data at the head of the sequence, the filtered data can be directly arranged at the tail of the sequence;

and if the calculation result of the similarity corresponding to the filtered data is not greater than the calculation result corresponding to the filtered data at the head of the sequence or less than the calculation result corresponding to the filtered data at the head of the sequence, comparing the calculation result with the calculation result corresponding to the filtered data at the middle position of the sequence, and the like until the position of the filtered data in the sequence is determined.

If the target data is an image file uploaded by a user, and the data stored in the disk is the image file and the attribute information corresponding to the image file, modeling can be performed on the target data when data is searched, in addition, model data of the image file is also correspondingly stored in the disk, similarity calculation is performed on the data obtained after modeling the target data and the model data stored in the disk, and matching data of the target data is determined according to the calculation result.

It should be noted that after each filtered data is sorted, the matching data of the target data can be determined in each filtered data according to the sorting result. If the filtered data are sorted in the descending order of the similarity calculation result, the preset number of data arranged in the front can be determined as the matching data of the target data. The determined matching data may be output, i.e. presented to the user.

When the target data is an image file uploaded by a user, the matching data is a matched image file and corresponding attribute information thereof, the attribute information included in all the searched matching data can be determined as the attribute information corresponding to the target data, the matching data to be output can also be determined in the searched matching data, and only the attribute information included in the matching data to be output is determined as the attribute information corresponding to the target data. Only the attribute information corresponding to the target data may be displayed to the user, or the attribute information and the image file corresponding to the attribute information may be displayed to the user together, which is not limited specifically.

As an implementation manner, the N disks may be respectively allocated with their corresponding second threads; and aiming at each disk, searching the matching data of the target data in each disk by using the second thread corresponding to each disk.

In this embodiment, a thread applied when storing data is referred to as a first thread, and a thread applied when searching for data is referred to as a second thread.

Continuing with the above example, N is 4, the 4 disks are disk A, disk B, disk C and disk D, respectively, the same type of data of the target data are A1, B1, C1 and D1, respectively, A1 is stored in disk A, B1 is stored in disk B, C1 is stored in disk C, and D1 is stored in disk D.

Assuming that 12 threads are available for the 4 disk lookup data, the 12 threads are allocated to the 4 disks, which is assumed here to be allocated equally, that is, each disk corresponds to 3 threads, and of course, the allocation may not be even, which is not limited specifically. Assume that the 3 threads allocated for disk a are: thread 11, thread 12, and thread 13, assuming that the 3 threads allocated for disk B are: thread 14, thread 15, and thread 16, assuming that the 3 threads allocated for disk C are: thread 17, thread 18, and thread 19, assuming that the 3 threads allocated for disk D are: thread 20, thread 21, and thread 22.

Disk a performs a data lookup using thread 11, thread 12, and thread 13, disk B performs a data lookup using thread 14, thread 15, and thread 16, disk C performs a data lookup using thread 17, thread 18, and thread 19, and disk D performs a data lookup using thread 20, thread 21, and thread 22.

It should be noted that, when the matching data of the target data is searched in the determined N disks, the searching may be performed in parallel, so that the searching efficiency may be improved to a greater extent. Of course, serial lookups are also possible, which can also reduce the occurrence of I/O blocking situations.

By applying the embodiment shown in fig. 2 of the present invention, the same type of data is stored in N disks, and when data is searched, the data is searched in the N disks, and the read-write speed allowed by the N disks is much higher than the read-write speed allowed by one disk, so that the occurrence of I/O blocking can be reduced.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a data storage and search device.

Fig. 3 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, including:

an obtaining module 301, configured to obtain data to be stored;

a classification module 302, configured to classify the data to be stored according to at least one preset dimension;

a dividing module 303, configured to divide each type of data to be stored into N parts, where N is the determined number of disks, and N is greater than 1;

the storage module 304 is configured to store the N pieces of data to be stored into the N disks, respectively, where a disk corresponding to each piece of data to be stored is different.

In this embodiment, the classification module 302 may be specifically configured to:

In this embodiment, the dividing module 303 may be specifically configured to:

and averagely dividing each type of data to be stored into N parts.

In this embodiment, the storage module 304 may be specifically configured to:

distributing corresponding first threads for the N disks respectively;

In this embodiment, the data to be stored is an image file and attribute information corresponding to the image file.

By applying the embodiment shown in fig. 3 of the present invention, data to be stored is classified according to at least one preset dimension, and each class of data to be stored is divided into N parts, where N is the determined number of disks, and N is greater than 1; and respectively storing the N data to be stored into the N magnetic disks. Therefore, in the scheme, the data of the same type is stored in the N disks, the scattered storage of similar data is realized, when the data is required to be searched, the data is searched in the N disks, and the allowed read-write speed of the N disks is far higher than the allowed read-write speed of one disk, so that the occurrence of I/O (input/output) blocking can be reduced.

Fig. 4 is a schematic structural diagram of a data searching apparatus according to an embodiment of the present invention, including:

a first determining module 401, configured to determine target data;

a searching module 402, configured to search for matching data of the target data in the determined N disks, respectively, where N is greater than 1; the data in the N disks is stored by using the data storage device provided by the embodiment of fig. 3 of the present invention.

In this embodiment, the searching module 402 may include: a first determination sub-module, a filtering sub-module, and a lookup sub-module (not shown), wherein,

In this embodiment, when the filtering dimension is a time dimension, the first determining submodule may be specifically configured to: determining a time parameter corresponding to the target data;

In this embodiment, the searching module 402 may include: a computation submodule, a sorting submodule and a second determination submodule (not shown in the figure), wherein,

In this embodiment, the calculation submodule may be specifically configured to:

caching each filtered data;

the sorting submodule may be specifically configured to:

In this embodiment, the searching module 402 may be specifically configured to:

distributing corresponding second threads for the N disks respectively;

In this embodiment, the searching module 402 may be specifically configured to:

In this embodiment, the target data is an image file; the matching data is an image file and corresponding attribute information thereof; the apparatus may further include: a second determination module and a third determination module (not shown in the figures), wherein,

By applying the embodiment shown in fig. 4 of the present invention, the same type of data is stored in N disks, and when data is searched, the data is searched in the N disks, and the read-write speed allowed by the N disks is much higher than the read-write speed allowed by one disk, so that the occurrence of I/O blocking can be reduced.

An embodiment of the present invention further provides a data processing system, including: a management device and a storage node, wherein,

the management device may be configured to acquire data to be stored; classifying the data to be stored according to at least one preset dimension; dividing each type of data to be stored into N parts, wherein N is the number of the determined storage nodes, and is greater than 1; and respectively storing the N parts of data to be stored into the N storage nodes, wherein the storage nodes corresponding to each part of data to be stored are different.

In the process of storing data by the management device, the management device may classify the data to be stored into N classes according to the time dimension and/or the characteristic dimension of the data to be stored.

In this process, the management device may divide each type of data to be stored into N pieces on average.

In this process, the management device may allocate corresponding first threads to the N storage nodes, respectively;

and for each storage node, storing the corresponding data to be stored by using the corresponding first thread.

In the process, the data to be stored are image files and corresponding attribute information thereof.

In this embodiment, the management device may be further configured to determine target data; and searching the matching data of the target data in the N storage nodes respectively.

In the process of searching for data for the management device, the management device may determine a parameter of the target data in a filtering dimension according to a preset filtering dimension; filtering the data in the N storage nodes according to the parameters to obtain filtered data; and searching the matched data of the target data in the filtered data in the N storage nodes respectively.

In the process, under the condition that the filtering dimension is the time dimension, the management device may determine a time parameter corresponding to the target data; and determining a target time parameter interval according to the time parameters, and determining the data which are stored in the N storage nodes and are positioned in the interval as filtered data.

In the process, the management device may perform similarity calculation on each filtered data and the target data to obtain a corresponding calculation result; sorting each filtered data according to a calculation result corresponding to each filtered data; and determining the matching data of the target data in each filtered data according to the sorting result.

In this process, the management device may cache each filtered data; for each cached filtered data, performing similarity calculation on each filtered data and the target data to obtain a calculation result corresponding to each filtered data; determining the position of each filtered data in the sequence by utilizing a binary sorting method according to the corresponding calculation result of each filtered data; the sequence is composed of each filtered data, and each filtered data in the sequence is sorted according to the corresponding calculation result.

In this process, the management device may allocate the second threads corresponding to the N storage nodes, respectively; and aiming at each storage node, searching the matching data of the target data in the storage node by using the second thread corresponding to the storage node.

In this process, the management device may search for matching data of the target data in parallel in the N storage nodes, respectively.

In the process, the target data is an image file; the matching data is an image file and corresponding attribute information thereof; the management device may determine matching data to be output among the searched matching data; and determining the attribute information contained in the matching data to be output as the attribute information corresponding to the target data.

By applying the system embodiment of the invention, the data to be stored is classified according to at least one preset dimension, and each class of data to be stored is divided into N parts, wherein N is the number of the determined storage nodes, and N is more than 1; and respectively storing the N data to be stored into the N storage nodes. Therefore, in the scheme, the data of the same type is stored in the N storage nodes, the scattered storage of similar data is realized, when the data is required to be searched, the data is searched in the N storage nodes, and the read-write speed allowed by the N storage nodes is far higher than the read-write speed allowed by one storage node, so that the occurrence of I/O (input/output) blocking can be reduced.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct relevant hardware to perform the steps, and the program may be stored in a computer-readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of storing data, comprising:

acquiring data to be stored;

classifying the data to be stored according to at least one preset dimension;

2. The method according to claim 1, wherein the step of classifying the data to be stored according to a preset at least one dimension comprises:

3. The method according to claim 1, wherein the step of dividing each type of data to be stored into N shares comprises:

and averagely dividing each type of data to be stored into N parts.

4. The method according to claim 1, wherein the step of storing the N copies of data to be stored in the N disks respectively comprises:

distributing corresponding first threads for the N disks respectively;

5. The method according to any one of claims 1 to 4, wherein the data to be stored is an image file and its corresponding attribute information.

6. A method for data retrieval, comprising:

determining target data;

7. The method of claim 6, wherein the step of searching the determined N disks for matching data of the target data comprises:

and searching the matched data of the target data in the filtered data.

8. The method according to claim 7, wherein, in the case that the filtering dimension is a time dimension, the step of determining the parameters of the target data in the filtering dimension according to a preset filtering dimension comprises:

determining a time parameter corresponding to the target data;

the step of filtering the data stored in the N disks according to the parameters to obtain filtered data includes:

9. The method of claim 7, wherein the step of finding matching data of the target data in the filtered data comprises:

10. The method according to claim 9, wherein for each filtered data, similarity calculation is performed between the filtered data and the target data to obtain a corresponding calculation result; the step of sorting each filtered data according to the calculation result corresponding to each filtered data comprises:

caching each filtered data;

11. The method of claim 6, wherein the step of searching the determined N disks for matching data of the target data comprises:

distributing corresponding second threads for the N disks respectively;

12. The method of claim 6, wherein the step of searching the determined N disks for matching data of the target data comprises:

13. The method according to any one of claims 6 to 12, wherein the target data is an image file; the matching data is an image file and corresponding attribute information thereof;

after the step of searching for matching data of the target data in the determined N disks, the method further includes:

determining matched data to be output in the searched matched data;

14. A data storage device, comprising:

the acquisition module is used for acquiring data to be stored;

15. The apparatus according to claim 14, wherein the classification module is specifically configured to:

16. The apparatus according to claim 14, wherein the partitioning module is specifically configured to:

and averagely dividing each type of data to be stored into N parts.

17. The apparatus of claim 14, wherein the storage module is specifically configured to:

distributing corresponding first threads for the N disks respectively;

18. The apparatus according to any one of claims 14-17, wherein the data to be stored is an image file and its corresponding attribute information.

19. A data search apparatus, comprising:

the first determining module is used for determining target data;

20. The apparatus of claim 19, wherein the lookup module comprises:

21. The apparatus according to claim 20, wherein, in case the filtering dimension is a time dimension, the first determining submodule is specifically configured to: determining a time parameter corresponding to the target data;

the filtering submodule is specifically configured to: and determining a target time parameter interval according to the time parameters, and determining the data which are stored in the N disks and are positioned in the interval as filtered data.

22. The apparatus of claim 20, wherein the lookup module comprises:

23. The apparatus according to claim 22, wherein the computation submodule is specifically configured to:

caching each filtered data;

the sorting submodule is specifically configured to:

24. The apparatus of claim 19, wherein the lookup module is specifically configured to:

distributing corresponding second threads for the N disks respectively;

25. The apparatus of claim 19, wherein the lookup module is specifically configured to:

26. The apparatus according to any one of claims 19-25, wherein the target data is an image file; the matching data is an image file and corresponding attribute information thereof; the device further comprises:

27. A data processing system, comprising: a management device and a storage node, wherein,

28. The system of claim 27,

the management device is also used for determining target data; and searching the matching data of the target data in the N storage nodes respectively.