CN113535824B

CN113535824B - Data searching method, device, electronic equipment and storage medium

Info

Publication number: CN113535824B
Application number: CN202110850414.3A
Authority: CN
Inventors: 陈畅怀
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2024-06-07
Anticipated expiration: 2041-07-27
Also published as: CN113535824A

Abstract

The embodiment of the application provides a data searching method, a data searching device, electronic equipment and a storage medium, and target data to be queried, which are submitted by a data querying party, are obtained; calculating the similarity of the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity of each sample data and the target data; selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result; and sending the first sequencing result to the data querying party. When the first sorting result is sent to the data querying party, sorting is only performed on the sample data in the designated similarity interval, and sorting is not performed on all the sample data, so that the efficiency of data searching can be increased.

Description

Data searching method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data searching method, apparatus, electronic device, and storage medium.

Background

With the rapid development of information searching and artificial intelligence technology, the application of the information searching technology has covered various industries, such as multimedia information searching, e.g. image searching, video searching, document searching, web searching, etc.

In the related art, after data to be retrieved is obtained, the similarity between the data and all sample data in a database is calculated, all the sample data are ordered according to the sequence from high to low of the similarity, the ordering queues of all the sample data are cached, and then corresponding sample data are selected according to the ordering queues of all the sample data and fed back to a data querying party.

However, with the method, all sample data needs to be ordered, the response time shows a rapid increase trend along with the increase of the data size, and a large amount of computing resources need to be consumed, so that the efficiency of data searching is seriously affected.

Disclosure of Invention

The embodiment of the application aims to provide a data searching method, a data searching device, electronic equipment and a storage medium, so as to increase the efficiency of data searching. The specific technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a data searching method, where the method includes:

Acquiring target data to be queried submitted by a data querying party;

Calculating the similarity of the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity of each sample data and the target data;

selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result;

And sending the first sorting result to the data inquirer.

In one possible implementation manner, the calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data includes:

calculating the similarity of the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently;

And distributing each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data.

In one possible implementation manner, after the assigning each sample data into a corresponding similarity interval according to the similarity between each sample data and the target data, the method further includes:

And for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, re-dividing the similarity interval into a plurality of similarity intervals, and correspondingly adjusting the sample data in each re-divided similarity interval.

In a possible implementation manner, the selecting, according to a preset interval selection rule, sample data in a specified similarity interval, and sorting the selected sample data according to similarity, to obtain a first sorting result, includes:

obtaining the number of the sample data which can be displayed by the single page of the data inquiring party at most to obtain a first numerical value;

Selecting a first second numerical value similarity interval as a designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the first second numerical value similarity interval is not smaller than the first numerical value, and the total number of sample data in a first third numerical value similarity interval is smaller than the first numerical value, and the third numerical value is equal to the second numerical value minus 1;

and sequencing the sample data in the appointed similarity interval according to the sequence from high similarity to low similarity to obtain a first sequence, and selecting the first numerical value sample data in the first sequence as a first sequencing result.

In a possible implementation manner, after the sending the first sorting result to the data querying party, the method further includes:

When a query message of the data query party, which indicates that more query results are requested, is received, determining the number of other sample data except the first sequencing result in the designated similarity interval to obtain a fourth numerical value;

Calculating the number of sample data to be selected according to the fourth value and the first value to obtain a fifth value;

Selecting a previous sixth numerical value similarity interval from other similarity intervals except the designated similarity interval as a current designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the previous sixth numerical value similarity interval is not less than the fifth numerical value, and the total number of sample data in the previous seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1;

sequencing sample data in a current appointed similarity interval according to a sequence from high similarity to low similarity to obtain a second sequence, and selecting a last fourth numerical value in the first sequence and a first fifth numerical value in the second sequence as a second sequencing result;

And sending the second sorting result to the data inquirer.

Selecting sample data in a previous eighth value similarity interval according to the sequence of the similarity from high to low for sorting the sample data in each similarity interval without sample data sorting to obtain a third sorting result, wherein the eighth value is a preset interval number or the eighth value satisfies that the total number of the sample data in the previous eighth value similarity interval in each similarity interval without sample data sorting is not less than a preset sample value, and the total number of the sample data in the previous ninth value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1;

And sending the third sorting result to the data inquirer.

In one possible implementation of the method according to the invention,

After the sending the first ranking result to the data querying party, the method further includes:

When a query message of the data query party, which indicates that a query result of a tenth numerical value page is displayed, is received, according to a first numerical value of sample data which can be displayed at most by a single page of the data query party and the number of the sample data in each sample interval, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to the sequence of the similarity from high to low, wherein the total number of the sample data in the first numerical value minus 1 sample interval is not more than thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of the sample data in the first eleventh numerical value sample interval is more than thirteenth numerical value, the total number of the sample data in the first twelfth numerical value minus 1 sample interval is less than fourteenth numerical value, the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, and the total number of the sample data in the first twelfth numerical value sample interval is not less than fourteenth numerical value;

sequencing the sample data in the target sample interval according to the sequence from high to low of the similarity to obtain a third sequence;

selecting a fifteenth value to a sixteenth value of sample data in the third sequence as a fourth sorting result, wherein the fifteenth value is equal to the thirteenth value minus a seventeenth value plus 1, the sixteenth value is equal to the fourteenth value minus the seventeenth value, and the seventeenth value is the total number of sample data in a previous eleventh value minus 1 sample interval;

And sending the fourth sorting result to the data inquirer.

In a second aspect, an embodiment of the present application provides a data searching apparatus, including:

the target data acquisition module is used for acquiring target data to be queried submitted by a data querying party;

The sample data distribution module is used for calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data;

The sample data sorting module is used for selecting sample data in a designated similarity interval according to a preset interval selection rule, sorting the selected sample data according to the similarity, and obtaining a first sorting result;

and the sequencing result sending module is used for sending the first sequencing result to the data inquiring party.

In a possible implementation manner, the sample data distribution module is specifically configured to: calculating the similarity of the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently; and distributing each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data.

In a possible implementation manner, the sample data distribution module is further configured to: and for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, re-dividing the similarity interval into a plurality of similarity intervals, and correspondingly adjusting the sample data in each re-divided similarity interval.

In one possible implementation, the sample data sorting module includes:

the sample data quantity acquisition sub-module is used for acquiring the quantity of the sample data which can be displayed at most on a single page of the data inquiring party to obtain a first value;

The similarity interval selecting submodule is used for selecting a first second numerical value similarity interval as a designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the first second numerical value similarity interval is not smaller than the first numerical value, the total number of sample data in a first third numerical value similarity interval is smaller than the first numerical value, and the third numerical value is equal to the second numerical value minus 1;

And the sample data selecting sub-module is used for sequencing the sample data in the appointed similarity interval according to the sequence from high similarity to low similarity to obtain a first sequence, and selecting the first numerical value sample data in the first sequence as a first sequencing result.

In a possible implementation manner, the apparatus further comprises a data delay ordering module, configured to: when a query message of the data query party, which indicates that more query results are requested, is received, determining the number of other sample data except the first sequencing result in the designated similarity interval to obtain a fourth numerical value; calculating the number of sample data to be selected according to the fourth value and the first value to obtain a fifth value; selecting a previous sixth numerical value similarity interval from other similarity intervals except the designated similarity interval as a current designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the previous sixth numerical value similarity interval is not less than the fifth numerical value, and the total number of sample data in the previous seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1; sequencing sample data in a current appointed similarity interval according to a sequence from high similarity to low similarity to obtain a second sequence, and selecting a last fourth numerical value in the first sequence and a first fifth numerical value in the second sequence as a second sequencing result; and sending the second sorting result to the data inquirer.

In a possible implementation manner, the apparatus further comprises a data delay ordering module, configured to: selecting sample data in a previous eighth value similarity interval according to the sequence of the similarity from high to low for sorting the sample data in each similarity interval without sample data sorting to obtain a third sorting result, wherein the eighth value is a preset interval number or the eighth value satisfies that the total number of the sample data in the previous eighth value similarity interval in each similarity interval without sample data sorting is not less than a preset sample value, and the total number of the sample data in the previous ninth value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1; and sending the third sorting result to the data inquirer.

In a possible implementation manner, the apparatus further comprises a data delay ordering module, configured to: when a query message of the data query party, which indicates that a query result of a tenth numerical value page is displayed, is received, according to a first numerical value of sample data which can be displayed at most by a single page of the data query party and the number of the sample data in each sample interval, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to the sequence of the similarity from high to low, wherein the total number of the sample data in the first numerical value minus 1 sample interval is not more than thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of the sample data in the first eleventh numerical value sample interval is more than thirteenth numerical value, the total number of the sample data in the first twelfth numerical value minus 1 sample interval is less than fourteenth numerical value, the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, and the total number of the sample data in the first twelfth numerical value sample interval is not less than fourteenth numerical value; sequencing the sample data in the target sample interval according to the sequence from high to low of the similarity to obtain a third sequence; selecting a fifteenth value to a sixteenth value of sample data in the third sequence as a fourth sorting result, wherein the fifteenth value is equal to the thirteenth value minus a seventeenth value plus 1, the sixteenth value is equal to the fourteenth value minus the seventeenth value, and the seventeenth value is the total number of sample data in a previous eleventh value minus 1 sample interval; and sending the fourth sorting result to the data inquirer.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement any one of the data searching methods according to the present application when executing the program stored in the memory.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when executed by a processor implements a data search method according to any of the present application.

The embodiment of the application has the beneficial effects that:

The data searching method, the device, the electronic equipment and the storage medium provided by the embodiment of the application acquire target data to be queried submitted by a data querying party; calculating the similarity of the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity of each sample data and the target data; selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result; and sending the first sequencing result to the data querying party. When the first sorting result is sent to the data querying party, sorting is only performed on the sample data in the designated similarity interval, and sorting is not performed on all the sample data, so that the efficiency of data searching can be increased. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of a data searching method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a possible implementation of step S102 in an embodiment of the present application;

FIG. 3 is a schematic diagram of a possible implementation of step S103 in an embodiment of the present application;

FIG. 4a is a schematic diagram of a data searching method according to an embodiment of the present application;

FIG. 4b is a schematic diagram showing a first possible implementation of step S105 according to an embodiment of the present application;

FIG. 5 is a schematic diagram showing a second possible implementation manner of step S105 in the embodiment of the present application;

FIG. 6 is a schematic diagram of a third possible implementation of step S105 according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data searching apparatus according to an embodiment of the present application;

Fig. 8 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.

In order to increase the efficiency of data searching, an embodiment of the present application provides a data searching method, referring to fig. 1, the method includes:

s101, acquiring target data to be queried submitted by a data querying party.

The data searching method of the embodiment of the application can be realized through electronic equipment, and the electronic equipment can be a personal computer, a hard disk video recorder, a database server or a search server and the like. The target data is the data which is submitted by the data inquiring party and needs to be inquired, and the target data can be data in the forms of characters, images, videos, sounds or tables and the like, and the target data are all within the protection scope of the application.

S102, calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data.

The similarity between the target data and each sample data can be calculated in parallel by utilizing a plurality of threads, and the sample data are distributed into corresponding similarity intervals according to the similarity; the similarity interval corresponding to the sample data refers to a similarity range of the similarity interval including the similarity between the sample data and the target data, for example, the similarity between the sample data X and the target data is 60%, the similarity range of the similarity interval a is 80% -100%, the similarity range of the similarity interval b is 50% -80%, and the similarity range of the similarity interval c is 30% -50%, and then the sample data X is allocated to the similarity interval b.

The similarity interval can be divided in advance or in real time, and the division of the similarity interval can be equally divided or unequally divided, which are all within the protection scope of the application. In one example, the upper and lower boundaries of the similarity distribution may be obtained, and a number of similarity intervals may be specified according to the division of the upper and lower boundaries of the similarity. In one possible embodiment, referring to fig. 2, the calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and assigning each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data includes:

s1021, calculating the similarity of the target data and the sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each currently obtained similarity.

And S1022, distributing each sample data to a corresponding similarity interval according to the similarity of each sample data and the target data.

The method can utilize different threads to calculate the similarity between the target data and the sample data, dynamically adjust the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently, for example, a certain similarity interval can be divided into a plurality of similarity intervals again, a certain similarity interval can be combined into one similarity interval and the like, the distribution of the sample data can be performed after the similarity interval is divided, and can also be performed in the process of dynamically adjusting the similarity interval, in one example, the sample data included in the corresponding similarity interval can be redistributed while the similarity range corresponding to each similarity interval is dynamically adjusted, so that the distribution speed of the sample data is increased, and the efficiency of data searching is finally increased.

In one possible embodiment, after the assigning each of the sample data to a corresponding similarity interval according to the similarity between each of the sample data and the target data, the method further includes: and for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, re-dividing the similarity interval into a plurality of similarity intervals, and correspondingly adjusting the sample data in each re-divided similarity interval.

When the number of the sample data included in one similarity interval exceeds a preset number threshold, the similarity interval is not reasonably divided, the number of the included sample data is large, and if the sample data in the similarity interval are sequenced, a large number of idle work is performed, so that the similarity interval is required to be divided into a plurality of similarity intervals again, the number of the sample data in each similarity interval is reduced, and the idle work in the sequencing process is reduced, so that the efficiency of data searching is improved. The preset number threshold may be determined according to the number O of sample data that can be displayed at most by a single page of the data inquirer, and may be set to O, 0.8O, 0.7O, 0.6O, 0.5O, 0.4O, 0.3O, or the like, for example.

And S103, selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result.

The preset interval selection rule can be set in a self-defined mode according to actual conditions, for example, a preset number of similarity intervals can be selected as designated similarity intervals according to the sequence of similarity from high to low by default.

In an example, referring to fig. 3, selecting sample data in a specified similarity interval according to a preset interval selection rule, and sorting the selected sample data according to similarity to obtain a first sorting result, where the sorting result includes:

S1031, the number O (first numerical value) of sample data that can be displayed on the data querying side single page at most is acquired.

S1032, selecting the first N (second numerical value) similarity intervals as designated similarity intervals according to the sequence of the similarity from high to low, wherein the total number of the sample data in the first N similarity intervals is not less than O, and the total number of the sample data in the first N-1 (third numerical value) similarity intervals is less than O, and N is a positive integer.

S1033, sorting the sample data in the specified similarity interval according to the sequence from high similarity to low similarity to obtain a first sequence, and selecting the first O sample data in the first sequence as a first sorting result.

When the data query party displays the search result, the data query party is generally displayed in a paging mode, and each page can display O sample data at maximum, and then the minimum N meeting the following conditions can be selected according to the sequence from high similarity to low similarity:

Where n _i is the number of sample data in the i-th similarity interval (in order of high-to-low similarity).

S104, the first sorting result is sent to the data inquiring party.

And sending the first sorting result to the data inquiring party so that the data inquiring party can display the first sorting result.

In a possible embodiment, referring to fig. 4a, after step S104, the method further includes: s105, for other sample data except the first sorting result, the other sample data is sent to the data querying party in a delayed sorting mode.

The method of delayed sorting is adopted for the unordered sample data, for example, the unordered sample data is temporarily unordered, the sample data is sorted only when a data inquirer triggers to check more sample data, all results are not checked most of the time, and only T results with the front sorting are checked in general, and T is less than S, so that the samples which are not checked by a user are not required to be sorted, the sorting resource consumption of the samples is saved, and the calculation resource can be greatly saved.

In a possible implementation manner, referring to fig. 4b, for the other sample data except the first sorting result, the sending the other sample data to the data querying party in a delayed sorting manner includes:

s1051, when receiving the query message of the data querying party, which indicates that more query results are requested, determining the number M (fourth numerical value) of other sample data except the first sorting result in the specified similarity interval.

S1052, according to M and a, calculate the number a (fifth value) of sample data that still needs to be selected. In one example, a=o-M.

S1053, selecting front B (sixth numerical value) similarity intervals from other similarity intervals except the designated similarity intervals according to the sequence of the similarity from high to low, wherein the total number of sample data in the front B similarity intervals is not less than A, the total number of sample data in the front B-1 (seventh numerical value) similarity intervals is less than A, and B is a positive integer.

S1054, sorting the sample data in the current appointed similarity interval according to the sequence from high to low in similarity to obtain a second sequence, and selecting the last M sample data in the first sequence and the first A sample data in the second sequence as a second sorting result.

S1055, the second sorting result is sent to the data inquiring party.

When the data querying party requests more query results, the data querying party can select O sample data again and send the O sample data to the data querying party, and the selection process is similar to the processes of S1051-S1054, and will not be repeated here.

The rest unordered similarity intervals are ordered, the rest similarity intervals can be ordered at one time, interval-by-interval ordering can be selected, for example, Q similarity intervals are ordered each time according to the order of high similarity to low similarity, or a plurality of similarity intervals with the sum of the sample data numbers in the continuous similarity intervals being larger than a preset sample value R are ordered. And the peak value requirement of the search sequencing on the resource consumption can be reduced, the unordered samples are sequenced in the time when the user checks the search result returned preferentially, the resource consumption is dispersed, the response speed of the system can be improved or the use cost can be reduced, the accuracy of the search result is not influenced, and the search result is noninductive to the user.

In a possible implementation manner, referring to fig. 5, for the other sample data except the first sorting result, the sending the other sample data to the data querying party in a delayed sorting manner includes:

S105A, selecting sample data in the previous Q (eighth numerical value) similarity intervals according to the sequence of the similarity from high to low for each similarity interval without sample data sequencing to obtain a third sequencing result, wherein Q is the number of preset intervals or Q meets the condition that the total number of the sample data in the previous Q similarity intervals in each similarity interval without sample data sequencing is not less than R, and the total number of the sample data in the previous Q-1 (ninth numerical value) similarity intervals is less than R, and R is the number of preset samples;

And S105B, the third sorting result is sent to the data inquiring party.

And when the first sorting result is returned, the total paging display information of the search result can be sent to the data inquiring party for the data inquiring party to select the page which is required to be checked later. After the sorting and sending of the sample data of the page to be checked by the current data inquirer are completed, other sample data can be temporarily not sorted, and the subsequent P pages of the current page can be sorted. In general, the data querying party selects to view the subsequent results according to the page sequence, so that when the user views the current page, a plurality of pages behind the current page are ordered, and when the user directly turns pages in sequence, the ordered subsequent page results can be directly displayed. Other sample data can be not sequenced, and only when the data querying party triggers an instruction for viewing a certain page, the sample data corresponding to the page can be sequenced and returned to be displayed to the data querying party. The similarity intervals needing to be ordered can be determined according to the number of sample data in the similarity intervals and pages needing to be checked, and the sample data in the similarity intervals are ordered only.

In a possible implementation manner, referring to fig. 6, for the other sample data except the first sorting result, the sending the other sample data to the data querying party in a delayed sorting manner includes:

S105a, when a query message of the data query party, which indicates that a C (tenth numerical value) page query result is displayed, is received, selecting a D (eleventh numerical value) sample interval to an E (twelfth numerical value) sample interval as a target sample interval according to the number O of sample data which can be displayed most by a single page of the data query party and the number of sample data in each sample interval, and the sequence from high similarity to low, wherein the total number of sample data in the first D-1 sample interval is not more than (C-1) xO (thirteenth numerical value), the total number of sample data in the first D sample interval is more than (C-1) xO, the total number of sample data in the first E-1 sample interval is less than C xO (fourteenth numerical value), and the total number of sample data in the first E sample interval is not less than C xO;

s105b, sequencing the sample data in the target sample interval according to the sequence from high to low of the similarity to obtain a third sequence;

S105C, selecting (C-1) xO-F+1 (fifteenth value) to (C xO-F (sixteenth value) sample data from the third sequence as a fourth sorting result, wherein F (seventeenth value) is the total number of sample data in the previous D-1 sample intervals;

and S105d, sending the fourth sorting result to the data inquiring party.

For example, assuming that the similarity interval is divided into [0,0.2 ], [0.2,0.4 ], [0.4,0.6 ], [0.6,0.7 ], [0.7,0.8 ], [0.8,0.9 ], and [0.9,1] for 7 similarity intervals, the number of sample data in the database with respect to the target data similarity in the corresponding interval is 20, 30, 35, 56, 90, 60, respectively, and the number of sample data that can be displayed by a single page of the data inquirer at the time of search result presentation is 20 at most. The whole database contains 381 data, which needs to be displayed in 20 pages, and the last page only displays one sample. If the current data query party is looking over the first page, after looking over the first page, selecting to look over the result of the 8 th page, wherein the 8 th page corresponds to the 141 th to 160 th in the sorting result, and corresponds to the similarity interval, the results are distributed in the two intervals of [0.7, 0.8) and [0.8,0.9 ], if the two intervals are not sorted, sorting the two intervals, and selecting sample data corresponding to the 8 th page after sorting to display, namely the 81 th to 90 th in the interval [0.8,0.9) and the 1 st to 10 th in the interval.

By adopting the method in the embodiment of the application, only the results to be checked are ordered, and the results not to be checked are not ordered, thereby saving resources. And the number of samples in the corresponding interval in the checked page is relatively small, so that the real-time performance of searching is not affected. In addition, whether finer division of the similarity interval is needed or not can be judged according to the number of sample data in the similarity interval corresponding to the page selected by the data inquiring party, so that the searching efficiency is further improved, and the consumption of computing resources is reduced.

The delay sequencing in the embodiment of the application can improve the system concurrency under the same hardware resource. Assuming that the number of database samples is S, the time complexity of fully sorting the database samples is S log S. By the method in the embodiment of the application, sample data are distributed to M subintervals according to the similarity, the similarity intervals are equally spaced, N similarity intervals are selected for priority display, and the number of the priority samples isThe time complexity when the result is returned for the first time is/>Since M > N, therefore/>Whereas the resources consumed by the retrieval system are linearly related to the ordering time complexity. Therefore, under the condition of supporting the same hardware resource, the search system realized by adopting the scheme can support more users to use simultaneously, namely the concurrency support quantity of the system can be obviously improved, or the hardware resource requirement can be obviously reduced under the condition of supporting the same concurrency support quantity, and the use cost of the system can be reduced. For unordered samples.

The embodiment of the application also provides a data searching device, referring to fig. 7, the device comprises:

The target data acquisition module 11 is used for acquiring target data to be queried submitted by a data querying party;

The sample data distribution module 12 is configured to calculate a similarity between the target data and each sample data, determine a plurality of similarity intervals, and distribute each sample data to a corresponding similarity interval according to a similarity between each sample data and the target data;

The sample data sorting module 13 is configured to select sample data in a specified similarity interval according to a preset interval selection rule, sort the selected sample data according to similarity, and obtain a first sorting result;

And the sorting result sending module 14 is configured to send the first sorting result to the data querying party.

In one possible embodiment, the apparatus further comprises:

And the data delay ordering module is used for sending other sample data except the first ordering result to the data inquiring party in a delay ordering mode.

In one possible implementation, the sample data sorting module includes:

In a possible implementation manner, the data delay ordering module is specifically configured to: when a query message of the data query party, which indicates that more query results are requested, is received, determining the number of other sample data except the first sequencing result in the designated similarity interval to obtain a fourth numerical value; calculating the number of sample data to be selected according to the fourth value and the first value to obtain a fifth value; selecting a previous sixth numerical value similarity interval from other similarity intervals except the designated similarity interval as a current designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the previous sixth numerical value similarity interval is not less than the fifth numerical value, and the total number of sample data in the previous seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1; sequencing sample data in a current appointed similarity interval according to a sequence from high similarity to low similarity to obtain a second sequence, and selecting a last fourth numerical value in the first sequence and a first fifth numerical value in the second sequence as a second sequencing result; and sending the second sorting result to the data inquirer.

In a possible implementation manner, the data delay ordering module is specifically configured to: selecting sample data in a previous eighth value similarity interval according to the sequence of the similarity from high to low for sorting the sample data in each similarity interval without sample data sorting to obtain a third sorting result, wherein the eighth value is a preset interval number or the eighth value satisfies that the total number of the sample data in the previous eighth value similarity interval in each similarity interval without sample data sorting is not less than a preset sample value, and the total number of the sample data in the previous ninth value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1; and sending the third sorting result to the data inquirer.

In a possible implementation manner, the data delay ordering module is specifically configured to: when a query message of the data query party, which indicates that a query result of a tenth numerical value page is displayed, is received, according to a first numerical value of sample data which can be displayed at most by a single page of the data query party and the number of the sample data in each sample interval, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to the sequence of the similarity from high to low, wherein the total number of the sample data in the first numerical value minus 1 sample interval is not more than thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of the sample data in the first eleventh numerical value sample interval is more than thirteenth numerical value, the total number of the sample data in the first twelfth numerical value minus 1 sample interval is less than fourteenth numerical value, the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, and the total number of the sample data in the first twelfth numerical value sample interval is not less than fourteenth numerical value; sequencing the sample data in the target sample interval according to the sequence from high to low of the similarity to obtain a third sequence; selecting a fifteenth value to a sixteenth value of sample data in the third sequence as a fourth sorting result, wherein the fifteenth value is equal to the thirteenth value minus a seventeenth value plus 1, the sixteenth value is equal to the fourteenth value minus the seventeenth value, and the seventeenth value is the total number of sample data in a previous eleventh value minus 1 sample interval; and sending the fourth sorting result to the data inquirer.

The embodiment of the application also provides electronic equipment, which comprises: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement any one of the data searching methods according to the present application when executing the computer program stored in the memory.

Optionally, referring to fig. 8, in addition to the processor 21 and the memory 23, the electronic device according to the embodiment of the present application further includes a communication interface 22 and a communication bus 24, where the processor 21, the communication interface 22, and the memory 23 complete communication with each other through the communication bus 24.

The communication bus mentioned for the above-mentioned electronic devices may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the data searching method in any one of the applications when being executed by a processor.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the data search method of any of the applications.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It should be noted that, in this document, the technical features in each alternative may be combined to form a solution, so long as they are not contradictory, and all such solutions are within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, electronic device, computer program product and storage medium, the description is relatively simple, as it is substantially similar to the method embodiments, as relevant see also part of the description of the method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A data searching method, the method comprising:

Acquiring target data to be queried submitted by a data querying party;

Transmitting the first sorting result to the data inquiring party;

Selecting sample data in a specified similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result, wherein the method comprises the following steps:

2. The method of claim 1, wherein the calculating the similarity of the target data to each sample data, determining a plurality of similarity intervals, and assigning each sample data to a corresponding similarity interval according to the similarity of each sample data to the target data, comprises:

3. The method of claim 2, wherein after assigning each of the sample data into a corresponding similarity interval according to the similarity of each of the sample data and the target data, the method further comprises:

4. The method of claim 1, wherein after said sending the first ranked result to the data querying party, the method further comprises:

And sending the second sorting result to the data inquirer.

5. The method of claim 1, wherein after said sending the first ranked result to the data querying party, the method further comprises:

And sending the third sorting result to the data inquirer.

6. The method of claim 1, wherein after said sending the first ranked result to the data querying party, the method further comprises:

When a query message of the data query party, which indicates that a query result of a tenth numerical value page is displayed, is received, according to a first numerical value of sample data which can be displayed at most by a single page of the data query party and the number of the sample data in each sample interval, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to the sequence of the similarity from high to low, wherein the total number of the sample data in the first numerical value minus 1 sample interval is not more than thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first numerical value, the total number of the sample data in the first numerical value sample interval is more than thirteenth numerical value, the total number of the sample data in the first numerical value minus 1 sample interval is less than fourteenth numerical value, the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, and the total number of the sample data in the first numerical value sample interval is not less than fourteenth numerical value;

And sending the fourth sorting result to the data inquirer.

7. A data search device, the device comprising:

The sample data distribution module is used for calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data;

the sequencing result sending module is used for sending the first sequencing result to the data inquiring party;

The sample data sorting module is specifically configured to obtain a first value from the number of sample data that can be displayed on a single page of the data querying party at most; selecting a first second numerical value similarity interval as a designated similarity interval according to the sequence of the similarity from high to low, wherein the total number of sample data in the first second numerical value similarity interval is not smaller than the first numerical value, and the total number of sample data in a first third numerical value similarity interval is smaller than the first numerical value, and the third numerical value is equal to the second numerical value minus 1; and sequencing the sample data in the appointed similarity interval according to the sequence from high similarity to low similarity to obtain a first sequence, and selecting the first numerical value sample data in the first sequence as a first sequencing result.

8. The apparatus according to claim 7, wherein the sample data distribution module is specifically configured to: calculating the similarity of the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently; and distributing each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data.

9. The apparatus of claim 8, wherein the sample data distribution module is further configured to: and for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, re-dividing the similarity interval into a plurality of similarity intervals, and correspondingly adjusting the sample data in each re-divided similarity interval.

10. An electronic device, comprising a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement the data searching method according to any one of claims 1 to 6 when executing the program stored in the memory.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the data search method of any of claims 1-6.