CN110019289B - Data query method and device and electronic equipment - Google Patents

Data query method and device and electronic equipment Download PDF

Info

Publication number
CN110019289B
CN110019289B CN201710744290.4A CN201710744290A CN110019289B CN 110019289 B CN110019289 B CN 110019289B CN 201710744290 A CN201710744290 A CN 201710744290A CN 110019289 B CN110019289 B CN 110019289B
Authority
CN
China
Prior art keywords
signature
data query
processed
query request
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710744290.4A
Other languages
Chinese (zh)
Other versions
CN110019289A (en
Inventor
韦仁忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710744290.4A priority Critical patent/CN110019289B/en
Publication of CN110019289A publication Critical patent/CN110019289A/en
Application granted granted Critical
Publication of CN110019289B publication Critical patent/CN110019289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data query method, a data query device and electronic equipment. The data query method comprises the following steps: obtaining a signature of a data query request to be processed, and judging whether a target signature with the similarity of the target signature within a set threshold range exists in a signature library, wherein the target signature in the signature library corresponds to the processed data query request and a database address; under the condition that a target signature exists in the signature library, namely, a signature similar to or the same as the signature of the data query request to be processed exists, a database address corresponding to the target signature is obtained; and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed based on the cache data through the database.

Description

Data query method and device and electronic equipment
Technical Field
The present application relates to the field of databases, and in particular, to a data query method, apparatus, and electronic device.
Background
With the continuous development of internet technology, more and more access requests are required to the database, and the access frequency is also higher and higher. Under the condition of high load of the database, the read-write separation is often needed to be carried out on the database, namely, the write-in database is separated from the query database, and the consistency of data is maintained between the read-write databases through the master-slave copy functions provided by various database bottom layers.
In high concurrency access systems, in order to address a large number of access requests, some redundancy is required to share the requests. Most of the systems have the characteristics of more reading and less writing, and a common strategy is to clone a query database into one or more mirror image libraries to form a cluster specially used for database reading operation, and distribute database query requests, namely read requests, to different examples (namely databases) in the cluster through load-balancing middleware so as to share the pressure of the query database and improve the access speed. However, as the number of concurrent accesses increases exponentially, the speed of access to the database needs to be further increased.
Disclosure of Invention
The embodiment of the specification provides a data query method, a data query device and electronic equipment, which are used for improving the processing speed of a data query request to be processed and improving the access speed of a database.
In a first aspect, embodiments of the present disclosure provide a data query method, where the method includes:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
Optionally, the obtaining the signature of the pending data query request includes:
obtaining parameter sequencing of n target parameters affecting the query speed;
obtaining m target parameters and filtering conditions contained in the data query request to be processed, wherein n and m are positive integers, and m is less than or equal to n;
and obtaining codes of the data query requests to be processed as the signatures based on the m target parameters, the parameter ordering and the filtering conditions.
Optionally, the obtaining the signature of the pending data query request includes:
enumerating the types of the data query requests to be processed, enumerating and encoding various types of the data query requests to be processed;
and obtaining codes corresponding to the types of the data query requests to be processed as the signatures.
Optionally, when the signature is the code of the data query request to be processed, the method further includes:
and taking the signature of the data query request to be processed and the signature in the signature library as numerical values, and obtaining the numerical value difference between the signature of the data query request to be processed and the signature in the signature library as the similarity between the two signatures.
Optionally, the obtaining the signature of the pending data query request includes:
and mapping the data query request to be processed into a vector based on the query content of the data query request to be processed, and taking the vector as the signature.
Optionally, when the signature is a vector corresponding to the data query request to be processed, the method further includes:
and obtaining the included angle between the two vectors as the similarity between the two signatures corresponding to the two vectors.
Optionally, the method further comprises:
obtaining a processed data query request with a processing time greater than or equal to a time threshold or with a request frequency greater than or equal to a frequency threshold;
and generating the signature related information and storing the signature related information into the signature library based on the signature of the processed data query request and the database address of the processing operation.
Optionally, the generating the signature related information based on the signature of the processed data query request and the database address of the processing operation, and storing the signature related information in the signature library includes:
obtaining the processing time length of the processed data query request and the expiration time of the signature of the processed data query request;
and generating signature related information based on the signature of the processed data query request, the database address, the processing time length and the failure time, and storing the signature related information into the signature library.
Optionally, the data query request to be processed is implemented by a structured query language SQL.
Optionally, the n target parameters include: n column names in the structured query language SQL.
In a second aspect, embodiments of the present disclosure provide a data query device, including:
the computing unit is used for obtaining the signature of the data query request to be processed;
the judging unit is used for judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request;
the acquisition unit is used for acquiring a database address in the signature related information corresponding to the target signature under the condition that the target signature exists in the signature library;
the distribution unit is used for sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
Optionally, the computing unit is configured to:
obtaining parameter sequencing of n target parameters affecting the query speed;
obtaining m target parameters and filtering conditions contained in the data query request to be processed, wherein n and m are positive integers, and m is less than or equal to n;
and obtaining codes of the data query requests to be processed as the signatures based on the m target parameters, the parameter ordering and the filtering conditions.
Optionally, the computing unit is configured to:
enumerating the types of the data query requests to be processed, enumerating and encoding various types of the data query requests to be processed;
and obtaining codes corresponding to the types of the data query requests to be processed as the signatures.
Optionally, when the signature is the code of the pending data query request, the apparatus further includes: a similarity calculation unit configured to:
and taking the signature of the data query request to be processed and the signature in the signature library as numerical values, and obtaining the numerical value difference between the signature of the data query request to be processed and the signature in the signature library as the similarity between the two signatures.
Optionally, the computing unit is configured to: and mapping the data query request to be processed into a vector based on the query content of the data query request to be processed, and taking the vector as the signature.
Optionally, when the signature is a vector corresponding to the data query request to be processed, the apparatus further includes:
and the similarity calculation unit is used for obtaining the included angle between the two vectors as the similarity between the two signatures corresponding to the two vectors.
Optionally, the acquiring unit is further configured to: obtaining a processed data query request with a processing time greater than or equal to a time threshold or with a request frequency greater than or equal to a frequency threshold;
the apparatus further comprises: and the generation unit is used for generating the signature related information based on the signature of the processed data query request and the database address of the processing operation, and storing the signature related information into the signature library.
Optionally, the generating unit includes:
an obtaining subunit, configured to obtain a processing duration of the processed data query request and a failure time of a signature of the processed data query request;
and the generation subunit is used for generating the signature related information based on the signature of the processed data query request, the database address, the processing time length and the failure time and storing the signature related information into the signature library.
Optionally, the data query request to be processed is implemented by a structured query language SQL.
Optionally, the n target parameters include: n column names in the structured query language SQL.
In a third aspect, the present description provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
In a fourth aspect, the present description implementation provides an electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
The above technical solutions in the embodiments of the present application at least have the following technical effects:
the embodiment of the application provides a data query method, which is used for obtaining a signature of a data query request to be processed, judging whether each instance of the data query request to be processed processes similar data query requests or not by judging whether a target signature with the similarity of the signature within a set threshold value exists in a signature library, caching related data for processing the data query request to be processed, and if the target signature similar to the previous data query request exists in the signature library, sending the data query request to a database address of the target signature, and processing the data query request to be processed through instance service corresponding to the database address. The data to be processed is cached in the database, so that the processing speed of the data query can be greatly improved, the processing speed of the data query to be processed is further improved, and the access speed of the database is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data access system according to an embodiment of the present application;
FIG. 2 is a flowchart of a data query method according to an embodiment of the present application;
FIG. 3 is a flowchart of a system for initial access to a data query request to be processed according to an embodiment of the present application;
FIG. 4 is a flowchart of a system for accessing a subsequent pending data query request provided by an embodiment of the present application;
FIG. 5 is a flowchart for obtaining a signature of a pending data query request according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a data query device according to an embodiment of the present application;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a data query method, a data query device and electronic equipment, which are used for improving the processing speed of a data query request to be processed and improving the access speed of a database.
The main implementation principle, the specific implementation manner and the corresponding beneficial effects of the technical scheme of the embodiment of the application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a data access system provided in an embodiment of the present disclosure includes:
1) A client (denoted by pixel) is connected to the load balancing component for sending pending data query requests, data write requests, and receiving request feedback.
2) And a load balancing component (shown as HA) for distributing the request sent by the client according to a certain strategy.
3) Read databases and clusters (denoted by R), provide instances or clusters of database reading capabilities.
4) And a write database (denoted by W), providing an instance of the write capability of the database, and synchronizing data by a master-slave copy function provided by the bottom layer of the database, so as to keep the data of the read database consistent with the data of the write database.
In general, the read database and the cluster R cache data required for processing a data query request within a period of time after processing the data query request, and in this embodiment of the present disclosure, the processing of similar and identical data query requests to be processed is accelerated by using the cached data by the following data query method. The multi-level cache design of the database engine can cache data very efficiently, and for data adjacently distributed on physical storage according to an aggregation index structure, a large number of continuous data blocks are cached, and the cache data are utilized to provide assistance for the subsequent data query request to be processed, so that the processing speed of the data query request to be processed can be increased by hundreds of thousands of times.
Referring to fig. 2, a data query method provided in the embodiment of the present disclosure is applied to a load balancing component HA. The load balancing component HA is not limited to be applied to a database data access system, but can be applied to a data access system of common electronic equipment. The data query method comprises the following steps:
s21: obtaining a signature of a data query request to be processed;
s22: judging whether a target signature with the signature similarity of the data query request to be processed within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of the data query request to be processed;
s23: if the target signature exists in the signature library, acquiring a database address in signature related information corresponding to the target signature;
s24: and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
The present application discloses a signature information processing method, which stores signature related information of a processed data query request through a signature library, wherein the signature related information comprises a database address, and data used in a processing process of the processed data query request is cached in a database corresponding to the database address. The data query requests to be processed can be distributed to the databases which process similar data query requests for processing, and the cache data in the databases is utilized for processing acceleration. Whether the data query requests are similar or not can be judged through the signature, so that the calculated amount is reduced, and the query rate is improved.
In the implementation process, the signature of the data request obtained in S21 includes two kinds of signatures: one is that the signatures are not similar or the same signature is in a signature library, and at the moment, the similarity between the signature of the data request and the signature in the signature library is not in a set threshold range, such as a data query request to be processed which accesses the database for the first time; the other is that the signatures have similar or identical signatures in the signature library, and the similarity between the signature of the data request and the signatures in the signature library is within a set threshold. Correspondingly, S22 judges whether a target signature with the signature similarity with the data query request to be processed within a set threshold value range exists in a signature library, and two judging results can be obtained: (1) the signature library does not have the target signature, namely the signature library does not have a signature similar to or the same as the signature of the data query request to be processed; (2) the signature library is provided with the target signature, namely the signature library is provided with a signature similar to or the same as the signature of the data query request to be processed.
Based on the different determination results obtained in S22, the present embodiment provides different processing methods. Please refer to fig. 3, which is a process flow of the data query request to be processed when the target signature does not exist in the signature library. After the Client A establishes connection with the database R_1 through the load balancing component HA, the Client A executes the following steps after sending a data query request to be processed:
and 3.1, the load balancing component HA executes signature calculation on the received data query request to be processed, and obtains the signature of the data query request to be processed.
And 3.2, transmitting the signature obtained by calculation to a signature library for comparison, and judging whether similar or identical signatures exist in the signature library. At this time, the judgment result obtained is: no similar or identical signature is performed, and 3.3 is performed.
And 3.3, sending the data query request to be processed to the database R_1. The database R_1 processes the data query request to be processed, and feeds back the processing result and the processing time to the load balancing component HA after the processing is finished.
And 3.4, the load balancing component HA updates the signature library according to the processing time.
And 3.5, the load balancing component HA returns the processing result to the Client A.
The specific method for updating the signature library by the load balancing component HA according to the processing time length is as follows:
when the processing time length of the processed data query request to be processed is greater than or equal to a time threshold value, the processed data query request to be processed is indicated to be slow query, the processing time is longer, signature related information is generated based on the signature of the processed data query request to be processed and the database address of the database for processing the request, the signature related information is sent to a signature library, and the signature library is updated. Further, for the case that the processing time length of the data query request to be processed is greater than or equal to the time threshold, the processing time length of the processed data query request to be processed and the expiration time of the signature of the processed data query request to be processed can be obtained, and based on the removed signature and expiration time of the data query request to be processed and the processing time length and the database address of the data query request to be processed, signature related information is generated and the signature library is updated. That is, the signature-related information may include not only the signature and the database address but also the processing time period and the expiration time. When the expiration time in the signature related information is up, the signature library is automatically updated, the signature related information is deleted, and the data occupation amount of the signature library is reduced. The expiration time of the signature related information may be set according to the expiration time and/or the processing time of the cache data, for example, the expiration time of the cache data may be set to be the expiration time of the corresponding signature related information, or the expiration time of the signature related information may be set to be greater than or equal to the processing time.
In the implementation process, the load balancing component HA may also update the signature library according to the request frequency of the data query request to be processed. When the request frequency of the data query request to be processed is greater than or equal to the frequency threshold, generating a signature correlation message based on the signature of the data query request to be processed and the database address of the data query request to be processed, and sending the signature correlation message to a signature library so as to update the signature library.
Please refer to fig. 4, which is a process flow of the data query request to be processed when the target signature exists in the signature library. After the Client B establishes connection with the database R_2 through the load balancing component HA, the Client B executes the following steps after sending a data query request to be processed:
and 4.1, the load balancing component HA executes signature calculation on the received data query request to be processed, and obtains the signature of the data query request to be processed.
And 4.2, transmitting the signature obtained by calculation to a signature library for comparison, and judging whether similar or identical signatures exist in the signature library. At this time, the judgment result obtained is: there is a target signature (i.e., a signature similar to or the same as the signature of the data query request to be processed) in the signature library having a similarity with the signature of the data access request within a set threshold range, and 4.3 is further performed.
And 4.3, obtaining a database address in signature related information corresponding to the target signature in the signature library, and sending the data query request to be processed to a database R_1 corresponding to the database address for processing, instead of sending the data query request to the database R_2 for processing. The database R_1 processes the data query request to be processed, and feeds back the processing result and the processing time to the load balancing component HA after the processing is finished.
4.4, the load balancing component HA updates the signature library. The specific method for updating the signature library is the same as before, and the signature library can be updated according to the processing time length and/or the request frequency of the data query request to be processed.
4.5, the load balancing component HA returns the processing result to the Client B.
In the above processing steps, due to the presence of the signature library, the data query request to be processed is distributed more reasonably by the load balancing component HA as in step 4.3, that is, the subsequent request is accelerated by using all or part of the cache results of the previous request on the database instance, thereby greatly improving the overall processing speed of the data query request to be processed.
In the implementation process, S21 and S22 relate to signature and signature similarity of the data query request to be processed, and the embodiment of the present disclosure provides the following three ways to calculate the signature and signature similarity:
mode one
Based on the query content of the data query request to be processed, mapping the data query request to be processed into a vector, and taking the vector of the data query request to be processed as a signature thereof.
For example: the data query request to be processed generally comprises a query name, a filtering condition, an operation name and the like, and the query name, the filtering condition, the operation name and the like are mapped into a vector A= { a 1 ,a 2 ,a 3 …. Similarly, the signatures in the signature library are mapped to the processed data query request to obtain vectors according to the methodB={b 1 ,b 2 ,b 3 ,…}。
For the signatures obtained by the method, the similarity between the signatures can be obtained by calculating the included angle between the two corresponding vectors, and the closer the included angle between the two vectors is to zero, the greater the similarity is. Specifically, an included angle between two vectors may be obtained as a similarity between two signatures corresponding to the two vectors, and an included angle cosine value between the two vectors may also be obtained as a similarity between two signatures corresponding to the two vectors. For example, the similarity can be obtained by the following formula:
similarity (A, B) = (A.B)/|A|X|B| manner one
When comparing with the signature in the signature library, A represents the signature of the data query request to be processed, and B represents the signature in the signature library.
Mode two
Enumerating the types of the data query requests to be processed, enumerating and encoding various types of the data query requests to be processed; and obtaining the code corresponding to the type of the data query request to be processed as the signature thereof. The similarity between the signatures can then be seen as a number by the encoding of the two signatures, and the difference in number between the two signatures is obtained as the similarity between the two signatures. Specifically, the numerical difference can be obtained by calculation of the following formula two.
Mode three
Referring to fig. 5, a method for obtaining a signature of a data query request to be processed includes:
s51: and obtaining parameter sequencing of n target parameters affecting the query speed, wherein n is a positive integer.
In particular, the data query requests to be processed are typically implemented by SQL (Structured Query Language ), such as database query requests. The SQL statement contains operations, column names, table names, filtering conditions, sorting and calculating operations, and the like, and since caching is usually performed according to the column names and the filtering conditions, the column names and the filtering conditions affect the query speed, the signature aiming at the query request of the data to be processed can extract the column names affected by the caching in the SQL statement as target parameters, wherein the column names contain a plurality of column names.
And sorting the n target parameters according to the weight of the n target parameters affecting the query speed to obtain the parameter sorting. Considering the conditions of aggregation index, prefix index and the like in the data storage process, the weight and the ordering of each target parameter are defined as follows:
1) The primary key in the column name has a higher weight;
2) The columns contained by the index have higher weights;
3) Columns that occur in multiple indices have higher weights than columns that occur in only a single index;
4) The top ranked column in the index has a higher rank with the same weight.
For example: suppose that obtaining n target parameters includes column names: id and columns 1-7, the weight settings and final ordering positions for each Column name according to the definition above are shown in Table one below:
column name Weighting of Description of the application Final ranking position
Id 1 Main key 2
Column1 2 In two indexesOccurs in (a) 1
Column2 1 Appear in an index 3
Column3 1 Appear in an index 4
Column4 0 Without any index 6
Column5 1 Appear in an index 5
Column6 0 Without any index 7
Column7 0 Without any index 8
List one
According to the weight of each column name, the column names are obtained in the following order:
Column2,Id,Column1,Column3,Column5,Column6,Column7
s52: obtaining m target parameters and filtering conditions contained in a data query request to be processed, wherein m is a positive integer, and m is less than or equal to n.
Some or all of n target parameters are contained in an SQL sentence of a data query request to be processed, namely, the number m of the target parameters contained in the data query request to be processed is smaller than or equal to the number n of the target parameters contained in all the data query requests to be processed. The m target parameters typically form filter conditions with filter statements in SQL, such as select id, column2, column3from table_a window column 1= something and column2 < 2 > =sorting.
S53: and obtaining the code of the data query request to be processed as the signature of the data query request to be processed based on the m target parameters, the parameter ordering and the filtering conditions.
Specifically, the clause following the select in the filter term is taken as the longitude, and the filter clause following where is taken as the latitude. For a data query request to be processed, select clauses in SQL are ordered by parameters, 1 if some target parameter is included, or 0 otherwise. The same applies to the where clauses, ordered by parameter, and if some target parameter is included, 1 is the case, otherwise 0 is the case. And finally, placing the code formed by the select clause in odd bits, and placing the code formed by the where clause in even bits to obtain the final binary sequence code, and taking the final binary sequence code as the signature of the data query request to be processed.
For example: suppose that the parameter ordering of the obtained n target parameters is: column2, id, column1, column3, column5, column6, column7; the filtering statement of a data query request to be processed is as follows:
select id,column2,column3from table_a where column1=something and column2=something;
the select clause constitutes an arrangement 1101000;
the where clause constitutes the arrangement 1010000;
the signature obtained by interpolating the two according to the parity sequence is: 11100110000000.
the signature of the data query request to be processed is obtained through the third calculation in the mode, so that the following steps are ensured: 1) Columns using similar indexes have higher similarity; 2) Sentences of the query similar columns have higher similarity; 3) Sentences which do not use indexes at all have higher similarity if the filtering conditions are similar, so that the accuracy of obtaining the signature based on the method for judging similar signatures is higher. For the signature of the data query request to be processed obtained in the third mode, the method for calculating the similarity of the signature may be:
the signature is regarded as a binary number, the similarity is obtained by calculating the numerical difference between the signature A and the signature B, and the similarity calculation formula is as follows:
similary (A, B) =1-Abs (A-B)/(A+B) formula II
When compared with the signatures in the signature library, A represents the signature of the data query request to be processed, and B represents the signature in the signature library.
According to the data query method provided by the embodiment of the specification, the attached balancing strategy can be carried out on the SQL level, and better load balancing can be carried out on finer dynamics. In addition, according to the data query method, the similarity of the SQL sentences is calculated through the similarity of the signatures, so that the SQL sentences are dynamically balanced in load, the cache resources of a host are fully utilized, and the query speed is greatly improved.
Referring to fig. 6, based on the data query method provided in the foregoing embodiment, the present embodiment further correspondingly provides a data query device, including:
a computing unit 61 for obtaining a signature of the data query request to be processed;
a judging unit 62, configured to judge whether a target signature with a similarity to the signature within a set threshold exists in a signature library, where the signature library is used to store signature related information of a processed data query request;
an obtaining unit 63, configured to obtain, when the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature;
the allocation unit 64 is configured to send the data query request to be processed to a database corresponding to the database address, and process the data query request to be processed through the database.
As an alternative embodiment, the computing unit 61 may obtain the signature of the pending data query request by computing in any of the following ways:
in the first mode, the data query request to be processed is mapped into a vector based on the query content of the data query request to be processed, and the vector is used as the signature.
Enumerating the types of the data query requests to be processed, enumerating and encoding various types of the data query requests to be processed; and obtaining codes corresponding to the types of the data query requests to be processed as the signatures.
Thirdly, obtaining parameter sequencing of n target parameters influencing the query speed; obtaining m target parameters and filtering conditions contained in the data query request to be processed, wherein n and m are positive integers, and m is less than or equal to n; and obtaining codes of the data query requests to be processed as the signatures based on the m target parameters, the parameter ordering and the filtering conditions.
In the implementation process, the device further comprises: the similarity calculation unit 65. When the signature is a vector corresponding to the data query request to be processed, the similarity calculation unit 65 is configured to obtain an included angle between two vectors as a similarity 65 between two signatures corresponding to the two vectors. And when the signature is the code of the data query request to be processed, the similarity calculation unit is used for taking the signature of the data query request to be processed and the signature in the signature library as numerical values, and obtaining the numerical value difference between the signature of the data query request to be processed and the signature in the signature library as the similarity between the two signatures.
As an alternative embodiment, the obtaining unit 63 is further configured to: obtaining a processed data query request with a processing time greater than or equal to a time threshold or with a request frequency greater than or equal to a frequency threshold; the apparatus further comprises: and the generating unit 66 is used for generating the signature related information and storing the signature related information in the signature library based on the signature of the processed data query request and the database address for executing the processing operation.
Specifically, the generating unit 66 includes: an acquisition subunit and a generation subunit. An obtaining subunit, configured to obtain a processing duration of the processed data query request and a failure time of a signature of the processed data query request; and the generation subunit is used for generating the signature related information based on the signature of the processed data query request, the database address, the processing time length and the failure time and storing the signature related information into the signature library.
In a specific implementation process, the data query request to be processed is realized by a structured query language SQL. The n target parameters affecting the query speed include: n column names in the structured query language SQL.
The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be explained in detail here.
Referring to fig. 7, a block diagram of an electronic device 700 for implementing a data query method is shown, according to an exemplary embodiment. For example, the electronic device 700 may be a computer, a database console, a tablet device, a personal digital assistant, or the like.
Referring to fig. 7, an electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power supply component 706, a multimedia component 708, an input/output (I/O) interface 710, and a communication component 712.
The processing component 702 generally controls overall operation of the electronic device 700, such as operations associated with display, data communication, and recording operations. The processing element 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 702 can include one or more modules that facilitate interaction between the processing component 702 and other components.
Memory 704 is configured to store various types of data to support operations at device 700. Examples of such data include instructions for any application or method operating on the electronic device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 706 provides power to the various components of the electronic device 700. Power supply components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 700.
The I/O interface 710 provides an interface between the processing component 702 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The communication component 712 is configured to facilitate communication between the electronic device 700 and other devices, either wired or wireless. The electronic device 700 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication part 712 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 712 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 704, including instructions executable by processor 720 of electronic device 700 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes an electronic device to perform a data query method, the method comprising:
obtaining a signature of a data query request to be processed; judging whether a target signature with the similarity larger than a set threshold value range exists in a signature library or not, wherein the signature library is used for storing signature related information of a processed data query request; if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained; and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the database.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims (13)

1. A method of querying data, the method comprising:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request, the signature related information comprises a database address, and data used in a processing process of the processed data query request is cached in a database corresponding to the database address;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through cache data of the database, wherein the database belongs to a cluster for database reading operation.
2. The method of claim 1, wherein obtaining a signature of a pending data query request comprises:
obtaining parameter sequencing of n target parameters affecting the query speed;
obtaining m target parameters and filtering conditions contained in the data query request to be processed, wherein n and m are positive integers, and m is less than or equal to n;
and obtaining codes of the data query requests to be processed as the signatures based on the m target parameters, the parameter ordering and the filtering conditions.
3. The method of claim 1, wherein obtaining a signature of a pending data query request comprises:
enumerating the types of the data query requests to be processed, enumerating and encoding various types of the data query requests to be processed;
and obtaining codes corresponding to the types of the data query requests to be processed as the signatures.
4. A method according to claim 2 or 3, wherein when the signature is an encoding of the pending data query request, the method further comprises:
and taking the signature of the data query request to be processed and the signature in the signature library as numerical values, and obtaining the numerical value difference between the signature of the data query request to be processed and the signature in the signature library as the similarity between the two signatures.
5. The method of claim 1, wherein obtaining a signature of a pending data query request comprises:
and mapping the data query request to be processed into a vector based on the query content of the data query request to be processed, and taking the vector as the signature.
6. The method of claim 5, wherein when the signature is a vector corresponding to the pending data query request, the method further comprises:
and obtaining the included angle between the two vectors as the similarity between the two signatures corresponding to the two vectors.
7. The method of any one of claims 1-3, 5-6, wherein the method further comprises:
obtaining a processed data query request with a processing time greater than or equal to a time threshold or with a request frequency greater than or equal to a frequency threshold;
and generating the signature related information and storing the signature related information into the signature library based on the signature of the processed data query request and the database address of the processing operation.
8. The method of claim 7, wherein generating the signature-related information based on the signature of the processed data query request, a database address at which processing operations are performed, and storing the signature in the signature library, comprises:
obtaining the processing time length of the processed data query request and the expiration time of the signature of the processed data query request;
and generating signature related information based on the signature of the processed data query request, the database address, the processing time length and the failure time, and storing the signature related information into the signature library.
9. The method of claim 2, wherein the pending data query request is implemented by a structured query language, SQL.
10. The method of claim 9, wherein the n target parameters comprise: n column names in the structured query language SQL.
11. A data query device, comprising:
the computing unit is used for obtaining the signature of the data query request to be processed;
the judging unit is used for judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of the processed data query request, the signature related information comprises a database address, and data used in the processing process of the processed data query request is cached in a database corresponding to the database address;
the acquisition unit is used for acquiring a database address in the signature related information corresponding to the target signature under the condition that the target signature exists in the signature library;
the distribution unit is used for sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through the cache data of the database, wherein the database belongs to a cluster for database reading operation.
12. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor performs the steps of:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request, the signature related information comprises a database address, and data used in a processing process of the processed data query request is cached in a database corresponding to the database address;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through cache data of the database, wherein the database belongs to a cluster for database reading operation.
13. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
obtaining a signature of a data query request to be processed;
judging whether a target signature with the similarity of the signature within a set threshold value range exists in a signature library, wherein the signature library is used for storing signature related information of a processed data query request, the signature related information comprises a database address, and data used in a processing process of the processed data query request is cached in a database corresponding to the database address;
if the target signature exists in the signature library, a database address in the signature related information corresponding to the target signature is obtained;
and sending the data query request to be processed to a database corresponding to the database address, and processing the data query request to be processed through cache data of the database, wherein the database belongs to a cluster for database reading operation.
CN201710744290.4A 2017-08-25 2017-08-25 Data query method and device and electronic equipment Active CN110019289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710744290.4A CN110019289B (en) 2017-08-25 2017-08-25 Data query method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710744290.4A CN110019289B (en) 2017-08-25 2017-08-25 Data query method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110019289A CN110019289A (en) 2019-07-16
CN110019289B true CN110019289B (en) 2023-10-03

Family

ID=67186124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710744290.4A Active CN110019289B (en) 2017-08-25 2017-08-25 Data query method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110019289B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035923A (en) * 2013-03-04 2014-09-10 阿里巴巴集团控股有限公司 Data inquiry method and device
CN104782137A (en) * 2012-11-23 2015-07-15 索尼公司 Information processing device and information processing method
CN106104480A (en) * 2014-04-03 2016-11-09 斯特拉托斯卡莱有限公司 Similarity is used to retain the memory management of the cluster wide signed
CN107085594A (en) * 2017-03-14 2017-08-22 武汉大学 Subgraph match method based on set similarity in big chart database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10152509B2 (en) * 2015-09-23 2018-12-11 International Business Machines Corporation Query hint learning in a database management system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104782137A (en) * 2012-11-23 2015-07-15 索尼公司 Information processing device and information processing method
CN104035923A (en) * 2013-03-04 2014-09-10 阿里巴巴集团控股有限公司 Data inquiry method and device
CN106104480A (en) * 2014-04-03 2016-11-09 斯特拉托斯卡莱有限公司 Similarity is used to retain the memory management of the cluster wide signed
CN107085594A (en) * 2017-03-14 2017-08-22 武汉大学 Subgraph match method based on set similarity in big chart database

Also Published As

Publication number Publication date
CN110019289A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
EP3637280B1 (en) Data storage method and device, and storage medium
US10546021B2 (en) Adjacency structures for executing graph algorithms in a relational database
US20150100600A1 (en) General property hierarchy systems and methods for web applications
US11023499B2 (en) Object relational mapping for relational databases
US20240054147A1 (en) System for list-based database replication
CN107636655B (en) System and method for providing data as a service (DaaS) in real time
US9646053B2 (en) OLTP compression of wide tables
US20200042609A1 (en) Methods and systems for searching directory access groups
US11640412B2 (en) Materialized view sub-database replication
US11354366B2 (en) Method and system for creating and using persona in a content management system
US20160343088A1 (en) Generating a supplemental description of an entity
CN110321367B (en) Data storage method, data query method, related device and electronic equipment
CN110019289B (en) Data query method and device and electronic equipment
US10831731B2 (en) Method for storing and accessing data into an indexed key/value pair for offline access
CN113360889B (en) Rights management method and apparatus, server, and computer-readable storage medium
US20190303462A1 (en) Methods and apparatuses for automated performance tuning of a data modeling platform
US20150227629A1 (en) Financial reporting system with reduced data redundancy
CN115599787A (en) Level sub-metering method and device, electronic equipment and storage medium
US11347709B2 (en) Hierarchical metadata enhancements for a memory management system
US10114864B1 (en) List element query support and processing
CN110597849B (en) Data query method and device
US10438129B1 (en) Regularization relaxation scheme
US20200081925A1 (en) Method and system for cached early-binding document search
US20150186794A1 (en) Template regularization for generalization of learning systems
CN116561374B (en) Resource determination method, device, equipment and medium based on semi-structured storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40010847

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant