CN111159219B - Data management method, device, server and storage medium - Google Patents

Data management method, device, server and storage medium Download PDF

Info

Publication number
CN111159219B
CN111159219B CN201911410067.1A CN201911410067A CN111159219B CN 111159219 B CN111159219 B CN 111159219B CN 201911410067 A CN201911410067 A CN 201911410067A CN 111159219 B CN111159219 B CN 111159219B
Authority
CN
China
Prior art keywords
data
sqlite
data file
information
storage system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911410067.1A
Other languages
Chinese (zh)
Other versions
CN111159219A (en
Inventor
林敏�
叶必胜
周小敏
王全胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Yaxin Software Co ltd
Original Assignee
Hunan Yaxin Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Yaxin Software Co ltd filed Critical Hunan Yaxin Software Co ltd
Priority to CN201911410067.1A priority Critical patent/CN111159219B/en
Publication of CN111159219A publication Critical patent/CN111159219A/en
Application granted granted Critical
Publication of CN111159219B publication Critical patent/CN111159219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data management method, a device, a server and a storage medium, wherein the data inquiry request is received and carries data inquiry conditions; acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into SQLite data files by a computing platform; and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result. Based on the invention, mass data storage can be realized based on the distributed storage system, and standard query of SQL (structured query language) and quick response of data query can be realized by querying the SQLite data file in the distributed storage system, so that the database is not required to be improved, and the use threshold is reduced.

Description

Data management method, device, server and storage medium
Technical Field
The present invention relates to the field of big data management technologies, and in particular, to a data management method, device, server, and storage medium.
Background
The data volume accessed by the large data center is more and the application is more and more extensive at present. In some data application scenarios, it is required to be able to realize both storage of mass data and quick response to data query. Hadoop, while capable of storing massive data, is slow to respond to data queries, especially small data volume queries. Relational databases, while capable of achieving quick response to data queries based on SQL, have limited data storage capabilities. Although Nosql can store mass data, certain improvement is needed to realize SQL query so as to improve the response speed of data query, and the use threshold is higher.
Disclosure of Invention
In view of this, the present application provides a data management method, apparatus, server and storage medium, so as to implement fast response to storage of mass data and data query on the basis of reducing the use threshold.
The technical proposal is as follows:
the first aspect of the invention discloses a data management method, comprising the following steps:
receiving a data query request, wherein the data query request carries data query conditions;
acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into SQLite data files by a computing platform;
And inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result.
Optionally, the acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system includes:
acquiring a table name and an account period in the data query condition;
performing hash calculation on the table name and the account period to generate second information;
querying the SQLite data file which is carried with the first information and is the same as the second information from the SQLite data file stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, respectively converting each data file into an SQLite data file, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
Optionally, the acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system includes:
acquiring a table name in the data query condition;
performing hash calculation on the table names to generate second information;
querying the SQLite data file which is carried with the first information and is the same as the second information from the SQLite data file stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing the whole original data into data files, converting each data file into SQLite data files respectively, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
Optionally, the receiving a data query request includes: and receiving the data query request sent through the data access interface according to the data access interface specification under the condition that the historical data query result of the data query request is not stored in the cache.
Optionally, the method further comprises:
and under the condition that the historical data query result of the data query request is stored in the cache, taking the historical data query result as the data query result.
Optionally, the querying, from each target SQLite data file, information matching the data query condition as a data query result includes:
loading each target SQLite data file into a memory;
performing aggregation calculation on each target SQLite data file loaded into a memory to obtain information matched with the data query condition;
and taking the information as a data query result of the data query request.
Optionally, the distributed storage system is an HBase database based on a Hadoop platform.
A second aspect of the present invention discloses a data management apparatus comprising:
the receiving unit is used for receiving a data query request, wherein the data query request carries data query conditions;
the first acquisition unit is used for acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after the computing platform converts original data into SQLite data files;
And the first query unit is used for querying information matched with the data query conditions from each target SQLite data file as a data query result.
A third aspect of the present invention discloses a server comprising: at least one memory and at least one processor; the memory stores a program, and the processor calls the program stored in the memory, where the program is used to implement the data management method disclosed in any one of the first aspect of the present invention.
A fourth aspect of the present invention discloses a computer-readable storage medium having stored therein computer-executable instructions for performing the data management method as disclosed in any one of the above-described first aspects of the present invention.
The invention provides a data management method, a device, a server and a storage medium, which are used for receiving a data query request, acquiring each target SQLite data file related to a data query condition from SQLite data files stored in a distributed storage system, and further querying information matched with the data query condition from each target SQLite data file as a data query result. The technical method provided by the invention can realize mass data storage based on the distributed storage system, can realize standard query of SQL and improve quick response of data query by querying the SQLite data file in the distributed storage system, does not need to improve the database, and reduces the use threshold.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a data management system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data management method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a data management method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for storing SQLite data files in a distributed storage system according to an embodiment of the present invention;
fig. 5 is a flow chart of a method for querying information matched with a data query condition from each target SQLite data file as a data query result according to an embodiment of the present invention;
FIG. 6 is an exemplary diagram of another data management method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data management device according to an embodiment of the present invention;
Fig. 8 is a hardware block diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, the large data center has more and more data to be accessed and has wider application. In some data application scenarios, it is required to be able to realize both storage of mass data and quick response to data query. The existing method for storing mass data is to calculate and summarize data through Hadoop to obtain original data, store the original data into a relational database or transfer the original data into Nosql, so that data query is performed through the relational database or the Nosql when data query is performed.
When the data volume of the original data is large, the storage capacity of the data of the relational database is limited, so that the storage of mass data cannot be performed, and when the user stores and inquires the same piece of original data, the condition of locking a table is caused, so that production faults are caused. Nosql can realize the storage of mass data, but the Nosql defaults to not support SQL query, when SQL query is needed, corresponding big data components are needed to be overlapped in the Nosql, the Nosql has certain limitation on query conditions and query scenes, the query scenes are planned in advance, and then the corresponding query conditions are formulated according to the query scenes, so that SQL query can be realized, the requirements on the design capability of technicians are higher, and the use threshold is high.
Although the Hadoop can further realize the storage and the inquiry of the original data on the basis of realizing the calculation of the data to obtain the original data, the inquiry response of the original data is slower, especially when the data size of the original data is smaller, the inquiry response of the small data size is slower due to the larger early preparation work of the Hadoop, namely the long starting time.
Therefore, the invention provides a data management method, a device, a server and a storage medium, mass data storage is realized based on a distributed storage system, SQL inquiry is realized by inquiring SQLite data files in the distributed storage system, so that quick response of data inquiry is improved, improvement on a database is not needed, and the use threshold is reduced.
Referring to fig. 1, a schematic structural diagram of a data management system according to an embodiment of the present invention is shown. The data management system comprises a server, a data access interface, data access middleware and an HBase database. The HBase database stores SQLite data files.
Referring to fig. 2 in conjunction with fig. 1, a schematic diagram of a data management method according to an embodiment of the present invention is shown, where as shown in fig. 2, the following details are shown:
the server receives a data query request sent by a user based on a front-end application, judges whether a historical data query result of the data query request is stored in a cache, and returns the historical data query result to the user as the data query result of the data query request if the historical data query result exists; and if the data access request does not exist, calling a data access interface according to the data access interface specification, and sending the data query request to the data access middleware.
The data access middleware analyzes the data query request based on the SQL access engine to obtain data query conditions; acquiring each target SQLite data file related to the data query condition from the HBase database according to the data query condition; loading each target SQLite data file into a memory of the target SQLite data file; and merging all the target SQLite data files in the memory, carrying out aggregation calculation on all the merged target SQLite data files to obtain information matched with the data query condition, and further taking the obtained information as a data query result of the data query request. And returning the data query result to the server based on the data access interface, so that the server returns the data query result to the front-end application for the user to view.
It should be noted that the hbae database is a distributed storage system based on a Hadoop platform. The front-end application may be a report Web application.
In this embodiment of the present application, when the server returns the data query result of the data query request to the front-end application, the data query result of the data query request may be stored in a cache, and after the data query result of the data query request is stored in the cache, the data query result may be referred to as a historical data query result of the data query request. Therefore, when the data query request is received, whether a historical data query result of the data query request is stored in the cache or not can be judged; if the historical data query result of the data query request is stored in the cache, returning the historical data query result to the user as the data query result of the data query request; if the historical data query result of the data query request is not stored in the cache, the data access interface is called according to the data access interface specification to send the data query request to the data access middleware, and the data query request is further processed through the SQL access engine in the data access middleware.
For example, the cache stores the historical data query result of the data query request 1, the historical data query result of the data query request 2, and the historical data query result of the data query request 3. If the received data query request is the data query request 1, determining that the historical data query result of the data query request is stored in the cache, and further taking the historical data query result of the data query request 1 as the data query result of the received data query request.
Otherwise, if the received data query request is the data query request 4, and it is determined that the historical data query result of the data query request is not stored in the cache, the data access interface is called according to the data access interface specification to send the data query request to the data access middleware, and the data query request is further processed through the SQL access engine in the data access middleware.
It should be noted that, when the storage duration of the historical data query result stored in the cache reaches the preset duration, the historical data query result with the storage duration reaching the preset duration is deleted from the cache.
In the embodiment of the present application, the preset duration may be one day, two days, or the like, and the above is only a preferred value of the preset duration provided in the embodiment of the present application, and the inventor may set a specific value of the preset duration according to his own requirement, which is not limited herein.
In the embodiment of the invention, when a data query request is received, whether a historical data query result of the data query request is stored in a cache is judged; if the historical data query result of the data query request is stored in the cache, the historical data query result is used as the data query result of the data query request; if the historical data query result of the data query request is not stored in the cache, each target SQLite data file related to the data query condition is obtained from the SQLite data files stored in the distributed storage system, and then information matched with the data query condition is queried from each target SQLite data file to serve as the data query result. According to the method and the device, under the condition that the historical data query result of the data query request is stored in the cache, the data query request is not processed, but the historical data query result is directly used as the data query result of the data query request, and therefore the response efficiency of the data query is improved. Under the condition that the historical data query result of the data query request is not stored in the cache, mass data storage is realized based on the distributed storage system, and the SQL access engine can query the SQLite data file in the distributed storage system to realize standard query of SQL and improve the quick response of the data query, so that the database is not required to be improved, and the use threshold is reduced.
A detailed description will now be given of a data management method according to an embodiment of the present invention, where the data management method is applied to a data access middleware, and specifically applied to an SQL access engine in the data access middleware. As shown in fig. 3, the data management method specifically includes the following steps:
s301: receiving a data query request, wherein the data query request carries a data query condition;
in the embodiment of the application, under the condition that the historical data query result of the data query request is not stored in the cache, the data query request sent through the data access interface according to the data access interface specification is received. Wherein the cache may be a data cache region.
In this embodiment of the present application, the manner of determining whether the historical data query result of the data query request is stored in the cache may be: the user sends a data query request to the server based on the front-end application, and the server judges whether a historical data query result of the data query request is stored in a cache after receiving the data query request; if the historical data query result of the data query request is stored in the cache, sending the historical data query result as a data query result; and if the historical data query result of the data query request is not stored in the cache, calling the data access interface according to the data access interface specification to send the data query request to the data access middleware.
S302: acquiring each target SQLite data file related to the data query request condition from SQLite data files stored in a distributed storage system;
in the specific execution process of step S302, the distributed storage system stores SQLite data files, and when receiving a data query request, the SQLite data files parse the data query request based on the SQL access engine to obtain data query conditions; and further, acquiring each target SQLite data file related to the data query condition from the distributed storage system according to the data query condition.
In the embodiment of the application, the SQLite data file in the distributed storage system is stored in the distributed storage system after the computing platform converts the original data into the SQLite data file.
As a preferred manner of storing the SQLite data file in the distributed storage system, as shown in fig. 4, the manner may be: dividing the original data into data files with different account periods through a computing platform; converting each data file into SQLite data files respectively; and carrying out hash calculation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file, taking the result obtained by carrying out hash calculation on the table name and the account period as first information of the SQLite data file, and further storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
In the embodiment of the application, after the computing platform performs summarizing computation on the data to obtain the original data, dividing the original data into data files with different account periods according to months. For example, the computing platform may divide the raw data of one year into a data file of 1 month accounting period, a data file of 2 months accounting period, and a data file of … … months accounting period by month. Regarding a specific manner of dividing the original data into data files with different accounting periods through the computing platform, the inventor can set the data files according to own requirements, and the embodiment of the application is not limited.
In the embodiment of the application, for each data file, the data file can be converted into an SQLite data file, and each converted SQLite data file has a unique table name and account period; for each SQLite data file, hash calculation may be performed according to the table name and account period of the SQLite data file to obtain the first information of the SQLite data file. Wherein the first information may be a character string of 32 bits or 64 bits. And then pre-partitioning can be performed according to the target information in the first information and according to the odd-numbered hops among the parts 01 f-fef, and SQLite data files are stored in the distributed storage system according to the pre-partitioning result so as to balance the distributed storage system. The target information in the first information may be the first 3-bit character in the first information.
Accordingly, in the embodiment of the present application, the manner of acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system may be: after analyzing the data query request based on the SQL access engine to obtain data query conditions, obtaining table names and account periods in the data query conditions; performing hash calculation on the acquired table names and account periods, and taking the result obtained by performing hash calculation on the table names and the account periods as first information; querying the SQLite data file which is the same as the second information from the first information carried by each SQLite data file of the distributed storage system, and further determining the queried SQLite data file as a target SQLite data file.
In the embodiment of the application, the result obtained by carrying out hash calculation on the table name and the account period in the data query condition is used as the first information, so that the partition of the target SQLite data file stored in the distributed storage system can be quickly determined according to the target information in the first information, and then the target SQLite data file related to the data condition is determined from the determined partition.
For example, the computing platform may divide the raw data into a 1 month account period data file and a 2 month account period file; converting the 1 month account period file into an SQLite data file 1; converting the data file of the 2 month account period into an SQLite data file 2; the table name of the SQLite data file 1 is table name 1, and the table name of the SQLite data file 2 is table name 2. The first information of the SQLite data file 1 is generated by carrying out hash calculation by using the table name 1 and the 1 month account period and the first information of the SQLite data file 2 is generated by carrying out hash calculation by using the table name 2 and the 2 month account period. If the table name and the account period in the acquired data query condition are respectively table name 1 and 1 month account period; the second information generated by carrying out hash calculation on the table name 1 and the 1 month account period is k1, and further the SQLite data file with the same first information as the second information carried in the SQLite data file stored by the distributed storage system is determined to be SQLite data file 1.
As another preferred manner of the embodiment of the present application, as shown in fig. 4, the manner of storing the SQLite data file into the distributed storage system may be: dividing the original data into data files in full quantity through a computing platform; converting each data file into SQLite data files respectively; and carrying out hash calculation on each SQLite data file by utilizing the table name of the SQLite data file, taking the result obtained by carrying out hash calculation on the table name as first information of the SQLite data file, and further storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
In this embodiment of the present application, for each SQLite data file, the SQLite data file has a unique table name, and the hash calculation may be performed according to the table name of the SQLite data file to obtain the first information of the SQLite data file. Wherein the first information may be a character string of 32 bits or 64 bits. And then the pre-partition can be performed according to the target information in the first information and according to the odd-numbered hops among the parts 01 f-fef, and the SQLite data file is stored in the distributed storage system according to the pre-partition result so as to balance the distributed storage system. The target information in the first information may be the first 3-bit character in the first information.
Accordingly, in the embodiment of the present application, the manner of acquiring each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system may be: after analyzing the data query request based on the SQL access engine to obtain data query conditions; obtaining table names in the data query condition, carrying out hash calculation on the obtained table names, and taking a result obtained by carrying out hash calculation on the table names as second information; and querying the SQLite data file of the second information from the first information carried by each SQLite data file of the distributed storage system, and further determining the queried SQLite data file as a target SQLite data file.
In the embodiment of the application, the result obtained by carrying out hash calculation on the table names in the data query condition is used as the second information, so that the partition of the target SQLite data file stored in the distributed storage system can be quickly determined according to the target information in the second information, and then the target SQLite data file related to the data condition is determined from the determined partition.
For example, the computing platform is used for dividing the whole amount of the original data into a data file 1 and a data file 1, converting the data file 1 into an SQLite data file 1 and converting the data file 2 into an SQLite data file 2; the table name of the SQLite data file 1 is table name 1, and the table name of the SQLite data file 2 is table name 2. Hash calculation is performed by using table name 1 to generate first information of the SQLite data file 1 as k1, and hash calculation is performed by using table name 2 to generate first information of the SQLite data file 2 as k2. If the table name in the acquired data query condition is table name 1; and if the second information generated by carrying out hash calculation on the table name 1 is k1, determining that the SQLite data file with the same first information as the second information carried in the SQLite data file stored in the distributed storage system is the SQLite data file 1.
Preferably, in the embodiment of the present application, the distributed storage system is an HBase database based on a Hadoop platform.
It should be noted that, based on the HBase database, the distributed file system hdsf of the Hadoop platform is applied, so as to realize the distributed storage of the file.
S303: and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result.
Fig. 5 is a flow chart of a method for querying information matched with a data query condition from each target SQLite data file as a data query result according to an embodiment of the present invention.
As shown in fig. 5, the method includes:
s501: loading each target SQLite data file into a memory;
in the embodiment of the application, each target SQLite data file matched with the data query condition is acquired from the distributed storage system based on the SQL access engine, and then the acquired target SQLite data files are loaded into the memory.
S502: performing aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query condition;
in the embodiment of the application, after loading each target SQLite data file into the memory, merging each target SQLite data file in the memory based on the SQL access engine, and further performing aggregate calculation on each merged target SQLite data file to obtain information matched with the data query condition.
It should be noted that, merging each target SQLite data file in the memory may be understood as: and placing each target SQLite data file in the memory into the same SQLite data file.
S503: and taking the information as a data query result of the data query request.
Further, in the embodiment of the application, the SQL access engine of the data access middleware returns the data query result of the data query request to the server based on the data access interface, so that the server returns the data query result of the data query request to the front-end application for the user to view.
The invention provides a data management method, which is applied to a data access middleware, in particular to an SQL access engine of the data access middleware, receives a data query request, acquires each target SQLite data file related to a data query condition from SQLite data files stored in a distributed storage system, and further queries information matched with the data query condition from each target SQLite data file as a data query result. The technical method provided by the invention can realize mass data storage based on the distributed storage system, and the SQL access engine queries the SQLite data file in the distributed storage system to realize standard query of SQL and improve the quick response of data query, so that the database is not required to be improved, and the use threshold is reduced.
In order to better understand the content in the above-provided data management method, the embodiment of the present invention provides an example of a data management method based on the application in the production and use of the index library. As shown in fig. 6, specifically:
the SPARK index calculation platform performs summarization calculation on the data in the data warehouse to obtain original data, and divides the original data into data files with different account periods; converting each data file into SQLite data files respectively; and performing CUBE (compute unified library) storage, namely performing hash computation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file, taking the result obtained by performing hash computation on the table name and the account period as first information of the SQLite data file, and further storing the SQLite data file in a distributed storage system according to target information in the first information of the SQLite data file.
The server receives an SQL request sent by a user based on a front-end application, and queries a historical data to judge whether the SQL request is stored in a cache; if the historical data query result of the SQL request is stored in the cache, the historical data query result is returned to the front-end application as the data query result of the SQL request.
If the historical data query result of the SQL request is not stored in the cache, calling a data access interface according to the data access interface specification to send the data query request to the index aggregation middleware, namely the data access middleware; the SQL access engine based on the index aggregation middleware analyzes the SQL request to obtain a data query condition; obtaining table names and account periods in the data query conditions, carrying out hash computation on the obtained table names and account periods, and taking results obtained by carrying out hash computation on the table names and the account periods as second information; querying the SQLite data files which are the same as the second information from the first information carried by each SQLite data file of the distributed storage system, and further determining the queried SQLite data files as target SQLite data files; loading each target SQLite data file into a memory, merging each target SQLite data file in the memory, and performing SQL calculation on each merged target SQLite data file, namely performing aggregation calculation on each merged target SQLite data file to obtain information matched with a data query condition; and returning the information matched with the data query condition to the server as a data query result based on the data access interface, so that the server returns the data query result to the front-end application for the user to check.
Corresponding to the data management method provided in the above embodiment of the present invention, as shown in fig. 7, the embodiment of the present invention further provides a schematic structural diagram of a data management device. The data management device includes:
a receiving unit 71, configured to receive a data query request, where the data query request carries a data query condition;
a first obtaining unit 72, configured to obtain each target SQLite data file related to the data query condition from the SQLite data files stored in the distributed storage system, where the SQLite data files in the distributed storage system are stored in the distributed storage system after the computing platform converts the original data into the SQLite data files;
and a first query unit 73, configured to query, from each target SQLite data file, information matching the data query condition as a data query result.
The specific principle and execution process of each unit in the data management device disclosed in the above embodiment of the present invention are the same as those of the data management method disclosed in the above embodiment of the present invention, and may refer to the corresponding parts in the data management method disclosed in the above embodiment of the present invention, and will not be described in detail here.
The invention provides a data management device, which receives a data query request, acquires each target SQLite data file related to a data query condition from SQLite data files stored in a distributed storage system, and further queries information matched with the data query condition from each target SQLite data file as a data query result. The technical method provided by the invention can realize mass data storage based on the distributed storage system, and the SQL access engine queries the SQLite data file in the distributed storage system to realize standard query of SQL and improve the quick response of data query, so that the database is not required to be improved, and the use threshold is reduced.
In an embodiment of the present application, preferably, the first obtaining unit includes:
the second acquisition unit is used for acquiring the table name and account period in the data query condition;
the first calculation unit is used for carrying out hash calculation on the table name and the account period to generate second information;
the second query unit is used for querying the SQLite data file which is carried by the first information and is the same as the second information from the SQLite data files stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, respectively converting each data file into an SQLite data file, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
In an embodiment of the present application, preferably, the first obtaining unit includes:
a third obtaining unit, configured to obtain a table name in the data query condition;
The second calculation unit is used for carrying out hash calculation on the table names to generate second information;
the third query unit is used for querying the SQLite data file which is carried by the first information and is the same as the second information from the SQLite data files stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing the whole original data into data files, converting each data file into SQLite data files respectively, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
In an embodiment of the present application, preferably, the receiving unit includes:
and the receiving subunit is used for receiving the data query request sent through the data access interface according to the data access interface specification under the condition that the historical data query result of the data query request is not stored in the cache.
Further, the data management device provided in the embodiment of the present application further includes:
And the first determining unit is used for taking the historical data query result as the data query result when the historical data query result of the data query request is stored in the cache.
In the embodiment of the application, under the condition that the historical data query result of the data query request is not stored in the cache, the data query request is not processed, and the historical data query result is directly used as the data query result of the data query request, so that the response efficiency of the data query is improved.
In an embodiment of the present application, preferably, the first query unit includes:
the loading unit is used for loading each SQLite data file to the memory;
the aggregation calculation unit is used for carrying out aggregation calculation on each target SQLite data file loaded into the memory to obtain information matched with the data query condition;
and the second determining unit is used for taking the information as a data query result of the data query request.
In the embodiment of the present application, preferably, the distributed storage system is an HBase database based on a Hadoop platform.
The following describes in detail a hardware structure of a server to which the data management method provided in the embodiment of the present application is applicable, taking an example that the data management method is applied to the server.
The data management method provided by the embodiment of the application can be applied to a server, and the server can be service equipment for providing services for users by a network side, and can be a server cluster formed by a plurality of servers or a single server.
Optionally, fig. 8 is a block diagram illustrating a hardware structure of a server, to which the data management method provided in the embodiment of the present application is applicable, and referring to fig. 8, the hardware structure of the server may include: a processor 81, a memory 82, a communication interface 83 and a communication bus 84;
in the embodiment of the present invention, the number of the processor 81, the memory 82, the communication interface 83 and the communication bus 84 may be at least one, and the processor 81, the memory 82 and the communication interface 83 complete communication with each other through the communication bus 84;
the processor 81 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 82 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
Wherein the memory stores a program, and the processor is operable to invoke the program stored in the memory, the program being operable to:
receiving a data query request, wherein the data query request carries a data query condition;
acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into SQLite data files by a computing platform;
and inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result.
The function of the program may be referred to in the above description of a data management method provided in the embodiments of the present application, which is not described herein.
Further, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions for executing the data management method.
For details of the computer-executable instructions, reference is made to the above detailed description of a data management method provided in the embodiments of the present application, which is not repeated here.
The foregoing has described in detail a data management method, apparatus, server and storage medium provided by the present invention, and specific examples have been applied herein to illustrate the principles and embodiments of the present invention, and the above examples are only for aiding in the understanding of the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include, or is intended to include, elements inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. A method of data management, comprising:
receiving a data query request, wherein the data query request carries data query conditions;
acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after original data are converted into SQLite data files by a computing platform;
inquiring information matched with the data inquiry conditions from each target SQLite data file to serve as a data inquiry result;
The step of acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system comprises the following steps:
acquiring a table name and an account period in the data query condition;
performing hash calculation on the table name and the account period to generate second information;
querying the SQLite data file which is carried with the first information and is the same as the second information from the SQLite data file stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, respectively converting each data file into an SQLite data file, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
2. The method of claim 1, wherein the retrieving each target SQLite data file associated with the data query condition from the SQLite data files stored in the distributed storage system further comprises:
Acquiring a table name in the data query condition;
performing hash calculation on the table names to generate second information;
querying the SQLite data file which is carried with the first information and is the same as the second information from the SQLite data file stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing the whole original data into data files, converting each data file into SQLite data files respectively, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
3. The method of claim 1, wherein the receiving a data query request comprises: and receiving the data query request sent through the data access interface according to the data access interface specification under the condition that the historical data query result of the data query request is not stored in the cache.
4. A method according to claim 3, further comprising:
And under the condition that the historical data query result of the data query request is stored in the cache, taking the historical data query result as the data query result.
5. The method according to claim 1, wherein the querying information matching the data query condition from each target SQLite data file as a data query result comprises:
loading each target SQLite data file into a memory;
performing aggregation calculation on each target SQLite data file loaded into a memory to obtain information matched with the data query condition;
and taking the information as a data query result of the data query request.
6. The method according to any one of claims 1-5, wherein the distributed storage system is a Hadoop platform based HBase database.
7. A data management apparatus, comprising:
the receiving unit is used for receiving a data query request, wherein the data query request carries data query conditions;
the first acquisition unit is used for acquiring each target SQLite data file related to the data query condition from SQLite data files stored in a distributed storage system, wherein the SQLite data files in the distributed storage system are stored in the distributed storage system after the computing platform converts original data into SQLite data files;
The first query unit is used for querying information matched with the data query conditions from each target SQLite data file as a data query result;
the first acquisition unit includes:
the second acquisition unit is used for acquiring the table name and account period in the data query condition;
the first calculation unit is used for carrying out hash calculation on the table name and the account period to generate second information;
the second query unit is used for querying the SQLite data file which is carried by the first information and is the same as the second information from the SQLite data files stored in the distributed storage system, and determining the queried SQLite data file as a target SQLite data file;
the computing platform is used for dividing original data into data files with different account periods, respectively converting each data file into an SQLite data file, carrying out hash computation on each SQLite data file by using the table name of the SQLite data file and the account period corresponding to the SQLite data file to generate first information of the SQLite data file, and storing the SQLite data file in the distributed storage system according to target information in the first information of the SQLite data file.
8. A server, characterized by at least one memory and at least one processor; the memory stores a program, and the processor calls the program stored in the memory, the program being for implementing the data management method according to any one of claims 1 to 6.
9. A computer-readable storage medium having stored therein computer-executable instructions for performing the data management method of any one of claims 1-6.
CN201911410067.1A 2019-12-31 2019-12-31 Data management method, device, server and storage medium Active CN111159219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911410067.1A CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911410067.1A CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN111159219A CN111159219A (en) 2020-05-15
CN111159219B true CN111159219B (en) 2023-05-23

Family

ID=70559920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911410067.1A Active CN111159219B (en) 2019-12-31 2019-12-31 Data management method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN111159219B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694708A (en) * 2020-05-28 2020-09-22 新浪网技术(中国)有限公司 Data query method and device, electronic equipment and storage medium
CN114143279B (en) * 2020-08-13 2023-10-24 北京有限元科技有限公司 Interactive recording sampling method and device and storage medium
CN112000692B (en) * 2020-09-02 2023-06-23 平安养老保险股份有限公司 Page query feedback method and device, computer equipment and readable storage medium
CN112860695B (en) * 2021-02-08 2023-08-04 北京百度网讯科技有限公司 Monitoring data query method, device, equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN106940778A (en) * 2017-03-10 2017-07-11 华东师范大学 A kind of encryption data method cracked based on the parallel dictionaries of GPU in support storehouse
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN109840254A (en) * 2018-12-14 2019-06-04 湖南亚信软件有限公司 A kind of data virtualization and querying method, device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007036932A2 (en) * 2005-09-27 2007-04-05 Zetapoint Ltd. Data table management system and methods useful therefor
US20160267132A1 (en) * 2013-12-17 2016-09-15 Hewlett-Packard Enterprise Development LP Abstraction layer between a database query engine and a distributed file system
US10545935B2 (en) * 2015-04-20 2020-01-28 Oracle International Corporation System and method for providing access to a sharded database using a cache and a shard technology
CN106649828B (en) * 2016-12-29 2019-12-24 ***股份有限公司 Data query method and system
US11003699B2 (en) * 2018-01-24 2021-05-11 Walmart Apollo, Llc Systems and methods for high efficiency data querying
US10587675B2 (en) * 2018-02-09 2020-03-10 InterPro Solutions, LLC Offline mobile data storage system and method
CN110309334B (en) * 2018-04-20 2023-07-18 腾讯科技(深圳)有限公司 Query method, system, computer device and readable storage medium for graph database
CN109471890A (en) * 2018-10-16 2019-03-15 深圳壹账通智能科技有限公司 Generation method, terminal device and the medium of report file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN106940778A (en) * 2017-03-10 2017-07-11 华东师范大学 A kind of encryption data method cracked based on the parallel dictionaries of GPU in support storehouse
CN109299133A (en) * 2017-07-24 2019-02-01 迅讯科技(北京)有限公司 Data query method, computer system and non-transitory computer-readable medium
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN109840254A (en) * 2018-12-14 2019-06-04 湖南亚信软件有限公司 A kind of data virtualization and querying method, device

Also Published As

Publication number Publication date
CN111159219A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111159219B (en) Data management method, device, server and storage medium
US8892677B1 (en) Manipulating objects in hosted storage
EP3236365A1 (en) Data query method and device
CN106326309B (en) Data query method and device
CN109766318B (en) File reading method and device
CN111723161A (en) Data processing method, device and equipment
CN113111038B (en) File storage method, device, server and storage medium
CN110659971B (en) Transaction data processing method and device
CN107636655B (en) System and method for providing data as a service (DaaS) in real time
CN112307062A (en) Database aggregation query method, device and system
CN114116827B (en) Query system and method for user portrait data
CN106919607B (en) Data access method, device and system
CN111400301A (en) Data query method, device and equipment
JPWO2016092604A1 (en) Data processing system and data access method
CN112579633A (en) Data retrieval method, device, equipment and storage medium
CN116108036A (en) Method and device for off-line exporting back-end system data
CN112035413B (en) Metadata information query method, device and storage medium
CN113688148A (en) Urban rail data query method and device, electronic equipment and readable storage medium
CN109543079B (en) Data query method and device, computing equipment and storage medium
CN111459981B (en) Query task processing method, device, server and system
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN108287853B (en) Data relation analysis method and system
CN112861031A (en) URL (Uniform resource locator) refreshing method, device and equipment in CDN (content delivery network) and CDN node
US9727655B2 (en) Searching system, method and P2P device for P2P device community
CN113094391B (en) Calculation method, device and equipment for data summarization supporting cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant