CN107291947B - Semi-structured data query method and distributed NewSQL database system - Google Patents

Semi-structured data query method and distributed NewSQL database system Download PDF

Info

Publication number
CN107291947B
CN107291947B CN201710580416.9A CN201710580416A CN107291947B CN 107291947 B CN107291947 B CN 107291947B CN 201710580416 A CN201710580416 A CN 201710580416A CN 107291947 B CN107291947 B CN 107291947B
Authority
CN
China
Prior art keywords
data
unit
user
query
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710580416.9A
Other languages
Chinese (zh)
Other versions
CN107291947A (en
Inventor
晋彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunrun Da Data Service Co ltd
Original Assignee
Yunrun Da Data Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunrun Da Data Service Co Ltd filed Critical Yunrun Da Data Service Co Ltd
Publication of CN107291947A publication Critical patent/CN107291947A/en
Application granted granted Critical
Publication of CN107291947B publication Critical patent/CN107291947B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24545Selectivity estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5022Workload threshold

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Operations Research (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a method for querying semi-structured data, which comprises the following steps: accessing a user request in an interface mode of JDCB/ODBC, wherein the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition; analyzing the user request, compiling and generating a corresponding execution plan; acquiring index data corresponding to the query condition requested by the user according to an execution plan; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type; inquiring a data table according to the acquired index data so as to acquire the corresponding inquiry result; wherein the JSON data is stored as a whole; and returning the query result to the user. The invention also discloses a distributed NewSQL database system. The invention realizes the data query in the JSON format and solves the problem of poor effect and performance when processing the semi-structured data.

Description

Semi-structured data query method and distributed NewSQL database system
Technical Field
The invention relates to the technical field of big data, in particular to a semi-structured data query method and a distributed NewSQL database system.
Background
The Hbase unit is currently one of the most well-known distributed NoSQL databases in the Hadoop ecosystem. The Hbase unit main components comprise an HMmaster and an HRegionserver, a table type data model is provided for a user, a plurality of regions are divided according to a main key range, the HMmaster is responsible for managing and distributing the regions, and the HRegionserver is responsible for reading and writing region data. The data stored by the existing Hbase unit has no data type, and is byte arrays, so that problems in query can exist if semi-structured data such as JSON is stored. To store JSON format data in the Hbase unit, the entire JSON object is conventionally stored as a string. This approach has the following drawbacks:
when the records are to be filtered, all the records need to be read out and then filtered at the client, and the performance cannot be accepted in the case of large data volume.
When a record needs to be updated, the record needs to be read out, updated according to a specific field, and then rewritten into the Hbase unit for overwriting.
Disclosure of Invention
The embodiment of the invention aims to provide a semi-structured data query method and a distributed NewSQL database system, which can realize data query in a JSON format and solve the problems of poor effect and poor performance when processing semi-structured data.
In order to achieve the above object, an embodiment of the present invention provides a method for querying semi-structured data, which is applicable to a distributed NewSQL database system, and includes:
accessing a user request in an interface mode of JDCB/ODBC, wherein the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
analyzing the user request, compiling and generating a corresponding execution plan;
acquiring index data corresponding to the query condition requested by the user according to an execution plan; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type;
inquiring a data table according to the acquired index data so as to acquire the corresponding inquiry result; wherein the JSON data is stored as a whole;
and returning the query result to the user.
Further, the analyzing the user request, compiling, and generating the corresponding execution plan includes:
judging whether a pre-stored SQL statement corresponding to the SQL request exists in the shared cache pool, if so, outputting an execution plan corresponding to the pre-stored SQL statement, otherwise,
and carrying out syntax check on the SQL request, if the syntax error returns error information to a user, otherwise,
semantic check is carried out on the SQL request, if the semantic error returns error information to the user, otherwise,
carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
selecting a corresponding data connection mode and a corresponding connection sequence according to the selection result of the optimizer;
selecting a searched path according to the connection mode and the connection sequence;
and generating an execution plan according to the search path, and outputting the execution plan.
Correspondingly, an embodiment of the present invention further provides a distributed NewSQL database system, including:
the JDCB/ODBC interface unit is used for carrying out interactive operation with a user, and comprises the steps of receiving a user request and returning a query result to the user; the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
the master unit is used for accessing a user request accessed by the JDCB/ODBC interface unit, coordinating data communication among a plurality of processors and managing the whole flow, and preferentially sending the user request to the SQLPLaner unit; the master unit is also used for returning the query result to the JDCB/ODBC interface unit;
the SQLPLaner unit is used for analyzing the user request, compiling and customizing an execution plan according to the user request;
a worker unit to execute the plan in parallel, comprising: according to an execution plan, starting a coprocessor module to obtain index data corresponding to the query conditions requested by the user, and querying a data table according to the obtained index data so as to obtain the corresponding query result; the Hbase unit is also used for returning the query result of the Hbase unit to the master unit;
the Hbase unit is used for storing the data table and the index table; the Hbase unit further comprises the coprocessor module, JSON type data are added to the bottom layer of the Hbase unit, and the JSON data are stored in the bottom layer HFile in a whole mode;
and the distributed transaction manager is used for coordinating multiple parties to finish distributed transaction management when the worker unit execution plan relates to a transaction.
Further, the JDCB/ODBC interface unit is further configured to convert the user request into an SQL request in the form of an SQL statement.
Further, the SQLPlaner unit is configured to:
judging whether a pre-stored SQL statement corresponding to the SQL request exists in the shared cache pool, if so, outputting an execution plan corresponding to the pre-stored SQL statement, otherwise,
and carrying out syntax check on the SQL request, if the syntax error returns error information to a user, otherwise,
semantic check is carried out on the SQL request, if the semantic error returns error information to the user, otherwise,
carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
selecting a corresponding data connection mode and a corresponding connection sequence according to the selection result of the optimizer;
selecting a searched path according to the connection mode and the connection sequence;
and generating an execution plan according to the search path, and outputting the execution plan.
Further, the method also comprises the following steps:
a monitor for taking charge of metadata management, monitoring a load of a Region of the Hbase unit, and reallocating the Region through a coprocessor module of the Hbase unit; the monitor is connected with the master unit.
Further, the monitoring the load of the Region of the Hbase unit and the reallocating the Region by the coprocessor module of the Hbase unit includes:
receiving data distribution information of the Hbase unit, and receiving load information of the worker unit in the master unit, wherein the load information comprises a load deviation value of the worker unit;
comparing the load deviation value of the worker unit with a preset load deviation threshold, and if the load deviation value is judged to exceed the threshold, triggering the Hbase unit to perform secondary distribution on the Region on the server with higher hit rate and the Region on the server with lower hit rate;
acquiring the data volume of each Region, judging the data volume of each Region and a preset data volume threshold, and triggering the Hbase unit to divide the regions exceeding the preset data volume threshold into two regions if the data volume of the Region is judged to exceed the threshold.
Further, the JDCB/ODBC interface unit includes:
the JDBC application program module is used for receiving the user request, calling the JDBC object method to give an SQL statement and extracting a result to return to the user;
the JDBC driver manager module is used for loading and calling the JDBC driver module for the JDBC application program module;
the JDBC driver module is used for executing the calling of the JDBC object method, sending the SQL statement corresponding to the user request to the bottom database and returning the result obtained from the bottom database to the JDBC application module.
Compared with the prior art, the semi-structured data query method and the distributed NewSQL database system provided by the invention access a user request in an interface mode of JDCB/ODBC, wherein the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition; analyzing the user request, compiling and generating a corresponding execution plan; acquiring index data corresponding to the query condition requested by the user according to an execution plan; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type; inquiring a data table according to the acquired index data so as to acquire the corresponding inquiry result; wherein the JSON data is stored as a whole; the technical scheme of returning the query result to the user can realize data query in a JSON format and solve the problems of poor effect and poor performance when processing semi-structured data.
Drawings
Fig. 1 is a schematic flowchart of a method for semi-structured data query according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a distributed NewSQL database provided in embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flowchart of a method for semi-structured data query according to embodiment 1 of the present invention; the method is suitable for a distributed NewSQL database system, and the embodiment 1 comprises the following steps:
s1, accessing a user request in a JDCB/ODBC interface mode, wherein the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
s2, analyzing the user request, compiling and generating a corresponding execution plan;
s3, acquiring index data corresponding to the query condition requested by the user according to an execution plan; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type;
s4, inquiring a data table according to the acquired index data so as to acquire the corresponding inquiry result; wherein the JSON data is stored as a whole;
and S5, returning the query result to the user.
In the prior art, data stored by Hbase has no data type difference and is a byte array, so that problems exist in the aspect of query if json semi-structured data is stored. To store json format data in hbase, the entire json object would conventionally be stored as a string. This approach has the following drawbacks: when the records are to be filtered, all the records need to be read out and then filtered at the client, and the performance cannot be accepted in the case of large data volume. When a record needs to be updated, the record needs to be read out, updated according to a specific field, and then rewritten to the hbase for overwriting. Particularly, for semi-structured data, the embodiment can support the semi-structured data, and a user can directly store data in a JSON format, query any field of the JSON, create an index and delete the data. The problem of effect and performance are not good when the hbase processes the semi-structured data in the prior art is solved.
Further, step S1 further includes: and converting the user request into an SQL request in an SQL statement form.
Further, the parsing, compiling and generating the corresponding execution plan in step S2 includes:
s21, judging whether the shared cache pool has the pre-stored SQL sentence corresponding to the SQL request, if yes, outputting the execution plan corresponding to the pre-stored SQL sentence, if not,
s22, syntax checking the SQL request, if the syntax error returns error information to the user, otherwise,
s23, semantic checking the SQL request, if the semantic error returns error information to the user, otherwise,
s24, carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
s25, selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
s26, selecting a corresponding data connection mode and a connection sequence according to the result of the optimizer selection;
s27, selecting the searched path according to the connection mode and the connection sequence;
and S28, generating an execution plan according to the search path and outputting the execution plan.
When the method is specifically implemented, a user request is accessed in an interface mode of JDCB/ODBC, and then the user request is analyzed, compiled and a corresponding execution plan is generated; then, according to an execution plan, acquiring index data corresponding to the query condition requested by the user; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type; inquiring a data table according to the acquired index data so as to acquire a corresponding inquiry result; wherein the JSON data is stored as a whole; and finally, returning the query result to the user.
The embodiment can realize data query in JSON format, and solve the problems of poor effect and performance when processing semi-structured data.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a distributed NewSQL database system according to embodiment 2 of the present invention, where the embodiment includes:
the JDCB/ODBC interface unit 1 is used for carrying out interactive operation with a user, and comprises the steps of receiving a user request and returning a query result to the user; the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
the master unit 2 is used for accessing a user request accessed by the JDCB/ODBC interface unit 1, coordinating data communication among a plurality of processors and managing the whole process, and preferentially sending the user request to the SQLPLanner unit 3; the master unit 2 is also used for returning the query result to the JDCB/ODBC interface unit;
the SQLPLaner unit 3 is used for analyzing the user request, compiling and customizing an execution plan according to the user request;
a worker unit 4 for executing the plan in parallel, comprising: according to an execution plan, starting a coprocessor module to obtain index data corresponding to the query conditions requested by the user, and querying a data table according to the obtained index data so as to obtain the corresponding query result; the Hbase unit is also used for returning the query result of the Hbase unit to the master unit 2;
an Hbase unit 6, configured to store the data table and the index table; the Hbase unit 6 further comprises the coprocessor module 61, wherein the bottom layer of the Hbase unit 6 is augmented with JSON type data, which is stored in its entirety in the bottom layer HFile;
generally, the distributed NewSQL database system of the embodiment allows a user to flexibly establish a secondary index according to specific business logic, in practical application, the user often establishes a plurality of secondary indexes, and dynamically calculates the cost of using the indexes according to query conditions during use, and automatically selects the most appropriate index. The query for rowkey is extremely efficient, so the implementation of the secondary index is to generate an index table for data by using the coprocessors module 61 and the Filter module 62 of the hbase unit 6.
And the distributed transaction manager 5 is used for coordinating multiple parties to complete distributed transaction management when the worker unit 4 execution plan relates to a transaction.
Further, the JDCB/ODBC interface unit 1 is further configured to convert the user request into an SQL request in the form of an SQL statement.
Further, the SQLPlaner unit 3 is configured to:
judging whether a pre-stored SQL statement corresponding to the SQL request exists in the shared cache pool, if so, outputting an execution plan corresponding to the pre-stored SQL statement, otherwise,
and carrying out syntax check on the SQL request, if the syntax error returns error information to a user, otherwise,
semantic check is carried out on the SQL request, if the semantic error returns error information to the user, otherwise,
carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
selecting a corresponding data connection mode and a corresponding connection sequence according to the selection result of the optimizer;
selecting a searched path according to the connection mode and the connection sequence;
and generating an execution plan according to the search path, and outputting the execution plan.
Further, this embodiment further includes:
a monitor 8 for taking charge of metadata management, monitoring the load of Region of the Hbase unit, and reallocating the Region by the coprocessors module 61 of the Hbase unit 6; the monitor is connected with the master unit.
Further, the monitoring the load of the Region of the Hbase unit 6, and the reallocating the Region by the coprocessor module of the Hbase unit 6 includes:
receiving data distribution information of the Hbase unit 6, and receiving load information of the worker unit 4 in the master unit 2, wherein the load information comprises a load deviation value of the worker unit 4;
comparing the load deviation value of the worker unit 4 with a preset load deviation threshold, and if the load deviation value is judged to exceed the threshold, triggering the Hbase unit 6 to distribute the Region on the server with higher hit rate and the Region on the server with lower hit rate;
acquiring the data volume of each Region, judging the data volume of each Region and a preset data volume threshold, and triggering the Hbase unit 6 to divide the regions exceeding the preset data volume threshold into two regions if the data volume of the Region is judged to exceed the threshold.
Further, the JDCB/ODBC interface unit 1 includes:
the JDBC application program module 11 is used for receiving a user request, calling a JDBC object method to give an SQL statement, and extracting a result to return to a user;
a JDBC driver manager module 12, configured to load and call a JDBC driver module 13 for the JDBC application module 11;
the JDBC driver module 13 is configured to execute the invocation of the JDBC object method, send an SQL statement corresponding to the user request to the underlying database, and return a result obtained from the underlying database to the JDBC application module 11.
When the method is implemented specifically, firstly, a user request is received through the JDCB/ODBC interface unit 1; then, the master unit 2 accesses the user request accessed by the JDCB/ODBC interface unit 1, coordinates data communication among a plurality of processors and manages the whole process, and preferentially sends the user request to the SQLPLaner unit; then, the SQLPLaner unit 3 analyzes the user request, and compiles and customizes an execution plan according to the user request; then, the worker unit 4 executes the plan in parallel, the coprocessors module 61 of the Hbase unit 6 is started to obtain index data corresponding to the query condition requested by the user, and a data table is queried according to the obtained index data, so as to obtain the corresponding query result; JSON type data are added to the bottom layer of the Hbase unit 6, and the JSON data are integrally stored in the bottom layer HFile; and finally, returning the query result of the Hbase unit 6 to the master unit, and returning the query result written with the JSON data to the JDCB/ODBC interface unit through the master unit 2 so as to return to the user.
The distributed NewSQL database system of the embodiment can realize data query in a JSON format, and solves the problems of poor effect and performance when processing semi-structured data.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (5)

1. A method for semi-structured data query is applicable to a distributed NewSQL database system and comprises the following steps:
accessing a user request in an interface mode of JDCB/ODBC, and converting the user request into an SQL request in an SQL statement form, wherein the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
analyzing the user request, compiling and generating a corresponding execution plan; in particular, the method comprises the following steps of,
judging whether a pre-stored SQL statement corresponding to the SQL request exists in the shared cache pool, if so, outputting an execution plan corresponding to the pre-stored SQL statement, otherwise,
syntax checking is carried out on the SQL request, if syntax errors return error information to a user, if syntax checking has no errors, then,
semantic check is carried out on the SQL request, if the semantic error returns error information to a user, if the semantic check has no error,
carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
selecting a corresponding data connection mode and a corresponding connection sequence according to the selection result of the optimizer;
selecting a searched path according to the connection mode and the connection sequence;
generating an execution plan according to the search path, and outputting the execution plan;
acquiring index data corresponding to the query condition requested by the user according to an execution plan; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type;
inquiring a data table according to the acquired index data so as to acquire the corresponding inquiry result; wherein the JSON data is stored as a whole;
and returning the query result to the user.
2. A distributed NewSQL database system, comprising:
the JDCB/ODBC interface unit is used for carrying out interactive operation with a user, and comprises the steps of receiving a user request, converting the user request into an SQL request in an SQL statement form, and returning a query result to the user; the user request comprises a query condition of JSON data to be queried, and the query result is the JSON data obtained according to the query condition;
the master unit is used for accessing a user request accessed by the JDCB/ODBC interface unit, coordinating data communication among a plurality of processors and managing the whole flow, and preferentially sending the user request to the SQLPLaner unit; the master unit is also used for returning the query result to the JDCB/ODBC interface unit;
the SQLPLaner unit is used for analyzing the user request, compiling and customizing an execution plan according to the user request; in particular, the method comprises the following steps of,
judging whether a pre-stored SQL statement corresponding to the SQL request exists in the shared cache pool, if so, outputting an execution plan corresponding to the pre-stored SQL statement, otherwise,
syntax checking is carried out on the SQL request, if syntax errors return error information to a user, if syntax checking has no errors, then,
semantic check is carried out on the SQL request, if the semantic error returns error information to a user, if the semantic check has no error,
carrying out view and expression conversion on the SQL request to obtain a corresponding conversion result;
selecting an optimizer according to the conversion result to obtain a corresponding optimizer selection result;
selecting a corresponding data connection mode and a corresponding connection sequence according to the selection result of the optimizer;
selecting a searched path according to the connection mode and the connection sequence;
generating an execution plan according to the search path, and outputting the execution plan;
a worker unit to execute the plan in parallel, comprising: according to an execution plan, starting a coprocessor module to obtain index data corresponding to the query conditions requested by the user, and querying a data table according to the obtained index data so as to obtain the corresponding query result; the Hbase unit is also used for returning the query result of the Hbase unit to the master unit;
the Hbase unit is used for storing the data table and the index table; wherein, the index table stores the index data in the form of inverted index generated by the JSON data as a nested type; the Hbase unit further comprises the coprocessor module, JSON type data are added to the bottom layer of the Hbase unit, and the JSON data are stored in the bottom layer HFile in a whole mode;
and the distributed transaction manager is used for coordinating multiple parties to finish distributed transaction management when the worker unit execution plan relates to a transaction.
3. The distributed NewSQL database system according to claim 2, further comprising:
a monitor for taking charge of metadata management, monitoring a load of a Region of the Hbase unit, and reallocating the Region through a coprocessor module of the Hbase unit; the monitor is connected with the master unit.
4. The distributed NewSQL database system according to claim 3, wherein the monitoring of the loading of the Region of the Hbase unit and the redistribution of the Region by the coprocessors module of the Hbase unit includes:
receiving data distribution information of the Hbase unit, and receiving load information of the worker unit in the master unit, wherein the load information comprises a load deviation value of the worker unit;
comparing the load deviation value of the worker unit with a preset load deviation threshold, and if the load deviation value is judged to exceed the threshold, triggering the Hbase unit to perform secondary distribution on the Region on the server with higher hit rate and the Region on the server with lower hit rate;
acquiring the data volume of each Region, judging the data volume of each Region and a preset data volume threshold, and triggering the Hbase unit to divide the regions exceeding the preset data volume threshold into two regions if the data volume of the Region is judged to exceed the threshold.
5. The distributed NewSQL database system of claim 2, wherein the JDCB/ODBC interface unit comprises:
the JDBC application program module is used for receiving the user request, calling the JDBC object method to give an SQL statement and extracting a result to return to the user;
the JDBC driver manager module is used for loading and calling the JDBC driver module for the JDBC application program module;
the JDBC driver module is used for executing the calling of the JDBC object method, sending the SQL statement corresponding to the user request to the bottom database and returning the result obtained from the bottom database to the JDBC application module.
CN201710580416.9A 2016-09-21 2017-07-17 Semi-structured data query method and distributed NewSQL database system Expired - Fee Related CN107291947B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610842399.7A CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method
CN2016108423997 2016-09-21

Publications (2)

Publication Number Publication Date
CN107291947A CN107291947A (en) 2017-10-24
CN107291947B true CN107291947B (en) 2020-03-10

Family

ID=58166840

Family Applications (24)

Application Number Title Priority Date Filing Date
CN201610842399.7A Pending CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method
CN201710580423.9A Active CN107402987B (en) 2016-09-21 2017-07-17 Full-text retrieval method and distributed NewSQL database system
CN201710581291.1A Expired - Fee Related CN107463637B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data storage method
CN201710581195.7A Expired - Fee Related CN107451220B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system
CN201710580431.3A Active CN107491485B (en) 2016-09-21 2017-07-17 Method for generating execution plan, plan unit device and distributed NewSQ L database system
CN201710580456.3A Expired - Fee Related CN107402988B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and semi-structured data query method
CN201710580739.8A Expired - Fee Related CN107402990B (en) 2016-09-21 2017-07-17 Distributed New SQL database system and semi-structured data storage method
CN201710581193.8A Expired - Fee Related CN107451219B (en) 2016-09-21 2017-07-17 Method for analyzing second index and distributed New SQL database
CN201710581237.7A Expired - Fee Related CN107463635B (en) 2016-09-21 2017-07-17 Method for inquiring picture data and distributed NewSQL database system
CN201710580416.9A Expired - Fee Related CN107291947B (en) 2016-09-21 2017-07-17 Semi-structured data query method and distributed NewSQL database system
CN201710581273.3A Expired - Fee Related CN107451221B (en) 2016-09-21 2017-07-17 Database interface unit device and distributed NewSQL database system
CN201710580403.1A Expired - Fee Related CN107368575B (en) 2016-09-21 2017-07-17 Load-balanced distributed NewSQL database system
CN201710580791.3A Active CN107291948B (en) 2016-09-21 2017-07-17 Access method of distributed newSQL database
CN201710580435.1A Expired - Fee Related CN107480198B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval method
CN201710580752.3A Expired - Fee Related CN107247808B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data query method
CN201710581229.2A Expired - Fee Related CN107491345B (en) 2016-09-21 2017-07-17 Method for writing picture data and distributed NewSQ L database system
CN201710580720.3A Expired - Fee Related CN107402989B (en) 2016-09-21 2017-07-17 Full-text retrieval establishing method and distributed NewSQL database system
CN201710580754.2A Expired - Fee Related CN107402991B (en) 2016-09-21 2017-07-17 Method for writing semi-structured data and distributed NewSQL database system
CN201710581275.2A Active CN107329837B (en) 2016-09-21 2017-07-17 Load balancing method and unit and distributed NewSQL database system
CN201710580417.3A Expired - Fee Related CN107463632B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data query method
CN201710580796.6A Expired - Fee Related CN107402992B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval establishing method
CN201710585103.2A Expired - Fee Related CN107402995B (en) 2016-09-21 2017-07-17 Distributed newSQL database system and method
CN201710581256.XA Expired - Fee Related CN107391653B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data storage method
CN201710580794.7A Expired - Fee Related CN107451214B (en) 2016-09-21 2017-07-17 Non-primary key query method and distributed NewSQL database system

Family Applications Before (9)

Application Number Title Priority Date Filing Date
CN201610842399.7A Pending CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method
CN201710580423.9A Active CN107402987B (en) 2016-09-21 2017-07-17 Full-text retrieval method and distributed NewSQL database system
CN201710581291.1A Expired - Fee Related CN107463637B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data storage method
CN201710581195.7A Expired - Fee Related CN107451220B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system
CN201710580431.3A Active CN107491485B (en) 2016-09-21 2017-07-17 Method for generating execution plan, plan unit device and distributed NewSQ L database system
CN201710580456.3A Expired - Fee Related CN107402988B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and semi-structured data query method
CN201710580739.8A Expired - Fee Related CN107402990B (en) 2016-09-21 2017-07-17 Distributed New SQL database system and semi-structured data storage method
CN201710581193.8A Expired - Fee Related CN107451219B (en) 2016-09-21 2017-07-17 Method for analyzing second index and distributed New SQL database
CN201710581237.7A Expired - Fee Related CN107463635B (en) 2016-09-21 2017-07-17 Method for inquiring picture data and distributed NewSQL database system

Family Applications After (14)

Application Number Title Priority Date Filing Date
CN201710581273.3A Expired - Fee Related CN107451221B (en) 2016-09-21 2017-07-17 Database interface unit device and distributed NewSQL database system
CN201710580403.1A Expired - Fee Related CN107368575B (en) 2016-09-21 2017-07-17 Load-balanced distributed NewSQL database system
CN201710580791.3A Active CN107291948B (en) 2016-09-21 2017-07-17 Access method of distributed newSQL database
CN201710580435.1A Expired - Fee Related CN107480198B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval method
CN201710580752.3A Expired - Fee Related CN107247808B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data query method
CN201710581229.2A Expired - Fee Related CN107491345B (en) 2016-09-21 2017-07-17 Method for writing picture data and distributed NewSQ L database system
CN201710580720.3A Expired - Fee Related CN107402989B (en) 2016-09-21 2017-07-17 Full-text retrieval establishing method and distributed NewSQL database system
CN201710580754.2A Expired - Fee Related CN107402991B (en) 2016-09-21 2017-07-17 Method for writing semi-structured data and distributed NewSQL database system
CN201710581275.2A Active CN107329837B (en) 2016-09-21 2017-07-17 Load balancing method and unit and distributed NewSQL database system
CN201710580417.3A Expired - Fee Related CN107463632B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data query method
CN201710580796.6A Expired - Fee Related CN107402992B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval establishing method
CN201710585103.2A Expired - Fee Related CN107402995B (en) 2016-09-21 2017-07-17 Distributed newSQL database system and method
CN201710581256.XA Expired - Fee Related CN107391653B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data storage method
CN201710580794.7A Expired - Fee Related CN107451214B (en) 2016-09-21 2017-07-17 Non-primary key query method and distributed NewSQL database system

Country Status (1)

Country Link
CN (24) CN106446153A (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391744B (en) * 2017-08-10 2020-06-16 东软集团股份有限公司 Data storage method, data reading method, data storage device, data reading device and equipment
CN107480260B (en) * 2017-08-16 2021-02-23 北京奇虎科技有限公司 Big data real-time analysis method and device, computing equipment and computer storage medium
CN107688660B (en) * 2017-09-08 2020-03-13 上海达梦数据库有限公司 Parallel execution plan execution method and device
CN107766572A (en) * 2017-11-13 2018-03-06 北京国信宏数科技有限责任公司 Distributed extraction and visual analysis method and system based on economic field data
CN108228750A (en) * 2017-12-21 2018-06-29 浪潮软件股份有限公司 A kind of distributed data base and its method that data are managed
CN108038215A (en) * 2017-12-22 2018-05-15 上海达梦数据库有限公司 Data processing method and system
CN109992409B (en) * 2018-01-02 2021-07-30 ***通信有限公司研究院 Method, device and system for segmenting data storage area, electronic equipment and medium
CN108829507B (en) * 2018-03-30 2019-07-26 北京百度网讯科技有限公司 The resource isolation method, apparatus and server of distributed data base system
CN108664616A (en) * 2018-05-14 2018-10-16 浪潮软件集团有限公司 ROWID-based Oracle data batch acquisition method
CN108846044A (en) * 2018-05-30 2018-11-20 浪潮软件股份有限公司 A kind of map application dispositions method and device
CN108920519A (en) * 2018-06-04 2018-11-30 贵州数据宝网络科技有限公司 One-to-many data supply system and method
CN109033209B (en) * 2018-06-29 2021-12-31 新华三大数据技术有限公司 Spark storage process processing method and device
CN109241076A (en) * 2018-08-01 2019-01-18 上海依图网络科技有限公司 A kind of data query method and device
CN109271428A (en) * 2018-09-11 2019-01-25 北京市计算中心 Data pick-up method and method for exhibiting data based on geography information
CN109408591B (en) * 2018-10-12 2021-11-09 北京聚云位智信息科技有限公司 Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering
CN109298976B (en) * 2018-10-17 2022-04-12 成都索贝数码科技股份有限公司 Heterogeneous database cluster backup system and method
CN109408515A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 A kind of index execution method and apparatus
CN109684412A (en) * 2018-12-25 2019-04-26 成都虚谷伟业科技有限公司 A kind of distributed data base system
CN109726250B (en) * 2018-12-27 2020-01-17 星环信息科技(上海)有限公司 Data storage system, metadata database synchronization method and data cross-domain calculation method
CN111488340B (en) * 2019-01-29 2023-09-12 菜鸟智能物流控股有限公司 Data processing method and device and electronic equipment
CN110046161A (en) * 2019-03-18 2019-07-23 平安普惠企业管理有限公司 Method for writing data and device, storage medium, electronic equipment
CN110086602B (en) * 2019-04-16 2022-02-11 上海交通大学 Rapid implementation method of SM3 password hash algorithm based on GPU
CN110110234B (en) * 2019-05-13 2020-10-16 重庆天蓬网络有限公司 Big data real-time searching system and method
CN110275901B (en) * 2019-06-25 2021-08-24 北京创鑫旅程网络技术有限公司 Cache data calling method and device
CN110457363B (en) * 2019-07-05 2023-11-21 中国平安人寿保险股份有限公司 Query method, device and storage medium based on distributed database
CN110413642B (en) * 2019-08-02 2022-05-27 北京快立方科技有限公司 Application-unaware fragmentation database parsing and optimizing method
CN110569257B (en) * 2019-09-16 2022-04-01 上海达梦数据库有限公司 Data processing method, corresponding device, equipment and storage medium
CN110704437B (en) * 2019-09-26 2022-05-20 上海达梦数据库有限公司 Method, device, equipment and storage medium for modifying database query statement
CN112688976A (en) * 2019-10-17 2021-04-20 广州迈安信息科技有限公司 Data processing transmission service system adopting JDBC/HTTP standard
CN110888919B (en) * 2019-12-04 2023-06-30 阳光电源股份有限公司 HBase-based method and device for statistical analysis of big data
CN113032479A (en) * 2019-12-24 2021-06-25 上海昂创信息技术有限公司 HBase non-primary key indexing method and HBase system
CN111309581B (en) * 2020-02-28 2023-09-12 中国工商银行股份有限公司 Application performance detection method and device in database upgrading scene
CN111651453B (en) * 2020-04-30 2024-02-06 中国平安财产保险股份有限公司 User history behavior query method and device, electronic equipment and storage medium
CN113760960A (en) * 2020-06-01 2021-12-07 北京搜狗科技发展有限公司 Information generation method and device for generating information
CN111797112B (en) * 2020-06-05 2022-04-01 武汉大学 PostgreSQL preparation statement execution optimization method
CN113806611A (en) * 2020-06-17 2021-12-17 海信集团有限公司 Method and equipment for storing search engine results
CN111930705B (en) * 2020-07-07 2023-03-14 中国电子科技集团公司电子科学研究院 Binary message protocol data processing method and device
CN112148792B (en) * 2020-09-16 2024-04-12 鹏城实验室 Partition data adjustment method, system and terminal based on HBase
CN112052347B (en) * 2020-10-09 2024-06-04 北京百度网讯科技有限公司 Image storage method and device and electronic equipment
CN112416925B (en) * 2020-11-02 2024-04-09 浙商银行股份有限公司 Query method based on ordered distributed index structure and distributed database system
CN112364033B (en) * 2021-01-13 2021-04-13 北京云真信科技有限公司 Data retrieval system
CN113760900A (en) * 2021-02-19 2021-12-07 西安京迅递供应链科技有限公司 Method and device for real-time data summarization and interval summarization
CN112905615B (en) * 2021-03-02 2023-03-24 浪潮云信息技术股份公司 Distributed consistency protocol submission method and system based on sequence verification
CN112925841B (en) * 2021-03-26 2022-11-08 瀚高基础软件股份有限公司 Distributed JDBC implementation method, device and computer-readable storage medium
CN113407662B (en) * 2021-08-19 2021-12-14 深圳市明源云客电子商务有限公司 Sensitive word recognition method, system and computer readable storage medium
CN113742370B (en) * 2021-11-02 2022-04-19 阿里云计算有限公司 Data query method and statistical information ciphertext generation method of full-encryption database
CN115129724A (en) * 2022-08-29 2022-09-30 畅捷通信息技术股份有限公司 Statistical report paging method, system, equipment and medium
CN116861455B (en) * 2023-06-25 2024-04-26 上海数禾信息科技有限公司 Event data processing method, system, electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477568A (en) * 2009-02-12 2009-07-08 清华大学 Integrated retrieval method for structured data and non-structured data
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN105518676A (en) * 2013-07-31 2016-04-20 甲骨文国际公司 Generic sql enhancement to query any semi-structured data and techniques to efficiently support such enhancements

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567006B (en) * 2009-05-25 2012-07-04 中兴通讯股份有限公司 Database system and distributed SQL statement execution plan reuse method
CN102163195B (en) * 2010-02-22 2013-04-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN102375853A (en) * 2010-08-24 2012-03-14 ***通信集团公司 Distributed database system, method for building index therein and query method
CN102201010A (en) * 2011-06-23 2011-09-28 清华大学 Distributed database system without sharing structure and realizing method thereof
CN102289482A (en) * 2011-08-02 2011-12-21 北京航空航天大学 Unstructured data query method
CN103150304B (en) * 2011-12-06 2016-11-23 郑红云 Cloud Database Systems
CN103577407B (en) * 2012-07-19 2016-10-12 国际商业机器公司 Querying method and inquiry unit for distributed data base
US20140074860A1 (en) * 2012-09-12 2014-03-13 Pingar Holdings Limited Disambiguator
CN102902932B (en) * 2012-09-18 2015-12-02 武汉华工安鼎信息技术有限责任公司 The using method of the outside encrypting and deciphering system of the database based on SQL rewrite
CN103092970A (en) * 2013-01-24 2013-05-08 华为技术有限公司 Database operation method and device
US9773021B2 (en) * 2013-01-30 2017-09-26 Hewlett-Packard Development Company, L.P. Corrected optical property value-based search query
CN103377292B (en) * 2013-07-02 2017-02-15 华为技术有限公司 Database result set caching method and device
CN103473321A (en) * 2013-09-12 2013-12-25 华为技术有限公司 Database management method and system
CN103984726B (en) * 2014-05-16 2017-03-29 上海新炬网络信息技术有限公司 A kind of local correction method of data base's implement plan
CN104133858B (en) * 2014-07-15 2017-08-01 武汉邮电科学研究院 Intelligence analysis system with double engines and method based on row storage
CN104503985A (en) * 2014-12-03 2015-04-08 浪潮电子信息产业股份有限公司 Method for automatically creating Solr index file by Hbase data
CN104572895B (en) * 2014-12-24 2018-02-23 天津南大通用数据技术股份有限公司 MPP databases and Hadoop company-datas interoperability methods, instrument and implementation method
CN104731922A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 System and method for rapidly retrieving structural data based on distributed type database HBase
CN104750815B (en) * 2015-03-30 2017-11-03 浪潮集团有限公司 The storage method and device of a kind of Lob data based on HBase
CN104731945B (en) * 2015-03-31 2018-04-06 浪潮集团有限公司 A kind of text searching method and device based on HBase
CN105389375B (en) * 2015-11-18 2018-10-02 福建师范大学 A kind of image index setting method, system and search method based on visible range
CN105740410A (en) * 2016-01-29 2016-07-06 浪潮电子信息产业股份有限公司 Data statistics method based on Hbase secondary index

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477568A (en) * 2009-02-12 2009-07-08 清华大学 Integrated retrieval method for structured data and non-structured data
CN105518676A (en) * 2013-07-31 2016-04-20 甲骨文国际公司 Generic sql enhancement to query any semi-structured data and techniques to efficiently support such enhancements
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Apache Phoenix;@ApachePhoenix;《Apache.org》;20141029;第1-59页 *
Phoenix;James Taylor;《Apache.org》;20141029;第42、88-94页 *
基于HBase的全文索引及检索技术的研究;吴国泉;《万方数据库学位论文》;20160504 *

Also Published As

Publication number Publication date
CN107368575A (en) 2017-11-21
CN107402989B (en) 2020-10-27
CN107451221A (en) 2017-12-08
CN107451214B (en) 2020-05-19
CN107402987B (en) 2020-04-03
CN107402992A (en) 2017-11-28
CN107463632A (en) 2017-12-12
CN107391653B (en) 2020-05-19
CN107391653A (en) 2017-11-24
CN107451220A (en) 2017-12-08
CN107491345A (en) 2017-12-19
CN107247808A (en) 2017-10-13
CN107402989A (en) 2017-11-28
CN107402990A (en) 2017-11-28
CN107402988B (en) 2020-01-03
CN107463635A (en) 2017-12-12
CN107402995B (en) 2020-06-09
CN107463635B (en) 2020-09-25
CN107402987A (en) 2017-11-28
CN107491485B (en) 2020-08-04
CN107463632B (en) 2020-06-09
CN106446153A (en) 2017-02-22
CN107491345B (en) 2020-08-04
CN107451221B (en) 2020-09-04
CN107402995A (en) 2017-11-28
CN107402990B (en) 2020-06-09
CN107480198A (en) 2017-12-15
CN107480198B (en) 2020-05-19
CN107291948B (en) 2020-05-19
CN107451214A (en) 2017-12-08
CN107402988A (en) 2017-11-28
CN107463637A (en) 2017-12-12
CN107491485A (en) 2017-12-19
CN107451220B (en) 2020-06-09
CN107291947A (en) 2017-10-24
CN107329837B (en) 2020-06-09
CN107402991B (en) 2020-05-19
CN107402992B (en) 2020-06-09
CN107451219B (en) 2020-06-09
CN107247808B (en) 2020-01-10
CN107463637B (en) 2020-05-19
CN107451219A (en) 2017-12-08
CN107291948A (en) 2017-10-24
CN107329837A (en) 2017-11-07
CN107368575B (en) 2020-06-09
CN107402991A (en) 2017-11-28

Similar Documents

Publication Publication Date Title
CN107291947B (en) Semi-structured data query method and distributed NewSQL database system
US20230376487A1 (en) Processing database queries using format conversion
KR102157925B1 (en) Data query method and apparatus
JP5819376B2 (en) A column smart mechanism for column-based databases
KR101621137B1 (en) Low latency query engine for apache hadoop
JP2021511582A (en) Dimensional context propagation technology for optimizing SQL query plans
US10970300B2 (en) Supporting multi-tenancy in a federated data management system
US20160140205A1 (en) Queries involving multiple databases and execution engines
US20130311454A1 (en) Data source analytics
CN104123374A (en) Method and device for aggregate query in distributed databases
US7788275B2 (en) Customization of relationship traversal
US10157234B1 (en) Systems and methods for transforming datasets
CN105164673A (en) Query integration across databases and file systems
US11354313B2 (en) Transforming a user-defined table function to a derived table in a database management system
US10997170B2 (en) Local database cache
CN116739336A (en) Power grid disaster early warning method and system based on multi-source heterogeneous data fusion model
CN107368477B (en) HBase coprocessor-based SQL-like query method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200212

Address after: Room 5303, 1023 Gaopu Road, Tianhe Software Park, Tianhe District, Guangzhou City, Guangdong 510000

Applicant after: Yunrun Da Data Service Co.,Ltd.

Address before: 510000 Yuexiu District, Guangzhou Province, north of the text of the text of the North Road, No. 68, the east wing of the text of the building on the ground floor, No. six, No. 602, No.

Applicant before: GUANGZHOU TEDAO INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200310

Termination date: 20210717