CN107402995B - Distributed newSQL database system and method - Google Patents

Distributed newSQL database system and method Download PDF

Info

Publication number
CN107402995B
CN107402995B CN201710585103.2A CN201710585103A CN107402995B CN 107402995 B CN107402995 B CN 107402995B CN 201710585103 A CN201710585103 A CN 201710585103A CN 107402995 B CN107402995 B CN 107402995B
Authority
CN
China
Prior art keywords
data
index
hbase
master
task executor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710585103.2A
Other languages
Chinese (zh)
Other versions
CN107402995A (en
Inventor
张中弦
谭恒亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunrun Da Data Service Co ltd
Original Assignee
Yunrun Da Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunrun Da Data Service Co ltd filed Critical Yunrun Da Data Service Co ltd
Publication of CN107402995A publication Critical patent/CN107402995A/en
Application granted granted Critical
Publication of CN107402995B publication Critical patent/CN107402995B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • G06F16/24545Selectivity estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5022Workload threshold

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Operations Research (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a distributed newSQL database system and a method, wherein the system comprises a database interface, a Master and a database server, wherein the database interface is used for sending a request to the Master by a user and receiving a result returned by the Master; the Master is used for accessing the user request in a JDBC and ODBC mode, coordinating data communication among multiple parties and managing the whole flow, and preferentially sending the user request to the SQLPLaner; the SQLPLaner is used for analyzing the user request, compiling and customizing an execution plan; the distributed transaction manager is used for coordinating multiple parties in the plan to finish distributed transaction management; and the parallel task executor is used for executing tasks in charge of the plan in parallel and merging and summarizing the data obtained from the database to return to the master. The invention optimizes the transaction operation under the high concurrency condition, and supports distributed transaction, semi-structured data, full-text retrieval and efficient picture storage.

Description

Distributed newSQL database system and method
Technical Field
The invention relates to the technical field of databases, in particular to a distributed newSQL database system.
Background
The HBase, namely Hadoop Database, is a distributed storage system with high reliability, high performance, orientation and scalability, and a large-scale structured storage cluster can be built on a cheap PC Server by utilizing the HBase technology. Hbase has become one of the most widely used distributed NoSQL databases at present, but as more and more applications attempt to migrate to HBase, defects of HBase are more and more exposed, mainly including
The use cost is high: the user needs to access the HBase through API programming, and the use cost of complex application is too high; the standard JDBC/ODBC interface is not supported, and the ETL process is very complicated; the use cost is too high to directly cause that many more complex applications cannot use HBase.
Non-primary key queries cannot be supported efficiently: in practical application, a user often needs to perform multi-dimensional query, and the HBase cannot effectively support non-primary key query.
Only a single line transaction is supported: in practical applications, transactions often involve multiple rows of data in multiple tables, and the single-row transaction provided by HBase cannot meet application requirements.
Semi-structured data cannot be supported efficiently: the data model of HBase is completely structured, and cannot effectively support semi-structured form data (such as JSON).
Picture storage cannot be effectively supported: in the fields of public security, transportation and the like, users often need to store a large amount of picture data, the size of a typical picture is between 500K and 2MB, and practice proves that HBase cannot effectively meet the storage requirement of picture types.
The usability is low: each region of the HBase provides services on one HRegionserver at the same time, when HRegionserver failure goes down, data corresponding to all the regions on the HRegionserver are temporarily unavailable until a fault tolerance mechanism redistributes the regions to other HRegionservers, and therefore the availability of the HBase is insufficient to meet the requirement of general online services.
Disclosure of Invention
The invention provides a distributed newSQL database system, which realizes complex business logic, meets the requirement of non-primary key query, optimizes transaction operation under the condition of high concurrency, and supports distributed transactions, semi-structured data, full-text retrieval and efficient picture storage.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a distributed newSQL database system comprises
The database interface is used for sending a request to the Master by a user and receiving a result returned by the Master;
the Master is used for accessing user requests in a JDBC and ODBC mode, coordinating data communication among a plurality of processors and managing the whole flow, and preferentially sending the user requests to the SQLPLaner;
the SQLPLaner is used for analyzing the user request, compiling and customizing an execution plan;
the distributed transaction manager is used for coordinating multiple parties in the plan to finish distributed transaction management;
and the parallel task executor is used for executing tasks in charge of the plan in parallel and merging and summarizing the data obtained from the database to return to the master.
Further improvements to the above scheme are as follows
And the parallel task executor acquires data from the database through the hbase and the search engine server.
The master is connected with a monitor and is used for being responsible for metadata management and monitoring the load of the underlying hbase regions, avoiding that the load of a specific Region is too high, and redistributing the regions by using the hbase subprocessor.
The invention also provides a method for generating an execution plan by utilizing the SQLPLaner of the distributed newSQL database system, which comprises the following steps
Inputting SQL sentences through the database interface;
judging whether the SQL already exists in the shared cache pool, if so, outputting an execution plan corresponding to the SQL;
otherwise, carrying out syntax check and semantic check on the SQL statement, and carrying out view and expression conversion on the SQL statement after the syntax check and the semantic check are passed;
carrying out optimizer selection according to the conversion result;
selecting a data connection mode and a connection sequence according to a selection result of the optimizer;
selecting a search path according to the connection mode and the connection sequence;
and generating an execution plan according to the search path and outputting the execution plan.
The invention also provides a method for establishing and querying a plurality of secondary indexes by using the distributed newSQL database system, which comprises the following steps
Generating an index table aiming at data by using a Coprocessor and a Filter of the hbase; the coprocessors write index data into the index table in parallel in a reverse index mode according to the index definition, and therefore a plurality of secondary indexes are established;
the Master dynamically calculates the cost of using the index according to the query condition; the coprocessors can firstly inquire the index table according to the index definition and the inquiry condition, and parallelly inquire the data table again through the inquiry result of the index table.
The invention also provides a method for realizing semi-structured data access by using the distributed newSQL database system, which comprises the following steps
The parallel task executor writes JSON data as a common character string type as a whole into the data table of the hbase as a field; the copessor in the hbase extracts data in the JSON according to the field description, writes the index data into another hbase index table in an inverted index mode, and completes the storage of semi-structured data;
the parallel task executor queries an index table in parallel by using a coprocessor according to query conditions; the index coprocessors in the hbase return the index ID of the index table to the parallel task executor; and the parallel task executor queries a data table by utilizing an API (application programming interface) of the hbase according to the index ID, returns a result and finishes obtaining the semi-structured data.
The invention also provides a method for realizing the picture data access by utilizing the distributed newSQL database system, which is characterized by comprising the following steps
The parallel task executor generates image data into an image data format encrypted by an information summary algorithm, and writes the encrypted image data into an original data table; the parallel task executor writes the encrypted picture data into a picture data table for independent storage;
the parallel task executor queries an original data table according to query conditions to obtain image data encrypted by an information abstract algorithm; and the parallel task executor queries a picture data table by using the API of the hbase according to the encrypted image data to acquire picture data.
As an improvement of the above scheme, the hbase bottom layer adds an LOB type, establishes an alternative index for the LOB type, stores large object picture data as a bitmap in the database, stores the picture data in an independent data table as the bitmap, and stores only an index ID in an original data table.
The invention also provides a method for realizing full-text retrieval by utilizing the distributed newSQL database system, which comprises the following steps
The parallel task executor writes fields needing full-text retrieval into a data table of the hbase as common character string types for storage, and a coprocessor in the hbase writes data into a search engine server for indexing according to the description of the fields;
the parallel task executor queries a specific index ID from the search engine server according to a query condition, the search engine server returns the index ID according to the query condition, and the parallel task executor queries a data table by using the API of the hbase according to the index ID to acquire query data.
Advantageous effects
The distributed NewSQL database system provided by the invention provides a brand new mode covering the large data storage and high-speed read-write capability of the hbase, and simultaneously solves the practical application problem that the hbase cannot be considered at the same time. The system supports SQL, supports JDBC/ODBC, supports SQL through an interactive analysis engine UrunSQL, and can realize complex business logic on the system by compiling SQL by a user, thereby greatly reducing the use cost; the JDBC/ODBC interface is supported, and the ETL process is greatly simplified.
The distributed NewSQL database system supports the secondary index, efficiently solves the non-primary key query requirement, allows a user to flexibly establish the secondary index according to specific service logic, often establishes a plurality of secondary indexes in practical application, dynamically calculates the cost of using the index according to the query condition, and automatically selects the most appropriate index.
The distributed NewSQL database system supports distributed transactions, cross-row and cross-table distributed transactions, supports complete ACID transaction semantics, optimizes transaction operation under high concurrency conditions and can meet most OLTP applications.
The distributed NewSQL database system supports semi-structured data and JSON data format, and a user can directly store the data in the JSON format in a database connected with the hbase of the system, inquire any field of the JSON, create indexes and delete and modify the data.
The distributed NewSQL database system supports full-text retrieval, supports the distributed full-text retrieval through the Solr, and can enable a user to create a full-text index for a table of the user and search in the SQL by using a full-text retrieval syntax.
The distributed NewSQL database system supports efficient picture storage, LOB storage is added on the hbase bottom layer of the system, the LOB can efficiently meet the binary storage requirement that the size of a single piece of data is hundreds of K to 10M, and a user can meet the picture storage requirement through the LOB.
The distributed NewSQL database system has high availability, allows a plurality of copies to be maintained for the region at the same time, and the region multi-copy mechanism can ensure that the reading service is not influenced at all when the region server is down, the second-level recovery of the writing service is realized, the availability is effectively improved, and the requirement of the online service is met.
Drawings
Fig. 1 is a schematic structural diagram of a distributed newSQL database system according to embodiment 1 of the present invention;
fig. 2 is a flowchart of a method for generating an execution plan by the distributed newSQL database system SQLPlaner according to embodiment 2 of the present invention;
FIG. 3 is a flowchart of a method for establishing and querying a plurality of secondary indexes in a distributed newSQL database system according to embodiment 3 of the present invention;
fig. 4 is a flowchart of a method for implementing semi-structured data access by a distributed newSQL database system according to embodiment 4 of the present invention;
fig. 5 is a flowchart of a method for implementing picture data access by a distributed newSQL database system according to embodiment 5 of the present invention;
fig. 6 is a flowchart of a method for implementing full-text retrieval by a distributed newSQL database system according to embodiment 6 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, it is a schematic structural diagram of a distributed newSQL database system, also called an unrubate database system, according to embodiment 1 of the present invention, and includes:
the database interface, namely JDBC \ ODBC, is used for sending a request to the Master by a user and receiving a result returned by the Master;
the Master is used for accessing user requests in a JDBC and ODBC mode, coordinating data communication among a plurality of processors and managing the whole flow, and preferentially sending the user requests to the SQLPLaner;
the SQLPLaner is used for analyzing the user request, compiling and customizing an execution plan;
the DTM, namely a distributed transaction manager, is used for coordinating multiple parties in the plan to finish distributed transaction management;
the Worker is a parallel task executor and is used for executing tasks of the plan in parallel and merging and summarizing data obtained from the database to return to the master;
the hbase and the search engine server Solr are respectively connected with different workers, and the workers acquire data from the database through the hbase and the search engine server Solr;
the master is connected with a monitor and is used for being responsible for metadata management and monitoring the load of the underlying hbase regions, avoiding that the load of a specific Region is too high, and redistributing the regions by using the hbase subprocessor.
The invention discloses a distributed NewSQL database based on hadoop big data technology. When a user requests to enter, the following processes are mainly executed:
s1, Master is mainly responsible for accessing user request in JDBC, ODBC mode and coordinating data communication among multiple roles, managing the whole flow, it will send the request to SQLPLaner at first;
s2, SQLPLaner is mainly used for analyzing user requests, compiling SQL and customizing execution plans through a cloud-lubricated self-developed high-speed SQL engine UrunSQL;
s3, DTM, mainly used for coordinating multiple parties to complete distributed transaction management when the execution plan relates to transactions. Realizing distributed transaction processing and transaction management by using Java transaction processing API (JTA);
s4, the Worker is that the parallel task executor is mainly responsible for executing the tasks of the execution plan in parallel, and data obtained from the database is merged and summarized to be returned to the master;
s5, the master returns the request result to the user.
The Coprocessor is a Coprocessor provided by the hbase, and developers can realize efficient parallel processing on data on the basis of the Coprocessor, and meanwhile, a region management extension interface of a master end is provided.
JTA, a Java Transaction API, allows an application to perform a distributed Transaction-accessing and updating data on two or more networked computer resources.
Solr is a high-performance Lucene-based full-text search server, and simultaneously expands the server, provides richer query languages than Lucene, simultaneously realizes configurability and expandability, optimizes the query performance, provides a perfect function management interface, and is a very excellent full-text search engine.
Referring to fig. 2, a flowchart of a method for generating an execution plan by an SQLPlaner in a distributed newSQL database system according to embodiment 2 of the present invention is shown, where the embodiment is based on embodiment 1, and the method for generating an execution plan by an SQLPlaner includes the following steps:
1) inputting an SQL statement;
2) judging whether the SQL already exists in the shared cache pool, if so, outputting an execution plan corresponding to the SQL, and if not, executing the next step;
3) syntax checking is carried out on the SQL statement, if the syntax is wrong, error information is returned to a user, and the syntax checking is passed, namely the next step is executed;
4) performing semantic check on the SQL statement, if the semantic is wrong, returning error information to a user, and if the semantic check is passed, executing the next step;
5) carrying out view and expression conversion on the SQL statement;
6) selecting an optimizer according to the conversion result of the previous step;
7) selecting a data connection mode and a connection sequence according to a selection result of the optimizer;
8) selecting a searched path according to the connection mode and the connection sequence;
9) generating an execution plan according to the search path;
10) output the execution plan.
And returning the execution plan to the master after the execution plan is customized, wherein the master judges whether the intervention of the distributed transaction manager is needed or not according to the content of the execution plan, if so, the step of S3 in the embodiment 1 is executed, otherwise, the step of S3 in the embodiment 1 is skipped, and the step of S4 in the embodiment 1 is executed.
Referring to fig. 3, which is a flowchart of a method for establishing and querying a plurality of secondary indexes by a distributed newSQL database system according to embodiment 3 of the present invention, the embodiment is based on embodiment 2, wherein the method for establishing and querying a plurality of secondary indexes by a distributed newSQL database system includes the following steps:
and (3) writing request:
11) a user initiates a write request;
12) the master processes the sql request and generates an execution plan in combination with the SQLPlaner;
13) writing the data field into a data table by the worker according to the execution plan;
14) a coprocessor mechanism inside the hbase is utilized to realize synchronization and write the data into an index table in a reverse index generation mode;
15) returning the processing result of the hbase to the master by the worker;
16) the master returns the results to the user.
And (3) reading request:
21) user initiated read request
22) Master processes sql requests and generates execution plans in conjunction with SQLPlaner
23) The worker firstly queries the index table according to the execution plan, and the coprocessors are used for improving the query parallelism.
24) The index coprocessor in hbase returns the index ID of the index table
25) And (5) the worker queries the data table by utilizing the hbase API according to the index ID and returns the data table to the master.
26) The master merges the query results and returns the result to the user
Secondary indexes are supported, and the non-primary key query requirement is efficiently solved: the UrunBase allows a user to flexibly establish secondary indexes according to specific service logic, in practical application, the user often establishes a plurality of secondary indexes, and when the UrunBase is used, the UrunBase dynamically calculates the cost of using the indexes according to query conditions and automatically selects the most appropriate index. The query of hbase for rowkey is extremely efficient, so the implementation manner of the secondary index is to generate an index table for data by using Coprocessor and Filter of hbase. When writing data, the coprocessors writes the index data into the index table in a reverse index mode according to the index definition, preferentially queries the index table in a query stage according to the index definition and query conditions, and queries the data table again through a query result of the index table. Meanwhile, the parallelism of the coprocessors is utilized to improve the overall query speed.
The UrunBase supports the refinement work of the semi-structured data, the picture data and the full-text retrieval aiming at different storage types by taking the above two-level index flow as a reference.
Referring to fig. 4, a flowchart of a method for implementing semi-structured data access by a distributed newSQL database system according to embodiment 4 of the present invention is shown, where the embodiment is based on embodiment 3, where the method for implementing semi-structured data access by a distributed newSQL database system includes a write request:
31) writing JSON data as a common character string type as a whole into a data table of hbase as a field by worker
32) The copessor in the hbase extracts the data in the JSON according to the field description, and writes the index data into another hbase index table in an inverted index mode.
And (3) reading request:
41) and (5) the worker queries the index table according to the query condition, wherein the coprocessors are utilized to improve the parallelism of the query.
42) The index coprocessor in hbase returns the index ID of the index table
43) The worker queries the data table by utilizing the hbase API according to the index ID and returns a result
The UrunBase supports a JSON data format, and a user can directly store the data in the JSON format in the UrunBase, inquire any field of the JSON, create an index and delete and modify the field. JSON type data is newly added to the hbase bottom layer by the UrunBase, the JSON data is integrally stored in the bottom layer HFile, and the JSON is used as a nested type to carry out indexing when a secondary index is constructed, so that any field query, index creation and deletion aiming at the JSON can be supported.
Referring to fig. 5, which is a flowchart of a method for implementing picture data access by a distributed newSQL database system according to embodiment 5 of the present invention, the embodiment is based on embodiment 3, wherein the method for implementing picture data access by a distributed newSQL database system includes the steps of
And (3) writing request:
51) and (3) generating the MD5 from the picture data by the worker, and writing the picture MD5 into the original data table.
52) And writing the picture data into a picture data table by the worker for independent storage.
And (3) reading request:
61) the worker queries the original data table according to the query condition to obtain the MD5 of the picture
62) And (5) the worker queries the picture data table by utilizing the hbase API according to the picture MD5 and returns a result.
The UrunBase provides LOB storage, the LOB can efficiently meet the binary storage requirement that the size of a single piece of data is hundreds of K to 10M, and a user can meet the picture storage requirement through the LOB. The method includes the steps that an LOB type is added to the hbase bottom layer by UrunBase, the LOB type refers to the implementation of a BLOB type in SQL, a large object is stored as a bitmap in a database, the LOB is implemented to establish another type of index aiming at the LOB type, picture data are stored in an independent data table in the bitmap mode, an original data table only stores an index ID, and therefore the size of the data table is reduced. Because the picture data can only be modified in an atomic coverage way and can be inquired independently, the retrieval speed can be greatly improved when the image data is inquired for a non-picture field.
Referring to fig. 6, which is a flowchart of a method for implementing full-text retrieval by a distributed newSQL database system according to embodiment 6 of the present invention, the embodiment is based on embodiment 3, wherein the method for implementing full-text retrieval by a distributed newSQL database system includes
And (3) writing request:
71) the worker writes the field needing full text retrieval as a common character string type into a data table 7 of hbase for storage
2) The coprocessors in the hbase write the data into the solr for indexing according to the field description
And (3) reading request:
81) the worker queries the specific index ID in the solr according to the query condition
82) The solr returns the index ID according to the query condition.
83) And (5) the worker queries the data table by utilizing the hbase API according to the index ID and returns a result.
The UrunBase supports distributed full-text retrieval through Solr, and a user can create a full-text index for a table of the user and search in SQL by using full-text retrieval syntax. The mode is a special extension of the secondary index, and is realized by using a coprocessor, and the index data is not stored in another index table but stored in the SOLR aiming at the field needing full-text retrieval, and the SOLR provides the full-text retrieval function. When data is queried, the query statement of the field which is indexed by the full text is converted from the SQL conditional statement into the query expression of the SOLR for further query, and the return result of the SOLR is converted into a universal data format for further return.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (8)

1. A distributed newSQL database system, comprising:
the database interface is used for sending a request to the Master by a user and receiving a result returned by the Master;
the Master is used for accessing user requests in a JDBC and ODBC mode, coordinating data communication among a plurality of processors and managing the whole flow, and preferentially sending the user requests to the SQLPLaner; the master is connected with a monitor and is used for being responsible for metadata management and monitoring the load of the underlying hbase Region, avoiding that the load of a specific Region is too high, and redistributing the Region by using the hbase subprocessor;
the SQLPLaner is used for analyzing the user request, compiling and customizing an execution plan;
the distributed transaction manager is used for coordinating multiple parties in the plan to finish distributed transaction management;
and the parallel task executor is used for executing tasks in charge of the plan in parallel and merging and summarizing the data obtained from the database to return to the master.
2. The distributed newSQL database system according to claim 1, wherein the parallel task executor obtains data from the database through hbase and a search engine server.
3. The distributed newSQL database system according to claim 1, wherein the custom execution plan comprises:
inputting SQL sentences through the database interface;
judging whether the SQL already exists in the shared cache pool, if so, outputting an execution plan corresponding to the SQL;
otherwise, carrying out syntax check and semantic check on the SQL statement, and carrying out view and expression conversion on the SQL statement after the syntax check and the semantic check are passed;
carrying out optimizer selection according to the conversion result;
selecting a data connection mode and a connection sequence according to a selection result of the optimizer;
selecting a search path according to the connection mode and the connection sequence;
and generating an execution plan according to the search path and outputting the execution plan.
4. The distributed newSQL database system according to claim 2, wherein the Master is further configured to build and query a plurality of secondary indexes, including:
generating an index table aiming at data by utilizing a Coprocessor and a Filter of the hbase, wherein the Coprocessor writes index data into the index table in parallel in an inverted index mode according to index definitions so as to establish a plurality of secondary indexes;
the Master dynamically calculates the cost of using the index according to the query condition, and the Coprocessor can firstly query the index table according to the index definition and the query condition and parallelly query the data table again through the query result of the index table.
5. The distributed newSQL database system of claim 2, wherein the parallel task executor is further to implement semi-structured data access, including
The parallel task executor writes JSON data as a common character string type as a whole into the data table of the hbase as a field; the copessor in the hbase extracts data in the JSON according to the field description, writes the index data into another hbase index table in an inverted index mode, and completes the storage of semi-structured data;
the parallel task executor queries an index table in parallel by using a coprocessor according to query conditions; the index coprocessors in the hbase return the index ID of the index table to the parallel task executor; and the parallel task executor queries a data table by utilizing an API (application programming interface) of the hbase according to the index ID, returns a result and finishes obtaining the semi-structured data.
6. The distributed newSQL database system of claim 2, wherein the parallel task executor is further to implement picture data access, including
The parallel task executor generates image data into an image data format encrypted by an information summary algorithm, and writes the encrypted image data into an original data table; the parallel task executor writes the encrypted picture data into a picture data table for independent storage;
the parallel task executor queries an original data table according to query conditions to obtain image data encrypted by an information abstract algorithm; and the parallel task executor queries a picture data table by using the API of the hbase according to the encrypted image data to acquire picture data.
7. The distributed newSQL database system of claim 6, wherein the enabling picture data access further comprises
And increasing LOB types on the hbase bottom layer, establishing an alternative index aiming at the LOB types, storing the large object picture data as a bitmap in the database, storing the picture data in an independent data table in the bitmap mode, and only storing an index ID in an original data table.
8. The distributed newSQL database system of claim 2, wherein the parallel task executor obtains data from the database through a hbase and search engine server, including
The parallel task executor writes fields needing full-text retrieval into a data table of the hbase as common character string types for storage, and a coprocessor in the hbase writes data into a search engine server for indexing according to the description of the fields;
the parallel task executor queries a specific index ID from the search engine server according to a query condition, the search engine server returns the index ID according to the query condition, and the parallel task executor queries a data table by using the API of the hbase according to the index ID to acquire query data.
CN201710585103.2A 2016-09-21 2017-07-17 Distributed newSQL database system and method Expired - Fee Related CN107402995B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2016108423997 2016-09-21
CN201610842399.7A CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method

Publications (2)

Publication Number Publication Date
CN107402995A CN107402995A (en) 2017-11-28
CN107402995B true CN107402995B (en) 2020-06-09

Family

ID=58166840

Family Applications (24)

Application Number Title Priority Date Filing Date
CN201610842399.7A Pending CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method
CN201710585103.2A Expired - Fee Related CN107402995B (en) 2016-09-21 2017-07-17 Distributed newSQL database system and method
CN201710580431.3A Active CN107491485B (en) 2016-09-21 2017-07-17 Method for generating execution plan, plan unit device and distributed NewSQ L database system
CN201710580791.3A Active CN107291948B (en) 2016-09-21 2017-07-17 Access method of distributed newSQL database
CN201710580416.9A Expired - Fee Related CN107291947B (en) 2016-09-21 2017-07-17 Semi-structured data query method and distributed NewSQL database system
CN201710580456.3A Expired - Fee Related CN107402988B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and semi-structured data query method
CN201710580403.1A Expired - Fee Related CN107368575B (en) 2016-09-21 2017-07-17 Load-balanced distributed NewSQL database system
CN201710581275.2A Active CN107329837B (en) 2016-09-21 2017-07-17 Load balancing method and unit and distributed NewSQL database system
CN201710581193.8A Expired - Fee Related CN107451219B (en) 2016-09-21 2017-07-17 Method for analyzing second index and distributed New SQL database
CN201710581273.3A Expired - Fee Related CN107451221B (en) 2016-09-21 2017-07-17 Database interface unit device and distributed NewSQL database system
CN201710580435.1A Expired - Fee Related CN107480198B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval method
CN201710580796.6A Expired - Fee Related CN107402992B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval establishing method
CN201710580423.9A Active CN107402987B (en) 2016-09-21 2017-07-17 Full-text retrieval method and distributed NewSQL database system
CN201710580417.3A Expired - Fee Related CN107463632B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data query method
CN201710580739.8A Expired - Fee Related CN107402990B (en) 2016-09-21 2017-07-17 Distributed New SQL database system and semi-structured data storage method
CN201710580720.3A Expired - Fee Related CN107402989B (en) 2016-09-21 2017-07-17 Full-text retrieval establishing method and distributed NewSQL database system
CN201710580794.7A Expired - Fee Related CN107451214B (en) 2016-09-21 2017-07-17 Non-primary key query method and distributed NewSQL database system
CN201710581291.1A Expired - Fee Related CN107463637B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data storage method
CN201710581229.2A Expired - Fee Related CN107491345B (en) 2016-09-21 2017-07-17 Method for writing picture data and distributed NewSQ L database system
CN201710581256.XA Expired - Fee Related CN107391653B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data storage method
CN201710581237.7A Expired - Fee Related CN107463635B (en) 2016-09-21 2017-07-17 Method for inquiring picture data and distributed NewSQL database system
CN201710580754.2A Expired - Fee Related CN107402991B (en) 2016-09-21 2017-07-17 Method for writing semi-structured data and distributed NewSQL database system
CN201710580752.3A Expired - Fee Related CN107247808B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data query method
CN201710581195.7A Expired - Fee Related CN107451220B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201610842399.7A Pending CN106446153A (en) 2016-09-21 2016-09-21 Distributed newSQL database system and method

Family Applications After (22)

Application Number Title Priority Date Filing Date
CN201710580431.3A Active CN107491485B (en) 2016-09-21 2017-07-17 Method for generating execution plan, plan unit device and distributed NewSQ L database system
CN201710580791.3A Active CN107291948B (en) 2016-09-21 2017-07-17 Access method of distributed newSQL database
CN201710580416.9A Expired - Fee Related CN107291947B (en) 2016-09-21 2017-07-17 Semi-structured data query method and distributed NewSQL database system
CN201710580456.3A Expired - Fee Related CN107402988B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and semi-structured data query method
CN201710580403.1A Expired - Fee Related CN107368575B (en) 2016-09-21 2017-07-17 Load-balanced distributed NewSQL database system
CN201710581275.2A Active CN107329837B (en) 2016-09-21 2017-07-17 Load balancing method and unit and distributed NewSQL database system
CN201710581193.8A Expired - Fee Related CN107451219B (en) 2016-09-21 2017-07-17 Method for analyzing second index and distributed New SQL database
CN201710581273.3A Expired - Fee Related CN107451221B (en) 2016-09-21 2017-07-17 Database interface unit device and distributed NewSQL database system
CN201710580435.1A Expired - Fee Related CN107480198B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval method
CN201710580796.6A Expired - Fee Related CN107402992B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and full-text retrieval establishing method
CN201710580423.9A Active CN107402987B (en) 2016-09-21 2017-07-17 Full-text retrieval method and distributed NewSQL database system
CN201710580417.3A Expired - Fee Related CN107463632B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data query method
CN201710580739.8A Expired - Fee Related CN107402990B (en) 2016-09-21 2017-07-17 Distributed New SQL database system and semi-structured data storage method
CN201710580720.3A Expired - Fee Related CN107402989B (en) 2016-09-21 2017-07-17 Full-text retrieval establishing method and distributed NewSQL database system
CN201710580794.7A Expired - Fee Related CN107451214B (en) 2016-09-21 2017-07-17 Non-primary key query method and distributed NewSQL database system
CN201710581291.1A Expired - Fee Related CN107463637B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and data storage method
CN201710581229.2A Expired - Fee Related CN107491345B (en) 2016-09-21 2017-07-17 Method for writing picture data and distributed NewSQ L database system
CN201710581256.XA Expired - Fee Related CN107391653B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data storage method
CN201710581237.7A Expired - Fee Related CN107463635B (en) 2016-09-21 2017-07-17 Method for inquiring picture data and distributed NewSQL database system
CN201710580754.2A Expired - Fee Related CN107402991B (en) 2016-09-21 2017-07-17 Method for writing semi-structured data and distributed NewSQL database system
CN201710580752.3A Expired - Fee Related CN107247808B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system and picture data query method
CN201710581195.7A Expired - Fee Related CN107451220B (en) 2016-09-21 2017-07-17 Distributed NewSQL database system

Country Status (1)

Country Link
CN (24) CN106446153A (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391744B (en) * 2017-08-10 2020-06-16 东软集团股份有限公司 Data storage method, data reading method, data storage device, data reading device and equipment
CN107480260B (en) * 2017-08-16 2021-02-23 北京奇虎科技有限公司 Big data real-time analysis method and device, computing equipment and computer storage medium
CN107688660B (en) * 2017-09-08 2020-03-13 上海达梦数据库有限公司 Parallel execution plan execution method and device
CN107766572A (en) * 2017-11-13 2018-03-06 北京国信宏数科技有限责任公司 Distributed extraction and visual analysis method and system based on economic field data
CN108228750A (en) * 2017-12-21 2018-06-29 浪潮软件股份有限公司 A kind of distributed data base and its method that data are managed
CN108038215A (en) * 2017-12-22 2018-05-15 上海达梦数据库有限公司 Data processing method and system
CN109992409B (en) * 2018-01-02 2021-07-30 ***通信有限公司研究院 Method, device and system for segmenting data storage area, electronic equipment and medium
CN108829507B (en) * 2018-03-30 2019-07-26 北京百度网讯科技有限公司 The resource isolation method, apparatus and server of distributed data base system
CN108664616A (en) * 2018-05-14 2018-10-16 浪潮软件集团有限公司 ROWID-based Oracle data batch acquisition method
CN108846044A (en) * 2018-05-30 2018-11-20 浪潮软件股份有限公司 A kind of map application dispositions method and device
CN108920519A (en) * 2018-06-04 2018-11-30 贵州数据宝网络科技有限公司 One-to-many data supply system and method
CN109033209B (en) * 2018-06-29 2021-12-31 新华三大数据技术有限公司 Spark storage process processing method and device
CN109241076A (en) * 2018-08-01 2019-01-18 上海依图网络科技有限公司 A kind of data query method and device
CN109271428A (en) * 2018-09-11 2019-01-25 北京市计算中心 Data pick-up method and method for exhibiting data based on geography information
CN109408591B (en) * 2018-10-12 2021-11-09 北京聚云位智信息科技有限公司 Decision-making distributed database system supporting SQL (structured query language) driven AI (Artificial Intelligence) and feature engineering
CN109298976B (en) * 2018-10-17 2022-04-12 成都索贝数码科技股份有限公司 Heterogeneous database cluster backup system and method
CN109408515A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 A kind of index execution method and apparatus
CN109684412A (en) * 2018-12-25 2019-04-26 成都虚谷伟业科技有限公司 A kind of distributed data base system
CN109726250B (en) * 2018-12-27 2020-01-17 星环信息科技(上海)有限公司 Data storage system, metadata database synchronization method and data cross-domain calculation method
CN111488340B (en) * 2019-01-29 2023-09-12 菜鸟智能物流控股有限公司 Data processing method and device and electronic equipment
CN110046161A (en) * 2019-03-18 2019-07-23 平安普惠企业管理有限公司 Method for writing data and device, storage medium, electronic equipment
CN110086602B (en) * 2019-04-16 2022-02-11 上海交通大学 Rapid implementation method of SM3 password hash algorithm based on GPU
CN110110234B (en) * 2019-05-13 2020-10-16 重庆天蓬网络有限公司 Big data real-time searching system and method
CN110275901B (en) * 2019-06-25 2021-08-24 北京创鑫旅程网络技术有限公司 Cache data calling method and device
CN110457363B (en) * 2019-07-05 2023-11-21 中国平安人寿保险股份有限公司 Query method, device and storage medium based on distributed database
CN110413642B (en) * 2019-08-02 2022-05-27 北京快立方科技有限公司 Application-unaware fragmentation database parsing and optimizing method
CN110569257B (en) * 2019-09-16 2022-04-01 上海达梦数据库有限公司 Data processing method, corresponding device, equipment and storage medium
CN110704437B (en) * 2019-09-26 2022-05-20 上海达梦数据库有限公司 Method, device, equipment and storage medium for modifying database query statement
CN112688976A (en) * 2019-10-17 2021-04-20 广州迈安信息科技有限公司 Data processing transmission service system adopting JDBC/HTTP standard
CN110888919B (en) * 2019-12-04 2023-06-30 阳光电源股份有限公司 HBase-based method and device for statistical analysis of big data
CN113032479A (en) * 2019-12-24 2021-06-25 上海昂创信息技术有限公司 HBase non-primary key indexing method and HBase system
CN111309581B (en) * 2020-02-28 2023-09-12 中国工商银行股份有限公司 Application performance detection method and device in database upgrading scene
CN111651453B (en) * 2020-04-30 2024-02-06 中国平安财产保险股份有限公司 User history behavior query method and device, electronic equipment and storage medium
CN113760960A (en) * 2020-06-01 2021-12-07 北京搜狗科技发展有限公司 Information generation method and device for generating information
CN111797112B (en) * 2020-06-05 2022-04-01 武汉大学 PostgreSQL preparation statement execution optimization method
CN113806611A (en) * 2020-06-17 2021-12-17 海信集团有限公司 Method and equipment for storing search engine results
CN111930705B (en) * 2020-07-07 2023-03-14 中国电子科技集团公司电子科学研究院 Binary message protocol data processing method and device
CN112148792B (en) * 2020-09-16 2024-04-12 鹏城实验室 Partition data adjustment method, system and terminal based on HBase
CN112052347B (en) * 2020-10-09 2024-06-04 北京百度网讯科技有限公司 Image storage method and device and electronic equipment
CN112416925B (en) * 2020-11-02 2024-04-09 浙商银行股份有限公司 Query method based on ordered distributed index structure and distributed database system
CN112364033B (en) * 2021-01-13 2021-04-13 北京云真信科技有限公司 Data retrieval system
CN113760900A (en) * 2021-02-19 2021-12-07 西安京迅递供应链科技有限公司 Method and device for real-time data summarization and interval summarization
CN112905615B (en) * 2021-03-02 2023-03-24 浪潮云信息技术股份公司 Distributed consistency protocol submission method and system based on sequence verification
CN112925841B (en) * 2021-03-26 2022-11-08 瀚高基础软件股份有限公司 Distributed JDBC implementation method, device and computer-readable storage medium
CN113407662B (en) * 2021-08-19 2021-12-14 深圳市明源云客电子商务有限公司 Sensitive word recognition method, system and computer readable storage medium
CN113742370B (en) * 2021-11-02 2022-04-19 阿里云计算有限公司 Data query method and statistical information ciphertext generation method of full-encryption database
CN115129724A (en) * 2022-08-29 2022-09-30 畅捷通信息技术股份有限公司 Statistical report paging method, system, equipment and medium
CN116861455B (en) * 2023-06-25 2024-04-26 上海数禾信息科技有限公司 Event data processing method, system, electronic device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572895A (en) * 2014-12-24 2015-04-29 天津南大通用数据技术股份有限公司 MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477568A (en) * 2009-02-12 2009-07-08 清华大学 Integrated retrieval method for structured data and non-structured data
CN101567006B (en) * 2009-05-25 2012-07-04 中兴通讯股份有限公司 Database system and distributed SQL statement execution plan reuse method
CN102163195B (en) * 2010-02-22 2013-04-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN102375853A (en) * 2010-08-24 2012-03-14 ***通信集团公司 Distributed database system, method for building index therein and query method
CN102201010A (en) * 2011-06-23 2011-09-28 清华大学 Distributed database system without sharing structure and realizing method thereof
CN102289482A (en) * 2011-08-02 2011-12-21 北京航空航天大学 Unstructured data query method
CN103150304B (en) * 2011-12-06 2016-11-23 郑红云 Cloud Database Systems
CN103577407B (en) * 2012-07-19 2016-10-12 国际商业机器公司 Querying method and inquiry unit for distributed data base
US20140074860A1 (en) * 2012-09-12 2014-03-13 Pingar Holdings Limited Disambiguator
CN102902932B (en) * 2012-09-18 2015-12-02 武汉华工安鼎信息技术有限责任公司 The using method of the outside encrypting and deciphering system of the database based on SQL rewrite
CN103092970A (en) * 2013-01-24 2013-05-08 华为技术有限公司 Database operation method and device
US9773021B2 (en) * 2013-01-30 2017-09-26 Hewlett-Packard Development Company, L.P. Corrected optical property value-based search query
CN103377292B (en) * 2013-07-02 2017-02-15 华为技术有限公司 Database result set caching method and device
US20150039587A1 (en) * 2013-07-31 2015-02-05 Oracle International Corporation Generic sql enhancement to query any semi-structured data and techniques to efficiently support such enhancements
CN103473321A (en) * 2013-09-12 2013-12-25 华为技术有限公司 Database management method and system
CN104794123B (en) * 2014-01-20 2018-07-27 阿里巴巴集团控股有限公司 A kind of method and device building NoSQL database indexes for semi-structured data
CN103984726B (en) * 2014-05-16 2017-03-29 上海新炬网络信息技术有限公司 A kind of local correction method of data base's implement plan
CN104133858B (en) * 2014-07-15 2017-08-01 武汉邮电科学研究院 Intelligence analysis system with double engines and method based on row storage
CN104503985A (en) * 2014-12-03 2015-04-08 浪潮电子信息产业股份有限公司 Method for automatically creating Solr index file by Hbase data
CN104731922A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 System and method for rapidly retrieving structural data based on distributed type database HBase
CN104750815B (en) * 2015-03-30 2017-11-03 浪潮集团有限公司 The storage method and device of a kind of Lob data based on HBase
CN104731945B (en) * 2015-03-31 2018-04-06 浪潮集团有限公司 A kind of text searching method and device based on HBase
CN105389375B (en) * 2015-11-18 2018-10-02 福建师范大学 A kind of image index setting method, system and search method based on visible range
CN105740410A (en) * 2016-01-29 2016-07-06 浪潮电子信息产业股份有限公司 Data statistics method based on Hbase secondary index

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572895A (en) * 2014-12-24 2015-04-29 天津南大通用数据技术股份有限公司 MPP (Massively Parallel Processor) database and Hadoop cluster data intercommunication method, tool and realization method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Apache Phoenix;@ApachePhoenix,Apache.org;《http://phoenix.apache.org/presentations/OC-HUG-2014-10-4x3.pdf》;20141029;第1-59页 *
Phoenix;James Taylor,Apache.org;《http://phoenix.apache.org/presentations/HadoopSummit2013-16x》;20141029;第1-108页 *

Also Published As

Publication number Publication date
CN107247808B (en) 2020-01-10
CN107402992B (en) 2020-06-09
CN107480198A (en) 2017-12-15
CN107402991A (en) 2017-11-28
CN107451219B (en) 2020-06-09
CN107451220A (en) 2017-12-08
CN107451220B (en) 2020-06-09
CN107402990A (en) 2017-11-28
CN107451214A (en) 2017-12-08
CN107402992A (en) 2017-11-28
CN107463637B (en) 2020-05-19
CN107391653B (en) 2020-05-19
CN107451221B (en) 2020-09-04
CN107402988A (en) 2017-11-28
CN107463632B (en) 2020-06-09
CN107480198B (en) 2020-05-19
CN107402990B (en) 2020-06-09
CN107402987A (en) 2017-11-28
CN107402987B (en) 2020-04-03
CN107391653A (en) 2017-11-24
CN107491485B (en) 2020-08-04
CN107291947B (en) 2020-03-10
CN107291948A (en) 2017-10-24
CN107291947A (en) 2017-10-24
CN107329837B (en) 2020-06-09
CN107451221A (en) 2017-12-08
CN107491345A (en) 2017-12-19
CN107491485A (en) 2017-12-19
CN107451219A (en) 2017-12-08
CN107463637A (en) 2017-12-12
CN107463635B (en) 2020-09-25
CN107368575B (en) 2020-06-09
CN107491345B (en) 2020-08-04
CN107402988B (en) 2020-01-03
CN107463632A (en) 2017-12-12
CN107247808A (en) 2017-10-13
CN107402995A (en) 2017-11-28
CN107463635A (en) 2017-12-12
CN107451214B (en) 2020-05-19
CN107329837A (en) 2017-11-07
CN107368575A (en) 2017-11-21
CN107402989B (en) 2020-10-27
CN107291948B (en) 2020-05-19
CN106446153A (en) 2017-02-22
CN107402991B (en) 2020-05-19
CN107402989A (en) 2017-11-28

Similar Documents

Publication Publication Date Title
CN107402995B (en) Distributed newSQL database system and method
Ali et al. Comparison between SQL and NoSQL databases and their relationship with big data analytics
JP6617117B2 (en) Scalable analysis platform for semi-structured data
Xie et al. Simba: Efficient in-memory spatial analytics
US20170193041A1 (en) Document-partitioned secondary indexes in a sorted, distributed key/value data store
JP6964384B2 (en) Methods, programs, and systems for the automatic discovery of relationships between fields in a mixed heterogeneous data source environment.
US10997124B2 (en) Query integration across databases and file systems
Aranda-Andújar et al. AMADA: web data repositories in the amazon cloud
Hubail et al. Couchbase analytics: NoETL for scalable NoSQL data analysis
Li et al. An integration approach of hybrid databases based on SQL in cloud computing environment
Bugiotti et al. RDF data management in the Amazon cloud
Wu et al. Comparisons between mongodb and ms-sql databases on the twc website
Caldarola et al. Big data: A survey-the new paradigms, methodologies and tools
US20170177633A1 (en) Dbms-supported score assignment
El Alami et al. Supply of a key value database redis in-memory by data from a relational database
Feuerlicht Database Trends and Directions: Current Challenges and Opportunities.
Ravichandran Big Data processing with Hadoop: a review
Arputhamary et al. A review on big data integration
Yuan et al. VDB-MR: MapReduce-based distributed data integration using virtual database
Kuznetsov et al. Real-time analytics: benefits, limitations, and tradeoffs
Scholz Coping with Dynamic, Unstructured Data Sets–NoSQL a Buzzword or a Savior?
Li Introduction to Big Data
Sachdeva et al. Comparison of data processing tools in hadoop
Paradies et al. Entity matching for semistructured data in the Cloud
Liu et al. Modeling fuzzy relational database in HBase

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200429

Address after: Room 5303, 1023 Gaopu Road, Tianhe Software Park, Tianhe District, Guangzhou City, Guangdong 510000

Applicant after: Yunrun Da Data Service Co.,Ltd.

Address before: 510000 Yuexiu District, Guangzhou Province, north of the text of the text of the North Road, No. 68, the east wing of the text of the building on the ground floor, No. six, No. 602, No.

Applicant before: GUANGZHOU TEDAO INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200609

Termination date: 20210717