CN106844703B

CN106844703B - A kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine

Info

Publication number: CN106844703B
Application number: CN201710064131.XA
Authority: CN
Inventors: 张延松; 王珊; 杜小勇
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2017-02-04
Filing date: 2017-02-04
Publication date: 2019-08-02
Anticipated expiration: 2037-02-04
Also published as: CN106844703A

Abstract

The present invention relates to a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine, steps: building internal storage data model of storehouse inventory；Construct internal storage data warehouse all-in-one machine distributed storage model；High performance computing service device company-data more new strategy: when high performance computing service device cluster memory off-capacity, data at most is eliminated using round-robin queue's more new strategy, are updated to newest data；Realize the processing of internal storage data warehouse all-in-one machine OLAP query.The present invention can improve the utilization rate of database all-in-one machine asymmetry storage and computing resource, improve memory OLAP overall performance, the different disposal stage of multiple queries can further be flowed to water parallel processing on database one machine platform, improve system OLAP query throughput performance.The present invention is suitable for the memory OLAP application scenarios towards internal storage data warehouse all-in-one machine, and the memory OLAP performance that can adapt under database all-in-one machine asymmetry hardware structure accelerates demand.

Description

A kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine

Technical field

The present invention relates to a kind of data warehouse implementation methods, especially with regard to a kind of memory number of data base-oriented all-in-one machine According to warehouse query processing implementation method.

Background technique

Database all-in-one machine is a kind of storage of data base-oriented big data and High Performance Data Query processing application characteristic and designs Soft and hardware integrated design solution.In terms of hardware design, database all-in-one machine is usually the clothes as unit of cabinet Business device aggregated structure, passes through built-in high speed network and server cluster provides the big data storage and processing energy of scalability Power.Different server cluster Expansion abilities is provided in cabinet, and is realized as unit of cabinet extending transversely；Database All-in-one machine generallys use small-scale high performance computing service device cluster and deposits for complex query processing service and extensive low side It stores up server cluster and is used for big data storage service, be a kind of asymmetric server cluster framework；Database all-in-one machine is usual Using its special hardware-accelerated storage access and query processing performance, if Oracle Exadata database all-in-one machine is using big Capacity PCI-e flash cache data in magnetic disk improves data access performance, IBM Netezza use site programmable gate array FPGA calculates the simple operations such as the biggish decompression of cost, projection, filtering as dedicated database accelerator card, for handling, And multi-core CPU then handles and more complicated the operation such as polymerize, connects, summarizing.In software aspects, Database Systems are needed towards number According to the special hardware structure optimization software design of library all-in-one machine, such as optimize distributed data storage strategy, optimization collects towards asymmetry The query process tactic of group optimizes towards novel flash memory device and novel acceleration card apparatus (such as FPGA, GPU, Intel MIC Phi etc.) Query Optimization Technique.

Data warehouse is the most important application field of database all-in-one machine, with the hair of novel storage and processing device technology Exhibition, internal storage data warehouse are increasingly becoming emerging real-time OLAP analysis processing platform, data base-oriented all-in-one machine framework it is interior Deposit data warehouse can preferably meet big data real-time OLAP application demand.Current internal storage data REPOSITORY TECHNOLOGY is mainly directed towards The hardware structure of isomorphism server cluster, towards asymmetric server cluster and novel storage, calculate in terms of it is excellent It is also immature to change technical research.Therefore, how pointedly towards internal storage data warehouse all-in-one machine framework the characteristics of and it is novel Storage and calculate equipment the characteristics of and systematically design memory OLAP Query Processing Technique frame become current urgent need to resolve skill Art problem: its critical issue is how to adapt to the hardware structure feature of internal storage data warehouse all-in-one machine, gives full play to internal storage data The hardware performance advantage of warehouse all-in-one machine, improves the overall performance of memory OLAP.

Summary of the invention

In view of the above-mentioned problems, the object of the present invention is to provide a kind of inquiries of the internal storage data warehouse of data base-oriented all-in-one machine Implementation method is handled, this method adapts to the acceleration of the memory OLAP performance under internal storage data warehouse all-in-one machine asymmetry hardware structure Demand gives full play to the hardware performance advantage of internal storage data warehouse all-in-one machine, improves the overall performance of memory OLAP.

To achieve the above object, the present invention takes following technical scheme: a kind of internal storage data of data base-oriented all-in-one machine Warehouse query processing implementation method, it is characterised in that the following steps are included: 1) constructing internal storage data model of storehouse inventory；2) it constructs Internal storage data warehouse all-in-one machine distributed storage model；3) high performance computing service device company-data more new strategy: work as high-performance When calculation server cluster memory off-capacity, data at most are eliminated using round-robin queue's more new strategy, are updated to newest Data；4) processing of internal storage data warehouse all-in-one machine OLAP query is realized.

In the step 1), multi-dimensional relation OLAP model of the internal storage data model of storehouse inventory using fusion, multi-dimensional relation OLAP model construction process is as follows: 1.1) logic data model: the cube structure of data warehouse is divided into dimension, more Three kinds of data structures of dimension index and measurement；1.2) Physical data model: dimension is stored as dimension table and dimensional vector, and dimension table is deposited using row Storage or column storage database engine, dimensional vector indicate dimension with structure of arrays, and array index is mapped as latitude coordinates；Multi-dimensional indexing Using column storage model；Measurement is stored as true table, is stored using column；1.3) multidimensional OLAP interrogation model includes that dimension maps, is more Dimension index calculates and polymerization calculates three processing stages.

In the step 1.3), the specific process is as follows: 1.3.1) dimension mapping: OLAP query is mapped to related dimension table, Dimensional vector is generated, the non-null value in dimensional vector identifies the corresponding multidimensional data subset of current OLAP query on each relevant dimension Component value；1.3.2) multi-dimensional indexing calculates: multi-dimensional indexing is mapped to corresponding dimensional vector realization to the multidimensional mistake of metric data Filter, and vector index is created, mark meets the multi-dimensional indexing item of current OLAP query, and the non-null value in vector index represents OLAP The multi-dimensional address for the aggregated data cube that inquiry packets attribute is constructed；It is obtained by multidimensional filtering and meets OLAP query condition The metrology data sets of data create vector index for metric data；1.3.3) polymerization calculates: metric data is based on vector index Packet aggregation is completed to calculate.

In the step 2), internal storage data warehouse all-in-one machine distributed storage model uses following two distributed storage plan Slightly: 2.1) dimension table, multi-dimensional indexing are centrally stored, true table distributed storage strategy；2.2) dimension table is centrally stored, multi-dimensional indexing, thing Real table distributed storage strategy.

In the step 2.1), specific storage strategy is as follows: 2.1.1) lesser dimension table is centrally stored in high-performance calculation Server cluster；When computing cluster configuration is higher, the multi-dimensional indexing in internal storage data warehouse is centrally stored in high-performance calculation clothes Business device clustered node；2.1.2) huge true table data are distributed using horizontal fragmentation mode is stored in storage service clustered node On；2.1.3) vector index that Multi-dimension calculation generates is transferred to corresponding storage server clustered node, completes polymerization and calculates.

In the step 2.2), specific storage strategy are as follows: stored when high performance computing service device cluster memory capacity is opposite Service cluster memory size it is smaller and can not stored memory data warehouse whole multi-dimensional indexing data when, using dimension table concentration deposit It is stored in high performance computing service device cluster, multi-dimensional indexing and true table use horizontal fragmentation mode to be stored in high-performance meter with being distributed It calculates in server cluster and storage server cluster.

In the step 4), specific memory OLAP inquiry processing method is as follows: 4.1) OLAP query is in high-performance calculation Server cluster executes, and OLAP query order is decomposed into the dimensional vector on related dimension table and generates order, filtering dimension table record, projection Packet attributes and dictionary encoding is carried out to packet attributes out, is encoded using dictionary table and as dimension table record corresponding dimensional vector unit Value, the dimension table for being unsatisfactory for filter condition record corresponding dimensional vector unit and are set to null value, creation OLAP query it is relevant it is each tie up to Amount；4.2) centrally stored using multi-dimensional indexing, when true table distributed storage strategy, multi-dimensional indexing is pressed true table physical partitioning and is carried out Logic fragment；4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node saves complete multidimensional rope Draw and downloads dimensional vector with factual data fragment, each server node to local node from high performance computing service device cluster, complete The OLAP of localization is calculated；4.4) when server node is configured with many-core coprocessor accelerator card, accelerated using coprocessor Card accelerates multi-dimensional indexing calculation method；4.5) in storage server node side, when memory size is less than data fragmentation, use is excellent Change strategy one and completes multi-dimensional indexing calculating.

In the step 4.2), OLAP query comprising the following three steps: 4.2.1) multi-dimensional indexing is raw according to OLAP query At dimensional vector carry out multidimensional filtering and calculate, generate corresponding vector index, the null value unit in vector index is for filtering thing Real table record, non-null value represent the block encoding that true table is recorded in OLAP query；When multi-dimensional indexing is in OLAP query correlation When the position value of dimensional vector mapping is non-empty, by the corresponding packet data cube multidimensional coordinate of related dimensional vector mapping value One-dimensional coordinate is converted to be stored in the corresponding unit of vector index；4.2.2) vector index of creation is sent by logic fragment Onto the corresponding node of storage server cluster, measure column is filtered by vector index, and carry out polymerization calculating；4.2.3 it) stores Polymerization result in server cluster node is transmitted back to high performance computing service device cluster and carries out global polymerization result merger operation, Global polymerization result is obtained, and the multidimensional coordinate of the corresponding packet data cube of polymerization result is mapped to each dimensional vector and is grouped Dictionary table is converted to packet attributes, exports OLAP query processing result.

In the step 4.4), the specific steps are as follows: 4.4.1) according to coprocessor accelerator card memory size to multidimensional rope Draw and divided with vector index, is distributed by the principle for maximizing coprocessor accelerator card memory usage and coprocessor is suitble to add The maximum fragment of fast card memory size, and copy to coprocessor accelerator card memory；4.4.2) when query execution, dimensional vector is answered Coprocessor accelerator card memory is made, the multi-dimensional indexing mapped based on dimensional vector is completed by coprocessor accelerator card and is calculated, it is raw At vector index, and memory is copied back into, updates corresponding vector index fragment；4.4.3) memory multi-dimensional indexing fragment be based on dimension to Amount is completed multi-dimensional indexing by CPU and is calculated, and generates corresponding vector index fragment；4.4.4) at CPU and coprocessor accelerator card Different multi-dimensional indexing data fragmentations is managed, the calculating on two multi-dimensional indexing fragments executes parallel.

In the step 4.5), optimisation strategy one is as follows: 4.5.1) when node memory can store multi-dimensional indexing and part When measure column, multi-dimensional indexing full memory is stored, and factual data is frequent in memory storage by lru algorithm to arrange as storage cell The measure column of access, the measure column infrequently accessed are stored in flash memory；4.5.2) when node memory cannot store whole multidimensional ropes When drawing column, multi-dimensional indexing is stored in node server memory or flash memory to arrange for unit；Multi-dimensional indexing is calculated for unit by LRU with arranging The multi-dimensional indexing column that method selection frequently uses are stored in memory；4.5.3 the multi-dimensional indexing) when multi-dimensional indexing calculates, in memory Column first carry out dimensional vector map operation, and vector index records some numerical results of memory multi-dimensional indexing column, and with vector rope Draw non-null value position as the multi-dimensional indexing column in index accesses flash memory, completes remaining multi-dimensional indexing calculating task.

The invention adopts the above technical scheme, which has the following advantages: 1, the present invention passes through building data base-oriented The novel meter such as all-in-one machine high performance computing service device cluster and storage server cluster, many-core coprocessor accelerator card and flash memory It calculates, the memory OLAP data model of storage hardware, Data Warehouse Conceptual data set is divided into dimension, multi-dimensional indexing and measurement three Class data respectively correspond high performance computing service device cluster and many-core coprocessor accelerator card memory and computing resource, storage clothes Memory, flash memory and the computing resource of business device cluster realize data storage and calculate feature and database all-in-one machine hardware characteristics phase It adapts to；Memory OLAP query processing is reduced to dimension mapping calculation, multi-dimensional indexing calculates and polymerize calculating, database is most complicated Attended operation be converted to the calculating of the multi-dimensional indexing based on simple vector data structure, keep Data Structure and Algorithm design more suitable The programming feature for closing many-core coprocessor accelerator card accelerates OLAP core capabilities by novel computing hardware；OLAP is looked into Inquiry task is on database all-in-one machine high performance computing service device cluster, many-core coprocessor accelerator card and storage server cluster Configuration is optimized, the utilization rate of database all-in-one machine asymmetry storage and computing resource is improved, improves memory OLAP globality Energy；It is the stream treatment task between different computing platforms that OLAP query, which handles Task-decomposing, can further be looked into multiple The different disposal stage of inquiry flows water parallel processing on database one machine platform, improves system OLAP query throughput performance.2, The present invention is mentioned for the hardware configuration of the asymmetric server cluster framework of database all-in-one machine and flash memory, coprocessor accelerator card The memory OLAP Query Optimization Technique appeared to hardware feature maximizes internal storage data warehouse all-in-one machine by hardware to memory The optimization function of OLAP performance.3, under memory database warehouse all-in-one machine asymmetry hardware structure, in storage model, this hair Lesser dimension and multi-dimensional indexing data are centrally stored in high performance computing service device cluster by bright use, by biggish measurement number According to the data distribution strategy for being stored in storage server cluster, make the data characteristics and database all-in-one machine high-performance of data warehouse The memory capacity feature of calculation server cluster and storage cluster is adapted.4, on computation model, the present invention, which uses, passes through crowd Core coprocessor rapid memory OLAP query processing technique, utilizes many-core coprocessor accelerator card (such as FPGA, GPU, Intel MIC Phi etc.) the characteristics of computation capability is powerful, price is low, low energy consumption rapid memory OLAP multi-dimensional indexing calculation processing Stage improves whole OLAP query process performance.

In conclusion the present invention is suitable for the memory OLAP application scenarios towards internal storage data warehouse all-in-one machine, Neng Goushi The memory OLAP performance under database all-in-one machine asymmetry hardware structure is answered to accelerate demand.

Detailed description of the invention

Fig. 1 is database all-in-one machine hardware structure schematic diagram；

Fig. 2 is logic data model, Physical data model and multidimensional OLAP computation model schematic diagram used in the present invention；

Fig. 3 is that dimension table of the present invention, multi-dimensional indexing are centrally stored, true table distributed storage strategy schematic diagram；

Fig. 4 is that dimension table of the present invention is centrally stored, multi-dimensional indexing, true table distributed storage strategy；

Fig. 5 is high performance computing service device company-data more new strategy schematic diagram of the present invention；

Fig. 6 is the present invention towards the multi-dimensional indexing of CPU and many-core coprocessor framework calculating schematic diagram；

Fig. 7 is that the present invention is based on the OLAP query processing schematics of database all-in-one machine cluster；

Fig. 8 is that the present invention mostly inquiry flowing water executes method schematic diagram parallel；

Fig. 9 is OLAP query treatment process schematic diagram of the embodiment of the present invention.

Specific embodiment

The present invention is described in detail below with reference to the accompanying drawings and embodiments.

The present invention provides a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine, party's normal plane To novel storage and the computing hardware such as database all-in-one machine asymmetry hardware structure and flash memory, many-core coprocessor accelerator card It optimizes, is allowed to be adapted with memory OLAP query processing feature, high-performance internal storage data warehouse OLAP query is provided Processing capacity, specific step are poly- as follows:

1) internal storage data model of storehouse inventory is constructed:

As shown in Figure 1, database all-in-one machine generallys use dissymmetrical structure on hardware structure, usually by high-performance calculation Server cluster and storage service cluster are constituted: high performance computing service device cluster hardware configuration is higher, is such as configured with large capacity Memory or muti-piece high-performance many-core coprocessor accelerator card；Storage service cluster hardware configuration is usually relatively low, memory size It is relatively small, a small amount of coprocessor accelerator card of possible configuration.According to hardware configuration feature, High-Performance Computing Cluster is mainly responsible for memory The main Multi-dimension calculation task of data warehouse, and storage cluster is then suitble to the processing lower data processing task of computation complexity.

For the hardware structure feature of database all-in-one machine, as shown in Fig. 2, internal storage data model of storehouse inventory of the present invention is adopted With the multi-dimensional relation OLAP model of fusion, multi-dimensional relation OLAP model construction process is as follows:

1.1) logic data model

The cube structure of data warehouse is divided into three kinds of dimension, multi-dimensional indexing and measurement data structures.Dimension The solid axes of corresponding internal storage data warehouse multi-dimensional data cube, for constructing Data Warehouse Conceptual data cube mould Type；Multi-dimensional indexing corresponds to space coordinate of the factual data in multi-dimensional data cube, for mapping metric data in multidimensional number According to the hyperspace position in cube；Measurement then corresponds to each attribute of factual data.

1.2) Physical data model

In Physical data model, dimension is stored as dimension table and dimensional vector, and dimension table can be using row storage or column storage number According to library engine, each dimension table records the unique coordinate values being mapped as in dimension, and dimensional vector indicates dimension, array with structure of arrays Subscript is mapped as latitude coordinates；Multi-dimensional indexing uses column storage model, and multidimensional coordinate storage is independent multi-dimensional indexing column, mark Multidimensional coordinate component of the factual data in multi-dimensional data cube space, vector index are the arrays isometric with measure column, are used In the corresponding factual data of retrieval multi-dimensional indexing；Measurement be stored as true table, using column memory technology improve data compression ratio and Analyze process performance.

1.3) multidimensional OLAP interrogation model

OLAP query is the multidimensional operation towards multi-dimensional data cube structure.OLAP based on multi-dimensional relation OLAP model Query processing includes three processing stages:

1.3.1) dimension mapping: being mapped to related dimension table for OLAP query, generates dimensional vector, the non-null value mark in dimensional vector Current component value of the corresponding multidimensional data subset of OLAP query on each relevant dimension；

1.3.2) multi-dimensional indexing calculates: multi-dimensional indexing is mapped to corresponding dimensional vector (the corresponding related dimension of multi-dimensional indexing value Vector array index value) it realizes and the multidimensional of metric data is filtered, and vector index is created, mark meets current OLAP query Multi-dimensional indexing item, the non-null value in vector index represent the multidimensional for the aggregated data cube that OLAP query packet attributes are constructed Address；The metrology data sets for meeting OLAP query condition data are obtained by multidimensional filtering, create vector rope for metric data Draw；

1.3.3) polymerization calculates: metric data is based on vector index and completes packet aggregation calculating.

2) internal storage data warehouse all-in-one machine distributed storage model is constructed:

In data warehouse, dimension table is usually smaller and increasess slowly, and true table is huge and increases comparatively fast, but true table data For read-only additional mode (i.e. insert-only mode).Under database all-in-one machine hardware structure of the invention, according to hardware Configuring condition, using following two distributed storage strategy:

2.1) dimension table, multi-dimensional indexing are centrally stored, true table distributed storage strategy:

2.1.1) as shown in figure 3, lesser dimension table is centrally stored in high performance computing service device cluster；When computing cluster is matched Set it is higher, such as configured with large capacity memory, configuration muti-piece many-core coprocessor accelerate card apparatus when, internal storage data warehouse it is more Dimension index is centrally stored in high performance computing service device clustered node, utilizes powerful computational of high performance computing service device cluster It can complete the Multi-dimension calculation task of memory OLAP inquiry；

2.1.2) huge true table data are distributed using horizontal fragmentation mode and are stored on storage service clustered node.

2.1.3) vector index that Multi-dimension calculation generates is transferred to corresponding storage server clustered node, completes polymerization meter It calculates.

Wherein, when multi-dimensional indexing is more than High-Performance Computing Cluster node storage capacity, by the physical store of multi-dimensional indexing data Earliest multi-dimensional indexing data degradation is cold data by sequence, the storage server being distributed to where corresponding true table data fragmentation Clustered node shifts storage server clustered node under calculating the part multi-dimensional indexing.

2.2) dimension table is centrally stored, multi-dimensional indexing, true table distributed storage strategy:

As shown in figure 4, when high performance computing service device cluster memory capacity is smaller with respect to storage service cluster memory capacity And can not stored memory data warehouse whole multi-dimensional indexing data when, high performance computing service device is centrally stored in using dimension table Cluster, multi-dimensional indexing and true table are stored in high performance computing service device cluster and storage clothes using horizontal fragmentation mode with being distributed It is engaged in device cluster.

3) high performance computing service device company-data more new strategy: when high performance computing service device cluster memory off-capacity When, data (as shown in Figure 5) at most are eliminated using round-robin queue's more new strategy, are updated to newest data.It is specific as follows:

Using column storage, column are stored as unit of row group for multi-dimensional indexing and factual data, and the size of row group column is flash memory I/O Row group size (such as 1M, 2M, 4M ... row) is arranged according to column data access performance in the integral multiple of data block size.According to storage plan Slightly (high-performance server cluster only stores multi-dimensional indexing or storage multi-dimensional indexing and factual data), column data width and server The open ended maximum row group number n of free memory calculation of capacity memory, the data newly increased are stored in data column as unit of row group In.When row group number is more than threshold value, such as the 90% of maximum row group number, then the corresponding column data of initial row group is asynchronously synchronized to sudden strain of a muscle In depositing, after the storage of whole row groups is full, using initial row group as the storage unit of new insertion data.Entire row group is used as one and follows Ring queue, the row group of rear of queue is for being inserted into new record, and the row group of queue head is for eliminating legacy data to flash memory.It is eliminated in flash memory Data storage server clustered node is copied to by asynchronous mode, after synchronously completing delete high performance computing service device collection Data fragmentation in group node flash memory.

In storage strategy as shown in Figure 3, the centrally stored multi-dimensional indexing data of high performance computing service device cluster are superseded Multi-dimensional indexing row group data according to factual data storage server cluster Distribution Strategy from high performance computing service device cluster Node flash sync keeps multi-dimensional indexing row group data and corresponding true table row group to corresponding storage server node memory Data are stored in identical node, shift storage server node under part multi-dimensional indexing is calculated.Storage plan shown in Fig. 4 In slightly, high-performance server node stores multidimensional data and factual data.Internal storage data replacement policy is as shown in figure 5, superseded Row group is made of multi-dimensional indexing and factual data, and row group quantity reaches certain threshold value (such as 32,64 ..., the quantity of row group in flash memory Determine the granularity replicated to storage server data) when, using several row groups in flash memory as a data fragmentation, by storage clothes The data distribution strategy of business device cluster is assigned to storage server clustered node, completes legacy data from high performance computing service device collection Transfer of the group to storage server cluster.

4) processing of internal storage data warehouse all-in-one machine OLAP query is realized:

The high performance computing service device cluster and storage server cluster of database all-in-one machine are in storage capacity and processing energy The asymmetry of the asymmetry of power, server node inner treater and many-core coprocessor accelerator card processing capacity and interior It is a kind of for depositing and requiring the OLAP query processing of internal storage data warehouse all-in-one machine with asymmetry of the flash memory in memory capacity and performance The distributed computing mechanism of loose coupling, different calculation stages can distribute to different storages according to hardware configuration and calculate money Source.In conjunction with internal storage data warehouse all-in-one machine different hardware configuration and data distribution strategy, specific memory OLAP query processing Method is as follows:

4.1) OLAP query is executed in high performance computing service device cluster, and OLAP query order is decomposed on related dimension table Dimensional vector generates order, and filtering dimension table record is projected out packet attributes and carries out dictionary encoding to packet attributes, compiled with dictionary table Code records corresponding dimensional vector cell value as dimension table, and the dimension table for being unsatisfactory for filter condition records corresponding dimensional vector unit and is set to Null value, the relevant each dimensional vector of creation OLAP query.

The block encoding of each dimensional vector constitutes a packet data cube, and the grouping value in dimensional vector is represented in the dimension The dimension coordinate components of upper packet data cube.

4.2) centrally stored using multi-dimensional indexing, when true table distributed storage strategy, multi-dimensional indexing presses true table physics point Piece carries out logic fragment.OLAP query comprising the following three steps:

4.2.1) multi-dimensional indexing carries out multidimensional filtering calculating according to the dimensional vector that OLAP query generates, and generates corresponding vector It indexes, the null value unit in vector index represents true table and be recorded in OLAP query for filtering true table record, non-null value Block encoding.When multi-dimensional indexing is when the position value that OLAP query correlation dimensional vector maps is non-empty, by correlation tie up to The corresponding packet data cube multidimensional coordinate of amount mapping value is converted to one-dimensional coordinate and is stored in the corresponding unit of vector index；

4.2.2 it) sends the vector index of creation on the corresponding node of storage server cluster by logic fragment, such as schemes Shown in 2, measure column is filtered by vector index, and carry out polymerization calculating；

4.2.3) polymerization result on storage server clustered node is transmitted back to high performance computing service device cluster and carries out entirely Office's polymerization result merger operation obtains global polymerization result, and the multidimensional of the corresponding packet data cube of polymerization result is sat Mark is mapped to each dimensional vector grouping dictionary table, is converted to packet attributes, exports OLAP query processing result.

4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node saves complete multidimensional rope Draw and downloads dimensional vector with factual data fragment, each server node to local node from high performance computing service device cluster, complete The OLAP of localization is calculated.

In local node, multi-dimensional indexing calculates, generates vector index, polymerization calculating can form assembly line, improves OLAP The locally aggregated result of query processing performance, generation returns to high performance computing service device clustered node, by high-performance server collection Group node completes the merger and output query result task of global polymerization result.

4.4) when server node is configured with many-core coprocessor accelerator card, multidimensional is accelerated using coprocessor accelerator card Index calculation method, the specific steps are as follows:

4.4.1) multi-dimensional indexing and vector index are divided according to coprocessor accelerator card memory size, by maximization The principle of coprocessor accelerator card memory usage distributes the maximum fragment for being suitble to coprocessor accelerator card memory size, and replicates To coprocessor accelerator card memory；

4.4.2) when query execution, dimensional vector is copied into coprocessor accelerator card memory, passes through coprocessor accelerator card It completes the multi-dimensional indexing mapped based on dimensional vector to calculate, generates vector index, and copy back into memory, update corresponding vector index Fragment；

4.4.3) memory multi-dimensional indexing fragment be based on dimensional vector by CPU complete multi-dimensional indexing calculate, and generate accordingly to Amount index fragment；

4.4.4) CPU handles different multi-dimensional indexing data fragmentations, two multi-dimensional indexing fragments from coprocessor accelerator card On calculating can execute parallel.

4.5) in storage server node side, when memory size is less than data fragmentation, using following optimisation strategy multidimensional Index calculates:

4.5.1) when node memory can store multi-dimensional indexing and part measure column, the storage of multi-dimensional indexing full memory, Factual data is to arrange as storage cell, by the measure column that LRU (nearest least referenced) algorithm is frequently accessed in memory storage, not frequently The measure column of numerous access is stored in flash memory；

4.5.2) when node memory cannot store whole multi-dimensional indexing column, multi-dimensional indexing is stored in node to arrange for unit Server memory or flash memory.Multi-dimensional indexing with arrange for unit by lru algorithm selection frequently use multi-dimensional indexing column be stored in It deposits；

4.5.3) when multi-dimensional indexing calculates, the multi-dimensional indexing column in memory first carry out dimensional vector map operation, vector rope Draw some numerical results of record memory multi-dimensional indexing column, and using in vector index non-null value position as index accesses flash memory In multi-dimensional indexing column, complete remaining multi-dimensional indexing calculating task.

In conclusion memory database all-in-one machine OLAP query processing technique of the present invention draws OLAP query task It is divided into dimension mapping calculation, multi-dimensional indexing calculates and polymerize three flowing water of calculating and execute the stages, as shown in fig. 7, at OLAP query Dimension mapping calculation, the multi-dimensional indexing of reason calculate and polymerization calculates three calculation stages and is respectively distributed to high performance computing service device collection Group CPU, high performance computing service device cluster coprocessor and when storage server clustered node, the calculated result in each stage with Vector mode passes to next hardware platform and continues to execute.As shown in figure 8, the different execution stages of multiple OLAP queries can be with Flowing water is parallel, improves the utilization rate of each computing resource in database all-in-one machine asymmetry hardware platform, improves system queries and handles up Performance.The ideal conditions of flowing water parallel computation is that the calculating time of three phases is close, calculating time in each stage by data volume, The Multiple factors such as computation complexity, processor memory size, processor quantity, processor performance determine, need to match by optimization Setting hardware keeps the calculating time of three phases relatively uniform, improves the computational efficiency of database all-in-one machine hardware platform.

The present invention is described further below with reference to embodiment.

As shown in figure 9, in the present embodiment, entire OLAP query treatment process is divided into three processing stages.Memory number According to the high-performance server cluster of library all-in-one machine as host node, receive OLAP query.

In dimension table processing stage, the CPU of high-performance server cluster is by selection, projection, the grouping in sql command on dimension table Operation is applied to corresponding dimension table, is projected out packet attributes, then carries out dictionary table compression to packet attributes, for not repetition values point It with unique serial number, then updates dimension table grouping and is projected as grouping projection vector, replace original packet with dictionary table coding Attribute value.As being projected out packet attributes c_nation by WHERE clause c_region=' AMERICA ' on customer table, In attribute value ' Canada ' and the dictionary encoding of ' Brazil ' be respectively 0 and 1, generating with dimension table there is position mapping one by one to close The dimensional vector of system.Similarly, dimensional vector is generated on supplier table, the dictionary table coding of three members of packet attributes is respectively 0,1, 2.Two dimension tables are corresponding to generate two dimensional vectors.

In multi-dimensional indexing calculation stages, multi-dimensional indexing maps directly to the corresponding deviation post of dimensional vector, reads corresponding Grouping value, when any multi-dimensional indexing mapping position is null value, current fact table record is unsatisfactory for the output condition of inquiry, corresponding Vector index position be set as null value；It, will be corresponding when the dimensional vector position of two multi-dimensional indexing values mapping is not empty Block encoding is stored as Multidimensional numerical subscript, if first recording indexes column l_CK, l_SK value of multi-dimensional indexing is 2 and 0, reflects respectively Be mapped to dimensional vector value be 1 and 0 position, Multidimensional numerical A [1] [0] subscript is converted into one-dimension array subscript 3, be stored in Measure first position of index.When the sufficient many-core coprocessor accelerator card of configuration, multi-dimensional indexing calculating adds in coprocessor It is executed on speed card.Dimensional vector copies to coprocessor accelerator card memory, with the multidimensional rope for being stored in coprocessor accelerator card memory Draw the common multi-dimensional indexing that executes of column to calculate, generates vector index, and copy back into memory.When coprocessor accelerator card memory can not It, can be concurrently on the multi-dimensional indexing column fragment of memory and coprocessor accelerator card memory when executing whole multi-dimensional indexings calculating Execute multi-dimensional indexing calculating task.The vector index of generation is calculated for the polymerization on measure column, and vector index is pressed and measurement number It is vector fragment according to the corresponding model split of fragment, is transferred to the corresponding node of storage server cluster.

Corresponding measure column note is accessed in polymerization calculation stages, sequential scan vector index and by the non-empty position of vector index It records position and carries out Aggregation computation.If scan vector indexes first unit, reading value 3 accesses measure column l_revenue first Metric 946 is mapped in the corresponding unit A [1] [0] (or A [3]) of Multidimensional numerical Agg and carries out accumulation calculating by unit.

After completing all Aggregation computations, Multidimensional numerical Agg is obtained.The Multidimensional numerical of each storage server node is in height Performance calculation server clustered node carries out aggregation result merger, and its each array location subscript is mapped to dimension table dictionary table pair Actual packet attributes value is read in the position answered, and generates query result record.As A [1] [0] is respectively corresponded in customer table Nation value is that nation value is Japan in Brazil and supplier table, and Multidimensional numerical subscript is reduced to grouping and is belonged to Property value, and with the cluster set group in array location be combined into output record.

OLAP query execute the highest multi-dimensional indexing calculation stages of time accounting, algorithm using fixed length dimension table vector, Multi-dimensional indexing column and vector index, attended operation are reduced to position mapping of the multi-dimensional indexing on dimensional vector, are accessed based on array Algorithm design can better adapt to the hardware characteristics of many-core coprocessor accelerator card large-scale integrated simple cores, preferably Play its computation capability.Multi-dimensional indexing counts and is designed to independent calculated under deposit data warehouse all-in-one machine framework Journey can use novel many-core coprocessor accelerator card and further increase multi-dimensional indexing calculated performance, the vector index energy of generation Enough polymerization calculated performances improved on storage server node on metric data more significantly, simplify on storage server node Computation complexity improves polymerization computational efficiency.

In conclusion database all-in-one machine is a kind of asymmetric hardware structure, high-end calculation server cluster and low side are deposited It stores up server cluster to service respectively for high-performance complicated calculations and the access of high extension storage, novel flash memory and many-core coprocessor Accelerator card hardware technology further improves the storage and calculated performance of database all-in-one machine.For internal storage data warehouse applications Speech improves memory real-time OLAP query processing performance needs according to the storage and calculated performance feature of different hardware, targetedly Ground optimizes distributed data storage and distribution calculating task, utilizes advanced hardware-accelerated OLAP query process performance.The present invention towards Database all-in-one machine asymmetry hardware structure and devise multi-dimensional relation OLAP model, data warehouse is divided into lesser dimension Degree, medium sized multi-dimensional indexing and biggish metric data three parts are handled with high performance computing service device cluster, many-core association The storage capacity of device accelerator card memory and storage server cluster matches, and optimizes distributed data storage strategy；Meanwhile it will OLAP query treatment process is decomposed into dimension mapping calculation, multi-dimensional indexing calculates and polymerization calculates three phases, at OLAP query The main cost that calculates of reason focuses on multi-dimensional indexing calculation stages, and hardware-accelerated more by novel many-core coprocessor accelerator card Dimension index calculating process, promotes memory OLAP query processing performance by advanced hardware.

The various embodiments described above are merely to illustrate the present invention, data structure, data type, application site and the realization of each component Technology may be changed, based on the technical solution of the present invention, all principles according to the present invention to individual part into Capable improvement and equivalents, should not exclude except protection scope of the present invention.

Claims

1. a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine, it is characterised in that including following step It is rapid:

1) internal storage data model of storehouse inventory is constructed；

Multi-dimensional relation OLAP model of the internal storage data model of storehouse inventory using fusion, multi-dimensional relation OLAP model construction mistake Journey is as follows:

1.1) the cube structure of data warehouse logic data model: is divided into three kinds of dimension, multi-dimensional indexing and measurement Data structure；

1.2) Physical data model: dimension is stored as dimension table and dimensional vector, and dimension table is stored using row or column storage database engine, Dimensional vector indicates dimension with structure of arrays, and array index is mapped as latitude coordinates；Multi-dimensional indexing uses column storage model；Measurement is deposited Storage is true table, is stored using column；

1.3) multidimensional OLAP interrogation model includes dimension mapping, multi-dimensional indexing calculates and polymerization calculates three processing stages；

2) internal storage data warehouse all-in-one machine distributed storage model is constructed；

Internal storage data warehouse all-in-one machine distributed storage model uses following two distributed storage strategy:

2.1.1) dimension table is centrally stored in high performance computing service device cluster；When computing cluster is configured with large capacity memory, configuration When muti-piece many-core coprocessor accelerates card apparatus, the multi-dimensional indexing in internal storage data warehouse is centrally stored in high performance computing service device Clustered node；

2.1.2) true table data are distributed using horizontal fragmentation mode is stored on storage service clustered node；

2.1.3) vector index that Multi-dimension calculation generates is transferred to corresponding storage server clustered node, completes polymerization and calculates；

When high performance computing service device cluster memory capacity is smaller with respect to storage service cluster memory capacity and can not stored memory When the multi-dimensional indexing data of data warehouse whole, high performance computing service device cluster, multi-dimensional indexing are centrally stored in using dimension table It is stored in high performance computing service device cluster and storage server cluster with being distributed with true table using horizontal fragmentation mode；

3) high performance computing service device company-data more new strategy: when high performance computing service device cluster memory off-capacity, Data at most are eliminated using round-robin queue's more new strategy, are updated to newest data；

4) processing of internal storage data warehouse all-in-one machine OLAP query is realized, comprising the following steps:

4.1) OLAP query high performance computing service device cluster execute, OLAP query order be decomposed into the dimension on related dimension table to Amount generates order, and filtering dimension table record is projected out packet attributes and carries out dictionary encoding to packet attributes, encoded and made with dictionary table Corresponding dimensional vector cell value is recorded for dimension table, the dimension table for being unsatisfactory for filter condition records corresponding dimensional vector unit and is set to sky Value, the relevant each dimensional vector of creation OLAP query；

4.2) centrally stored using multi-dimensional indexing, when true table distributed storage strategy, multi-dimensional indexing press true table physical partitioning into Row logic fragment；

4.3) when using multi-dimensional indexing, true table distributed storage strategy, each server node save complete multi-dimensional indexing and Factual data fragment, each server node are downloaded dimensional vector to local node from high performance computing service device cluster, are completed local The OLAP of change is calculated；

4.4) when server node is configured with many-core coprocessor accelerator card, multi-dimensional indexing is accelerated using coprocessor accelerator card Calculation method；

4.5) in storage server node side, when memory size is less than data fragmentation, multidimensional rope is completed using optimisation strategy one Draw calculating.

2. a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine as described in claim 1, It is characterized in that: in the step 1.3), the specific process is as follows:

1.3.1) dimension mapping: being mapped to related dimension table for OLAP query, generates dimensional vector, and the non-null value mark in dimensional vector is current Component value of the corresponding multidimensional data subset of OLAP query on each relevant dimension；

1.3.2) multi-dimensional indexing calculates: multi-dimensional indexing is mapped to corresponding dimensional vector realization, the multidimensional of metric data is filtered, And vector index is created, mark meets the multi-dimensional indexing item of current OLAP query, and the non-null value in vector index represents OLAP and looks into Ask the multi-dimensional address for the aggregated data cube that packet attributes are constructed；It is obtained by multidimensional filtering and meets OLAP query conditional number According to metrology data sets, for metric data create vector index；

3. a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine as described in claim 1, It is characterized in that: in the step 4.2), OLAP query comprising the following three steps:

4.2.1) multi-dimensional indexing carries out multidimensional filtering calculating according to the dimensional vector that OLAP query generates, and generates corresponding vector rope Draw, the null value unit in vector index represents true table and be recorded in OLAP query for filtering true table record, non-null value Block encoding；When multi-dimensional indexing is when the position value that OLAP query correlation dimensional vector maps is non-empty, by related dimensional vector The corresponding packet data cube multidimensional coordinate of mapping value is converted to one-dimensional coordinate and is stored in the corresponding unit of vector index；

4.2.2 it) sends the vector index of creation on the corresponding node of storage server cluster by logic fragment, passes through vector Index filtering measure column, and carry out polymerization calculating；

4.2.3) polymerization result on storage server clustered node is transmitted back to high performance computing service device cluster and carries out global gather Result merger operation is closed, obtains global polymerization result, and the multidimensional coordinate of the corresponding packet data cube of polymerization result is reflected It is mapped to each dimensional vector grouping dictionary table, packet attributes is converted to, exports OLAP query processing result.

4. a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine as described in claim 1, It is characterized in that: in the step 4.4), the specific steps are as follows:

4.4.1) multi-dimensional indexing and vector index are divided according to coprocessor accelerator card memory size, at maximization association The principle for managing device accelerator card memory usage distributes the maximum fragment for being suitble to coprocessor accelerator card memory size, and copies to association Processor accelerator card memory；

4.4.2) when query execution, dimensional vector is copied into coprocessor accelerator card memory, is completed by coprocessor accelerator card Multi-dimensional indexing based on dimensional vector mapping calculates, and generates vector index, and copy back into memory, updates corresponding vector index point Piece；

4.4.3) memory multi-dimensional indexing fragment is based on dimensional vector and completes multi-dimensional indexing calculating by CPU, and generates corresponding vector rope Draw fragment；

4.4.4) CPU and coprocessor accelerator card handle different multi-dimensional indexing data fragmentations, on two multi-dimensional indexing fragments Calculate parallel execute.

5. a kind of internal storage data warehouse query processing implementation method of data base-oriented all-in-one machine as described in claim 1, Be characterized in that: in the step 4.5), optimisation strategy one is as follows:

4.5.1) when node memory can store multi-dimensional indexing and part measure column, the storage of multi-dimensional indexing full memory is true Data are to arrange as storage cell, and by the measure column that lru algorithm is frequently accessed in memory storage, the measure column infrequently accessed is stored In flash memory；

4.5.2) when node memory cannot store whole multi-dimensional indexing column, multi-dimensional indexing is stored in node serve to arrange for unit Device memory or flash memory；Multi-dimensional indexing is stored in memory to arrange the multi-dimensional indexing column frequently used for unit by lru algorithm selection；

4.5.3) when multi-dimensional indexing calculates, the multi-dimensional indexing column in memory first carry out dimensional vector map operation, vector index note Record some numerical results of memory multi-dimensional indexing column, and using in vector index non-null value position as in index accesses flash memory Multi-dimensional indexing column, complete remaining multi-dimensional indexing calculating task.