CN106528070B

CN106528070B - A kind of data table generating method and equipment

Info

Publication number: CN106528070B
Application number: CN201510587686.3A
Authority: CN
Inventors: 吴勇军
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2019-09-03
Anticipated expiration: 2035-09-15
Also published as: CN106528070A

Abstract

This application discloses a kind of data table generating methods.Current tables of data task is generated according to the structural metadata information of increment list and preset task template first, after being configured according to scheduling information of the task template to tables of data task, table statement and initializtion script can be built according to tables of data task and scheduling information execution, so as to the construction of completion tables of data rapidly and accurately, reduces manpower consumption and improve Construction of Data Warehouse efficiency.

Description

A kind of data table generating method and equipment

Technical field

This application involves field of communication technology, in particular to a kind of data table generating method.The application also relates to one Kind tables of data generating device.

Background technique

Data warehouse (Data Warehouse, also known as DW or DWH) is for the decision-making process of all ranks of enterprise The strategy set that all types data are supported is provided.Data warehouse is one and provides user's current and history for being used for decision support The environment of data, these data are difficult in traditional operational database or cannot obtain.Data warehouse technology is to have Effect is integrated into operation graphic data to provide the various technologies of decision type data access and the general name of module in unified environment, Its final purpose is to allow user is faster more convenient to inquire required information, and offer decision support.

Basal layer (also known as ODS layers) will be established from the decimated next data of business system library in several storehouse processs of construction The data Layer in one layer of patch source, so that follow-up data be facilitated to integrate.During current Construction of Data Warehouse, ODS layers with table phase The construction of pass is the important component of basal layer construction, needs the incremental data introduced by synchronous center merging into portion Full dose data, to provide support for functions such as subsequent reservation history, Data Integration, data analysis, data applications.

Currently, establish the relevant table needs of basal layer writes script and configuration schedules letter after table statement is built in production Breath, script and task later could be issued and be executed.Inventor has found during realizing the application, existing task Script type and quantity are various, and technical staff is very easy to omit or malfunction when using script；And existing tables of data Generating all is work that is very basic and needing to be accomplished manually, and not only consumption is exploited natural resources big, but also artificial development efficiency compares Lowly.

It can be seen that how in conjunction with existing Construction of Data Warehouse process automatically generate with ODS layers associated by table, thus It reduces manpower consumption and improves Construction of Data Warehouse efficiency, become those skilled in the art's technical problem urgently to be resolved.

Summary of the invention

This application provides a kind of data table generating methods, efficiently and accurately to establish basis for existing database The tables of data of layer, to reduce manpower consumption and improve Construction of Data Warehouse efficiency.This method comprises:

Current tables of data task is generated according to the structural metadata information of increment list and preset task template；

It is configured according to scheduling information of the task template to the tables of data task；

Table statement and initializtion script are built according to the tables of data task and scheduling information execution, to generate number According to table.

Correspondingly, the application also proposed a kind of tables of data generating device, comprising:

Generation module, for generating current number according to the structural metadata information and preset task template of increment list According to table task；

Configuration module, for being configured according to scheduling information of the task template to the tables of data task；

Execution module, for building table statement and initialization according to the tables of data task and scheduling information execution Script, to generate tables of data.

By the technical solution of application the application, first according to the structural metadata information of increment list and preset task The current tables of data task of template generation, after being configured according to scheduling information of the task template to tables of data task, i.e., Table statement and initializtion script can be built according to tables of data task and scheduling information execution, it is complete so as to rapidly and accurately At the generation work of tables of data, to reduce manpower consumption and improve Construction of Data Warehouse efficiency.

Detailed description of the invention

Fig. 1 is a kind of flow diagram for data table generating method that the application proposes；

Fig. 2 is a kind of flow diagram for data table generating method that the application specific embodiment proposes；

Fig. 3 is a kind of structural schematic diagram for tables of data generating device that the application proposes.

Specific embodiment

It is first before introducing the technical solution of the application for the elaboration being purged convenient for the technical solution to the application First it is introduced for some contents in current data warehouse:

(1) table

Table is the most important component part of data warehouse.One table record is measured by key, attribute data composition (such as member Work table is by employee number (key), employee name, age etc. employee's attribute data composition).In the technical solution of the application, number According to warehouse construction there are the tables of following two type:

Increment list: in order to improve performance, timestamp field (usually gmt_ is changed according to record for large data volume table Modify increment synchronization) is used, each snapshot of increment list retains a incremental data, and table naming method is tablename_ { yyyymmdd } _ delta or tablename_delta (subregion field dt=yyyymmdd)；

Full dose table: each snapshot can retain a full dose table, which can be synchronizes from storage facility located at processing plant full dose, Snapshot data yesterday for being also possible to the incremental data that will be synchronized from storage facility located at processing plant with full dose table carries out full outer join Afterwards, retain a newest full dose, the structure of full dose table is consistent with increment list, and table naming method is tablename_ { yyyymmdd } Or tablename (subregion field dt=yyyymmdd).

(2) view

View is a Virtual table, and content is by query-defined.The same with true table, view includes that a series of bands are famous The columns and rows data of title.It should be noted that view does not exist in the form of the data value collection of storage in the database.

(3) metadata

Metadata is to describe the data of data, is in the nature the descriptive information to data and information resources, including traffic table Structural information, data warehouse table structure information etc..Wherein metadata, SQL are dispatched in more important having in Construction of Data Warehouse Execution journal metadata, table structure metadata, synchronous center metadata, timed task metadata etc..

(4) Merge task

Merge task is mode indispensable in Construction of Data Warehouse, its role is to by the data of increment list with Snapshot data yesterday of full dose table merges, and generates a newest full dose snapshot table data.

(5) synchronous center

Synchronous center is that creation data is synchronized to data warehouse or data warehouse data flows back the dress to production system It sets or equipment.

Background technique based on above content and the application, in Construction of Data Warehouse or restructuring procedure, according to data Model construction framework needs to establish basal layer, some incremental data tables synchronized from storage facility located at processing plant are needed in data warehouse ODS is laminated and a full dose table, and to retain a newest full dose snapshot data, and this process is related to generating full dose table and builds predicative The operations such as sentence, merge mission script, scheduling dependence, data initialization, publication.And the technical solution of the application is then intended to pass through Merge task realizes Mass production base layer data table (full dose table), to reduce data under the premise of safeguard work quality The complexity of construction of warehouse promotes development efficiency.

As shown in Figure 1, a kind of flow diagram of the data table generating method proposed for the application, comprising the following steps:

S101 generates current tables of data according to the structural metadata information of increment list and preset task template and appoints Business.

As stated in the background art, the full dose table construction demand of basal layer is more in existing database and configuration is cumbersome, therefore The tables of data that the application is directed to layer full dose table based on type in a preferred embodiment carries out generation processing.In order to it is subsequent can The production operation of tables of data is rapidly carried out, technical staff can carry out the initialization of uniform data table generation before this step Operation builds table statement and the data initialization according to structural metadata information generation is corresponding with the tables of data Script.

In specific application scenarios, incremental data table naming method are as follows: ods_ { source system table name } _ delta；And full dose The naming method of tables of data are as follows: ods_ { source system table name }.

It is realized since existing basal layer full dose table is most of by full dose table merge task, the application's In one preferred embodiment, task template can be specifically configured to merge task template, and root in this step by technical staff According to increment list structural metadata information and batch merge task template at merge task code, then by merge task code Preset code library is uploaded to as the tables of data task.In specific application scenarios, Mass production basal layer full dose table Merge task template format is as shown in table 1 below:

Table 1

Further, since the method for building up and device of ODS layers of full dose table of data warehouse are a highly integrated schemes, wherein Relate to metadata, beyond the clouds code library, scheduling dependences, data merge task, data initialization, issue it is online etc. a series of Process.It therefore, can be in the structural metadata according to increment list in order to further improve automatization level and treatment effeciency Information and preset task template are directed to increment list structural metadata information before generating current tables of data task carries out Pretreatment.Specifically, in the preferred embodiment of the application, increment list is synchronized to data warehouse before this step, And the structural metadata information of increment list is obtained, the metadata clothes for the table structure information for being capable of providing data warehouse are established with this Subsequent processing is convenient in business.

As shown in Fig. 2, the flow diagram of the data table generating method proposed by the application specific embodiment, in the figure Other than comprising synchronization center module described above, meta data block, also relate to lower module:

Middle layer: integrate using the base layer data synchronized a data of precipitating in Construction of Data Warehouse Layer, it is therefore intended that subsequent applications is facilitated to use business datum；

Beyond the clouds: the Integrated Development Environment of data warehouse exploitation, by the way that the synchronous configuration of data, model can be carried out beyond the clouds The operation such as design, ETL exploitation, unit testing, task publication, O&M.

Scheduling: data warehouse task is carried out to have adjusted the system executed automatically according to configuration.

Based on above-mentioned module, which executes following steps in early period:

Step 1: by synchronous central synchronous incremental data to data warehouse；

Step 2.1: meta data block generates full dose table according to increment list structural information and builds table statement, and sends it to Cloud module；

Step 2.2: meta data block is according to template and increment list structural generation Merge task.

In above-mentioned steps, the name format of incremental data table are as follows: ods_ { source system table name } _ delta；Full dose tables of data Name format are as follows: ods_ { source system table name }.Modules are established and are inquired to table with this.In synchronous central synchronous During, the name format of synchronous task are as follows: imp_ { table name of ODPS }；The name format of merging task are as follows: mrg_ { ODPS Table name.

Since meta data block is capable of providing the table structure information of data warehouse, and code library module being capable of basis beyond the clouds Increment list structural metadata information and batch merge task template generate merge task code and are taken by code library beyond the clouds Business submits code library to save, therefore after safeguarding information according to Mass production basal layer full dose table merge task template The system for being exclusively used in tables of data generation will be uploaded to by Excel file, and by calling meta data block and according to increment list Structural information generate basal layer full dose table build table statement and data initializtion script, call beyond the clouds code library module generate Merge task.

S102 is configured according to scheduling information of the task template to the tables of data task.

The early-stage preparations generated for tables of data are completed by the processing of S101 and S102, which is mainly used for It is configured for contents such as node, scripts required for specific generate.In order to realize the scheduling configuration for being directed to tables of data and generating Service, in the application preferred embodiment, this step generates and the tables of data task pair according to the merge task template The preposition dependence node answered, task output name, scheduler task baseline, scheduler task owner.

In the specific embodiment of Fig. 2, after having received Merge task by step 2.2, module is in step beyond the clouds In 2.3 according to template configuration scheduling information be Merge task configure relevant information, and call scheduling configuration service creation scheduling according to Rely and merge task exports title.For the content shown in the table 1, according to batch merge task mould in the specific embodiment Plate calls the preposition dependence node of scheduling configuration service creation merge task, task output name, scheduler task baseline, dispatches and appoint Be engaged in owner.

S103 builds table statement and initializtion script according to the tables of data task and scheduling information execution, with Generate tables of data.

After generating tables of data task by S101 and finishing scheduling information by S102 configuration, scheduler module exists Indicate that system will build table statement, initializtion script, merge task and be packaged and be published on line in step 3.Subsequent module beyond the clouds Table statement is built by the execution of step 4.1 instruction system, after table statement and initializtion script are built in system execution, module beyond the clouds Initialization data processing is carried out by step 4.2.

By the above process, quickly generating and creating for tables of data is realized, builds table instead of traditional manual preparation Sentence, hand-coding basal layer full dose table merge task, manual configuration scheduling dependence, hand-coding data initialization script, hand Work is packaged, by hand publication, execute by hand build table statement and initializtion script etc. the work that is accomplished manually of needs, realize data Table automatically generates, and improves the working efficiency of data warehouse.

To reach the above technical purpose, the application also proposed a kind of tables of data generating device, as shown in Figure 2, comprising:

Generation module 210, it is current for being generated according to the structural metadata information of increment list and preset task template Tables of data task；

Configuration module 220, for being configured according to scheduling information of the task template to the tables of data task；

Execution module 230, for according to the tables of data task and the scheduling information execution build table statement and just Beginningization script, to generate tables of data.

In specific application scenarios, the tables of data is specially basal layer full dose table, further includes:

Initialization module, for according to the structural metadata information generate it is corresponding with the tables of data build table statement with And the data initialization script.

In specific application scenarios, further includes:

Synchronization module for the increment list to be synchronized to data warehouse, and obtains the structural elements of the increment list Data information.

In specific application scenarios, the generation module is specifically used for:

According to increment list structural metadata information and batch merge task template at merge task code, and will be described Merge task code is uploaded to preset code library as the tables of data task.

In specific application scenarios, the configuration module is specifically used for:

Preposition dependence node corresponding with the tables of data task, task output are generated according to the merge task template Name, scheduler task baseline, scheduler task owner.

Through the above description of the embodiments, those skilled in the art can be understood that the application can lead to Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software.Based on this understanding, this Shen Technical solution please can be embodied in the form of software products, which can store in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute method described in each implement scene of the application.

It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or Process is not necessarily implemented necessary to the application.

It will be appreciated by those skilled in the art that the module in device in implement scene can be described according to implement scene into Row is distributed in the device of implement scene, can also be carried out corresponding change and is located at the one or more dresses for being different from this implement scene In setting.The module of above-mentioned implement scene can be merged into a module, can also be further split into multiple submodule.

Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.

Disclosed above is only several specific implementation scenes of the application, and still, the application is not limited to this, Ren Heben What the technical staff in field can think variation should all fall into the protection scope of the application.

Claims

1. a kind of data table generating method characterized by comprising

Table statement and initializtion script are built according to the tables of data task and scheduling information execution, to generate data Table；

The tables of data is specially basal layer full dose table, according to increment list structural metadata information and preset task mould Plate generates before current tables of data task, further includes:

Table statement and the data initialization foot are built according to structural metadata information generation is corresponding with the tables of data This.

2. the method as described in claim 1, which is characterized in that generated and the data according to the structural metadata information Table is corresponding to be built before table statement and the data initialization script, further includes:

The increment list is synchronized to data warehouse, and obtains the structural metadata information of the increment list.

3. the method as described in claim 1, which is characterized in that the task template is specially merge task template, according to increasing The structural metadata information of scale and preset task template generate current tables of data task, specifically:

According to increment list structural metadata information and batch merge task template at merge task code；

Preset code library is uploaded to using the merge task code as the tables of data task.

4. method as claimed in claim 3, which is characterized in that the scheduling according to the task template to the tables of data task Information is configured, specifically:

Preposition dependence node corresponding with the tables of data task is generated according to the merge task template, task output name, is adjusted Degree task baseline, scheduler task owner.

5. a kind of tables of data generating device characterized by comprising

Generation module, for generating current tables of data according to the structural metadata information and preset task template of increment list Task；

Execution module, for building table statement and initialization foot according to the tables of data task and scheduling information execution This, to generate tables of data；

The tables of data is specially basal layer full dose table, further includes:

Initialization module, for building table statement and institute according to structural metadata information generation is corresponding with the tables of data State data initialization script.

6. equipment as claimed in claim 5, which is characterized in that further include:

Synchronization module for the increment list to be synchronized to data warehouse, and obtains the structural metadata of the increment list Information.

7. equipment as claimed in claim 5, which is characterized in that

The generation module is specifically used for being appointed according to increment list structural metadata information and batch merge task template at merge Business code, and preset code library is uploaded to using the merge task code as the tables of data task.

8. equipment as claimed in claim 7, which is characterized in that

The configuration module is specifically used for being generated according to the merge task template corresponding preposition with the tables of data task Rely on node, task exports name, scheduler task baseline, scheduler task owner.