CN106528070B - A kind of data table generating method and equipment - Google Patents

A kind of data table generating method and equipment Download PDF

Info

Publication number
CN106528070B
CN106528070B CN201510587686.3A CN201510587686A CN106528070B CN 106528070 B CN106528070 B CN 106528070B CN 201510587686 A CN201510587686 A CN 201510587686A CN 106528070 B CN106528070 B CN 106528070B
Authority
CN
China
Prior art keywords
data
task
tables
template
merge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510587686.3A
Other languages
Chinese (zh)
Other versions
CN106528070A (en
Inventor
吴勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510587686.3A priority Critical patent/CN106528070B/en
Publication of CN106528070A publication Critical patent/CN106528070A/en
Application granted granted Critical
Publication of CN106528070B publication Critical patent/CN106528070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of data table generating methods.Current tables of data task is generated according to the structural metadata information of increment list and preset task template first, after being configured according to scheduling information of the task template to tables of data task, table statement and initializtion script can be built according to tables of data task and scheduling information execution, so as to the construction of completion tables of data rapidly and accurately, reduces manpower consumption and improve Construction of Data Warehouse efficiency.

Description

A kind of data table generating method and equipment
Technical field
This application involves field of communication technology, in particular to a kind of data table generating method.The application also relates to one Kind tables of data generating device.
Background technique
Data warehouse (Data Warehouse, also known as DW or DWH) is for the decision-making process of all ranks of enterprise The strategy set that all types data are supported is provided.Data warehouse is one and provides user's current and history for being used for decision support The environment of data, these data are difficult in traditional operational database or cannot obtain.Data warehouse technology is to have Effect is integrated into operation graphic data to provide the various technologies of decision type data access and the general name of module in unified environment, Its final purpose is to allow user is faster more convenient to inquire required information, and offer decision support.
Basal layer (also known as ODS layers) will be established from the decimated next data of business system library in several storehouse processs of construction The data Layer in one layer of patch source, so that follow-up data be facilitated to integrate.During current Construction of Data Warehouse, ODS layers with table phase The construction of pass is the important component of basal layer construction, needs the incremental data introduced by synchronous center merging into portion Full dose data, to provide support for functions such as subsequent reservation history, Data Integration, data analysis, data applications.
Currently, establish the relevant table needs of basal layer writes script and configuration schedules letter after table statement is built in production Breath, script and task later could be issued and be executed.Inventor has found during realizing the application, existing task Script type and quantity are various, and technical staff is very easy to omit or malfunction when using script;And existing tables of data Generating all is work that is very basic and needing to be accomplished manually, and not only consumption is exploited natural resources big, but also artificial development efficiency compares Lowly.
It can be seen that how in conjunction with existing Construction of Data Warehouse process automatically generate with ODS layers associated by table, thus It reduces manpower consumption and improves Construction of Data Warehouse efficiency, become those skilled in the art's technical problem urgently to be resolved.
Summary of the invention
This application provides a kind of data table generating methods, efficiently and accurately to establish basis for existing database The tables of data of layer, to reduce manpower consumption and improve Construction of Data Warehouse efficiency.This method comprises:
Current tables of data task is generated according to the structural metadata information of increment list and preset task template;
It is configured according to scheduling information of the task template to the tables of data task;
Table statement and initializtion script are built according to the tables of data task and scheduling information execution, to generate number According to table.
Correspondingly, the application also proposed a kind of tables of data generating device, comprising:
Generation module, for generating current number according to the structural metadata information and preset task template of increment list According to table task;
Configuration module, for being configured according to scheduling information of the task template to the tables of data task;
Execution module, for building table statement and initialization according to the tables of data task and scheduling information execution Script, to generate tables of data.
By the technical solution of application the application, first according to the structural metadata information of increment list and preset task The current tables of data task of template generation, after being configured according to scheduling information of the task template to tables of data task, i.e., Table statement and initializtion script can be built according to tables of data task and scheduling information execution, it is complete so as to rapidly and accurately At the generation work of tables of data, to reduce manpower consumption and improve Construction of Data Warehouse efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow diagram for data table generating method that the application proposes;
Fig. 2 is a kind of flow diagram for data table generating method that the application specific embodiment proposes;
Fig. 3 is a kind of structural schematic diagram for tables of data generating device that the application proposes.
Specific embodiment
It is first before introducing the technical solution of the application for the elaboration being purged convenient for the technical solution to the application First it is introduced for some contents in current data warehouse:
(1) table
Table is the most important component part of data warehouse.One table record is measured by key, attribute data composition (such as member Work table is by employee number (key), employee name, age etc. employee's attribute data composition).In the technical solution of the application, number According to warehouse construction there are the tables of following two type:
Increment list: in order to improve performance, timestamp field (usually gmt_ is changed according to record for large data volume table Modify increment synchronization) is used, each snapshot of increment list retains a incremental data, and table naming method is tablename_ { yyyymmdd } _ delta or tablename_delta (subregion field dt=yyyymmdd);
Full dose table: each snapshot can retain a full dose table, which can be synchronizes from storage facility located at processing plant full dose, Snapshot data yesterday for being also possible to the incremental data that will be synchronized from storage facility located at processing plant with full dose table carries out full outer join Afterwards, retain a newest full dose, the structure of full dose table is consistent with increment list, and table naming method is tablename_ { yyyymmdd } Or tablename (subregion field dt=yyyymmdd).
(2) view
View is a Virtual table, and content is by query-defined.The same with true table, view includes that a series of bands are famous The columns and rows data of title.It should be noted that view does not exist in the form of the data value collection of storage in the database.
(3) metadata
Metadata is to describe the data of data, is in the nature the descriptive information to data and information resources, including traffic table Structural information, data warehouse table structure information etc..Wherein metadata, SQL are dispatched in more important having in Construction of Data Warehouse Execution journal metadata, table structure metadata, synchronous center metadata, timed task metadata etc..
(4) Merge task
Merge task is mode indispensable in Construction of Data Warehouse, its role is to by the data of increment list with Snapshot data yesterday of full dose table merges, and generates a newest full dose snapshot table data.
(5) synchronous center
Synchronous center is that creation data is synchronized to data warehouse or data warehouse data flows back the dress to production system It sets or equipment.
Background technique based on above content and the application, in Construction of Data Warehouse or restructuring procedure, according to data Model construction framework needs to establish basal layer, some incremental data tables synchronized from storage facility located at processing plant are needed in data warehouse ODS is laminated and a full dose table, and to retain a newest full dose snapshot data, and this process is related to generating full dose table and builds predicative The operations such as sentence, merge mission script, scheduling dependence, data initialization, publication.And the technical solution of the application is then intended to pass through Merge task realizes Mass production base layer data table (full dose table), to reduce data under the premise of safeguard work quality The complexity of construction of warehouse promotes development efficiency.
As shown in Figure 1, a kind of flow diagram of the data table generating method proposed for the application, comprising the following steps:
S101 generates current tables of data according to the structural metadata information of increment list and preset task template and appoints Business.
As stated in the background art, the full dose table construction demand of basal layer is more in existing database and configuration is cumbersome, therefore The tables of data that the application is directed to layer full dose table based on type in a preferred embodiment carries out generation processing.In order to it is subsequent can The production operation of tables of data is rapidly carried out, technical staff can carry out the initialization of uniform data table generation before this step Operation builds table statement and the data initialization according to structural metadata information generation is corresponding with the tables of data Script.
In specific application scenarios, incremental data table naming method are as follows: ods_ { source system table name } _ delta;And full dose The naming method of tables of data are as follows: ods_ { source system table name }.
It is realized since existing basal layer full dose table is most of by full dose table merge task, the application's In one preferred embodiment, task template can be specifically configured to merge task template, and root in this step by technical staff According to increment list structural metadata information and batch merge task template at merge task code, then by merge task code Preset code library is uploaded to as the tables of data task.In specific application scenarios, Mass production basal layer full dose table Merge task template format is as shown in table 1 below:
Table 1
Further, since the method for building up and device of ODS layers of full dose table of data warehouse are a highly integrated schemes, wherein Relate to metadata, beyond the clouds code library, scheduling dependences, data merge task, data initialization, issue it is online etc. a series of Process.It therefore, can be in the structural metadata according to increment list in order to further improve automatization level and treatment effeciency Information and preset task template are directed to increment list structural metadata information before generating current tables of data task carries out Pretreatment.Specifically, in the preferred embodiment of the application, increment list is synchronized to data warehouse before this step, And the structural metadata information of increment list is obtained, the metadata clothes for the table structure information for being capable of providing data warehouse are established with this Subsequent processing is convenient in business.
As shown in Fig. 2, the flow diagram of the data table generating method proposed by the application specific embodiment, in the figure Other than comprising synchronization center module described above, meta data block, also relate to lower module:
Middle layer: integrate using the base layer data synchronized a data of precipitating in Construction of Data Warehouse Layer, it is therefore intended that subsequent applications is facilitated to use business datum;
Beyond the clouds: the Integrated Development Environment of data warehouse exploitation, by the way that the synchronous configuration of data, model can be carried out beyond the clouds The operation such as design, ETL exploitation, unit testing, task publication, O&M.
Scheduling: data warehouse task is carried out to have adjusted the system executed automatically according to configuration.
Based on above-mentioned module, which executes following steps in early period:
Step 1: by synchronous central synchronous incremental data to data warehouse;
Step 2.1: meta data block generates full dose table according to increment list structural information and builds table statement, and sends it to Cloud module;
Step 2.2: meta data block is according to template and increment list structural generation Merge task.
In above-mentioned steps, the name format of incremental data table are as follows: ods_ { source system table name } _ delta;Full dose tables of data Name format are as follows: ods_ { source system table name }.Modules are established and are inquired to table with this.In synchronous central synchronous During, the name format of synchronous task are as follows: imp_ { table name of ODPS };The name format of merging task are as follows: mrg_ { ODPS Table name.
Since meta data block is capable of providing the table structure information of data warehouse, and code library module being capable of basis beyond the clouds Increment list structural metadata information and batch merge task template generate merge task code and are taken by code library beyond the clouds Business submits code library to save, therefore after safeguarding information according to Mass production basal layer full dose table merge task template The system for being exclusively used in tables of data generation will be uploaded to by Excel file, and by calling meta data block and according to increment list Structural information generate basal layer full dose table build table statement and data initializtion script, call beyond the clouds code library module generate Merge task.
S102 is configured according to scheduling information of the task template to the tables of data task.
The early-stage preparations generated for tables of data are completed by the processing of S101 and S102, which is mainly used for It is configured for contents such as node, scripts required for specific generate.In order to realize the scheduling configuration for being directed to tables of data and generating Service, in the application preferred embodiment, this step generates and the tables of data task pair according to the merge task template The preposition dependence node answered, task output name, scheduler task baseline, scheduler task owner.
In the specific embodiment of Fig. 2, after having received Merge task by step 2.2, module is in step beyond the clouds In 2.3 according to template configuration scheduling information be Merge task configure relevant information, and call scheduling configuration service creation scheduling according to Rely and merge task exports title.For the content shown in the table 1, according to batch merge task mould in the specific embodiment Plate calls the preposition dependence node of scheduling configuration service creation merge task, task output name, scheduler task baseline, dispatches and appoint Be engaged in owner.
S103 builds table statement and initializtion script according to the tables of data task and scheduling information execution, with Generate tables of data.
After generating tables of data task by S101 and finishing scheduling information by S102 configuration, scheduler module exists Indicate that system will build table statement, initializtion script, merge task and be packaged and be published on line in step 3.Subsequent module beyond the clouds Table statement is built by the execution of step 4.1 instruction system, after table statement and initializtion script are built in system execution, module beyond the clouds Initialization data processing is carried out by step 4.2.
By the above process, quickly generating and creating for tables of data is realized, builds table instead of traditional manual preparation Sentence, hand-coding basal layer full dose table merge task, manual configuration scheduling dependence, hand-coding data initialization script, hand Work is packaged, by hand publication, execute by hand build table statement and initializtion script etc. the work that is accomplished manually of needs, realize data Table automatically generates, and improves the working efficiency of data warehouse.
To reach the above technical purpose, the application also proposed a kind of tables of data generating device, as shown in Figure 2, comprising:
Generation module 210, it is current for being generated according to the structural metadata information of increment list and preset task template Tables of data task;
Configuration module 220, for being configured according to scheduling information of the task template to the tables of data task;
Execution module 230, for according to the tables of data task and the scheduling information execution build table statement and just Beginningization script, to generate tables of data.
In specific application scenarios, the tables of data is specially basal layer full dose table, further includes:
Initialization module, for according to the structural metadata information generate it is corresponding with the tables of data build table statement with And the data initialization script.
In specific application scenarios, further includes:
Synchronization module for the increment list to be synchronized to data warehouse, and obtains the structural elements of the increment list Data information.
In specific application scenarios, the generation module is specifically used for:
According to increment list structural metadata information and batch merge task template at merge task code, and will be described Merge task code is uploaded to preset code library as the tables of data task.
In specific application scenarios, the configuration module is specifically used for:
Preposition dependence node corresponding with the tables of data task, task output are generated according to the merge task template Name, scheduler task baseline, scheduler task owner.
Through the above description of the embodiments, those skilled in the art can be understood that the application can lead to Hardware realization is crossed, the mode of necessary general hardware platform can also be added to realize by software.Based on this understanding, this Shen Technical solution please can be embodied in the form of software products, which can store in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute method described in each implement scene of the application.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or Process is not necessarily implemented necessary to the application.
It will be appreciated by those skilled in the art that the module in device in implement scene can be described according to implement scene into Row is distributed in the device of implement scene, can also be carried out corresponding change and is located at the one or more dresses for being different from this implement scene In setting.The module of above-mentioned implement scene can be merged into a module, can also be further split into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.
Disclosed above is only several specific implementation scenes of the application, and still, the application is not limited to this, Ren Heben What the technical staff in field can think variation should all fall into the protection scope of the application.

Claims (8)

1. a kind of data table generating method characterized by comprising
Current tables of data task is generated according to the structural metadata information of increment list and preset task template;
It is configured according to scheduling information of the task template to the tables of data task;
Table statement and initializtion script are built according to the tables of data task and scheduling information execution, to generate data Table;
The tables of data is specially basal layer full dose table, according to increment list structural metadata information and preset task mould Plate generates before current tables of data task, further includes:
Table statement and the data initialization foot are built according to structural metadata information generation is corresponding with the tables of data This.
2. the method as described in claim 1, which is characterized in that generated and the data according to the structural metadata information Table is corresponding to be built before table statement and the data initialization script, further includes:
The increment list is synchronized to data warehouse, and obtains the structural metadata information of the increment list.
3. the method as described in claim 1, which is characterized in that the task template is specially merge task template, according to increasing The structural metadata information of scale and preset task template generate current tables of data task, specifically:
According to increment list structural metadata information and batch merge task template at merge task code;
Preset code library is uploaded to using the merge task code as the tables of data task.
4. method as claimed in claim 3, which is characterized in that the scheduling according to the task template to the tables of data task Information is configured, specifically:
Preposition dependence node corresponding with the tables of data task is generated according to the merge task template, task output name, is adjusted Degree task baseline, scheduler task owner.
5. a kind of tables of data generating device characterized by comprising
Generation module, for generating current tables of data according to the structural metadata information and preset task template of increment list Task;
Configuration module, for being configured according to scheduling information of the task template to the tables of data task;
Execution module, for building table statement and initialization foot according to the tables of data task and scheduling information execution This, to generate tables of data;
The tables of data is specially basal layer full dose table, further includes:
Initialization module, for building table statement and institute according to structural metadata information generation is corresponding with the tables of data State data initialization script.
6. equipment as claimed in claim 5, which is characterized in that further include:
Synchronization module for the increment list to be synchronized to data warehouse, and obtains the structural metadata of the increment list Information.
7. equipment as claimed in claim 5, which is characterized in that
The generation module is specifically used for being appointed according to increment list structural metadata information and batch merge task template at merge Business code, and preset code library is uploaded to using the merge task code as the tables of data task.
8. equipment as claimed in claim 7, which is characterized in that
The configuration module is specifically used for being generated according to the merge task template corresponding preposition with the tables of data task Rely on node, task exports name, scheduler task baseline, scheduler task owner.
CN201510587686.3A 2015-09-15 2015-09-15 A kind of data table generating method and equipment Active CN106528070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510587686.3A CN106528070B (en) 2015-09-15 2015-09-15 A kind of data table generating method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510587686.3A CN106528070B (en) 2015-09-15 2015-09-15 A kind of data table generating method and equipment

Publications (2)

Publication Number Publication Date
CN106528070A CN106528070A (en) 2017-03-22
CN106528070B true CN106528070B (en) 2019-09-03

Family

ID=58348745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510587686.3A Active CN106528070B (en) 2015-09-15 2015-09-15 A kind of data table generating method and equipment

Country Status (1)

Country Link
CN (1) CN106528070B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107689982B (en) * 2017-06-25 2020-11-24 平安科技(深圳)有限公司 Multi-data source data synchronization method, application server and computer readable storage medium
CN107766132B (en) 2017-06-25 2019-03-15 平安科技(深圳)有限公司 Multi-task scheduling method, application server and computer readable storage medium
CN107908631B (en) * 2017-07-25 2021-04-20 平安科技(深圳)有限公司 Data processing method, data processing device, storage medium and computer equipment
CN107908672B (en) * 2017-10-24 2022-01-14 深圳前海微众银行股份有限公司 Application report realization method, device and storage medium based on Hadoop platform
CN109359157A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Data synchronize generation method, device, computer equipment and storage medium
CN110928869A (en) * 2018-09-04 2020-03-27 深圳市超脑云信息技术有限公司 Data warehouse table self-growing method, terminal and medium
CN110389955A (en) * 2019-07-13 2019-10-29 北京海致星图科技有限公司 A kind of data warehouse scheduling file automatic creation system and generation method
CN112988860B (en) * 2019-12-18 2023-09-26 菜鸟智能物流控股有限公司 Data acceleration processing method and device and electronic equipment
CN111563090B (en) * 2020-05-09 2023-11-21 中国银行股份有限公司 Method and device for loading homologous data by multi-batch system
CN111984728A (en) * 2020-08-14 2020-11-24 北京人大金仓信息技术股份有限公司 Heterogeneous database data synchronization method, device, medium and electronic equipment
CN112906054A (en) * 2021-03-12 2021-06-04 光典信息发展有限公司 Multi-tenant database isolation method and device and electronic equipment
CN112860631B (en) * 2021-04-25 2021-07-27 成都淞幸科技有限责任公司 Efficient metadata batch configuration method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096687A (en) * 2009-12-14 2011-06-15 阿里巴巴集团控股有限公司 Method and platform for scheduling tasks
CN102096685A (en) * 2009-12-11 2011-06-15 阿里巴巴集团控股有限公司 Method and device for synchronizing distributive data into data warehouse
CN102163308A (en) * 2011-04-18 2011-08-24 中国科学院计算技术研究所 Flow service data uploading system and method
CN104298671A (en) * 2013-07-16 2015-01-21 深圳中兴网信科技有限公司 Data statistics analysis method and device
CN104899284A (en) * 2015-06-05 2015-09-09 北京京东尚科信息技术有限公司 Method and device for driving scheduling system based on metadata

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096685A (en) * 2009-12-11 2011-06-15 阿里巴巴集团控股有限公司 Method and device for synchronizing distributive data into data warehouse
CN102096687A (en) * 2009-12-14 2011-06-15 阿里巴巴集团控股有限公司 Method and platform for scheduling tasks
CN102163308A (en) * 2011-04-18 2011-08-24 中国科学院计算技术研究所 Flow service data uploading system and method
CN104298671A (en) * 2013-07-16 2015-01-21 深圳中兴网信科技有限公司 Data statistics analysis method and device
CN104899284A (en) * 2015-06-05 2015-09-09 北京京东尚科信息技术有限公司 Method and device for driving scheduling system based on metadata

Also Published As

Publication number Publication date
CN106528070A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
CN106528070B (en) A kind of data table generating method and equipment
TWI476608B (en) A distributed computing data merging method, system and device thereof
CN110209728B (en) Distributed heterogeneous database synchronization method, electronic equipment and storage medium
CN106649378B (en) Data synchronization method and device
CN104317843B (en) A kind of data syn-chronization ETL system
CN105930479A (en) Data skew processing method and apparatus
CN105677465B (en) The data processing method and device of batch processing are run applied to bank
CN102081656A (en) Data acquisition and distribution system of cross-platform heterogeneous database
CN111797604A (en) Report generation method, device, equipment and computer readable storage medium
CN110442651A (en) A method of it is uploaded automatically based on kettle realization excel data and triggers scheduling
CN101976240A (en) Form number generating method and system
CN104850583A (en) Distributed collaborative analysis system and method of massive climate pattern model output data
CN104915414A (en) Data extraction method and device
CN110209730A (en) Change synchronous method, device, computer equipment and the computer storage medium of data
CN104915193A (en) Flow engine processing method and device
CN105528381A (en) Database data migration method and system
CN110134646B (en) Knowledge platform service data storage and integration method and system
CN106293995B (en) The data backup system and method for manufacturing execution system
CN103870540A (en) Database based on structural design and analysis integration
CN109947459A (en) A kind of software product construction method
CN107436883B (en) Data extraction method, device and system based on remainder
Tseng et al. A successful application of big data storage techniques implemented to criminal investigation for telecom
Popescu et al. Adaptive query execution for data management in the cloud
CN110225077A (en) Synchronous method, device, computer equipment and the computer storage medium of change supply data
CN109240757A (en) Configuration management system and method in a kind of big data component set

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.