CN108280084A - A kind of construction method of data warehouse, system and server - Google Patents

A kind of construction method of data warehouse, system and server Download PDF

Info

Publication number
CN108280084A
CN108280084A CN201710009996.6A CN201710009996A CN108280084A CN 108280084 A CN108280084 A CN 108280084A CN 201710009996 A CN201710009996 A CN 201710009996A CN 108280084 A CN108280084 A CN 108280084A
Authority
CN
China
Prior art keywords
data
task
warehouse
layer
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710009996.6A
Other languages
Chinese (zh)
Inventor
董林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pre Long Mdt Infotech Ltd
Original Assignee
Shanghai Pre Long Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pre Long Mdt Infotech Ltd filed Critical Shanghai Pre Long Mdt Infotech Ltd
Priority to CN201710009996.6A priority Critical patent/CN108280084A/en
Publication of CN108280084A publication Critical patent/CN108280084A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of construction method of data warehouse, system and server, the method includes:Structure multi-layer data processing framework handles data hierarchy;Required data are extracted from data source and by the data cleansing of extraction at the data for meeting preset requirement;Data processing task is scheduled and is supervised;The access rights of data in multi-layer data processing framework control and supervised to the overall operation situation of data;The multi-layer data processing framework includes:The interim storage layer for the data that interim storage is obtained from the data source, the core data layer stored and processed to the data after cleaning, the data set city level that data organize the formation of corresponding data theme and user oriented handle the application layer of specific requirements data input by user.The present invention can reduce the complexity of warehouse structure with rapid build Data Warehouse for Enterprises, shorten the development cycle that data warehouse is built by enterprise, reduce warehouse exploitation, O&M cost, be with a wide range of applications.

Description

A kind of construction method of data warehouse, system and server
Technical field
The present invention relates to computer software technical field, specially a kind of construction method of data warehouse, system and service Device.
Background technology
Data warehouse, English name are Data Warehouse, can be abbreviated as DW or DWH.Data warehouse is for institute of enterprise The decision-making process for having rank provides the strategy set of all types data support.It is individual data storage, for analysis Property report and decision support purpose and create.To need the enterprise of business intelligence, when service guidance flow scheme improvements, monitoring are provided Between, cost, quality and control.
Data warehouse is the structural data environment of DSS (dss) and on-line analysis application data source.Data It studies and solves the problems, such as to obtain information from database in warehouse.Data warehouse is characterized in that subject-oriented, integration, stabilization Property and time variation.
Data warehouse proposes that major function is still by father's Bill's grace door (Bill Inmon) of data warehouse in nineteen ninety It is by Transaction Processing (OLTP) the accumulated great mass of data for years of tissue penetration information system, through data warehouse Data specific to theory stores framework, makees a systematic analysis and arrangement, with the various analysis method such as on-line analytical processings of profit (OLAP), the progress of data mining (Data Mining), and supported in turn such as DSS (DSS), supervisor's information system (EIS) establishment, aid decision making person can analyze valuable information quickly and effectively from great mass of data, quasi- with sharp decision Fixed and rapid response external environment changes, and helps construction business intelligence (BI).
Data warehouse is one and provides the integrated of Data Analysis Services, unified data platform for corporate decision maker;It is led It is used to support in management, data analysis, decision support, data mining, bordereau etc.;The purpose of data warehouse is A kind of data storage environment of architecture is established, by the required mass data of analysis decision from traditional operating environment It separates, dispersion, inconsistent operation data is made to be converted to integrated, unified information.Mainly there are following characteristics:
Data in data warehouse are added by system on the basis of the database data to original dispersion is extracted, cleared up Work summarizes and arranges, it is necessary to eliminate the inconsistency in source data, is about whole with the information ensured in data warehouse The consistent global information of a enterprise.
The data of data warehouse are used mainly for business decision analysis, and involved data manipulation is mainly data query, It once some data enters after data warehouse, will be retained for a long time under normal circumstances, that is, generally had in data warehouse big The inquiry operation of amount, but modification and delete operation are seldom, usually only need regularly to load, refresh.
Data in data warehouse generally comprise historical information, and system has recorded enterprise since past a certain time point (as Using the time point of data warehouse) to the information in current each stage, by these information, can to the development course of enterprise and Future trend makes quantitative analysis and prediction.
Information technology is under data intelligence overall situation, and data warehouse is in software and hardware field, Internet and intranet The computing resource that many economical and efficients are provided in terms of solution and database, can preserve extremely large amount of data for analytical It uses, and allows to use a variety of data access technologies.
In recent years, with the fast development of internet industry, more and more enterprises are for business development and risk control System needs all to be badly in need of establishing a set of data warehouse of oneself for Analysis of Policy Making, model training use.Internet and the traditional forms of enterprises The difference of data:
1) data volume explosive growth, far faster than the growth rate of traditional forms of enterprises's data;2) data of internet industry are more Sample not only contains a large amount of traditional structural data, and contains many non-structured data;3) internet is looked forward to Requirement of the industry for data age will be far above traditional enterprise;4) result of warehouse analysis is required to carry out business fast Speed response;5) the business variation of internet industry is very fast, it is impossible to as traditional industries, be built using top-down method Vertical data warehouse, its business to be looked for novelty can incorporate data warehouse quickly, old offline business, can be very easily from existing It is offline in some data warehouses.Traditional data warehouse exploitation and O&M cost are higher, and the construction period is long, and data format is single, Have been unable to meet the fast-developing demand of Internet enterprises.
Invention content
In view of the foregoing deficiencies of prior art, the purpose of the present invention is to provide a kind of structure sides of data warehouse Method, system and server, for solve data warehouse in the prior art to make up the period longer, exploitation and O&M cost compared with High problem.
In order to achieve the above objects and other related objects, the present invention provides a kind of construction method of data warehouse, the number Include according to the construction method in warehouse:Structure multi-layer data processing framework handles data hierarchy;Required number is extracted from data source According to and by extraction data cleansing at the data for meeting preset requirement;Data processing task is scheduled and is supervised;To described The access rights of data in multi-layer data processing framework control and supervised to the overall operation situation of data.
In one embodiment of the invention, the multi-layer data processing framework includes:Interim storage is obtained from the data source Data are carried out tissue by the interim storage layer of the data taken, the core data layer stored and processed to the data after cleaning The data set city level and user oriented of formation corresponding data theme handle the application layer of specific requirements data input by user.
In one embodiment of the invention, the data required from data source extraction include:Data pick-up selection is complete Amount extracts or specified time stamp carries out increment extraction and can carry out selective extraction by specific field and filter condition.
In one embodiment of the invention, the data cleansing by extraction includes at the data for meeting preset requirement: It removes by incomplete data information completion, by wrong data, carried out in format conversion by duplicate data duplicate removal and by data One or more combinations.
In one embodiment of the invention, it is described to data processing task be scheduled including:Task is grouped simultaneously It is scheduled according to the dependence between the task of configuration or is scheduled according to the priority of the task of setting;At data Reason task carries out supervision:The newly-increased of task of supervision, pause and delete, check task operating condition and time-consuming situation and Reschedule the task of operation failure.
To achieve the above object, the present invention also provides a kind of structure system of data warehouse, the structures of the data warehouse System includes:Hierarchical block is handled data hierarchy for building multi-layer data processing framework;Abstraction module is used for from data Extract required data in source;Cleaning module, for by the data cleansing of extraction at the data for meeting preset requirement;Scheduler module, For being scheduled to data processing task;Task administration module, supervises data processing task;Authority management module, It is controlled for the access rights to the data in the multi-layer data processing framework;Data administration module, for data Overall operation situation supervised.
In one embodiment of the invention, the multi-layer data processing framework of the hierarchical block structure includes:Interim storage The interim storage layer of the data obtained from the data source, the core data layer that the data after cleaning are stored and processed, Data organize the formation of the data set city level of corresponding data theme and user oriented handles specific requirements input by user The application layer of data.
In one embodiment of the invention, selection full dose extracts when the abstraction module extracts required data from data source Or specified time stamp carries out increment extraction and can carry out selective extraction by specific field and filter condition.
In one embodiment of the invention, the cleaning module is by the data cleansing of extraction at the data for meeting preset requirement Include:Removed by incomplete data information completion, by wrong data, by duplicate data duplicate removal and by data into row format One or more combinations in conversion.
In one embodiment of the invention, the scheduler module to data processing task be scheduled including:By task into Row is grouped and is scheduled according to the dependence between the task of configuration or is scheduled according to the priority of the task of setting; The task administration module carries out supervision to data processing task:Task is checked in the newly-increased of task of supervision, pause and deletion Operating condition and time-consuming situation and reschedule operation failure task.
To achieve the above object, the present invention also provides a kind of server, the server includes data bins as described above The structure system in library.
As described above, construction method, system and the server of a kind of data warehouse of the present invention, have below beneficial to effect Fruit:
The present invention can reduce the complexity of warehouse structure with rapid build Data Warehouse for Enterprises, shorten enterprise and build data The development cycle in warehouse reduces warehouse exploitation, O&M cost, is with a wide range of applications.
Description of the drawings
Fig. 1 is shown as a kind of idiographic flow schematic diagram of the construction method of data warehouse of the present invention.
Fig. 2 is shown as a kind of structure schematic diagram of the construction method of data warehouse of the present invention.
Fig. 3 is shown as a kind of functional block diagram of the structure system of data warehouse of the present invention.
Component label instructions
The structure system of 100 data warehouses
101 hierarchical blocks
102 abstraction modules
103 cleaning modules
104 scheduler modules
105 task administration modules
106 authority management modules
107 data administration modules
S101~S104 steps
Specific implementation mode
Illustrate that embodiments of the present invention, those skilled in the art can be by this specification below by way of specific specific example Disclosed content understands other advantages and effect of the present invention easily.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also be based on different viewpoints with application, without departing from Various modifications or alterations are carried out under the spirit of the present invention.
It is existing for solving the purpose of the present invention is to provide a kind of construction method of data warehouse, system and server The load capacity that is generated when the structure of data warehouse in technology is big, exits that time-consuming, loss of data or the problem of need data re-transmitting. A kind of construction method of data warehouse of the present invention described in detail below, the principle and embodiment of system and server, make Those skilled in the art do not need construction method, system and the clothes that creative work is appreciated that a kind of data warehouse of the present invention Business device.
Construction method, system and the server of data warehouse in the present embodiment are intended to rapid build Data Warehouse for Enterprises, Including overall architecture, the extraction conversion of data and the exploitation of task, scheduling, integrated, deployment, warehouse exploitation, O&M cost are reduced, The value of more efficient performance business data.
The construction method of the data warehouse in the present embodiment, system and server are specifically described below.
Specifically, as shown in Figure 1, the present embodiment provides a kind of construction method of data warehouse, the structure of the data warehouse Construction method includes the following steps:
Step S101, structure multi-layer data processing framework handle data hierarchy.
Step S102 extracts required data and by the data cleansing of extraction at the number for meeting preset requirement from data source According to.
Step S103, is scheduled data processing task and supervises.
Step S104 carries out control and to the whole of data to the access rights of the data in the multi-layer data processing framework Running body situation is supervised.
Step S101~step S104 in the present embodiment is specifically described below.
Step S101, structure multi-layer data processing framework handle data hierarchy.
Specifically, in this present embodiment, the multi-layer data processing framework includes:Interim storage is obtained from the data source Data interim storage layer, the data after cleaning are stored and processed core data layer, data are subjected to tissue shape The application layer of specific requirements data input by user is handled at the data set city level and user oriented of corresponding Data subject.Data Layering can simplify the construction work of entire data warehouse, go to complete because multiple steps have been assigned in the work of an original step, It is equivalent to and a complicated job has been splitted into multiple simple work, so, in this present embodiment, as shown in Fig. 2, by data Warehouse is divided into four layers:ODS (interim storage layer), PDW (data warehouse layer), MID (data set city level) and APP (application layer).
Below in the present embodiment ODS (interim storage layer), PDW (data warehouse layer), MID (data set city level) and APP (application layer) is specifically described.
Interim storage layer (ODS) is the temporary storage area of interface data, is prepared for the data processing of latter step.Generally For ODS layers of data and the data of source data system be isomorphism, main purpose is the work of simplified follow-up data working process Make.
The data of data warehouse layer (PDW) should be consistent, data accurately, clean, i.e., carried out to source data Clean the data after (eliminating impurity).The data of this layer are usually to follow database third normal form, and data granularity is logical It is often identical with the granularity of ODS.It can be preserved at PDW layers all in BI (Business Intelligence, business intelligence) system Historical data.
Data set city level (MID) is that subject-oriented comes a group organization data, the typically data of star or snowflake structure.From number For granularity, the data of this layer are slightly to summarize the data of grade, and detailed data has been not present.From the time span of data For, a typically PDW layers of part, main purpose is to meet the needs of customer analysis.
Application layer (APP) this layer data is completely to meet specific analysis demand and the data built and star Or the data of snowflake structure.It is the data highly summarized for data granularity.For the range of data, then not necessarily Cover all business datums, but a subset of MID layer datas.
Step S102 extracts required data and by the data cleansing of extraction at the number for meeting preset requirement from data source According to.
ETL (Extract-Transform-Load) is used for describing by data from source terminal by extracting (extract), turning It changes (transform), load (load) to the process of destination.ETL is to build the indispensable step of data warehouse, Yong Hucong Data source extracts required data, by data cleansing, finally according to the data warehouse model pre-defined, by data plus It is downloaded in data warehouse.
In this present embodiment, the data required from data source extraction include:Data pick-up select full dose extract or Specified time stamp carries out increment extraction and can carry out selective extraction by specific field and filter condition.
In this present embodiment, can data pick-up be carried out to structuring or unstructured database table according to demand, it can logarithm It is filtered, can flexibly be configured according to the frequency of extraction and the condition of data pick-up.
In this present embodiment, by configuring cleaning rule and algorithm, by data cleansing at format specification, meaning unification, matter Measure good data.Specifically, in this present embodiment, the data cleansing by extraction in the data for meeting preset requirement at wrapping It includes:It removed by incomplete data information completion, by wrong data, carry out format conversion by duplicate data duplicate removal and by data In one or more combinations.
Step S103, is scheduled data processing task and supervises.
For the data warehouse of enterprise-level, processing routine therein is thousands of, and between these processing routines Relationship countless ties, how efficient scheduling and manage these tasks be very important work in data warehouse management, and Improve the key of data warehouse runnability and resource utilization.
In this present embodiment, it is described to data processing task be scheduled including:Task is grouped and according to configuration Task between dependence be scheduled or be scheduled according to the priority of the task of setting.
In order to improve the performance of scheduling, in this present embodiment, scheduling uses Thread Pool Technology, in order to preferably be isolated each Interacting between business and the distribution for preferably controlling resource are grouped scheduling to task, and a group corresponds to a thread Pond, parallel number of tasks is configurable in a group.
Specifically, the running frequency of task may be configured as day, the moon, week, being carried out according to the dependence of configuration between task Scheduling, task rely on subtask be carried out then task enter can operation queue;Or task can be arranged priority level, it is full The high priority of task of priority is adjusted after sufficient service condition.
According to different data sources, the scripts language such as Java, SQL Procedure, shell may be used in task scheduling process Speech is realized.
In this present embodiment, data processing task is supervised including but not limited to:The newly-increased of task of supervision, pause and Delete, check task operating condition and time-consuming situation and reschedule operation failure task.
Step S104 carries out control and to the whole of data to the access rights of the data in the multi-layer data processing framework Running body situation is supervised.
Specifically, in this present embodiment, the management that permission is carried out for the different consumer groups in warehouse, for data warehouse In table carry out the encapsulation of permission, including 4 department, user, role, permission entities.Can according to department carry out unified authorization or Person carries out single mandate by designated user, is reached by view and carries out the setting management of field rank to table.
In this present embodiment, the overall operation situation to data, which supervise, includes:To operating system, data warehouse The operating condition of database is monitored, gathered data operating load situation, the performance bottleneck convenient for the system of checking and abnormal feelings Condition, the unified fallback task that acquires are convenient for Optimization Work.
In addition, in this present embodiment, full dose or incremental backup can be carried out to the table in the data set city level (MID).
To realize that the construction method of above-mentioned data warehouse, the present embodiment correspondence also provide a kind of structure system of data warehouse System, specifically, as shown in figure 3, the structure system 100 of the data warehouse includes:Hierarchical block 101, abstraction module 102, clearly Mold cleaning block 103, scheduler module 104, task administration module 105, authority management module 106 and data administration module 107.
Specifically, in this present embodiment, the hierarchical block 101 is for building multi-layer data processing framework by data hierarchy Processing.
In this present embodiment, the multi-layer data processing framework of the structure of the hierarchical block 101 includes:Interim storage is from described The interim storage layer for the data that data source obtains, the core data layer that the data after cleaning are stored and processed, by data The data set city level and user oriented for organize the formation of corresponding data theme handle specific requirements data input by user Application layer.
Data hierarchy can simplify the construction work of entire data warehouse, because the work of an original step has been assigned to multiple Step goes to complete, and is equivalent to and a complicated job has been splitted into multiple simple work, so, in this present embodiment, such as Fig. 2 It is shown, data warehouse is divided into four layers:ODS (interim storage layer), PDW (data warehouse layer), MID (data set city level) and APP (application layer).
Below in the present embodiment ODS (interim storage layer), PDW (data warehouse layer), MID (data set city level) and APP (application layer) is specifically described., a):
Interim storage layer (ODS) is the temporary storage area of interface data, is prepared for the data processing of latter step.Generally For ODS layers of data and the data of source data system be isomorphism, main purpose is the work of simplified follow-up data working process Make.
The data of data warehouse layer (PDW) should be consistent, data accurately, clean, i.e., carried out to source data Clean the data after (eliminating impurity).The data of this layer are usually to follow database third normal form, and data granularity is logical It is often identical with the granularity of ODS.It can be preserved at PDW layers all in BI (Business Intelligence, business intelligence) system Historical data.
Data set city level (MID) is that subject-oriented comes a group organization data, the typically data of star or snowflake structure.From number For granularity, the data of this layer are slightly to summarize the data of grade, and detailed data has been not present.From the time span of data For, a typically PDW layers of part, main purpose is to meet the needs of customer analysis.
Application layer (APP) this layer data is completely to meet specific analysis demand and the data built and star Or the data of snowflake structure.It is the data highly summarized for data granularity.For the range of data, then not necessarily Cover all business datums, but a subset of MID layer datas.
In this present embodiment, the abstraction module 102 is used to extract required data from data source.
ETL (Extract-Transform-Load) is used for describing by data from source terminal by extracting (extract), turning It changes (transform), load (load) to the process of destination.ETL is to build the indispensable step of data warehouse, Yong Hucong Data source extracts required data, by data cleansing, finally according to the data warehouse model pre-defined, by data plus It is downloaded in data warehouse.
Specifically, in this present embodiment, selection full dose is taken out when the abstraction module 102 extracts required data from data source It takes or specified time stamp carries out increment extraction and can carry out selective extraction by specific field and filter condition.
In this present embodiment, can data pick-up be carried out to structuring or unstructured database table according to demand, it can logarithm It is filtered, can flexibly be configured according to the frequency of extraction and the condition of data pick-up.
Specifically, in this present embodiment, the cleaning module 103 is used for the data cleansing of extraction at meeting preset requirement Data.
In this present embodiment, by configuring cleaning rule and algorithm, by data cleansing at format specification, meaning unification, matter Measure good data.Specifically, in this present embodiment, the cleaning module 103 by the data cleansing of extraction at meeting default want The data asked include:It is removed by incomplete data information completion, by wrong data, by duplicate data duplicate removal and by data Carry out one or more combinations in format conversion.
Specifically, in this present embodiment, the scheduler module 104 is for being scheduled data processing task.
For the data warehouse of enterprise-level, processing routine therein is thousands of, and between these processing routines Relationship countless ties, how efficient scheduling and manage these tasks be very important work in data warehouse management, and Improve the key of data warehouse runnability and resource utilization.
In this present embodiment, the scheduler module 104 to data processing task be scheduled including:Task is grouped And it is scheduled according to the dependence between the task of configuration or is scheduled according to the priority of the task of setting.
In order to improve the performance of scheduling, in this present embodiment, the scheduling of the scheduler module 104 uses Thread Pool Technology, In order to which interacting between each task is preferably isolated and preferably controls the distribution of resource, the scheduler module 104 is to appointing Business is grouped scheduling, and a group corresponds to a thread pool, and parallel number of tasks can configure in a group.
Specifically, in the scheduler module 104, the running frequency of task may be configured as day, the moon, week, root between task Be scheduled according to the dependence of configuration, task rely on subtask be carried out then task enter can operation queue;Or it can Priority level is arranged to task, meets the high priority of task of priority after service condition and is adjusted.
According to different data sources, Java, SQL may be used in the task scheduling process of the scheduler module 104 The scripts such as Procedure, shell are realized.
Specifically, in this present embodiment, the task administration module 105 supervises data processing task.
In this present embodiment, the task administration module 105, which to data processing task supervise, includes:Task of supervision Operating condition that is newly-increased, suspending and delete, check task and time-consuming situation and the rescheduling operation failure of the task.
Specifically, in this present embodiment, the authority management module 106 is used for in the multi-layer data processing framework The access rights of data are controlled.
In this present embodiment, the management that permission is carried out for the different consumer groups in warehouse, for the table in data warehouse Carry out the encapsulation of permission, including 4 department, user, role, permission entities.Unified authorization can be carried out according to department or by finger Determine user and carry out single mandate, is reached by view and the setting management of field rank is carried out to table.
Specifically, in this present embodiment, the data administration module 107 is for supervising the overall operation situation of data Pipe.
In this present embodiment, the data administration module 107, which to the overall operation situation of data supervise, includes:To behaviour Make system, the operating condition in data warehouse data library is monitored, gathered data operating load situation, convenient for the property for the system of checking Energy bottleneck and abnormal conditions, the unified fallback task that acquires are convenient for Optimization Work.
In addition, in this present embodiment, the data administration module 107 can be to the table in the data set city level (MID) Carry out full dose or incremental backup.
Last the present embodiment also provides a kind of server, and the server includes the structure system of data warehouse as described above System 100.Above-mentioned that the structure system 100 of the data warehouse is described in detail, details are not described herein.
In conclusion the present invention can reduce the complexity of warehouse structure with rapid build Data Warehouse for Enterprises, shorten enterprise It builds the development cycle of data warehouse, reduces warehouse exploitation, O&M cost, be with a wide range of applications.So the present invention has Effect overcomes various shortcoming in the prior art and has high industrial utilization.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.It is any ripe The personage for knowing this technology can all carry out modifications and changes to above-described embodiment without violating the spirit and scope of the present invention.Cause This, institute is complete without departing from the spirit and technical ideas disclosed in the present invention by those of ordinary skill in the art such as At all equivalent modifications or change, should by the present invention claim be covered.

Claims (11)

1. a kind of construction method of data warehouse, which is characterized in that the construction method of the data warehouse includes:
Structure multi-layer data processing framework handles data hierarchy;
Required data are extracted from data source and by the data cleansing of extraction at the data for meeting preset requirement;
Data processing task is scheduled and is supervised;
To the access rights of the data in the multi-layer data processing framework carry out control and to the overall operation situation of data into Row supervision.
2. the construction method of data warehouse according to claim 1, which is characterized in that the multi-layer data processing framework packet It includes:The interim storage layers of the data that interim storage is obtained from the data source stores and processs the data after cleaning Core data layer, the data set city level that data organize the formation of corresponding data theme and user oriented processing user's input Specific requirements data application layer.
3. the construction method of data warehouse according to claim 1, which is characterized in that it is described from data source extract needed for Data include:Data pick-up selection full dose extracts or specified time stamp carries out increment extraction and can pass through specific field and filtering Condition carries out selective extraction.
4. the construction method of data warehouse according to claim 1, which is characterized in that the data cleansing by extraction at The data for meeting preset requirement include:It is removed, by duplicate data duplicate removal by incomplete data information completion, by wrong data And data are subjected to one or more combinations in format conversion.
5. the construction method of data warehouse according to claim 1, which is characterized in that described to be carried out to data processing task Scheduling includes:Task is grouped and is scheduled according to the dependence between the task of configuration or according to the task of setting Priority be scheduled;Carrying out supervision to data processing task includes:The newly-increased of task of supervision, pause and deletion are checked and are appointed The operating condition of business and time-consuming situation and the rescheduling operation failure of the task.
6. a kind of structure system of data warehouse, which is characterized in that the structure system of the data warehouse includes:
Hierarchical block is handled data hierarchy for building multi-layer data processing framework;
Abstraction module, for extracting required data from data source;
Cleaning module, for by the data cleansing of extraction at the data for meeting preset requirement;
Scheduler module, for being scheduled to data processing task;
Task administration module, supervises data processing task;
Authority management module is controlled for the access rights to the data in the multi-layer data processing framework;
Data administration module is supervised for the overall operation situation to data.
7. the structure system of data warehouse according to claim 6, which is characterized in that the multilayer of the hierarchical block structure Data processing architecture includes:The interim storage layer for the data that interim storage is obtained from the data source, to the data after cleaning into The core data layer of row storage and processing, the data set city level that data organize the formation of corresponding data theme and towards with Family handles the application layer of specific requirements data input by user.
8. the structure system of data warehouse according to claim 7, which is characterized in that the abstraction module is taken out from data source Selection full dose extracts when taking required data or specified time stamp carries out increment extraction and can be by specific field and filter condition Carry out selective extraction.
9. the structure system of data warehouse according to claim 6, which is characterized in that the cleaning module is by the number of extraction Include at the data for meeting preset requirement according to cleaning:It removes, will repeat by incomplete data information completion, by wrong data Data deduplication and by data carry out format conversion in one or more combinations.
10. the structure system of data warehouse according to claim 6, which is characterized in that the scheduler module to data at Reason task be scheduled including:Task is grouped and be scheduled according to the dependence between the task of configuration or according to The priority of the task of setting is scheduled;The task administration module carries out supervision to data processing task:Supervision is appointed The newly-increased of business, pause and deletion, the operating condition for checking task and time-consuming situation and the rescheduling operation failure of the task.
11. a kind of server, which is characterized in that the server includes as claim 6 to any right of claim 10 is wanted Seek the structure system of the data warehouse.
CN201710009996.6A 2017-01-06 2017-01-06 A kind of construction method of data warehouse, system and server Pending CN108280084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710009996.6A CN108280084A (en) 2017-01-06 2017-01-06 A kind of construction method of data warehouse, system and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710009996.6A CN108280084A (en) 2017-01-06 2017-01-06 A kind of construction method of data warehouse, system and server

Publications (1)

Publication Number Publication Date
CN108280084A true CN108280084A (en) 2018-07-13

Family

ID=62800908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710009996.6A Pending CN108280084A (en) 2017-01-06 2017-01-06 A kind of construction method of data warehouse, system and server

Country Status (1)

Country Link
CN (1) CN108280084A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109150283A (en) * 2018-07-23 2019-01-04 千寻位置网络有限公司 Observe the transmission method and terminal, proxy server and data broadcasting system of data
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive
CN109241042A (en) * 2018-07-24 2019-01-18 新华三大数据技术有限公司 Data processing method, device and electronic equipment
CN109597846A (en) * 2018-10-22 2019-04-09 平安科技(深圳)有限公司 Big data platform data warehouse data processing method, device and computer equipment
CN109840269A (en) * 2018-12-26 2019-06-04 成都康赛信息技术有限公司 Data relationship visual management method based on four layer data frameworks
CN111104394A (en) * 2019-12-31 2020-05-05 新奥数能科技有限公司 Energy data warehouse system construction method and device
CN112035450A (en) * 2020-07-30 2020-12-04 深圳市中盛瑞达科技有限公司 Data warehouse real-time construction method based on button
CN112231301A (en) * 2020-10-21 2021-01-15 黄河水利委员会黄河水利科学研究院 Yellow river water sand change data warehouse
CN112540854A (en) * 2020-12-28 2021-03-23 上海体素信息科技有限公司 Deep learning model scheduling deployment method and system under condition of limited hardware resources
CN112699096A (en) * 2020-12-30 2021-04-23 银盛支付服务股份有限公司 Method for controlling data access authority based on big data
CN113032495A (en) * 2021-03-23 2021-06-25 深圳市酷开网络科技股份有限公司 Multi-layer data storage system based on data warehouse, processing method and server
WO2021135727A1 (en) * 2019-12-31 2021-07-08 新奥数能科技有限公司 Energy data warehouse system
CN113190630A (en) * 2021-05-31 2021-07-30 深圳金石创新科技有限公司 Data integration method and system for constructing enterprise data warehouse

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040249644A1 (en) * 2003-06-06 2004-12-09 International Business Machines Corporation Method and structure for near real-time dynamic ETL (extraction, transformation, loading) processing
CN104239100A (en) * 2014-09-11 2014-12-24 浪潮软件集团有限公司 Universal data processing method
CN104731791A (en) * 2013-12-18 2015-06-24 东阳艾维德广告传媒有限公司 Marketing analysis data market system
CN104933112A (en) * 2015-06-04 2015-09-23 浙江力石科技股份有限公司 Distributed Internet transaction information storage and processing method
CN105933446A (en) * 2016-06-28 2016-09-07 中国农业银行股份有限公司 Service dual-active implementation method and system of big data platform
CN106202346A (en) * 2016-06-29 2016-12-07 浙江理工大学 A kind of data load and clean engine, dispatch and storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040249644A1 (en) * 2003-06-06 2004-12-09 International Business Machines Corporation Method and structure for near real-time dynamic ETL (extraction, transformation, loading) processing
CN104731791A (en) * 2013-12-18 2015-06-24 东阳艾维德广告传媒有限公司 Marketing analysis data market system
CN104239100A (en) * 2014-09-11 2014-12-24 浪潮软件集团有限公司 Universal data processing method
CN104933112A (en) * 2015-06-04 2015-09-23 浙江力石科技股份有限公司 Distributed Internet transaction information storage and processing method
CN105933446A (en) * 2016-06-28 2016-09-07 中国农业银行股份有限公司 Service dual-active implementation method and system of big data platform
CN106202346A (en) * 2016-06-29 2016-12-07 浙江理工大学 A kind of data load and clean engine, dispatch and storage system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109150283B (en) * 2018-07-23 2021-02-19 千寻位置网络有限公司 Observation data transmission method and terminal, proxy server and data broadcasting system
CN109150283A (en) * 2018-07-23 2019-01-04 千寻位置网络有限公司 Observe the transmission method and terminal, proxy server and data broadcasting system of data
CN109241042B (en) * 2018-07-24 2020-12-08 新华三大数据技术有限公司 Data processing method and device and electronic equipment
CN109241042A (en) * 2018-07-24 2019-01-18 新华三大数据技术有限公司 Data processing method, device and electronic equipment
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive
CN109597846A (en) * 2018-10-22 2019-04-09 平安科技(深圳)有限公司 Big data platform data warehouse data processing method, device and computer equipment
CN109597846B (en) * 2018-10-22 2024-05-07 平安科技(深圳)有限公司 Data processing method, device and computer equipment for large data platform data warehouse
CN109840269A (en) * 2018-12-26 2019-06-04 成都康赛信息技术有限公司 Data relationship visual management method based on four layer data frameworks
CN111104394A (en) * 2019-12-31 2020-05-05 新奥数能科技有限公司 Energy data warehouse system construction method and device
WO2021135177A1 (en) * 2019-12-31 2021-07-08 新奥数能科技有限公司 Construction method and apparatus for energy data warehouse system
WO2021135727A1 (en) * 2019-12-31 2021-07-08 新奥数能科技有限公司 Energy data warehouse system
CN112035450A (en) * 2020-07-30 2020-12-04 深圳市中盛瑞达科技有限公司 Data warehouse real-time construction method based on button
CN112035450B (en) * 2020-07-30 2021-10-29 深圳市中盛瑞达科技有限公司 Data warehouse real-time construction method based on button
CN112231301A (en) * 2020-10-21 2021-01-15 黄河水利委员会黄河水利科学研究院 Yellow river water sand change data warehouse
CN112540854A (en) * 2020-12-28 2021-03-23 上海体素信息科技有限公司 Deep learning model scheduling deployment method and system under condition of limited hardware resources
CN112699096A (en) * 2020-12-30 2021-04-23 银盛支付服务股份有限公司 Method for controlling data access authority based on big data
CN113032495A (en) * 2021-03-23 2021-06-25 深圳市酷开网络科技股份有限公司 Multi-layer data storage system based on data warehouse, processing method and server
CN113190630A (en) * 2021-05-31 2021-07-30 深圳金石创新科技有限公司 Data integration method and system for constructing enterprise data warehouse
CN113190630B (en) * 2021-05-31 2022-02-01 深圳金石创新科技有限公司 Data integration method and system for constructing enterprise data warehouse

Similar Documents

Publication Publication Date Title
CN108280084A (en) A kind of construction method of data warehouse, system and server
Li et al. Preventive maintenance scheduling optimization based on opportunistic production-maintenance synchronization
El-Seoud et al. Big Data and Cloud Computing: Trends and Challenges.
Wu et al. A self-tuning system based on application profiling and performance analysis for optimizing hadoop mapreduce cluster configuration
US20140358844A1 (en) Workflow controller compatibility
Sivaraman et al. High performance and fault tolerant distributed file system for big data storage and processing using hadoop
Ralph et al. Digitalization and digital transformation in metal forming: key technologies, challenges and current developments of industry 4.0 applications
CN108829505A (en) A kind of distributed scheduling system and method
CN106227862A (en) E-commerce data integration method based on distribution
Dagade et al. Big data weather analytics using hadoop
CN105868222A (en) Task scheduling method and device
Jun et al. Cloud computing based solution to decision making
CN106528297A (en) System for managing tasks
Talib et al. A multi-agent framework for data extraction, transformation and loading in data warehouse
CN117591504A (en) Enterprise data cleaning method for big data
EP4113418B1 (en) Non-linear planning model based production planning system, production planning method and computer-readable storage medium
Nemeth et al. Determination issues of data mining process of failures in the production systems
Arnold et al. Machine Learning Models for Cyberattack Detection in Industrial Control Systems
Jamal et al. Performance Comparison between S3, HDFS and RDS storage technologies for real-time big-data applications
Hassan et al. Real-time big data analytics for data stream challenges: an overview
Ionescu et al. An architecture and methods for big data analysis
CN115714807A (en) Design system of platform in industrial scene data
Shouaib et al. Survey on iot-based big data analytics
Marinescu et al. Software system for inventory and assessment of the wear of computing machines from a network of grid data centers
Darius et al. From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180713

RJ01 Rejection of invention patent application after publication