CN104391989A - Distributed ETL all-in-one machine system - Google Patents

Distributed ETL all-in-one machine system Download PDF

Info

Publication number
CN104391989A
CN104391989A CN201410774178.1A CN201410774178A CN104391989A CN 104391989 A CN104391989 A CN 104391989A CN 201410774178 A CN201410774178 A CN 201410774178A CN 104391989 A CN104391989 A CN 104391989A
Authority
CN
China
Prior art keywords
data
etl
task
node
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410774178.1A
Other languages
Chinese (zh)
Inventor
刘伟
辛国茂
金洪殿
亓开元
房体盈
曹连超
卢军佐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410774178.1A priority Critical patent/CN104391989A/en
Publication of CN104391989A publication Critical patent/CN104391989A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed ETL all-in-one machine system which comprises a distributed ETL all-in-one machine hardware system, a cluster intelligent management engine, ETL business logics, ETL task development, data engines, data storage, CS (Client-Server) data transmission and related system management modules. Compared with the prior art, the distributed ETL all-in-one machine system can parallelly extract a great amount of off-line data and streaming data at a high speed, processes the data through a distributed ETL all-in-one machine and then outputs the data to a large data system to complete the ETL processing process, thereby achieving high practicability, a wide application range and very high technological values.

Description

A kind of distributed ETL integrated machine system
Technical field
The present invention relates to field of computer technology, specifically a kind of practical, distributed ETL integrated machine system.
Background technology
Human society is current enters the information age comprehensively, and the data that the information age produces exponentially do not increase year by year, and due to the restriction of conventional art, a large amount of data are sunk into sleep in storage medium.In recent years, along with the development of the large data processing technique such as Hadoop, Spark, data attracted people's attention, and became the strategic resource of equal importance with water, oil.Current mass data is mainly stored in traditional SQL database, the NoSQL database used with large data technique is very different, simultaneously due to the diversity feature of data, before using large data platform process data, need the storage system of large for data importing data platform oneself, and generally need to carry out ETL process when importing, complete the extraction of Various types of data, cleaning, the processes such as loading.
On the unit that traditional E TL system is mainly run, also there is distributed ETL process, but mainly towards multitask scene.It is comparatively perfect that these traditional ETL systemic-functions have developed, but when tackling the scene of big data quantity, processing speed is difficult to meet processing demands, function exists a lot of deviation to connecting.
Along with large data age at hand, data can expand further, and the ETL processing demands towards large data platform will get more and more, and data volume is increasing, and process timeliness requires more and more urgently, finally can cause traditional ETL processing mode embarrassment heavy burden.So aim at large design data, high-effect data processing, large handling capacity, the ETL process unified platform of complete function will become a kind of new demand of large data age.
Based on this, now provide a kind of distributed ETL integrated machine system design towards large data, to meet the ETL processing demands of large data age.
Summary of the invention
Technical assignment of the present invention is for above weak point, provides a kind of practical, distributed ETL integrated machine system.
A kind of distributed ETL integrated machine system, its specific implementation process is:
Arrange distributed ETL all-in-one hardware system, this hardware system comprises server cluster, uses multiple stage to be applicable to the server of the transmission of large data and stores processor, builds dynamical ETL processing hardware platform; The cluster of above-mentioned hardware system uses MS master-slave formula structure, and namely whole cluster comprises a host node, some from node;
Swarm intelligence management engine is set as the interface in the middle of hardware layer and ETL operation system, for ETL business provides all supportings, this swarm intelligence management engine is also as the tension management person of hardware cluster, the internal memory of unified management cluster, hard disk, network hardware resources, the function of responsible node expansion simultaneously, two-node cluster hot backup, standby host node selection, cluster monitoring;
In host node, arrange distributed ETL administrative center, this distributed ETL administrative center performs collaborative, the load balancing of ETL task by host node, and data engine manages, task management; And coordinate swarm intelligence management engine to complete the synchronous of related data;
ETL service logic is set, is namely received the task of distributed ETL administrative center distribution by each node, collaborative ETL business processing of finishing the work, this business processing comprises data pick-up, data cleansing, conversion, data loading, data backflow, systematic analysis, quality management ETL systemic-function;
Arrange ETL task management, provide graphical task design, namely use visual ETL task design, the metadata store of design is in task metadatabase;
Setting data engine, management Various types of data source connects driving; For all kinds of metadata store of ETL system itself provide database to unify memory interface; Complete Distributed Storage unified management;
Setting data stores, and provides business datum storage, user data cache function, and these data store and use distributed memory storage and high speed hard-disk to store;
The transmission of principal and subordinate Client-Server data is set, uses Client to obtain source data in data source, then connect the Server port of distributed ETL system, complete convergence and collect;
Configuration Manager is set, namely provide can be mutual WEB UI interface, configuration management and user management are unified to cluster;
Log pattern is set, all kinds of daily records that cluster generates by this log pattern, imports log pattern and carry out unified management, and the statistical study of daily record is provided.
The cluster of described hardware system is selecting a node as standby host node from node, and all kinds of managing configuration information of the timely synchronization master of this standby host node, carry out hot standby; After host node breaks down disengaging cluster, standby host node switches to master node roles, takes over the ETL task that host node manages whole cluster, selects a node as standby host node from residue from node simultaneously;
The Large Copacity internal memory of each more than the Joint Enterprise 8G of described cluster, directly carries out ETL business procedure in internal memory and data store; Be equipped with the Large ca-pacity and high speed hard-disk of more than 500G, as data buffer storage pond, to adapt to the storage of super amount data simultaneously; Cluster internal uses 10,000,000,000 grades to connect with uplink, ensures internal exchange of data speed, simultaneously each Joint Enterprise many network links, by host node unified management, and can the data of the same data source of parallel clustering;
Described swarm intelligence management engine provides following service for upper strata ETL node: provide resource dispatching strategy; Distributed communication and comport interface are provided; Cluster resource is monitored.
Described distributed ETL administrative center is performed by host node and realizes its function, and its function and implementation procedure comprise:
ETL task scheduling, to the task that each user submits to, carries out United Dispatching execution according to predetermined policy;
ETL task scheduling, to the task that will perform, according to Data distribution8 feature, each node status information, carries out rational Task-decomposing and Data Segmentation, then distributes to each node and performs;
ETL task management is monitored, and monitor task operational process, gathers mission bit stream to user;
Load balancing, according to Mission Monitor information and each node resource state of current cluster, dynamic conditioning task matching;
Error handle and fault recovery, when there is task run mistake and node failure, redistribute task;
Data engine manages, built-in Various types of data source engine, and adds new data engine, is stored in data engine storehouse, drives engine by administrative center's unified management Various types of data source;
Historic task record and management.
Described ETL service logic is responsible for ETL business processing, mainly completes all processes of ETL, i.e. the process that the process of E logic extraction, the process of T logical transition, L logic load, and is responsible for data backflow, systematic analysis, data quality management simultaneously; Above-mentioned E logic is responsible for data extraction process, and this hardware system completes full dose or the increment extraction of off-line data, and flow data extracts in real time.
Described ETL task management uses visual ETL task design, and the metadata store of design is in task metadatabase, and conveniently many people's exploitations, provide co-development function, and use Version Control to manage version iteration.
Described data are stored as the key of whole ETL system, its content stored is divided into two parts: Service Database and customer data base, the data that wherein in Service Database storage system applications process, self produces, comprise system configuration data, task metadata, transfer function, data engine, the Schema data that external data source extracts also are stored in Service Database; The object of user data library storage ETL process and object, this customer data base uses internal memory as data storage medium, by using distributed memory storage system; Use high speed hard-disk as Buffer Pool during a large amount of off-line data of process, when low memory, use hard disk as external cache.
The distributed ETL integrated machine system of one of the present invention, has the following advantages:
This use express network of one distributed ETL integrated machine system of this invention connects inner cluster and external source data equipment, integrated Various types of data engine, various data source can be connected, there is complete ETL processing logic and high speed data transfer ability, practical, applied widely, be easy to promote.
Accompanying drawing explanation
Accompanying drawing 1 is distributed ETL integral mechanism frame system of the present invention.
Accompanying drawing 2 is integrated machine system detailed architecture figure of the present invention.
Accompanying drawing 3 is integrated machine system aggregated structure figure of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments, the invention will be further described.
The invention provides a kind of distributed ETL integrated machine system, use the high-effect server cluster for ETL business customizing under large data scene, coordinate with the distributed ETL system that process big data quantity be target design, formation is applicable to the ETL integrated machine system of large data scene efficiently; This integrated machine system hardware use express network connect inner cluster and external source data equipment, integrated Various types of data engine, the off-line data of big data quantity and flow data can be extracted by parallel high-speed, through the data processing of distributed ETL all-in-one, export large data system to, complete ETL processing procedure.As shown in accompanying drawing 1, Fig. 2, Fig. 3, its specific implementation process is:
Arrange distributed ETL all-in-one hardware system, this hardware system comprises server cluster, uses multiple stage to be applicable to the server of the transmission of large data and stores processor, builds dynamical ETL processing hardware platform; The cluster of above-mentioned hardware system uses MS master-slave formula structure, and namely whole cluster comprises a host node, some from node;
Swarm intelligence management engine is set as the interface in the middle of hardware layer and ETL operation system, for ETL business provides all supportings, this swarm intelligence management engine is also as the tension management person of hardware cluster, the internal memory of unified management cluster, hard disk, network hardware resources, the function of responsible node expansion simultaneously, two-node cluster hot backup, standby host node selection, cluster monitoring;
In host node, arrange distributed ETL administrative center, this distributed ETL administrative center performs collaborative, the load balancing of ETL task by host node, and data engine manages, task management; And coordinate swarm intelligence management engine to complete the synchronous of related data;
ETL service logic is set, is namely received the task of distributed ETL administrative center distribution by each node, collaborative ETL business processing of finishing the work, this business processing comprises data pick-up, data cleansing, conversion, data loading, data backflow, systematic analysis, quality management ETL systemic-function;
Arrange ETL task management, provide graphical task design, namely use visual ETL task design, the metadata store of design is in task metadatabase;
Setting data engine, management Various types of data source connects driving; For all kinds of metadata store of ETL system itself provide database to unify memory interface; Complete Distributed Storage unified management;
Setting data stores, and provides business datum storage, user data cache function, and these data store and use distributed memory storage and high speed hard-disk to store;
The transmission of principal and subordinate Client-Server data is set, uses Client to obtain source data in data source, then connect the Server port of distributed ETL system, complete convergence and collect;
Configuration Manager is set, namely provide can be mutual WEB UI interface, configuration management and user management are unified to cluster;
Log pattern is set, all kinds of daily records that cluster generates by this log pattern, imports log pattern and carry out unified management, and the statistical study of daily record is provided.
For above-mentioned steps, be described in detail below:
One, the setting of distributed ETL all-in-one hardware cluster.
Distributed ETL all-in-one hardware cluster framework as shown in Figure 3, cluster uses MS master-slave formula (Master-Slave) structure, whole cluster is by a host node, multiplely to form from node, from node, selecting a node as standby host node simultaneously, all kinds of managing configuration information of instant synchronization master, carry out hot standby.After host node breaks down disengaging cluster, standby host node switches to master node roles, takes over the ETL task that host node manages whole cluster, selects a node as standby host node from residue from node simultaneously.
For ETL business characteristic, cluster each Joint Enterprise Large Copacity internal memory, directly can complete ETL business procedure in internal memory and data store.Be equipped with Large ca-pacity and high speed hard-disk simultaneously, as data buffer storage pond, to adapt to the storage of super amount data, also can play the effect of data center in a special case.Cluster internal uses 10,000,000,000 grades to connect with uplink, ensures internal exchange of data speed, simultaneously each Joint Enterprise many network links, by host node unified management, and can the data of the same data source of parallel clustering.
Two, the setting of swarm intelligence management engine.
Swarm intelligence management engine is as the tension management person of hardware cluster, and the hardware resource such as internal memory, hard disk, network of unified management cluster, responsible node is expanded simultaneously, two-node cluster hot backup, the functions such as standby host node selection.
Swarm intelligence management engine is also for upper strata provides support service: 1, provide resource dispatching strategy, as strategies such as equity dispatching, ability scheduling, delay dispatching, primary resource equity dispatchings.2, distributed communication and comport interface is provided, as election algorithm, serializing, far call, Gossip service, Zookeeper service etc.3, cluster resource monitoring.
On the whole, swarm intelligence management engine as the interface in the middle of hardware layer and ETL operation system, for ETL business provides all supportings.
Three, the setting of distributed ETL administrative center.
Distributed ETL administrative center is the higher management of whole system, is performed the function of this role by host node.Main implementation content has:
1) ETL task scheduling.To the task that each user submits to, carry out United Dispatching execution according to predetermined policy.
2) ETL task scheduling.To the task that will perform, according to Data distribution8 feature, the information such as each node state, carry out rational Task-decomposing and Data Segmentation, then distribute to each node and perform.
3) ETL task management monitoring.Monitor task operational process, gathers mission bit stream to user.
4) load balancing.According to Mission Monitor information and each node resource state of current cluster, dynamic conditioning task matching.
5) error handle and fault recovery.When there is task run mistake and node failure, redistribute task.
6) data engine management.Built-in Various types of data source engine, and new data engine can be added, be stored in data engine storehouse, drive engine by administrative center's unified management Various types of data source.
7) historic task record and management.
Four, the setting of ETL service logic.
ETL service logic is responsible for ETL business processing, mainly completes all processes of ETL, namely extracts (E logic), conversion (T logic), loads (L logic), be responsible for data backflow, systematic analysis, the contents such as data quality management simultaneously.
E logic is responsible for data extraction process, and all-in-one can complete full dose or the increment extraction of off-line data, and flow data extracts in real time.In addition, the Schema of source data can be extracted, stored in external metadata storehouse, and can edit-modify be carried out.
ETL service logic is distributed in each child node, is undertaken unifying to start management by distributed ETL administrative center.
Five, the setting of ETL task management.
ETL task management uses visual ETL task design, and the metadata store of design is in task metadatabase, and conveniently many people's exploitations, provide co-development function, and use Version Control to manage version iteration.
Six, the setting of data storage.
Data storage is the key of ETL system.The data storage of this integrated machine system is divided into two parts: business datum and user data.
1) business datum: mainly self data in system application process, have system configuration data, task metadata, transfer function, data engine, the Schema data that external data source extracts in addition are also stored in Service Database.
Business datum can use traditional database to store, and conveniently utilizes the management function that traditional database is powerful.
2) user data: the object of ETL process and object are exactly user data.Consider that ETL system is stream compression processing enter instead of storage center, the main internal memory that uses, as data storage medium, by using distributed memory storage system, can reach the data reading speed of memory hierarchy.
In addition, memory capacity is little, and cost is high, and can not replace hard disk role completely, native system uses high-speed high capacity hard disk simultaneously, as Buffer Pool during a large amount of off-line data of process, when low memory, uses hard disk as external cache.
Seven, the setting of Client-Server data transmission.
In general, native system can carry out data extraction process by the direct connection data source of data engine.But some data source can be there is can not directly connect, or derive more easily with efficient at data source end data.Native system can be installed after Client carries out data acquisition in data source and send to the Server of integrated machine system to hold by its data fast transport mode.Server end is configured on an estrade node of all-in-one cluster, is obtained data and is distributed to other node carry out next step ETL process by it.
Eight, cluster configuration management and log management.
Provide convenience the Web UI of user operation, carries out configuration management and the log management of cluster
Cluster configuration management mainly contains configuration and imports and exports, user management, configuration HA etc.
Log management module uses Syslog agreement, completes daily record generation, daily record warehouse-in, log statistic analysis etc.
Above-mentioned embodiment is only concrete case of the present invention; scope of patent protection of the present invention includes but not limited to above-mentioned embodiment; claims of any a kind of distributed ETL integrated machine system according to the invention and the those of ordinary skill of any described technical field to its suitable change done or replacement, all should fall into scope of patent protection of the present invention.

Claims (6)

1. a distributed ETL integrated machine system, is characterized in that, its specific implementation process is:
Arrange distributed ETL all-in-one hardware system, this hardware system comprises server cluster, uses multiple stage to be applicable to the server of the transmission of large data and stores processor, builds dynamical ETL processing hardware platform; The cluster of above-mentioned hardware system uses MS master-slave formula structure, and namely whole cluster comprises a host node, some from node;
Swarm intelligence management engine is set as the interface in the middle of hardware layer and ETL operation system, for ETL business provides all supportings, this swarm intelligence management engine is also as the tension management person of hardware cluster, the internal memory of unified management cluster, hard disk, network hardware resources, the function of responsible node expansion simultaneously, two-node cluster hot backup, standby host node selection, cluster monitoring;
In host node, arrange distributed ETL administrative center, this distributed ETL administrative center performs collaborative, the load balancing of ETL task by host node, and data engine manages, task management; And coordinate swarm intelligence management engine to complete the synchronous of related data;
ETL service logic is set, is namely received the task of distributed ETL administrative center distribution by each node, collaborative ETL business processing of finishing the work, this business processing comprises data pick-up, data cleansing, conversion, data loading, data backflow, systematic analysis, quality management ETL systemic-function;
Arrange ETL task management, provide graphical task design, namely use visual ETL task design, the metadata store of design is in task metadatabase;
Setting data engine, management Various types of data source connects driving; For all kinds of metadata store of ETL system itself provide database to unify memory interface; Complete Distributed Storage unified management;
Setting data stores, and provides business datum storage, user data cache function, and these data store and use distributed memory storage and high speed hard-disk to store;
The transmission of principal and subordinate Client-Server data is set, uses Client to obtain source data in data source, then connect the Server port of distributed ETL system, complete convergence and collect;
Configuration Manager is set, namely provide can be mutual WEB UI interface, configuration management and user management are unified to cluster;
Log pattern is set, all kinds of daily records that cluster generates by this log pattern, imports log pattern and carry out unified management, and the statistical study of daily record is provided.
2. the distributed ETL integrated machine system of one according to claim 1, it is characterized in that, the cluster of described hardware system is selecting a node as standby host node from node, and all kinds of managing configuration information of the timely synchronization master of this standby host node, carry out hot standby; After host node breaks down disengaging cluster, standby host node switches to master node roles, takes over the ETL task that host node manages whole cluster, selects a node as standby host node from residue from node simultaneously;
Described cluster each Joint Enterprise Large Copacity internal memory, directly carries out ETL business procedure in internal memory and data store; Be equipped with Large ca-pacity and high speed hard-disk, as data buffer storage pond, to adapt to the storage of super amount data simultaneously; Cluster internal uses 10,000,000,000 grades to connect with uplink, ensures internal exchange of data speed, simultaneously each Joint Enterprise many network links, by host node unified management, and can the data of the same data source of parallel clustering;
Described swarm intelligence management engine provides following service for upper strata ETL node: provide resource dispatching strategy; Distributed communication and comport interface are provided; Cluster resource is monitored.
3. the distributed ETL integrated machine system of one according to claim 1, is characterized in that, described distributed ETL administrative center is performed by host node and realizes its function, and its function and implementation procedure comprise:
ETL task scheduling, to the task that each user submits to, carries out United Dispatching execution according to predetermined policy;
ETL task scheduling, to the task that will perform, according to Data distribution8 feature, each node status information, carries out rational Task-decomposing and Data Segmentation, then distributes to each node and performs;
ETL task management is monitored, and monitor task operational process, gathers mission bit stream to user;
Load balancing, according to Mission Monitor information and each node resource state of current cluster, dynamic conditioning task matching;
Error handle and fault recovery, when there is task run mistake and node failure, redistribute task;
Data engine manages, built-in Various types of data source engine, and adds new data engine, is stored in data engine storehouse, drives engine by administrative center's unified management Various types of data source;
Historic task record and management.
4. the distributed ETL integrated machine system of one according to claim 1, it is characterized in that, described ETL service logic is responsible for ETL business processing, mainly complete all processes of ETL, i.e. process, the process of T logical transition, the process of L logic loading of the extraction of E logic, be responsible for data backflow, systematic analysis, data quality management simultaneously; Above-mentioned E logic is responsible for data extraction process, and this hardware system completes full dose or the increment extraction of off-line data, and flow data extracts in real time.
5. the distributed ETL integrated machine system of one according to claim 1, it is characterized in that, described ETL task management uses visual ETL task design, the metadata store of design is in task metadatabase, conveniently many people's exploitations, co-development function is provided, and uses Version Control to manage version iteration.
6. the distributed ETL integrated machine system of one according to claim 1, it is characterized in that, described data are stored as the key of whole ETL system, its content stored is divided into two parts: Service Database and customer data base, the data that wherein in Service Database storage system applications process, self produces, comprise system configuration data, task metadata, transfer function, data engine, the Schema data that external data source extracts also are stored in Service Database; The object of user data library storage ETL process and object, this customer data base uses internal memory as data storage medium, by using distributed memory storage system; Use high speed hard-disk as Buffer Pool during a large amount of off-line data of process, when low memory, use hard disk as external cache.
CN201410774178.1A 2014-12-16 2014-12-16 Distributed ETL all-in-one machine system Pending CN104391989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410774178.1A CN104391989A (en) 2014-12-16 2014-12-16 Distributed ETL all-in-one machine system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410774178.1A CN104391989A (en) 2014-12-16 2014-12-16 Distributed ETL all-in-one machine system

Publications (1)

Publication Number Publication Date
CN104391989A true CN104391989A (en) 2015-03-04

Family

ID=52609893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410774178.1A Pending CN104391989A (en) 2014-12-16 2014-12-16 Distributed ETL all-in-one machine system

Country Status (1)

Country Link
CN (1) CN104391989A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717294A (en) * 2015-03-23 2015-06-17 浪潮集团有限公司 Data extracting method, main server and cluster
CN105069170A (en) * 2015-08-31 2015-11-18 中国科学院遥感与数字地球研究所 Mass spacial information storage and service integrated machine system
CN105227683A (en) * 2015-11-11 2016-01-06 中国建设银行股份有限公司 A kind of LDAP company-data synchronous method and system
CN105447097A (en) * 2015-11-10 2016-03-30 北京北信源软件股份有限公司 Data acquisition method and system
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet
CN106446271A (en) * 2016-10-20 2017-02-22 焦点科技股份有限公司 Design method of BI data interface
CN106599244A (en) * 2016-12-20 2017-04-26 飞狐信息技术(天津)有限公司 Universal original log cleaning device and method
CN106780149A (en) * 2016-12-30 2017-05-31 中核核电运行管理有限公司 A kind of equipment real-time monitoring system based on timed task scheduling
CN106921755A (en) * 2017-05-15 2017-07-04 浪潮软件股份有限公司 A kind of enterprise data integration cloud console, realization method and system
CN107204892A (en) * 2017-04-12 2017-09-26 北京国电通网络技术有限公司 Power telecom network service data processing method and processing device
CN107332926A (en) * 2017-07-28 2017-11-07 郑州云海信息技术有限公司 A kind of application server cluster starts method and device
CN107463610A (en) * 2017-06-27 2017-12-12 北京小度信息科技有限公司 A kind of data storage method and device
CN107463664A (en) * 2017-08-01 2017-12-12 山东浪潮云服务信息科技有限公司 A kind of ETL processing method and processing devices based on government data collection
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform
WO2018036332A1 (en) * 2016-08-22 2018-03-01 中兴通讯股份有限公司 Distributed data etl processing method and apparatus
CN108090844A (en) * 2017-11-20 2018-05-29 广东电网有限责任公司电力调度控制中心 A kind of power distribution network scheduling information reflow method and system
CN108306916A (en) * 2017-01-13 2018-07-20 江苏云创融合信息科技有限公司 Big data multi-internet integration scientific research all-in-one machine stage apparatus
CN108595480A (en) * 2018-03-13 2018-09-28 广州市优普科技有限公司 A kind of big data ETL tool systems and application process based on cloud computing
CN108921728A (en) * 2018-07-03 2018-11-30 北京科东电力控制***有限责任公司 Distributed real-time database system based on power network dispatching system
CN109117285A (en) * 2018-07-27 2019-01-01 高新兴科技集团股份有限公司 Support the distributed memory computing cluster system of high concurrent
CN109614448A (en) * 2018-11-09 2019-04-12 南京软智信息技术有限公司 A kind of data warehouse management system
CN109669975A (en) * 2018-11-09 2019-04-23 成都数之联科技有限公司 A kind of industry big data processing system and method
CN110941657A (en) * 2019-11-08 2020-03-31 支付宝(杭州)信息技术有限公司 Service data processing method and device
CN111061715A (en) * 2019-12-16 2020-04-24 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN111581254A (en) * 2020-05-03 2020-08-25 上海维信荟智金融科技有限公司 ETL method and system based on internet financial data
CN112307396A (en) * 2020-10-21 2021-02-02 五凌电力有限公司 Platform architecture based on multi-engine data modeling calculation analysis and processing method thereof
CN113282649A (en) * 2020-02-19 2021-08-20 北京国双科技有限公司 Distributed task processing method and device and computer equipment
CN113407633A (en) * 2018-09-13 2021-09-17 华东交通大学 Distributed data source heterogeneous synchronization method
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip
CN115357657A (en) * 2022-10-24 2022-11-18 成都数联云算科技有限公司 Data processing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8417762B2 (en) * 2007-04-10 2013-04-09 International Business Machines Corporation Mechanism for execution of multi-site jobs in a data stream processing system
CN103383750A (en) * 2012-05-04 2013-11-06 山西省电力公司阳泉供电公司 Power grid summarized information organic integrated platform
CN103944964A (en) * 2014-03-27 2014-07-23 上海云数信息科技有限公司 Distributed system and method carrying out expansion step by step through same
CN104035522A (en) * 2014-06-16 2014-09-10 南京云创存储科技有限公司 Large database appliance

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8417762B2 (en) * 2007-04-10 2013-04-09 International Business Machines Corporation Mechanism for execution of multi-site jobs in a data stream processing system
CN103383750A (en) * 2012-05-04 2013-11-06 山西省电力公司阳泉供电公司 Power grid summarized information organic integrated platform
CN103944964A (en) * 2014-03-27 2014-07-23 上海云数信息科技有限公司 Distributed system and method carrying out expansion step by step through same
CN104035522A (en) * 2014-06-16 2014-09-10 南京云创存储科技有限公司 Large database appliance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋杰 等: ""基于MapReduce的分布式ETL体系结构研究"", 《计算机科学》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717294A (en) * 2015-03-23 2015-06-17 浪潮集团有限公司 Data extracting method, main server and cluster
CN105069170A (en) * 2015-08-31 2015-11-18 中国科学院遥感与数字地球研究所 Mass spacial information storage and service integrated machine system
CN105069170B (en) * 2015-08-31 2019-02-15 中国科学院遥感与数字地球研究所 A kind of storage of mass spatial information and service integrated machine system
CN105447097A (en) * 2015-11-10 2016-03-30 北京北信源软件股份有限公司 Data acquisition method and system
CN105227683A (en) * 2015-11-11 2016-01-06 中国建设银行股份有限公司 A kind of LDAP company-data synchronous method and system
CN105468735A (en) * 2015-11-23 2016-04-06 武汉虹旭信息技术有限责任公司 Stream preprocessing system and method based on mass information of mobile internet
WO2018036332A1 (en) * 2016-08-22 2018-03-01 中兴通讯股份有限公司 Distributed data etl processing method and apparatus
CN107766387A (en) * 2016-08-22 2018-03-06 南京中兴软件有限责任公司 A kind of distributed data ETL processing method and processing devices
CN106446271B (en) * 2016-10-20 2019-08-27 焦点科技股份有限公司 A kind of BI data file interface design method
CN106446271A (en) * 2016-10-20 2017-02-22 焦点科技股份有限公司 Design method of BI data interface
CN106599244B (en) * 2016-12-20 2024-01-05 飞狐信息技术(天津)有限公司 General original log cleaning device and method
CN106599244A (en) * 2016-12-20 2017-04-26 飞狐信息技术(天津)有限公司 Universal original log cleaning device and method
CN106780149A (en) * 2016-12-30 2017-05-31 中核核电运行管理有限公司 A kind of equipment real-time monitoring system based on timed task scheduling
CN108306916A (en) * 2017-01-13 2018-07-20 江苏云创融合信息科技有限公司 Big data multi-internet integration scientific research all-in-one machine stage apparatus
CN107204892B (en) * 2017-04-12 2020-07-21 北京国电通网络技术有限公司 Power communication network operation data processing method and device
CN107204892A (en) * 2017-04-12 2017-09-26 北京国电通网络技术有限公司 Power telecom network service data processing method and processing device
CN106921755B (en) * 2017-05-15 2020-04-28 浪潮软件股份有限公司 Enterprise data integration cloud console, implementation method and system
CN106921755A (en) * 2017-05-15 2017-07-04 浪潮软件股份有限公司 A kind of enterprise data integration cloud console, realization method and system
CN107463610B (en) * 2017-06-27 2021-01-26 北京星选科技有限公司 Data warehousing method and device
CN107463610A (en) * 2017-06-27 2017-12-12 北京小度信息科技有限公司 A kind of data storage method and device
CN107332926A (en) * 2017-07-28 2017-11-07 郑州云海信息技术有限公司 A kind of application server cluster starts method and device
CN107463664A (en) * 2017-08-01 2017-12-12 山东浪潮云服务信息科技有限公司 A kind of ETL processing method and processing devices based on government data collection
CN107733986B (en) * 2017-09-15 2021-01-26 中国南方电网有限责任公司 Protection operation big data supporting platform supporting integrated deployment and monitoring
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform
CN108090844A (en) * 2017-11-20 2018-05-29 广东电网有限责任公司电力调度控制中心 A kind of power distribution network scheduling information reflow method and system
CN108595480B (en) * 2018-03-13 2022-01-21 广州市优普科技有限公司 Big data ETL tool system based on cloud computing and application method
CN108595480A (en) * 2018-03-13 2018-09-28 广州市优普科技有限公司 A kind of big data ETL tool systems and application process based on cloud computing
CN108921728A (en) * 2018-07-03 2018-11-30 北京科东电力控制***有限责任公司 Distributed real-time database system based on power network dispatching system
CN109117285A (en) * 2018-07-27 2019-01-01 高新兴科技集团股份有限公司 Support the distributed memory computing cluster system of high concurrent
CN113407633A (en) * 2018-09-13 2021-09-17 华东交通大学 Distributed data source heterogeneous synchronization method
CN109669975B (en) * 2018-11-09 2020-12-18 成都数之联科技有限公司 Industrial big data processing system and method
CN109614448A (en) * 2018-11-09 2019-04-12 南京软智信息技术有限公司 A kind of data warehouse management system
CN109669975A (en) * 2018-11-09 2019-04-23 成都数之联科技有限公司 A kind of industry big data processing system and method
CN110941657A (en) * 2019-11-08 2020-03-31 支付宝(杭州)信息技术有限公司 Service data processing method and device
CN110941657B (en) * 2019-11-08 2023-03-31 支付宝(杭州)信息技术有限公司 Service data processing method and device
CN111061715B (en) * 2019-12-16 2022-07-01 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN111061715A (en) * 2019-12-16 2020-04-24 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN113282649A (en) * 2020-02-19 2021-08-20 北京国双科技有限公司 Distributed task processing method and device and computer equipment
CN111581254A (en) * 2020-05-03 2020-08-25 上海维信荟智金融科技有限公司 ETL method and system based on internet financial data
CN112307396A (en) * 2020-10-21 2021-02-02 五凌电力有限公司 Platform architecture based on multi-engine data modeling calculation analysis and processing method thereof
CN114363357A (en) * 2021-12-28 2022-04-15 山东浪潮科学研究院有限公司 Distributed database network connection management method based on Gossip
CN114363357B (en) * 2021-12-28 2024-01-19 上海沄熹科技有限公司 Distributed database network connection management method based on Gossip
CN115357657A (en) * 2022-10-24 2022-11-18 成都数联云算科技有限公司 Data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104391989A (en) Distributed ETL all-in-one machine system
CN106202346B (en) A kind of data load cleaning engine, scheduling and storage system
CN105005570B (en) Magnanimity intelligent power data digging method and device based on cloud computing
CN109656911A (en) Distributed variable-frequencypump Database Systems and its data processing method
CN104050042B (en) The resource allocation methods and device of ETL operations
CN109379420B (en) Comprehensive energy service platform system based on distributed architecture
CN102609446B (en) Distributed Bloom filter system and application method thereof
US20210004712A1 (en) Machine Learning Performance and Workload Management
CN103581332B (en) HDFS framework and pressure decomposition method for NameNodes in HDFS framework
CN106528341B (en) Automation disaster tolerance system based on Greenplum database
Ali et al. Recent trends in distributed online stream processing platform for big data: Survey
CN105554123A (en) High-capacity-aware cloud computing platform system
CN102929769A (en) Virtual machine internal-data acquisition method based on agency service
CN114328688A (en) Management and control platform for electric power energy big data
CN104281980B (en) Thermal power generation unit remote diagnosis method and system based on Distributed Calculation
CN110083306A (en) A kind of distributed objects storage system and storage method
CN103036952B (en) A kind of enterprise-level isomery merges storage management system
Perera et al. Database scaling on Kubernetes
Li et al. Hadoop-Based University Ideological and Political Big Data Platform Design and Behavior Pattern Mining
CN107239369A (en) A kind of database filing standby system and method
Wang et al. Power grid data monitoring and analysis system based on edge computing
CN105677853A (en) Data storage method and device based on big data technology framework
Jijun et al. Research on Multi-layer Power Enterprise Data Management Architecture Based on Big Data
Nie Application of virtualization platform in informationization of higher vocational colleges
Tan Application of MongoDB technology in NoSQL database in video intelligent big data analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150304

WD01 Invention patent application deemed withdrawn after publication