CN113111106A - ETL design data access method and data access module based on Web - Google Patents
ETL design data access method and data access module based on Web Download PDFInfo
- Publication number
- CN113111106A CN113111106A CN202110367312.6A CN202110367312A CN113111106A CN 113111106 A CN113111106 A CN 113111106A CN 202110367312 A CN202110367312 A CN 202110367312A CN 113111106 A CN113111106 A CN 113111106A
- Authority
- CN
- China
- Prior art keywords
- data
- web
- data access
- component
- etl
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000013461 design Methods 0.000 title claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000012544 monitoring process Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Abstract
The invention discloses a Web-based ETL design data access method and a data access module, wherein the method comprises the following steps: defining each Flow through a Flow class; representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component; different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process; storing the parameter information in a Json character string mode; the component belongs to a component for data processing in the FlowLink class; the ID of the next component is used to string the FlowLink into a complete Flow. The invention can configure and process more complex data access, is suitable for various service scenes, realizes batch reading and complex processing of big data, improves data processing performance and furthest ensures complete and accurate access of the data.
Description
Technical Field
The invention relates to the field of big data access, in particular to an ETL design data access method based on Web.
Background
Data access Service (DIS for short) is an essential key link in the process of landing a large Data platform. In the face of various sources and various types of data, the scattered data needs to be integrated together through data access and incorporated into a unified big data platform. From a data type perspective, data access mainly includes access to structured data (database), log data, IoT data, and files. The data access faces more service scenes, and the mode types of data sources are unknown; in addition, the amount of data may change repeatedly during data access, and thus the stability of data access may affect the performance of the system.
An ETL (Extract Transform Load, data warehouse technology) is a process of data extraction (Extract), transformation (Transform), and loading (Load), which is an important loop for constructing a data warehouse, and a user extracts required data from a data source, and loads the data into the data warehouse according to a predefined data warehouse model after data cleaning and transformation.
The method firstly responds to a received access request sent by a data access interface through Web Service interface Service, acquires target monitoring data corresponding to the access request in monitoring data provided by monitoring equipment, then sends the target monitoring data in batches by calling the Web Service interface Service, and writes each part of the target monitoring data sent in batches into a real-time/historical database, so that the monitoring equipment of each manufacturer can directly access the monitoring data into the real-time/historical database through the method and the device provided by the scheme, and the problem that the bottom API of the real-time/historical database needs to be directly called is avoided by using the Web Service interface Service in the access process. But the scheme has low efficiency of accessing complex data, thereby affecting the performance of the whole system.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a Web-based ETL design data access method and a data access module.
The purpose of the invention is realized by the following technical scheme:
a Web-based ETL design data access method comprises the following steps:
defining each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
Further, the parameter information is stored in a Json character string mode.
Further, the component belongs to a component for data processing in the FlowLink class.
Further, the ID of the next component is used to string the FlowLink into a complete Flow.
Furthermore, each process comprises an input source node, N data conversion nodes and an output source node; the input source node is used for reading data; the data conversion node is used for realizing the processing of data content; and the output source node is used for storing data in a storage mode.
A data access module adopting an ETL design data access method based on Web comprises a Web-ETL designer and a Web-ETL actuator; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
The invention has the beneficial effects that: the method can configure and process the data access with more complex processing, is suitable for various service scenes, and realizes batch reading and complex processing of big data so as to improve the data processing performance and furthest ensure the complete and accurate access of the data.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, an ETL design data access method based on Web defines each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
And storing the parameter information in a Json character string mode.
The component is a component for data processing in the FlowLink class, different components have different processing classes for processing, such as a table input component, namely, the database connection is realized, and data is read from a specified data table; the file input component realizes the connection of the file server, acquires and analyzes the file and the like.
Wherein the ID of the next component is used to string the FlowLink into a complete Flow.
Each process comprises an input source node, N data conversion nodes and an output source node.
The input source node is used for reading data; when data is read, the component class acquires data in batches according to the data quantity of the read target table, transmits the data to the next component node after the data is read, and starts to read the next batch of data, so that the problem of memory overflow possibly occurring when the data is acquired at one time in large batch is avoided, and the data processing performance is improved.
Wherein, the data conversion node is used for realizing the processing of data content; such as data type conversion (Number- > String, Date- > String … …). When data is converted, the component takes the output content of the previous node as data input, the data is output to the next node after being processed by the service logic of the component, and each type of data conversion node only does the service logic of the node concerned node.
The output source node is used for storing data in a storage mode, when the data are stored, the assembly obtains the output content of the previous node, and the data are written and stored in the data table according to the matching relation configured by the nodes and the specified target database.
One process is composed of a plurality of process nodes (i.e., process links). Each flow node corresponds to a different component, thereby implementing a corresponding node function. Each process link has only one specific function, and a plurality of components are connected together to form a chain process. Each process link only concerns the processing process configuration of the node, and only has the relation of input and output contents with the upstream node or the downstream node, so that the user-defined configuration of the whole data process can be achieved through different combinations of the process links. The complex data processing processes such as cleaning, conversion, filtering, screening and the like are completed through three types of nodes, namely an input source, an output source and a processing source, the nodes can be customized and expanded, and each node only focuses on the required attributes of the service and the input and output contents.
In this embodiment, a data access module adopting an ETL design data access method based on Web includes a Web-ETL designer and a Web-ETL executor; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (6)
1. A Web-based ETL design data access method is characterized by comprising the following steps:
defining each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
2. The Web-based ETL design data access method of claim 1, wherein said parameter information is stored in a Json string manner.
3. The Web-based ETL design data access method according to claim 1, wherein said component belongs to a component for data processing in FlowLink class.
4. The Web-based ETL design data access method of claim 1, wherein the ID of said next component is used to string the FlowLink into a complete Flow.
5. The Web-based ETL design data access method of claim 1, wherein each flow comprises one input source node, N data conversion nodes and one output source node; the input source node is used for reading data; the data conversion node is used for realizing the processing of data content; and the output source node is used for storing data in a storage mode.
6. A data access module adopting the Web-based ETL design data access method of any one of claims 1-5, which is characterized by comprising a Web-ETL designer and a Web-ETL executor; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110367312.6A CN113111106A (en) | 2021-04-06 | 2021-04-06 | ETL design data access method and data access module based on Web |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110367312.6A CN113111106A (en) | 2021-04-06 | 2021-04-06 | ETL design data access method and data access module based on Web |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113111106A true CN113111106A (en) | 2021-07-13 |
Family
ID=76714094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110367312.6A Pending CN113111106A (en) | 2021-04-06 | 2021-04-06 | ETL design data access method and data access module based on Web |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111106A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880385A (en) * | 2021-07-27 | 2022-08-09 | 云南省地质环境监测院(云南省环境地质研究院) | Method and device for accessing geological disaster data through automatic combined flow |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101567013A (en) * | 2009-06-02 | 2009-10-28 | 阿里巴巴集团控股有限公司 | Method and apparatus for implementing ETL scheduling |
US20090281865A1 (en) * | 2008-05-08 | 2009-11-12 | Todor Stoitsev | Method and system to manage a business process |
CN109669976A (en) * | 2018-11-22 | 2019-04-23 | 武汉达梦数据库有限公司 | Data service method and equipment based on ETL |
-
2021
- 2021-04-06 CN CN202110367312.6A patent/CN113111106A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090281865A1 (en) * | 2008-05-08 | 2009-11-12 | Todor Stoitsev | Method and system to manage a business process |
CN101567013A (en) * | 2009-06-02 | 2009-10-28 | 阿里巴巴集团控股有限公司 | Method and apparatus for implementing ETL scheduling |
CN109669976A (en) * | 2018-11-22 | 2019-04-23 | 武汉达梦数据库有限公司 | Data service method and equipment based on ETL |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880385A (en) * | 2021-07-27 | 2022-08-09 | 云南省地质环境监测院(云南省环境地质研究院) | Method and device for accessing geological disaster data through automatic combined flow |
CN114880385B (en) * | 2021-07-27 | 2022-11-22 | 云南省地质环境监测院(云南省环境地质研究院) | Method and device for accessing geological disaster data through automatic combination process |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113111107B (en) | Data comprehensive access system and method | |
US8219518B2 (en) | Method and apparatus for modelling data exchange in a data flow of an extract, transform, and load (ETL) process | |
US20210256079A1 (en) | Adapting database queries for data virtualization over combined database stores | |
CN111176867B (en) | Data sharing exchange and open application platform | |
CN102508919B (en) | Data processing method and system | |
CN112148788A (en) | Data synchronization method and system for heterogeneous data source | |
CN111400288A (en) | Data quality inspection method and system | |
CN111460019A (en) | Data conversion method and middleware of heterogeneous data source | |
CN112162915A (en) | Test data generation method, device, equipment and storage medium | |
US10712731B2 (en) | Control device, control method, and non-transitory computer-readable recording medium | |
CN111966739A (en) | Method and equipment for processing graph data | |
CN113111106A (en) | ETL design data access method and data access module based on Web | |
CN114328278B (en) | Distributed simulation test method, system, readable storage medium and computer equipment | |
CN113111109A (en) | Interface warehousing analysis access method of data source | |
CN113704117A (en) | Algorithm testing system, method and device | |
CN112631754A (en) | Data processing method, data processing device, storage medium and electronic device | |
CN111600776A (en) | TR069 batch interactive test system and method thereof | |
CN116483707A (en) | Test method, test device, test apparatus, test program, and test program | |
CN113111108A (en) | File data source warehousing analysis access method | |
CN114297074A (en) | Method for realizing automatic testing of functions, interfaces and performances based on dynamic configuration | |
CN114416305A (en) | Robot engine implementation method and system and electronic equipment | |
CN110543155B (en) | Manufacturing process management system | |
CN113111105A (en) | Data customized access method and system based on big data | |
CN112445811A (en) | Data service method, device, storage medium and component based on SQL configuration | |
CN116560637B (en) | Method and system for developing application system in configuration form for digital transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210713 |
|
RJ01 | Rejection of invention patent application after publication |