CN113111106A - ETL design data access method and data access module based on Web - Google Patents

ETL design data access method and data access module based on Web Download PDF

Info

Publication number
CN113111106A
CN113111106A CN202110367312.6A CN202110367312A CN113111106A CN 113111106 A CN113111106 A CN 113111106A CN 202110367312 A CN202110367312 A CN 202110367312A CN 113111106 A CN113111106 A CN 113111106A
Authority
CN
China
Prior art keywords
data
web
data access
component
etl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110367312.6A
Other languages
Chinese (zh)
Inventor
陆文斌
周正斌
徐孟宇
周阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Creative Information Technology Co ltd
Original Assignee
Creative Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Creative Information Technology Co ltd filed Critical Creative Information Technology Co ltd
Priority to CN202110367312.6A priority Critical patent/CN113111106A/en
Publication of CN113111106A publication Critical patent/CN113111106A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a Web-based ETL design data access method and a data access module, wherein the method comprises the following steps: defining each Flow through a Flow class; representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component; different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process; storing the parameter information in a Json character string mode; the component belongs to a component for data processing in the FlowLink class; the ID of the next component is used to string the FlowLink into a complete Flow. The invention can configure and process more complex data access, is suitable for various service scenes, realizes batch reading and complex processing of big data, improves data processing performance and furthest ensures complete and accurate access of the data.

Description

ETL design data access method and data access module based on Web
Technical Field
The invention relates to the field of big data access, in particular to an ETL design data access method based on Web.
Background
Data access Service (DIS for short) is an essential key link in the process of landing a large Data platform. In the face of various sources and various types of data, the scattered data needs to be integrated together through data access and incorporated into a unified big data platform. From a data type perspective, data access mainly includes access to structured data (database), log data, IoT data, and files. The data access faces more service scenes, and the mode types of data sources are unknown; in addition, the amount of data may change repeatedly during data access, and thus the stability of data access may affect the performance of the system.
An ETL (Extract Transform Load, data warehouse technology) is a process of data extraction (Extract), transformation (Transform), and loading (Load), which is an important loop for constructing a data warehouse, and a user extracts required data from a data source, and loads the data into the data warehouse according to a predefined data warehouse model after data cleaning and transformation.
The method firstly responds to a received access request sent by a data access interface through Web Service interface Service, acquires target monitoring data corresponding to the access request in monitoring data provided by monitoring equipment, then sends the target monitoring data in batches by calling the Web Service interface Service, and writes each part of the target monitoring data sent in batches into a real-time/historical database, so that the monitoring equipment of each manufacturer can directly access the monitoring data into the real-time/historical database through the method and the device provided by the scheme, and the problem that the bottom API of the real-time/historical database needs to be directly called is avoided by using the Web Service interface Service in the access process. But the scheme has low efficiency of accessing complex data, thereby affecting the performance of the whole system.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a Web-based ETL design data access method and a data access module.
The purpose of the invention is realized by the following technical scheme:
a Web-based ETL design data access method comprises the following steps:
defining each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
Further, the parameter information is stored in a Json character string mode.
Further, the component belongs to a component for data processing in the FlowLink class.
Further, the ID of the next component is used to string the FlowLink into a complete Flow.
Furthermore, each process comprises an input source node, N data conversion nodes and an output source node; the input source node is used for reading data; the data conversion node is used for realizing the processing of data content; and the output source node is used for storing data in a storage mode.
A data access module adopting an ETL design data access method based on Web comprises a Web-ETL designer and a Web-ETL actuator; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
The invention has the beneficial effects that: the method can configure and process the data access with more complex processing, is suitable for various service scenes, and realizes batch reading and complex processing of big data so as to improve the data processing performance and furthest ensure the complete and accurate access of the data.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, an ETL design data access method based on Web defines each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
And storing the parameter information in a Json character string mode.
The component is a component for data processing in the FlowLink class, different components have different processing classes for processing, such as a table input component, namely, the database connection is realized, and data is read from a specified data table; the file input component realizes the connection of the file server, acquires and analyzes the file and the like.
Wherein the ID of the next component is used to string the FlowLink into a complete Flow.
Each process comprises an input source node, N data conversion nodes and an output source node.
The input source node is used for reading data; when data is read, the component class acquires data in batches according to the data quantity of the read target table, transmits the data to the next component node after the data is read, and starts to read the next batch of data, so that the problem of memory overflow possibly occurring when the data is acquired at one time in large batch is avoided, and the data processing performance is improved.
Wherein, the data conversion node is used for realizing the processing of data content; such as data type conversion (Number- > String, Date- > String … …). When data is converted, the component takes the output content of the previous node as data input, the data is output to the next node after being processed by the service logic of the component, and each type of data conversion node only does the service logic of the node concerned node.
The output source node is used for storing data in a storage mode, when the data are stored, the assembly obtains the output content of the previous node, and the data are written and stored in the data table according to the matching relation configured by the nodes and the specified target database.
One process is composed of a plurality of process nodes (i.e., process links). Each flow node corresponds to a different component, thereby implementing a corresponding node function. Each process link has only one specific function, and a plurality of components are connected together to form a chain process. Each process link only concerns the processing process configuration of the node, and only has the relation of input and output contents with the upstream node or the downstream node, so that the user-defined configuration of the whole data process can be achieved through different combinations of the process links. The complex data processing processes such as cleaning, conversion, filtering, screening and the like are completed through three types of nodes, namely an input source, an output source and a processing source, the nodes can be customized and expanded, and each node only focuses on the required attributes of the service and the input and output contents.
In this embodiment, a data access module adopting an ETL design data access method based on Web includes a Web-ETL designer and a Web-ETL executor; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. A Web-based ETL design data access method is characterized by comprising the following steps:
defining each Flow through a Flow class;
representing each process ring section of each process by using different FlowLink types to form an assembly; each FlowLink class records parameter information required by the component, the ID of the component to which the component belongs and the ID of the next component;
different components are combined and connected on a visual interface to form a chained flow so as to configure a more complex data access process.
2. The Web-based ETL design data access method of claim 1, wherein said parameter information is stored in a Json string manner.
3. The Web-based ETL design data access method according to claim 1, wherein said component belongs to a component for data processing in FlowLink class.
4. The Web-based ETL design data access method of claim 1, wherein the ID of said next component is used to string the FlowLink into a complete Flow.
5. The Web-based ETL design data access method of claim 1, wherein each flow comprises one input source node, N data conversion nodes and one output source node; the input source node is used for reading data; the data conversion node is used for realizing the processing of data content; and the output source node is used for storing data in a storage mode.
6. A data access module adopting the Web-based ETL design data access method of any one of claims 1-5, which is characterized by comprising a Web-ETL designer and a Web-ETL executor; the Web-ETL designer provides a visual interface for a user, and can configure an intermediate process of data processing according to a demand scene by dragging and combining components to form a complete complex data access flow; and the Web-ETL executor creates a data access task according to the result configured by the Web-ETL designer, and reads, cleans, converts, filters, screens and stores the data to be accessed.
CN202110367312.6A 2021-04-06 2021-04-06 ETL design data access method and data access module based on Web Pending CN113111106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110367312.6A CN113111106A (en) 2021-04-06 2021-04-06 ETL design data access method and data access module based on Web

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110367312.6A CN113111106A (en) 2021-04-06 2021-04-06 ETL design data access method and data access module based on Web

Publications (1)

Publication Number Publication Date
CN113111106A true CN113111106A (en) 2021-07-13

Family

ID=76714094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110367312.6A Pending CN113111106A (en) 2021-04-06 2021-04-06 ETL design data access method and data access module based on Web

Country Status (1)

Country Link
CN (1) CN113111106A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880385A (en) * 2021-07-27 2022-08-09 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combined flow

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling
US20090281865A1 (en) * 2008-05-08 2009-11-12 Todor Stoitsev Method and system to manage a business process
CN109669976A (en) * 2018-11-22 2019-04-23 武汉达梦数据库有限公司 Data service method and equipment based on ETL

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090281865A1 (en) * 2008-05-08 2009-11-12 Todor Stoitsev Method and system to manage a business process
CN101567013A (en) * 2009-06-02 2009-10-28 阿里巴巴集团控股有限公司 Method and apparatus for implementing ETL scheduling
CN109669976A (en) * 2018-11-22 2019-04-23 武汉达梦数据库有限公司 Data service method and equipment based on ETL

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880385A (en) * 2021-07-27 2022-08-09 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combined flow
CN114880385B (en) * 2021-07-27 2022-11-22 云南省地质环境监测院(云南省环境地质研究院) Method and device for accessing geological disaster data through automatic combination process

Similar Documents

Publication Publication Date Title
CN113111107B (en) Data comprehensive access system and method
US8219518B2 (en) Method and apparatus for modelling data exchange in a data flow of an extract, transform, and load (ETL) process
US20210256079A1 (en) Adapting database queries for data virtualization over combined database stores
CN111176867B (en) Data sharing exchange and open application platform
CN102508919B (en) Data processing method and system
CN112148788A (en) Data synchronization method and system for heterogeneous data source
CN111400288A (en) Data quality inspection method and system
CN111460019A (en) Data conversion method and middleware of heterogeneous data source
CN112162915A (en) Test data generation method, device, equipment and storage medium
US10712731B2 (en) Control device, control method, and non-transitory computer-readable recording medium
CN111966739A (en) Method and equipment for processing graph data
CN113111106A (en) ETL design data access method and data access module based on Web
CN114328278B (en) Distributed simulation test method, system, readable storage medium and computer equipment
CN113111109A (en) Interface warehousing analysis access method of data source
CN113704117A (en) Algorithm testing system, method and device
CN112631754A (en) Data processing method, data processing device, storage medium and electronic device
CN111600776A (en) TR069 batch interactive test system and method thereof
CN116483707A (en) Test method, test device, test apparatus, test program, and test program
CN113111108A (en) File data source warehousing analysis access method
CN114297074A (en) Method for realizing automatic testing of functions, interfaces and performances based on dynamic configuration
CN114416305A (en) Robot engine implementation method and system and electronic equipment
CN110543155B (en) Manufacturing process management system
CN113111105A (en) Data customized access method and system based on big data
CN112445811A (en) Data service method, device, storage medium and component based on SQL configuration
CN116560637B (en) Method and system for developing application system in configuration form for digital transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713

RJ01 Rejection of invention patent application after publication