CN111897865A - Dynamic adjustment method and device for ETL (extract transform load) working load - Google Patents

Dynamic adjustment method and device for ETL (extract transform load) working load Download PDF

Info

Publication number
CN111897865A
CN111897865A CN202010810516.8A CN202010810516A CN111897865A CN 111897865 A CN111897865 A CN 111897865A CN 202010810516 A CN202010810516 A CN 202010810516A CN 111897865 A CN111897865 A CN 111897865A
Authority
CN
China
Prior art keywords
etl
etl software
data
software
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010810516.8A
Other languages
Chinese (zh)
Inventor
张国宇
刘建成
张楠
乔雨倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
ICBC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ICBC Technology Co Ltd filed Critical ICBC Technology Co Ltd
Priority to CN202010810516.8A priority Critical patent/CN111897865A/en
Publication of CN111897865A publication Critical patent/CN111897865A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a dynamic adjustment method and device for ETL (extract transform load) workload, and relates to the technical field of big data. The method comprises the following steps: periodically obtaining performance data of the running environment of the ETL software; normalizing the performance data to obtain characteristic parameters; obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software; and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length. The device is used for executing the method. The dynamic adjustment method and device for the ETL workload provided by the embodiment of the invention improve the working efficiency of ETL software.

Description

Dynamic adjustment method and device for ETL (extract transform load) working load
Technical Field
The invention relates to the technical field of big data, in particular to a dynamic adjustment method and device for ETL (extract transform load) workload.
Background
ETL (Extract-Transform-Load) is used to describe the process of extracting, converting, and loading data from a source to a destination. The purpose is to integrate scattered, messy and non-uniform data in enterprises for integral data analysis.
The working principle of the ETL software is that a part of data is acquired from a source end, is loaded to a target end after being processed in a memory, and then the process is continuously executed in a circulating mode. Before the ETL software is used, the number of pieces of data acquired from the source end each time needs to be set, and in the prior art, the number is a static parameter and is estimated by a technician through experience. The running environment of the ETL software is dynamically changed, and the number is not matched with the running environment of the ETL software due to the fact that the number is fixed, so that the working efficiency of the ETL software is reduced.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for dynamically adjusting an ETL workload, which can at least partially solve the problems in the prior art.
In one aspect, the present invention provides a method for dynamically adjusting an ETL workload, including:
periodically obtaining performance data of the running environment of the ETL software;
normalizing the performance data to obtain characteristic parameters;
obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software;
and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length.
In another aspect, the present invention provides an apparatus for dynamic adjustment of ETL workload, comprising:
the obtaining unit is used for periodically obtaining performance data of the running environment of the ETL software;
the first processing unit is used for carrying out normalization processing on the performance data to obtain characteristic parameters;
the estimation unit is used for obtaining a step length value according to the characteristic parameters and the dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software;
and the calculating unit is used for calculating the product of the step length numerical value and a preset coefficient to serve as the current extraction step length.
In another aspect, the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the steps of the method for dynamically adjusting an ETL workload according to any of the above embodiments.
In yet another aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the method for dynamically adjusting an ETL workload according to any one of the above embodiments.
The dynamic adjustment method and device for the ETL workload, provided by the embodiment of the invention, periodically obtain the performance data of the running environment of the ETL software, perform normalization processing on the performance data to obtain the characteristic parameters, obtain the step size value according to the characteristic parameters and the dynamic estimation model, and use the product of the step size value and the preset coefficient as the current extraction step size, thereby realizing the dynamic adjustment of the extraction step size of the ETL software based on the running environment of the ETL software and improving the working efficiency of the ETL software.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a flowchart illustrating a method for dynamically adjusting an ETL workload according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for dynamically adjusting an ETL workload according to another embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an apparatus for dynamically adjusting an ETL workload according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of an apparatus for dynamically adjusting an ETL workload according to another embodiment of the present invention.
Fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In order to facilitate understanding of the technical solutions provided in the present application, the following first describes relevant contents of the technical solutions in the present application. In the invention, the number of pieces of data acquired from the source end by the ETL software each time is called the extraction step length. In the prior art, the extraction step length is a static value and is preset. The dynamic adjustment method for the ETL workload provided by the embodiment of the invention dynamically adjusts the extraction step length based on the running environment of the ETL software so as to improve the working efficiency of the ETL software.
Fig. 1 is a schematic flow chart of a method for dynamically adjusting an ETL workload according to an embodiment of the present invention, and as shown in fig. 1, the method for dynamically adjusting an ETL workload according to an embodiment of the present invention includes:
s101, periodically obtaining performance data of the running environment of the ETL software;
specifically, the server may obtain performance data of the operating environment of the ETL software periodically through a performance collection tool, where the performance data includes, but is not limited to, a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software. And the data width corresponding to the ETL software is the data width of a data table in a database processed by the ETL software. The performance tool is pre-set. The period is set according to actual needs, for example, once in 10 seconds, and the embodiment of the invention is not limited.
S102, performing normalization processing on the performance data to obtain characteristic parameters;
specifically, after the server obtains the performance data, the server performs normalization processing on the performance data to obtain a characteristic parameter.
For example, the performance data includes a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software. Performing normalization processing on the performance data comprises calculating and obtaining the CPU utilization rate of the ETL software according to the CPU usage of the ETL software and the total amount of CPUs corresponding to the ETL software; calculating and obtaining the memory utilization rate of the ETL software according to the memory usage of the ETL software and the total memory amount corresponding to the ETL software; calculating and obtaining the bandwidth utilization rate of the ETL software according to the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software; and calculating to obtain the data enrichment rate of the ETL software according to the data width corresponding to the ETL software and the maximum data width. The CPU utilization rate of the ETL software, the memory utilization rate of the ETL software, the bandwidth utilization rate of the ETL software and the data enrichment rate of the ETL software are used as the characteristic parameters
S103, obtaining a step size value according to the characteristic parameters and the dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software;
specifically, after obtaining the characteristic parameters, the server inputs the characteristic parameters into a dynamic estimation model, and outputs a step size value through processing of the dynamic estimation model. Wherein the dynamic estimation model is obtained based on the performance training data training of the ETL software.
And S104, taking the product of the step size numerical value and a preset coefficient as the current extraction step size.
Specifically, after obtaining the step size value, the server calculates a product result of the step size value and a preset coefficient, takes the product result as a current extraction step size, and then updates the extraction step size of the ETL software to the current extraction step size, thereby realizing the dynamic adjustment of the extraction step size of the ETL software at regular intervals. The preset coefficient is set according to an actual situation, and the embodiment of the present invention is not limited.
The dynamic adjustment method for the ETL workload, provided by the embodiment of the invention, regularly obtains the performance data of the running environment of the ETL software, performs normalization processing on the performance data to obtain the characteristic parameters, obtains the step size value according to the characteristic parameters and the dynamic estimation model, and takes the product of the step size value and the preset coefficient as the current extraction step size, thereby realizing the dynamic adjustment of the extraction step size of the ETL software based on the running environment of the ETL software and improving the working efficiency of the ETL software.
Fig. 2 is a schematic flow chart of a method for dynamically adjusting an ETL workload according to another embodiment of the present invention, and as shown in fig. 2, on the basis of the foregoing embodiments, the step of training to obtain a dynamic estimation model based on performance training data of ETL software further includes:
s201, acquiring the performance training data;
specifically, performance acquisition software may be deployed on each server where the ETL software operates, and the CPU usage, the memory usage, the broadband usage, the corresponding data width, the preset extraction step length, and the like when the ETL software operates are acquired as performance raw data, and the performance raw data is subjected to data cleaning to remove abnormal values such as null values and singular values, and then the CPU usage, the memory usage, the broadband usage, the corresponding data width, the preset extraction step length, and the like when the ETL software has a high working efficiency are manually selected as the performance training data. The server may obtain the performance training data. The data size included in the performance training data is set according to actual needs, and the embodiment of the invention is not limited.
S202, performing normalization processing on the performance training data to obtain a feature training set;
specifically, after obtaining the performance training data, the server performs normalization processing on the performance training data to obtain a feature training set.
For example, the performance training data includes the CPU usage of the ETL software, the memory usage of the ETL software, the bandwidth usage of the ETL software, the data width corresponding to the ETL software, and the preset extraction step size of the ETL software. The server calculates the result of dividing the CPU usage of the ETL software by the total CPU of the ETL software operation server to obtain the CPU utilization rate of the ETL software; the server calculates the result of dividing the memory usage of the ETL software by the total memory of the ETL software operation server to obtain the memory usage rate of the ETL software; the server calculates the result of dividing the bandwidth usage of the ETL software by the total bandwidth of the ETL software operating server to obtain the bandwidth utilization rate of the ETL software; the server calculates the result of dividing the data width corresponding to the ETL software by the maximum data width to obtain the data enrichment rate of the ETL software; and the server calculates the extraction step length preset by the ETL software to be divided by the preset value, so as to obtain a step length characteristic value. Each piece of feature training data of the feature training set comprises the CPU utilization rate of the ETL software, the memory utilization rate of the ETL software, the bandwidth utilization rate of the ETL software, the data enrichment rate of the ETL software and the step characteristic value. The CPU utilization rate of the ETL software, the memory utilization rate of the ETL software, the bandwidth utilization rate of the ETL software, the data enrichment rate of the ETL software and the value of the step characteristic value are between 0 and 1. The preset value is set according to practical experience, for example, set to 20 ten thousand, and the embodiment of the present invention is not limited.
S203, obtaining the dynamic estimation model according to the feature training set and the initial model.
Specifically, the server may input the feature training set into an initial model, and train to obtain the dynamic estimation model. Wherein the initial model is preset, and the initial model includes but is not limited to a linear regression model, a deep learning model, and a neural network model.
For example, the initial model employs a linear regression model, represented as [ x ]1,x2,x3,...,xn]×[b1,b2,b3,...,bn]=y,x1,x2,x3,...,xnAs input variables, b1,b2,b3,...,bnIs the weight coefficient and y is the output variable.
The feature training set comprises 4 pieces of feature training data, each piece of feature training data comprises a CPU utilization rate c of ETL software, a memory utilization rate m of ETL software, a bandwidth utilization rate n of ETL software, a data enrichment rate w of ETL software and a step length feature value u, and the 4 pieces of feature data are brought into the linear regression model to obtain the following expression:
Figure BDA0002630813790000051
solving the above expression to obtain the weight coefficient b1,b2,b3,b4Thereby obtaining the dynamic estimation model.
On the basis of the foregoing embodiments, further, the acquiring the performance training data includes:
and (4) performing data cleaning on the performance original data to remove abnormal values.
Specifically, in the process of acquiring the performance training data, the server performs data cleaning on the acquired performance raw data to remove abnormal values in the performance raw data, such as null values and singular values, where the singular values refer to values outside a normal range.
On the basis of the foregoing embodiments, further, the performance data includes a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software; correspondingly, the normalizing the performance data to obtain the characteristic parameters includes:
calculating and obtaining the CPU utilization rate of the ETL software according to the CPU usage of the ETL software and the total amount of CPUs corresponding to the ETL software;
calculating and obtaining the memory utilization rate of the ETL software according to the memory usage of the ETL software and the total memory amount corresponding to the ETL software;
calculating and obtaining the bandwidth utilization rate of the ETL software according to the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software;
and calculating to obtain the data enrichment rate of the ETL software according to the data width corresponding to the ETL software and the maximum data width.
Specifically, the performance data may include a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software.
And the server calculates the quotient of the CPU usage of the ETL software and the total CPU amount corresponding to the ETL software, so as to realize the normalization processing of the CPU usage of the ETL software, wherein the quotient is used as the CPU utilization rate of the ETL software. The total amount of the CPU corresponding to the ETL software is the total amount of the CPU of the ETL software running server, and is obtained in advance.
And the server calculates the quotient of the memory usage of the ETL software and the total memory amount corresponding to the ETL software, so as to realize the normalization processing of the memory usage of the ETL software, wherein the quotient is used as the memory usage rate of the ETL software. The total amount of the memory corresponding to the ETL software is the total amount of the memory of the ETL software running server, and is obtained in advance.
And the server calculates the quotient of the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software, so as to realize the normalization processing of the bandwidth usage of the ETL software, wherein the quotient is used as the bandwidth utilization rate of the ETL software. The bandwidth total amount corresponding to the ETL software is the bandwidth total amount of the ETL software running server and is obtained in advance.
And the server calculates a quotient value of the data width corresponding to the ETL software and the maximum data width to realize the normalization processing of the data width corresponding to the ETL software, wherein the quotient value is used as the data enrichment rate of the ETL software. The maximum data width is preset and is set according to practical experience, and the embodiment of the invention is not limited.
And the CPU utilization rate of the ETL software, the memory utilization rate of the ETL software, the bandwidth utilization rate of the ETL software and the data enrichment rate of the ETL software are used as the characteristic parameters.
Further, after obtaining the CPU utilization C of the ETL software, the memory utilization M of the ETL software, the bandwidth utilization N of the ETL software, and the data enrichment W of the ETL software, the server inputs the above feature parameters into a dynamic estimation model obtained by training a linear regression model, as shown in the following formula:
[C,M,N,W]×[b1,b2,b3,b4]=U
and obtaining a step size value U through calculation, and multiplying the step size value U by a preset coefficient k to obtain the current extraction step size kU. Wherein k is equal to the preset value.
According to the dynamic adjustment method of the ETL workload, provided by the embodiment of the invention, the performance data of the running environment of the ETL software and the corresponding extraction step length are analyzed by a machine learning method to obtain a dynamic estimation model, then the dynamic estimation model is embedded into the ETL software, the current extraction step length is obtained by periodically collecting the performance data of the running environment of the ETL software and the dynamic estimation model in the running process of the ETL software, and the current extraction step length is configured to the ETL software, so that the working efficiency of the ELT software is improved. By periodically calculating the extraction step length, the dynamic configuration of the extraction step length can be realized.
Fig. 3 is a schematic structural diagram of an apparatus for dynamically adjusting an ETL workload according to an embodiment of the present invention, and as shown in fig. 3, the apparatus for dynamically adjusting an ETL workload according to an embodiment of the present invention includes an obtaining unit 301, a first processing unit 302, an estimating unit 303, and a calculating unit 304, where:
the obtaining unit 301 is configured to periodically obtain performance data of an operating environment of the ETL software; the first processing unit 302 is configured to perform normalization processing on the performance data to obtain a characteristic parameter; the estimating unit 303 is configured to obtain a step size value according to the feature parameter and the dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software; the calculating unit 304 is configured to calculate a product of the step value and a preset coefficient as a current decimation step.
Specifically, the obtaining unit 301 may obtain, periodically through a performance collecting tool, performance data of an operating environment of the ETL software, where the performance data includes, but is not limited to, a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software. And the data width corresponding to the ETL software is the data width of a data table in a database processed by the ETL software. The performance tool is pre-set. The period is set according to actual needs, for example, once in 10 seconds, and the embodiment of the invention is not limited.
After obtaining the performance data, the first processing unit 302 performs normalization processing on the performance data to obtain a feature parameter.
After obtaining the characteristic parameters, the estimating unit 303 inputs the characteristic parameters into a dynamic estimation model, and outputs a step value through processing of the dynamic estimation model. Wherein the dynamic estimation model is obtained based on the performance training data training of the ETL software.
After obtaining the step size value, the calculating unit 304 calculates a result of multiplying the step size value by a preset coefficient, takes the result of multiplying as a current extraction step size, and then updates the extraction step size of the ETL software to the current extraction step size, thereby implementing the dynamic adjustment of the extraction step size of the ETL software at regular intervals. The preset coefficient is set according to an actual situation, and the embodiment of the present invention is not limited.
The dynamic adjustment device for the ETL workload provided by the embodiment of the invention periodically obtains the performance data of the running environment of the ETL software, performs normalization processing on the performance data to obtain the characteristic parameters, obtains the step size value according to the characteristic parameters and the dynamic estimation model, and takes the product of the step size value and the preset coefficient as the current extraction step size, thereby realizing the dynamic adjustment of the extraction step size of the ETL software based on the running environment of the ETL software and improving the working efficiency of the ETL software.
Fig. 4 is a schematic structural diagram of an ETL workload dynamic adjustment apparatus according to another embodiment of the present invention, and as shown in fig. 4, on the basis of the foregoing embodiments, further, the ETL workload dynamic adjustment apparatus according to the embodiment of the present invention further includes an obtaining unit 305, a second processing unit 306, and a training unit 307, where:
the obtaining unit 305 is configured to obtain the performance training data; the second processing unit 306 is configured to perform normalization processing on the performance training data to obtain a feature training set; the training unit 307 is configured to obtain the dynamic estimation model according to the feature training set and the initial model.
Specifically, performance acquisition software may be deployed on each server where the ETL software operates, and the CPU usage, the memory usage, the broadband usage, the corresponding data width, the preset extraction step length, and the like when the ETL software operates are acquired as performance raw data, and the performance raw data is subjected to data cleaning to remove abnormal values such as null values and singular values, and then the CPU usage, the memory usage, the broadband usage, the corresponding data width, the preset extraction step length, and the like when the ETL software has a high working efficiency are manually selected as the performance training data. The acquisition unit 305 may acquire the performance training data. The data size included in the performance training data is set according to actual needs, and the embodiment of the invention is not limited.
After obtaining the performance training data, the second processing unit 306 performs normalization processing on the performance training data to obtain a feature training set.
The training unit 307 may input the feature training set into an initial model, and train to obtain the dynamic estimation model. Wherein the initial model is preset, and the initial model includes but is not limited to a linear regression model, a deep learning model, and a neural network model.
On the basis of the foregoing embodiments, further, the obtaining unit 305 is specifically configured to:
and (4) carrying out data cleaning on the original data to remove abnormal values.
Specifically, in the process of acquiring the performance training data, the acquiring unit 305 performs data cleaning on the acquired performance raw data to remove abnormal values in the performance raw data, such as null values and singular values, where the singular values refer to values out of a normal range.
On the basis of the foregoing embodiments, further, the performance data includes a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software; correspondingly, the first processing unit 302 is specifically configured to:
calculating and obtaining the CPU utilization rate of the ETL software according to the CPU usage of the ETL software and the total amount of CPUs corresponding to the ETL software; calculating and obtaining the memory utilization rate of the ETL software according to the memory usage of the ETL software and the total memory amount corresponding to the ETL software; calculating and obtaining the bandwidth utilization rate of the ETL software according to the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software; and calculating to obtain the data enrichment rate of the ETL software according to the data width corresponding to the ETL software and the maximum data width.
Specifically, the performance data may include a CPU usage amount of the ETL software, a memory usage amount of the ETL software, a bandwidth usage amount of the ETL software, and a data width corresponding to the ETL software.
The first processing unit 302 calculates a quotient of the CPU usage of the ETL software and the total CPU amount corresponding to the ETL software, so as to realize normalization processing of the CPU usage of the ETL software, where the quotient is used as the CPU utilization of the ETL software. The total amount of the CPU corresponding to the ETL software is the total amount of the CPU of the ETL software running server, and is obtained in advance.
The first processing unit 302 calculates a quotient of the memory usage of the ETL software and the total memory amount corresponding to the ETL software, so as to realize normalization processing of the memory usage of the ETL software, where the quotient is used as the memory usage rate of the ETL software. The total amount of the memory corresponding to the ETL software is the total amount of the memory of the ETL software running server, and is obtained in advance.
The first processing unit 302 calculates a quotient of the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software, so as to realize normalization processing of the bandwidth usage of the ETL software, where the quotient is used as the bandwidth utilization of the ETL software. The bandwidth total amount corresponding to the ETL software is the bandwidth total amount of the ETL software running server and is obtained in advance.
The first processing unit 302 calculates a quotient of the data width corresponding to the ETL software and the maximum data width, and implements normalization processing on the data width corresponding to the ETL software, where the quotient is used as a data enrichment rate of the ETL software. The maximum data width is preset and is set according to practical experience, and the embodiment of the invention is not limited.
And the CPU utilization rate of the ETL software, the memory utilization rate of the ETL software, the bandwidth utilization rate of the ETL software and the data enrichment rate of the ETL software are used as the characteristic parameters.
The embodiment of the apparatus provided in the embodiment of the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.
It should be noted that the method and the device for dynamically adjusting the ETL workload provided by the embodiment of the present invention can be used in the financial field and can also be used in any technical field except the financial field.
Fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)501, a communication Interface (Communications Interface)502, a memory (memory)503, and a communication bus 504, wherein the processor 501, the communication Interface 502, and the memory 503 are configured to communicate with each other via the communication bus 504. The processor 501 may call logic instructions in the memory 503 to perform the following method: periodically obtaining performance data of the running environment of the ETL software; normalizing the performance data to obtain characteristic parameters; obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software; and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length.
In addition, the logic instructions in the memory 503 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The present embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method provided by the above-mentioned method embodiments, for example, comprising: periodically obtaining performance data of the running environment of the ETL software; normalizing the performance data to obtain characteristic parameters; obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software; and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length.
The present embodiment provides a computer-readable storage medium, which stores a computer program, where the computer program causes the computer to execute the method provided by the above method embodiments, for example, the method includes: periodically obtaining performance data of the running environment of the ETL software; normalizing the performance data to obtain characteristic parameters; obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software; and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for dynamically adjusting an ETL workload, comprising:
periodically obtaining performance data of the running environment of the ETL software;
normalizing the performance data to obtain characteristic parameters;
obtaining a step length value according to the characteristic parameters and a dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software;
and calculating the product of the step length value and a preset coefficient to serve as the current extraction step length.
2. The method of claim 1, wherein the step of training the dynamic estimation model based on the ETL software performance training data comprises:
acquiring the performance training data;
performing normalization processing on the performance training data to obtain a feature training set;
and obtaining the dynamic estimation model according to the feature training set and the initial model.
3. The method of claim 2, wherein the obtaining the performance training data comprises:
and (4) carrying out data cleaning on the original data to remove abnormal values.
4. The method according to any one of claims 1 to 3, wherein the performance data comprises CPU usage of the ETL software, memory usage of the ETL software, bandwidth usage of the ETL software, and data width corresponding to the ETL software; correspondingly, the normalizing the performance data to obtain the characteristic parameters includes:
calculating and obtaining the CPU utilization rate of the ETL software according to the CPU usage of the ETL software and the total amount of CPUs corresponding to the ETL software;
calculating and obtaining the memory utilization rate of the ETL software according to the memory usage of the ETL software and the total memory amount corresponding to the ETL software;
calculating and obtaining the bandwidth utilization rate of the ETL software according to the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software;
and calculating to obtain the data enrichment rate of the ETL software according to the data width corresponding to the ETL software and the maximum data width.
5. An apparatus for dynamic adjustment of an ETL workload, comprising:
the obtaining unit is used for periodically obtaining performance data of the running environment of the ETL software;
the first processing unit is used for carrying out normalization processing on the performance data to obtain characteristic parameters;
the estimation unit is used for obtaining a step length value according to the characteristic parameters and the dynamic estimation model; wherein the dynamic estimation model is obtained based on the performance training data training of ETL software;
and the calculating unit is used for calculating the product of the step length numerical value and a preset coefficient to serve as the current extraction step length.
6. The apparatus of claim 5, further comprising:
an obtaining unit configured to obtain the performance training data;
the second processing unit is used for carrying out normalization processing on the performance training data to obtain a feature training set;
and the training unit is used for obtaining the dynamic estimation model according to the characteristic training set and the initial model.
7. The apparatus according to claim 6, wherein the obtaining unit is specifically configured to:
and (4) carrying out data cleaning on the original data to remove abnormal values.
8. The apparatus according to any one of claims 5 to 7, wherein the performance data includes a CPU usage of the ETL software, a memory usage of the ETL software, a bandwidth usage of the ETL software, and a data width corresponding to the ETL software; correspondingly, the first processing unit is specifically configured to:
calculating and obtaining the CPU utilization rate of the ETL software according to the CPU usage of the ETL software and the total amount of CPUs corresponding to the ETL software;
calculating and obtaining the memory utilization rate of the ETL software according to the memory usage of the ETL software and the total memory amount corresponding to the ETL software;
calculating and obtaining the bandwidth utilization rate of the ETL software according to the bandwidth usage of the ETL software and the total bandwidth amount corresponding to the ETL software;
and calculating to obtain the data enrichment rate of the ETL software according to the data width corresponding to the ETL software and the maximum data width.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 4 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202010810516.8A 2020-08-13 2020-08-13 Dynamic adjustment method and device for ETL (extract transform load) working load Pending CN111897865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010810516.8A CN111897865A (en) 2020-08-13 2020-08-13 Dynamic adjustment method and device for ETL (extract transform load) working load

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010810516.8A CN111897865A (en) 2020-08-13 2020-08-13 Dynamic adjustment method and device for ETL (extract transform load) working load

Publications (1)

Publication Number Publication Date
CN111897865A true CN111897865A (en) 2020-11-06

Family

ID=73229301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010810516.8A Pending CN111897865A (en) 2020-08-13 2020-08-13 Dynamic adjustment method and device for ETL (extract transform load) working load

Country Status (1)

Country Link
CN (1) CN111897865A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659595A (en) * 2016-07-25 2018-02-02 阿里巴巴集团控股有限公司 A kind of method and apparatus for the ability for assessing distributed type assemblies processing specified services
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN108009016A (en) * 2016-10-31 2018-05-08 华为技术有限公司 A kind of balancing resource load control method and colony dispatching device
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN110287245A (en) * 2019-05-15 2019-09-27 北方工业大学 Method and system for scheduling and executing distributed ETL (extract transform load) tasks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659595A (en) * 2016-07-25 2018-02-02 阿里巴巴集团控股有限公司 A kind of method and apparatus for the ability for assessing distributed type assemblies processing specified services
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN108009016A (en) * 2016-10-31 2018-05-08 华为技术有限公司 A kind of balancing resource load control method and colony dispatching device
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN110287245A (en) * 2019-05-15 2019-09-27 北方工业大学 Method and system for scheduling and executing distributed ETL (extract transform load) tasks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董璐: ""基于Spark的云平台性能评估技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN106529727B (en) User loss prediction model generation method and related device
JP6991983B2 (en) How and systems to train machine learning systems
CN110008952B (en) Target identification method and device
CN110413396B (en) Resource scheduling method, device and equipment and readable storage medium
Tenreiro Boundary kernels for distribution function estimation
CN109583594B (en) Deep learning training method, device, equipment and readable storage medium
CN104391879B (en) The method and device of hierarchical clustering
CN111738408A (en) Method, device and equipment for optimizing loss function and storage medium
CN110457704B (en) Target field determination method and device, storage medium and electronic device
CN110851333B (en) Root partition monitoring method and device and monitoring server
CN115238806A (en) Sample class imbalance federal learning method and related equipment
CN110689937A (en) Coding model training method, system and equipment and detection item coding method
CN111897865A (en) Dynamic adjustment method and device for ETL (extract transform load) working load
CN107977980B (en) Target tracking method, device and readable medium
CN105095202B (en) Message recommends method and device
CN109840308B (en) Regional wind power probability forecasting method and system
CN115952398A (en) Data uploading statistical calculation method and system based on Internet of things and storage medium
CN110866043A (en) Data preprocessing method and device, storage medium and terminal
CN104636318A (en) Distributed or increment calculation method of big data variance and standard deviation
CN105955823B (en) Method and system for determining operation frequency of operation resource
CN115099928A (en) Method and device for identifying lost customers
CN116134387A (en) Method and system for determining the compression ratio of an AI model for an industrial task
CN110795227B (en) Data processing method of block chain and related equipment
CN113159297A (en) Neural network compression method and device, computer equipment and storage medium
CN115601198B (en) Power data simulation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210122

Address after: 100140, 55, Fuxing Avenue, Xicheng District, Beijing

Applicant after: INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Applicant after: ICBC Technology Co.,Ltd.

Address before: 071700 unit 111, 1st floor, building C, enterprise office area, xiong'an Civic Service Center, Rongcheng County, xiong'an District, Baoding pilot Free Trade Zone, Hebei Province

Applicant before: ICBC Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201106