CN112597233B - Batch processing method, device and equipment for data indexes and storage medium - Google Patents

Batch processing method, device and equipment for data indexes and storage medium Download PDF

Info

Publication number
CN112597233B
CN112597233B CN202011593546.4A CN202011593546A CN112597233B CN 112597233 B CN112597233 B CN 112597233B CN 202011593546 A CN202011593546 A CN 202011593546A CN 112597233 B CN112597233 B CN 112597233B
Authority
CN
China
Prior art keywords
data
processing
index
task
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011593546.4A
Other languages
Chinese (zh)
Other versions
CN112597233A (en
Inventor
张广智
梁海涛
邢远辉
邵骋
余祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202011593546.4A priority Critical patent/CN112597233B/en
Publication of CN112597233A publication Critical patent/CN112597233A/en
Application granted granted Critical
Publication of CN112597233B publication Critical patent/CN112597233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a batch processing method, device and equipment for data indexes and a storage medium, and belongs to the field of data standardization. The method comprises the following steps: scanning an index processing task based on the configured task parameters; the task parameters comprise a scanning mode, a triggering condition and a calling interface; when an index processing task meeting a trigger condition is scanned, calling the index processing task through the calling interface; executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the index processing flow comprises data source information and data processing logic information of the index processing task. According to the invention, the data processing logic is embedded in the nodes of the index processing flow in the mode of the configuration file, so that the updating and the replacement are convenient, the time of logic change flow is greatly shortened, and the development and the edition issuing efficiency are improved.

Description

Batch processing method, device and equipment for data indexes and storage medium
Technical Field
The present invention relates to the field of data standardization, and relates to a batch processing method, apparatus, device and storage medium for data indexes.
Background
The data index is index information reflecting the business operation condition extracted by analyzing and summarizing the ODS data of the enterprise in the operation process of the enterprise. The processing of the data index generally comprises the steps of cleaning and warehousing of ODS data, refining of data index logic and technical realization of index logic. By using the data index to analyze the data, the business operation condition of the enterprise can be more clearly known, and various decisions can be more quickly and better made, so that the decision risk of the enterprise is reduced, and the market opportunity is easier to grasp.
The data index processing platform of the enterprise in the industry generally adopts a large data platform technology, the technical relevance of the system is more, the dependence is more, the chain of the whole process is long, and the technology output is not easy to carry out. Before the index processing logic is determined, in order to verify the accuracy and the effectiveness of the index, multiple debugging, perfecting and verifying are needed. In the debugging process, the index processing logic has long change flow, and is not easy to develop and debug. The index processing logic of different enterprise institutions is different, the multi-mechanism support function is weak, and the multi-mechanism support can be realized only by more changes.
Disclosure of Invention
The invention aims to solve the technical problem of long index processing logic change flow in the prior art, and provides a batch processing method, device, equipment and storage medium for data indexes.
The invention solves the technical problems by the following technical scheme:
A batch processing method of data indexes comprises the following steps:
Scanning an index processing task based on the configured task parameters; the task parameters comprise triggering conditions and calling interfaces; when an index processing task meeting a trigger condition is scanned, calling the index processing task through the calling interface;
Executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the nodes of the index processing flow comprise data source information and data processing logic information of the index processing task.
According to the technical scheme, the processing logic of the index processing task is configured by combining the timing task with the index processing flow, so that the batch processing of the data indexes is realized.
Preferably, the step of performing the index processing task based on the configured index processing procedure includes:
Extracting data to be processed from a corresponding database according to the data source information;
and processing the data to be processed according to the data processing logic information to obtain a data index meeting the index requirement.
In the technical scheme, the index processing task supports multiple databases, and can process data indexes of data in the multiple databases at the same time.
Preferably, the step of extracting the data to be processed from the corresponding database according to the data source information includes:
Intercepting data source information from the index processing flow, wherein the data source information at least comprises a database;
Based on the data source information, abstracting and connecting a corresponding database;
extracting data to be processed from the connected database and temporarily storing the data in a resource library, wherein the data to be processed is provided with a data flow identifier corresponding to the database.
In the technical scheme, the index processing task supports multiple databases and performs data isolation through the data flow identification, so that the data safety is ensured.
Preferably, the data processing logic information at least comprises a set of data processing rules, the data processing rules are configured in the property file, and the data processing rules are in one-to-one correspondence with the property file;
each database is at least configured with a set of data processing rules;
the step of processing the extracted data to be processed comprises the following steps:
Intercepting the data processing logic information from the index processing flow, wherein the data processing logic information comprises a file name of at least one property file;
based on the data processing logic information, loading a corresponding property file into the resource library;
And processing the data to be processed in the resource library according to the data processing rule configured in the property file.
In the technical scheme, the data processing rules are configured through the property file, and are loaded through the property file, so that the operation is simple and convenient; the index processing task can be adapted to various industrial scenes, the index processing rule corresponding to each industrial scene can be embedded in the processing node of the processing task flow, and the application scene is wide.
Preferably, before the step of intercepting the data processing logic information from the index processing flow, the method further comprises:
traversing the property file;
and when the property file is updated, correspondingly updating the data processing logic information in the index processing flow.
In the technical scheme, the data processing rules are configured through the property file, and old logic is replaced in a property file loading mode, so that development efficiency and publishing efficiency are improved.
Preferably, the data index is obtained after the corresponding data to be processed is processed, and the data index is provided with the data flow identifier on the corresponding data to be processed;
After the step of obtaining the data index meeting the index requirement, the method further comprises the following steps:
and storing the data index into a database corresponding to the data flow identifier based on the data flow identifier carried on the data index.
According to the technical scheme, the processed data indexes are ensured to be put into the corresponding databases through the data flow identifiers, so that data leakage is prevented.
Preferably, the data processing rule comprises a data date parameter and data processing logic;
the step of processing the data to be processed in the resource library according to the data processing rule configured in the property file comprises the following steps:
intercepting data date parameters in the data processing rule;
According to the data date parameter, finding out the data generated by the time represented by the data date parameter from the data to be processed;
and processing the found data according to the data processing logic.
According to the technical scheme, the data on any date can be processed according to the data date parameter in the data processing rule, so that the data on the date can be processed into index data meeting the index requirement.
The invention also discloses a batch processing device of the data indexes, which comprises:
The task management module is used for scanning the index processing task based on the configured task parameters; the task parameters comprise a scanning mode, a triggering condition and a calling interface; when an index processing task meeting a trigger condition is scanned, calling the index processing task through the calling interface;
the data processing module is used for executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the nodes of the index processing flow comprise data source information and data processing logic information of the index processing task.
The invention also discloses a computer device, which comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program realizes the steps of the batch processing method of the data indexes in any one of the technical schemes when being executed by the processor.
The invention also discloses a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program can be executed by at least one processor to realize the steps of the batch processing method of the data indexes in any of the previous technical schemes.
The invention has the positive progress effects that: the processing of the data index is executed regularly in a timing task mode; the data processing logic is embedded in the nodes of the index processing flow in a configuration file mode, so that the updating and the replacement are convenient, the time of logic change flow is greatly shortened, and the development and edition issuing efficiency is improved; the data sources are configured in the nodes so as to isolate the data among different institutions, and therefore the safety of the data is improved.
Drawings
FIG. 1 is a flow chart of a first embodiment of a method for batch processing of data metrics in accordance with the present invention;
FIG. 2 shows a specific flow chart of step 2 in the first embodiment;
FIG. 3 shows a specific flow chart of step 21 in the first embodiment;
FIG. 4 shows a specific flow chart of step 22 in embodiment one;
FIG. 5 shows a specific flow chart of step 225 in embodiment one;
FIG. 6 is a block diagram showing a first embodiment of a batch processing apparatus for data indexes of the present invention;
fig. 7 is a schematic diagram of a hardware architecture of an embodiment of a computer device according to the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Firstly, the invention provides a batch processing method of data indexes.
In a first embodiment, as shown in fig. 1, the batch processing method of the data index includes the following steps:
Step 1: scanning an index processing task based on the configured task parameters; the task parameters comprise triggering conditions and calling interfaces; and when the index processing task meeting the triggering condition is scanned, calling the index processing task through the calling interface.
The monitoring scanning of tasks is realized by a task management platform, and an open source task management platform TASKMANAGER on the market can be adopted.
The index processing task can be configured in the task management platform as a timing task, and task parameters of the timing task are configured in the task management platform before the task management platform is used for simplifying configuration operation, and are generally configured in a visual interface or API form. The task parameters include a scan mode of the task, a trigger condition, a call interface of a system executing the task (here, a call interface of the ETL platform), and a judgment mode. The task scanning mode is various, and any one of polling, random, consistent hash, least frequently used, least recently used, fault transfer and busy transfer can be selected. The trigger condition may be a timed execution time corresponding to each task. The judging means refers to a means for judging whether the trigger condition is satisfied, and herein may refer to detecting the time of the current moment to judge whether the execution time of the task is reached.
After the task management platform configures task parameters, the background can be used for continuously scanning the timing tasks according to the selected scanning mode, and then the timing tasks meeting the conditions are triggered, for example, the timing zero point of each day starts to execute the index processing task. After the index processing task is triggered, the task management platform calls an ETL (Extract-Transform-Load) platform for executing the index processing task according to the configured call interface.
And when the index processing task meeting the condition is scanned, calling and executing the index processing task.
Step 2: executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the index processing flow comprises data source information and data processing logic information of the index processing task.
The index processing task herein may be specifically implemented by an ETL platform, where the ETL platform refers to a system tool that may implement extraction (extraction), conversion (transformation), and loading (load) of data from a source end to a destination end, and specifically may use Kettle (an ETL platform) written by Java.
When the ETL platform is used, an index processing flow of an index processing task needs to be configured in the ETL in advance, and information such as a data source, data processing logic and the like can be configured at each processing node of the index processing flow, so that the operation is very simple and convenient. And then executing corresponding operations by intercepting data source information or data processing logic information in the index processing flow. As shown in fig. 2, the method specifically comprises the following steps:
step 21: and extracting the data to be processed from the corresponding database according to the data source information.
Step 22: and processing the data to be processed according to the data processing logic information to obtain a data index meeting the index requirement.
As shown in fig. 3, step 21 includes the following sub-steps:
Step 211: intercepting data source information from the index processing flow, wherein the data source information at least comprises a database.
The data source information can comprise a plurality of databases, so that the index processing task supports a plurality of databases, for example, the same index processing task is used for processing data of a plurality of different institutions, the data of the different institutions are naturally stored in the different data, and the data source information can be configured with the plurality of databases, so that the universality of the index processing task is greatly enhanced.
Step 212: and abstracting and connecting the corresponding database based on the data source information.
When the data source information contains a plurality of databases, each database needs to be accessed, and the data in each database can be queried and acquired through the same function (method) by adopting abstract connection.
Step 213: extracting data to be processed from the connected database and temporarily storing the data in a resource library, wherein the data to be processed is provided with a data flow identifier corresponding to the database.
For data security, data among different databases need to be isolated, so the data to be processed is provided with a data flow identifier, and the data source of the data can be identified through the data flow identifier. For example: an organization number is assigned to the database of the different organization, and the organization number can be directly used as a data flow identifier to be associated with the acquired data from the database so as to identify the source of each data in the processes of subsequent data warehouse entry, processing and the like. Through the corresponding relation between the organization numbers and the database, after new data are processed later, the data can be conveniently stored respectively only by identifying the organization numbers, so that data leakage is avoided.
The data processing logic information at least comprises a set of data processing rules, the data processing rules are configured in the property files, and the data processing rules are in one-to-one correspondence with the property files. Each database is configured with at least one set of data processing rules, but multiple sets of data processing rules are typically configured for different business needs, so there are many times when a database is configured with multiple sets of data processing rules. Different data processing rules are distinguished by dimensions, the data processing rules of the same dimension are configured in one property file, which is equivalent to defining indexes of the same dimension in the same table, each index is defined as different fields of the table, the data processing rules of the indexes are defined as attribute values of the indexes, and the data processing rules are actually a simple string of sql sentences.
As shown in fig. 4, step 22 may specifically include the following substeps:
step 221: traversing the property file.
The data processing rules are configured in the property file, and the property file is updatable, so that before the data processing rules are extracted from the index processing flow, all the property files need to be traversed to determine whether new data processing rules exist.
Step 222: and when the property file is updated, correspondingly updating the data processing logic information in the index processing flow.
The update refers to that a new property file is added, that is, a new data processing rule is added, and when the new property file is added, the data processing logic information needs to be updated into the index processing flow before being intercepted from the index processing flow, otherwise, the data processing rule is not executed. The old logic is replaced by the property file loading mode, so that the development efficiency and the publishing efficiency can be effectively improved.
Step 223: intercepting the data processing logic information from the index processing flow, wherein the data processing logic information comprises a file name of at least one property file;
since the data processing logic information is embedded in the processing node of the index processing flow, the data processing logic information is acquired by intercepting.
Step 224: based on the data processing logic information, loading a corresponding property file into the resource library;
The data processing logic information contains file names of the property files, so that after the data processing logic information is intercepted, the corresponding property files are loaded into the resource library according to the file names of the property files contained in the data processing logic information.
Step 225: and processing the data to be processed in the resource library according to the data processing rule configured in the property file.
The data processing rule comprises extraction, cleaning and processing of data, and the data to be processed can obtain data indexes meeting index requirements after the extraction, cleaning and processing.
The data processing rules herein may include data date parameters to specify which day of processing is the specific day of data. And then, introducing a pull chain table, and processing data of any date by combining data date parameters in a data processing rule so as to process the data of the date into index data meeting index requirements. As shown in fig. 5, step 225 specifically includes the following substeps:
step 2251: intercepting data date parameters in the data processing rule;
step 2252: according to the data date parameter, finding out the data generated by the time represented by the data date parameter from the data to be processed;
step 2253: and processing the found data according to the data processing logic.
The data index is obtained after the corresponding data to be processed is processed, so the data index is provided with a data flow identifier on the corresponding data to be processed. Therefore, after the data index meeting the index requirement is obtained, the data index can be stored into a database corresponding to the data flow identifier based on the data flow identifier carried on the data index. When the processed data indexes are put in storage, different databases are required to be selected for storage according to the data flow identifiers, so that data leakage is further prevented.
The first embodiment realizes the periodic execution by processing the data index in a timing task mode; the data processing logic is embedded in the nodes of the index processing flow in a configuration file mode, so that the updating and the replacement are convenient, and the development and edition efficiency is improved; the data sources are configured in the nodes so as to isolate the data among different institutions, and therefore the safety of the data is improved.
Next, the present invention proposes a batch processing device for data indicators, said device 20 being able to be divided into one or more modules.
For example, FIG. 6 shows a block diagram of a first embodiment of the data index batch processing device 20, in which the device 20 may be partitioned into a task management module 201 and a data processing module 202. The following description will specifically introduce specific functions of the task management module 201 and the data processing module 202.
The task management module 201 is configured to perform scanning of an index processing task based on the configured task parameters; the task parameters comprise a scanning mode, a triggering condition and a calling interface; and when the index processing task meeting the triggering condition is scanned, calling the index processing task through the calling interface.
The monitoring scanning of tasks is realized by a task management platform, and an open source task management platform TASKMANAGER on the market can be adopted.
The index processing task can be configured in the task management platform as a timing task, and task parameters of the timing task are configured in the task management platform before the task management platform is used for simplifying configuration operation, and are generally configured in a visual interface or API form. The task parameters include a scan mode of the task, a trigger condition, a call interface of a system executing the task (here, a call interface of the ETL platform), and a judgment mode. The task scanning mode is various, and any one of polling, random, consistent hash, least frequently used, least recently used, fault transfer and busy transfer can be selected. The trigger condition may be a timed execution time corresponding to each task. The judging means refers to a means for judging whether the trigger condition is satisfied, and herein may refer to detecting the time of the current moment to judge whether the execution time of the task is reached.
After the task management platform configures task parameters, the background can be used for continuously scanning the timing tasks according to the selected scanning mode, and then the timing tasks meeting the conditions are triggered, for example, the timing zero point of each day starts to execute the index processing task. After the index processing task is triggered, the task management platform calls an ETL (Extract-Transform-Load) platform for executing the index processing task according to the configured call interface.
The data processing module 202 is configured to execute the index processing task based on the configured index processing procedure, so as to obtain a data index meeting the index requirement; the index processing flow comprises data source information and data processing logic information of the index processing task.
The index processing task herein may be specifically implemented by an ETL platform, where the ETL platform refers to a system tool that may implement extraction (extraction), conversion (transformation), and loading (load) of data from a source end to a destination end, and specifically may use Kettle (an ETL platform) written by Java.
When the ETL platform is used, an index processing flow of an index processing task needs to be configured in the ETL in advance, and information such as a data source, data processing logic and the like can be configured at each processing node of the index processing flow, so that the operation is very simple and convenient. And then executing corresponding operations by intercepting data source information or data processing logic information in the index processing flow. In particular, the data processing module 202 may be further divided into a data extraction sub-module and a processing sub-module:
The data extraction sub-module is used for extracting data to be processed from the corresponding database according to the data source information.
The processing sub-module is used for processing the data to be processed according to the data processing logic information so as to obtain data indexes meeting index requirements.
Wherein, the data extraction submodule can be further divided into a data source intercepting unit, a database connecting unit and a data extraction unit.
The data source intercepting unit is used for intercepting data source information from the index processing flow, and the data source information at least comprises a database.
The data source information can comprise a plurality of databases, so that the index processing task supports a plurality of databases, for example, the same index processing task is used for processing data of a plurality of different institutions, the data of the different institutions are naturally stored in the different data, and the data source information can be configured with the plurality of databases, so that the universality of the index processing task is greatly enhanced.
The database connection unit is used for abstracting and connecting the corresponding database based on the data source information.
When the data source information contains a plurality of databases, each database needs to be accessed, and the data in each database can be queried and acquired through the same function (method) by adopting abstract connection.
The data extraction unit is used for extracting data to be processed from the connected database and temporarily storing the data in a resource library, wherein the data to be processed is provided with a data flow identifier corresponding to the database.
For data security, data among different databases need to be isolated, so the data to be processed is provided with a data flow identifier, and the data source of the data can be identified through the data flow identifier. For example: an organization number is assigned to the database of the different organization, and the organization number can be directly used as a data flow identifier to be associated with the acquired data from the database so as to identify the source of each data in the processes of subsequent data warehouse entry, processing and the like. Through the corresponding relation between the organization numbers and the database, after new data are processed later, the data can be conveniently stored respectively only by identifying the organization numbers, so that data leakage is avoided. The data processing logic information at least comprises a set of data processing rules, the data processing rules are configured in the property files, and the data processing rules are in one-to-one correspondence with the property files. Each database is configured with at least one set of data processing rules, but multiple sets of data processing rules are typically configured for different business needs, so there are many times when a database is configured with multiple sets of data processing rules. Different data processing rules are distinguished by dimensions, the data processing rules of the same dimension are configured in one property file, which is equivalent to defining indexes of the same dimension in the same table, each index is defined as different fields of the table, the data processing rules of the indexes are defined as attribute values of the indexes, and the data processing rules are actually a simple string of sql sentences.
The processing sub-module may be further divided into a traversal unit, a logic update unit, a logic intercept unit, a logic load unit, and a data processing unit.
The traversing unit is used for traversing the property file.
The data processing rules are configured in the property file, and the property file is updatable, so that before the data processing rules are extracted from the index processing flow, all the property files need to be traversed to determine whether new data processing rules exist.
And the logic updating unit is used for correspondingly updating the data processing logic information in the index processing flow when the property file is updated.
The update refers to that a new property file is added, that is, a new data processing rule is added, and when the new property file is added, the data processing logic information needs to be updated into the index processing flow before being intercepted from the index processing flow, otherwise, the data processing rule is not executed. The old logic is replaced by the property file loading mode, so that the development efficiency and the publishing efficiency can be effectively improved.
The logic intercepting unit is used for intercepting the data processing logic information from the index processing flow, and the data processing logic information comprises a file name of at least one property file.
Since the data processing logic information is embedded in the processing node of the index processing flow, the data processing logic information is acquired by intercepting.
The logic loading unit is used for loading the corresponding property file into the resource library based on the data processing logic information.
The data processing logic information contains file names of the property files, so that after the data processing logic information is intercepted, the corresponding property files are loaded into the resource library according to the file names of the property files contained in the data processing logic information.
The data processing unit is used for processing the data to be processed in the resource library according to the data processing rule configured in the property file.
The data processing rule comprises extraction, cleaning and processing of data, and the data to be processed can obtain data indexes meeting index requirements after the extraction, cleaning and processing.
The data processing rules herein may include data date parameters to specify which day of processing is the specific day of data. And then, introducing a pull chain table, and processing data of any date by combining data date parameters in a data processing rule so as to process the data of the date into index data meeting index requirements. To achieve the above functions, the data processing unit may be further divided into a date interception subunit, a data matching subunit, and a data processing subunit.
The date intercepting subunit is used for intercepting the data date parameters in the data processing rule.
The data matching subunit is used for finding out the data generated by the time represented by the data date parameter from the data to be processed according to the data date parameter.
And the data processing subunit is used for processing the found data according to the data processing logic.
The data index is obtained after the corresponding data to be processed is processed, so the data index is provided with a data flow identifier on the corresponding data to be processed. Therefore, after the data index meeting the index requirement is obtained, the data index can be stored into a database corresponding to the data flow identifier based on the data flow identifier carried on the data index. When the processed data indexes are put in storage, different databases are required to be selected for storage according to the data flow identifiers, so that data leakage is further prevented.
According to the embodiment, the index processing logic is embedded into the nodes of the index processing flow of the timing task in the mode of the configuration file, so that the index processing logic can be replaced quickly. .
The invention further provides computer equipment.
Fig. 7 is a schematic diagram of a hardware architecture of an embodiment of a computer device according to the present invention. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction. For example, it may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including a stand-alone server or a server cluster composed of a plurality of servers), etc. As shown, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, and a network interface 23 communicatively coupled to each other via a system bus. Wherein:
The memory 21 includes at least one type of computer-readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 2. Of course, the memory 21 may also comprise both an internal memory unit of the computer device 2 and an external memory device. In this embodiment, the memory 21 is generally used to store an operating system and various application software installed on the computer device 2, such as a computer program for implementing a batch processing method of the data index. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 22 is configured to execute a program code stored in the memory 21 or process data, for example, execute a computer program or the like for implementing a batch processing method for the data index.
The network interface 23 may comprise a wireless network interface or a wired network interface, which network interface 23 is typically used for establishing a communication connection between the computer device 2 and other computer devices. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, or other wireless or wired network.
It is noted that fig. 7 only shows a computer device 2 having components 21-23, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In the present embodiment, a computer program for implementing the batch processing method of the data index stored in the memory 21 may be executed by one or more processors (the processor 22 in the present embodiment) to perform the operations of:
Step 1: scanning an index processing task based on the configured task parameters; the task parameters comprise a scanning mode, a triggering condition and a calling interface; and when the index processing task meeting the triggering condition is scanned, calling the index processing task through the calling interface.
Step 2: executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the nodes of the index processing flow comprise data source information and data processing logic information of the index processing task.
Furthermore, the present invention is a computer readable storage medium, which is a non-volatile readable storage medium, in which a computer program is stored, the computer program being executable by at least one processor to implement the operations of the batch processing method or apparatus for data metrics described above.
Among them, the computer-readable storage medium includes flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the computer readable storage medium may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, provided on the computer device. Of course, the computer-readable storage medium may also include both internal storage units of a computer device and external storage devices. In this embodiment, the computer readable storage medium is typically used to store an operating system and various types of application software installed on a computer device, such as a computer program for implementing a batch processing method of the data index. Furthermore, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims (8)

1. The batch processing method of the data indexes is characterized by comprising the following steps of:
Scanning an index processing task based on the configured task parameters; the task parameters comprise triggering conditions and calling interfaces; when the index processing task meeting the triggering condition is scanned, calling the index processing task through the calling interface;
Executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the nodes of the index processing flow comprise data source information and data processing logic information of the index processing task;
the step of executing the index processing task based on the configured index processing flow comprises the following steps:
Extracting data to be processed from a corresponding database according to the data source information;
processing the data to be processed according to the data processing logic information to obtain data indexes meeting index requirements;
The step of extracting the data to be processed from the corresponding database according to the data source information comprises the following steps:
Intercepting data source information from the index processing flow, wherein the data source information at least comprises a database;
Based on the data source information, abstracting and connecting a corresponding database;
extracting data to be processed from the connected database and temporarily storing the data in a resource library, wherein the data to be processed is provided with a data flow identifier corresponding to the database.
2. The batch processing method of data indexes according to claim 1, wherein the data processing logic information at least comprises a set of data processing rules, the data processing rules are configured in a property file, and the data processing rules are in one-to-one correspondence with the property file;
each database is at least configured with a set of data processing rules;
the step of processing the data to be processed according to the data processing logic information comprises the following steps:
Intercepting the data processing logic information from the index processing flow, wherein the data processing logic information comprises a file name of at least one property file;
based on the data processing logic information, loading a corresponding property file into the resource library;
And processing the data to be processed in the resource library according to the data processing rule configured in the property file.
3. The method of batch processing of data indicators according to claim 2, further comprising, prior to the step of intercepting the data processing logic information from the indicator process flow:
traversing the property file;
and when the property file is updated, correspondingly updating the data processing logic information in the index processing flow.
4. The method for batch processing of data indicators as claimed in claim 2, wherein,
The data index is obtained after the corresponding data to be processed is processed, and the data index is provided with the data flow identifier on the corresponding data to be processed;
After the step of obtaining the data index meeting the index requirement, the method further comprises the following steps:
and storing the data index into a database corresponding to the data flow identifier based on the data flow identifier carried on the data index.
5. The batch processing method of data indexes according to claim 2, wherein the data processing rule comprises a data date parameter and data processing logic;
the step of processing the data to be processed in the resource library according to the data processing rule configured in the property file comprises the following steps:
intercepting data date parameters in the data processing rule;
According to the data date parameter, finding out the data generated by the time represented by the data date parameter from the data to be processed;
and processing the found data according to the data processing logic.
6. A batch processing apparatus for data indexes, comprising:
The task management module is used for scanning the index processing task based on the configured task parameters; the task parameters comprise a scanning mode, a triggering condition and a calling interface; when the index processing task meeting the triggering condition is scanned, calling the index processing task through the calling interface;
the data processing module is used for executing the index processing task based on the configured index processing flow to obtain a data index meeting the index requirement; the nodes of the index processing flow comprise data source information and data processing logic information of the index processing task;
the data processing module is divided into a data extraction sub-module and a processing sub-module:
The data extraction sub-module is used for extracting data to be processed from the corresponding database according to the data source information;
The processing sub-module is used for processing the data to be processed according to the data processing logic information so as to obtain a data index meeting the index requirement;
The data extraction submodule is divided into a data source intercepting unit, a database connecting unit and a data extraction unit:
the data source intercepting unit is used for intercepting data source information from the index processing flow, and the data source information at least comprises a database;
the database connection unit is used for abstracting and connecting the corresponding database based on the data source information;
The data extraction unit is used for extracting data to be processed from the connected database and temporarily storing the data in a resource library, wherein the data to be processed is provided with a data flow identifier corresponding to the database.
7. A computer device comprising a memory and a processor, characterized in that the memory has stored thereon a computer program which, when executed by the processor, implements the steps of the batch processing method of data indicators according to any of claims 1-5.
8. A computer readable storage medium, having stored therein a computer program executable by at least one processor to implement the steps of the batch processing method of data indicators as claimed in any one of claims 1-5.
CN202011593546.4A 2020-12-29 2020-12-29 Batch processing method, device and equipment for data indexes and storage medium Active CN112597233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011593546.4A CN112597233B (en) 2020-12-29 2020-12-29 Batch processing method, device and equipment for data indexes and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011593546.4A CN112597233B (en) 2020-12-29 2020-12-29 Batch processing method, device and equipment for data indexes and storage medium

Publications (2)

Publication Number Publication Date
CN112597233A CN112597233A (en) 2021-04-02
CN112597233B true CN112597233B (en) 2024-06-25

Family

ID=75203241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011593546.4A Active CN112597233B (en) 2020-12-29 2020-12-29 Batch processing method, device and equipment for data indexes and storage medium

Country Status (1)

Country Link
CN (1) CN112597233B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245029A (en) * 2019-05-21 2019-09-17 中国平安财产保险股份有限公司 A kind of data processing method, device, storage medium and server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890651B (en) * 2011-07-19 2016-06-08 阿里巴巴集团控股有限公司 The method of testing of a kind of contextual data and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245029A (en) * 2019-05-21 2019-09-17 中国平安财产保险股份有限公司 A kind of data processing method, device, storage medium and server

Also Published As

Publication number Publication date
CN112597233A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN110309125B (en) Data verification method, electronic device and storage medium
WO2019134339A1 (en) Desensitization method and procedure, application server and computer readable storage medium
CN111367925A (en) Data dynamic real-time updating method, device and storage medium
CN110688378B (en) Migration method and system for database storage process
CN111737227B (en) Data modification method and system
CN112764874B (en) Virtual machine server information acquisition method based on CMDB configuration management system
JP2020057416A (en) Method and device for processing data blocks in distributed database
WO2019148657A1 (en) Method for testing associated environments, electronic device and computer readable storage medium
CN111124872A (en) Branch detection method and device based on difference code analysis and storage medium
CN112039900A (en) Network security risk detection method, system, computer device and storage medium
CN115794839B (en) Data collection method based on Php+Mysql system, computer equipment and storage medium
CN113535677A (en) Data analysis query management method and device, computer equipment and storage medium
CN112416957A (en) Data increment updating method and device based on data model layer and computer equipment
CN117033424A (en) Query optimization method and device for slow SQL (structured query language) statement and computer equipment
CN112395307A (en) Statement execution method, statement execution device, server and storage medium
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN113918954A (en) Automated vulnerability scanning integration method, device, equipment and storage medium
CN112597233B (en) Batch processing method, device and equipment for data indexes and storage medium
CN109271431B (en) Data extraction method, device, computer equipment and storage medium
CN112003837B (en) Intelligent equipment adaptation method and device based on Modbus protocol and storage medium
CN105302604A (en) Application version update method and apparatus
CN109992573B (en) Method and system for realizing automatic monitoring of HDFS file occupancy rate
CN105630889A (en) Method and device for realizing generic cache
CN111767299A (en) Database operation method, device and system, storage medium and electronic equipment
CN113051329B (en) Data acquisition method, device, equipment and storage medium based on interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant