CN118093707A - Multi-mode data acquisition method, system, terminal and storage medium - Google Patents

Multi-mode data acquisition method, system, terminal and storage medium Download PDF

Info

Publication number
CN118093707A
CN118093707A CN202410518256.5A CN202410518256A CN118093707A CN 118093707 A CN118093707 A CN 118093707A CN 202410518256 A CN202410518256 A CN 202410518256A CN 118093707 A CN118093707 A CN 118093707A
Authority
CN
China
Prior art keywords
data
acquisition
task
execution
target database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410518256.5A
Other languages
Chinese (zh)
Inventor
刘保卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North Health Medical Big Data Technology Co ltd
Original Assignee
North Health Medical Big Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North Health Medical Big Data Technology Co ltd filed Critical North Health Medical Big Data Technology Co ltd
Priority to CN202410518256.5A priority Critical patent/CN118093707A/en
Publication of CN118093707A publication Critical patent/CN118093707A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-mode data acquisition method, a system, a terminal and a storage medium, and relates to the technical field of data processing, wherein the method comprises the following steps: creating an acquisition task; configuring a data source and a target database for the acquisition task; the data source is a database for data acquisition, and the target database is a database for storing acquired data; creating an acquisition subtask under an acquisition task, and configuring a data mapping rule between the data source and the target database for the created acquisition subtask; selecting an applicable task executor for the created acquisition subtask according to the configured data source and target database, wherein the task executor comprises a plurality of execution engines, and the execution engines are a plurality of ETL tools; and calling a task executor to execute the corresponding acquisition subtask. The invention can execute data acquisition tasks according to different execution engines selected as required, acquire data of different modes, and conveniently acquire richer data information in time.

Description

Multi-mode data acquisition method, system, terminal and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a multi-mode data acquisition method, a system, a terminal and a storage medium.
Background
With the rapid development of technology, data has become a central element of organization and enterprise decision-making. In the medical technical field, data is not only the basis of diagnosis and treatment, but also the key of medical research and improvement. Therefore, the accuracy and real-time of data acquisition is critical to the development of the medical industry.
The traditional data acquisition method mainly uses an ETL tool to acquire data, and many ETL tools can only support limited database types, and when dealing with complex and diverse data acquisition scenes (namely dealing with multi-mode data acquisition), as the data has the characteristics of scattered storage, various formats and various data source types, various different tools and methods are often needed to acquire different-mode data. In addition, for data of different modes which cannot be directly acquired by using the ETL tool, additional conversion is often needed for the data, and then the corresponding ETL tool is used for acquiring, so that the complexity and cost of data acquisition are increased, and the real-time performance of data acquisition is also affected.
Disclosure of Invention
In order to solve the problems of at least one aspect, the invention provides a multi-mode data acquisition method, a multi-mode data acquisition system, a multi-mode data acquisition terminal and a multi-mode data acquisition storage medium, which are used for executing corresponding data acquisition tasks according to different execution engines selected according to requirements, acquiring data of different modes, and conveniently, timely and conveniently acquiring richer data information.
In a first aspect, the present invention provides a method for multi-modal data collection, the method comprising:
creating an acquisition task;
Configuring a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
Creating an acquisition subtask under an acquisition task, and configuring a data mapping rule between the data source and the target database for the created acquisition subtask;
Selecting an applicable task executor for the created acquisition subtasks according to the configured data source and target database, wherein the task executor comprises a plurality of execution engines, and the execution engines are a plurality of ETL tools which are assembled in advance, including but not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool and an OGG tool;
and calling a task executor to execute the corresponding acquisition subtask.
Further, the task executor further comprises a data collector, wherein the data collector is a self-defined execution engine, and the non-ETL tool comprises a real-time interface collector, a real-time sequence collector and an MQ interface collector and is used for collecting data which cannot be collected by the execution engine.
Further, configuring the data source and target databases includes: configuring a source path, an address and a port of a data source; and configuring the address, port and library table information of the target database.
Further, the method further includes, when configuring data mapping rules between the data source and the target database for the created acquisition subtasks, configuring a data acquisition mode of the data source for each of the created acquisition subtasks, including a full data mode, an incremental data mode, and a time period data mode.
Further, the method further comprises the step of monitoring the execution state, the execution time, the execution log and the occupied resources of the acquisition subtask when the task executor executes the acquisition subtask;
The method further comprises the steps of counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks.
In a second aspect, the present invention provides a multi-modal data acquisition system, the system comprising:
the task processing module is used for creating an acquisition task;
The configuration management module is used for configuring a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
The task processing module is also used for creating an acquisition subtask under the acquisition task and configuring a data mapping rule between the data source and the target database for the created acquisition subtask;
The task processing module is also used for selecting an applicable task executor for the created acquisition subtask according to the configured data source and target database; the task executor comprises a plurality of execution engines, wherein the execution engines are a plurality of ETL tools which are assembled in advance, and the ETL tools comprise but are not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool and an OGG tool; the task executor also comprises a data collector, wherein the data collector is a self-defined execution engine and is a non-ETL tool, and the task executor comprises a real-time interface collector, a real-time sequence collector and an MQ interface collector and is used for collecting data which cannot be collected by the execution engine;
and the task processing module is also used for calling the task executor to execute the corresponding acquisition subtasks.
Further, the configuration management module comprises a data source configuration module and a target database configuration module;
the data source configuration module is used for configuring a source path, an address and a port of a data source;
and the target database configuration module is used for configuring the address, the port and the library table information of the target database.
Further, the task processing module comprises a task management module, a task configuration module, a task executor management module, a task monitoring module and a task display module;
the task management module is used for managing the creation, starting, suspension, restoration and termination of the acquisition task and the acquisition subtask;
The task configuration module is used for configuring a data mapping rule between the data source and the target database for the created acquisition subtask; the data acquisition mode is also used for configuring a data source for the created acquisition subtasks, and comprises a full data mode, an incremental data mode and a time period data mode;
The task executor management module is used for realizing the management of the task executor;
The task monitoring module is used for monitoring the execution state, the execution time, the execution log and the occupied resources of the acquisition subtask when the task executor executes the acquisition subtask;
the task display module is used for counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks.
In a third aspect, the present invention provides a terminal comprising a memory and a processor;
the memory is used for storing a multi-mode data acquisition program;
and the processor is used for realizing the multi-mode data acquisition method according to any one of the first aspect when executing the multi-mode data acquisition program.
In a fourth aspect, the present invention provides a computer readable storage medium having a computer program stored thereon, the readable storage medium having stored thereon a multi-modal data collection program which when executed by a processor implements the multi-modal data collection method of any one of the first aspects.
From the above technical scheme, the invention has the following advantages:
According to the invention, based on a multi-mode data acquisition technology, a plurality of acquisition subtasks are created under the acquisition task, different task executors are selected to execute the data acquisition task according to the acquisition subtasks, and the acquisition of data of different modes can be realized by selecting a plurality of ETL tools in an execution engine in the task executors, so that the data acquisition has better expansibility and compatibility, and is beneficial to timely and conveniently acquiring richer data.
The invention is based on a multi-mode data acquisition technology, and a custom execution engine data acquisition device is arranged, and through the data acquisition device, the ETL tool in the execution engine can directly acquire data of different modes which cannot be directly acquired, so that the data acquisition has flexibility and adaptability, can be suitable for different multi-mode data acquisition scenes, such as medical data acquisition scenes, flexibly selects a proper task executor to perform data acquisition, reduces the complexity and cost of data acquisition, and improves the real-time performance of data acquisition.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a multi-modal data collection method according to the present invention;
FIG. 2 is a schematic block diagram of one embodiment of a multi-modal data acquisition system in accordance with the present invention;
FIG. 3 is a schematic diagram of a terminal according to an embodiment of the present invention;
fig. 4 is a schematic structural view of an embodiment of the readable storage medium according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The multi-mode data acquisition method provided by the embodiment of the invention is executed by the computer equipment, and correspondingly, the multi-mode data acquisition system runs in the computer equipment.
As shown in fig. 1, the present invention provides a multi-mode data acquisition method, which includes:
creating an acquisition task;
Configuring a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
Creating an acquisition subtask under an acquisition task, and configuring a data mapping rule between the data source and the target database for the created acquisition subtask;
selecting an applicable task executor for the created acquisition subtasks according to the configured data source and target database, wherein the task executor comprises a plurality of execution engines, and the execution engines are ETL tools including but not limited to DataX tools, kettle tools, canal tools, flume tools, sqoop tools and OGG tools;
and calling a task executor to execute the corresponding acquisition subtask.
In order to facilitate understanding of the present invention, the multi-mode data acquisition method provided by the present invention is further described below with reference to the process of the multi-mode data acquisition method in the embodiment.
Specifically, the multi-mode data acquisition method comprises the following steps:
step 110, creating an acquisition task.
The collection tasks are created, and one or more collection tasks can be created as required when the collection tasks are created.
Step 120, configuring a data source and a target database for the acquisition task; the data source is the source from which the data collection is to be performed, and the target database is the database for storing the collected data.
And recording a source to be subjected to data acquisition as a data source and recording a database for storing acquired data as a target database according to the created acquisition task. And configuring a source path, an address and a port of the data source and configuring address, port and library table information of the target database according to the created acquisition task and the configured corresponding data source and target database.
The source path of the data source refers to the path of the source of the data to be collected.
It should be noted that, if one acquisition task is configured with one data source and one target database, and one data source data is acquired to multiple target databases or multiple data source data is acquired to one target database or multiple data source data is acquired to multiple target databases, a corresponding number of acquisition tasks can be created, and then source paths, addresses and ports of the corresponding data sources and address, port and library table information of the corresponding target databases are configured for each acquisition task.
And 130, creating an acquisition subtask under the acquisition task, and configuring a data mapping rule between the data source and the target database for the created acquisition subtask.
The number of collection sub-tasks created, one skilled in the art can create one or more collection sub-tasks according to actual needs.
The data mapping rules between the data source and the target database include table mapping rules between the data source and the target database and field mapping rules between the data source and the target database.
The data mapping rule between the data source and the target database refers to a rule for mapping data in the data source to the target database; the table mapping rule between the data source and the target database refers to the corresponding relation between the data of the source table in the data source and the target table in the target database; the field mapping rule between the data source and the target database refers to the correspondence between the fields in the source table of the data source and the fields in the target table of the target database.
The number of collection sub-tasks created, one skilled in the art can create one or more collection sub-tasks according to actual needs.
After the acquisition subtask is created, configuring a table mapping rule between the data source and the target database or configuring a field mapping rule between the data source and the target database for the created acquisition subtask, wherein the table mapping rule is used for acquiring data of a source table in the data source into a target table in the target database or acquiring a field in the source table of the data source into a target field in a target table in the target database according to the configured field mapping rule when the acquisition subtask is executed.
Step 140, selecting an applicable task executor for the created acquisition subtask according to the configured data source and target database, wherein the task executor comprises a plurality of execution engines, and the execution engines are ETL tools including but not limited to a DataX tool, a Kettle tool, a Canal tool, a flame tool, a Sqoop tool and an OGG tool.
The task executor includes execution engines, which are all ETL tools existing on the market that are compiled in advance, each of which serves as an execution engine, including but not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool, and an OGG tool, for collecting data of a data source.
Data source types supported by the DataX tool include, but are not limited to, relational databases (e.g., mySQL, oracle, SQL SERVER, etc.), noSQL databases (e.g., mongoDB, HBase, etc.), large data stores (e.g., HDFS, hive, etc.), cloud stores (e.g., OSS, OBS, etc.), and message queues (e.g., kafka, rabbitMQ, etc.).
The data source types supported by the Kettle tool include, but are not limited to, relational databases, noSQL databases, text files, XML files, and related information for Web services.
The main data source type supported by the Canal tool is MySQL (MariaDB is also supported).
The data source types supported by the Flume tool include, but are not limited to Avro data sources, log files, mySQL database, oracle database, message queues (e.g., kafka, rabbitMQ), network-transmitted data streams (e.g., TCP, UDP, etc.), web services related information.
The primary data source type supported by the Sqoop tool is a Relational Database (RDBMS).
The types of data sources supported by the OGG tool include, but are not limited to, oracle and non-Oracle databases (e.g., mySQL, IBM DB2, MS SQL, sybase, etc.), file and unstructured data sources (e.g., XML, JSON, CSV files, etc.), and big data-based databases (e.g., big data platform databases based on APACHE HIVE, APACHE HDFS, and Apache Hadoop, etc. technologies).
And 150, calling a task executor to execute a corresponding acquisition subtask.
And (3) calling the task scheduler selected in the step 140, and executing the corresponding acquisition subtasks according to the data mapping rule between the data source and the target database configured in the step 130.
As one embodiment of the invention, the task executor further comprises a data collector, wherein the data collector is a self-defined execution engine and is a non-ETL tool, and the data collector comprises a real-time interface collector, a real-time sequence collector and an MQ interface collector and is used for collecting data which cannot be collected by the execution engine.
The real-time interface collector is used for collecting data which occurs in real time or data which is uploaded in real time.
The real-time sequence collector is used for collecting equipment information data of the Internet of things.
The MQ interface collector is used for collecting data of a plurality of requests with high concurrency.
And selecting an applicable task executor for the created acquisition subtask according to the configured data source and the target database, and calling the data acquisition unit to acquire the data of the source table in the data source into the target table in the target database or acquire the field in the source table of the data source into the target field in the target table of the target database according to the data mapping rule between the data source and the target database when the selected task executor is the data acquisition unit. When the data acquisition subtask is executed, the data acquisition device performs data cleaning on the acquired data, performs format conversion on the cleaned data, converts the format of the data in the acquired data source into the data format of a corresponding target table or target field in the target database, and stores the converted data into the target table or target field of the target database.
As one embodiment of the invention, when the data mapping rule between the data source and the target database is configured for the created acquisition subtask, the data acquisition mode of the data source is also configured for each created acquisition subtask, including a full data mode, an incremental data mode and a time period data mode.
The full data mode refers to collecting all data in a data source which is required to be collected currently when collecting data. The incremental data mode refers to the data in the acquired data source that has been newly added since the last acquisition. The time slot data mode is to collect all data in a specified time slot in a data source, when the time slot data mode is configured, the start and stop time of collection is set, all data of the data source in the time slot are collected according to the set start and stop time, namely, the start time of collection is set to x years x months x days x times, the end time of collection is set to y years y months y days y times, and when a data collection task is executed, all data in the data source from x years x months x days x times to y years y months y days y times are collected.
As one embodiment of the invention, the method further comprises the step of monitoring the execution state, the execution time, the execution log and the occupied resources of the acquisition subtasks when the task executor executes the acquisition subtasks.
As one embodiment of the invention, the method further comprises counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks.
Counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks in a certain time from the current moment, specifically counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks in one day from the current moment, and setting the specific time period length according to actual needs by a person skilled in the art in practical application.
As shown in fig. 2, the multi-modal data collection system 200 includes: a task processing module 210 and a configuration management module 220.
A task processing module 210, configured to create an acquisition task;
a configuration management module 220, configured to configure a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
the task processing module 210 is further configured to create an acquisition subtask under an acquisition task, and configure a data mapping rule between the data source and the target database for the created acquisition subtask;
The task processing module 210 is further configured to select an applicable task executor for the created acquisition subtask according to the configured data source and target database; the task executor comprises a plurality of execution engines, wherein the execution engines are a plurality of ETL tools which are assembled in advance, and the ETL tools comprise but are not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool and an OGG tool; the task executor also comprises a data collector, wherein the data collector is a self-defined execution engine and is a non-ETL tool, and the task executor comprises a real-time interface collector, a real-time sequence collector and an MQ interface collector and is used for collecting data which cannot be collected by the execution engine;
the task processing module 210 is further configured to invoke a task executor to execute a corresponding acquisition subtask.
As one embodiment of the present invention, the configuration management module 220 includes a data source configuration module 211 and a target database configuration module 222.
A data source configuration module 221, configured to configure a source path, an address, and a port of a data source;
the target database configuration module 222 is configured to configure the address, port and library table information of the target database.
As one embodiment of the present invention, the task processing modules include a task management module 211, a task configuration module 212, a task executor management module 213, a task monitoring module 214, and a task display module 215.
The task management module 211 is configured to manage creation of an acquisition task and creation, start, pause, resume, and termination of an acquisition subtask. The starting of the acquisition subtask refers to executing the corresponding acquisition subtask on the applicable task executor selected for the created acquisition subtask, the suspending of the acquisition subtask refers to controlling the task executor to suspend executing the corresponding acquisition subtask, the restoring of the acquisition subtask refers to controlling the task executor to finish suspending the state of executing the corresponding acquisition subtask, the corresponding acquisition subtask is continuously executed, and the termination of the acquisition subtask is controlling the task executor to finish executing the corresponding acquisition subtask.
The task configuration module 212 is configured to configure a data mapping rule between the data source and the target database for the created acquisition subtask, and the task configuration module 212 is further configured to configure a data acquisition mode of the data source for the created acquisition subtask, including a full data mode, an incremental data mode, and a time period data mode.
The task executor management module 213 is configured to implement management of task executors, including implementing addition, deletion, modification, search, and monitoring of task executors. Specifically, the user may add a new task executor or delete a task executor with a fault through the task executor management module 213 as required, and may modify the configuration of the task executor or query the configuration or the state of the task executor through the task executor management module 213 as required, where the task executor management module 213 monitors the state and the performance of the task executor in real time, and when detecting that the task executor is abnormal, sends a corresponding alarm notification to remind the user that the task executor is abnormal.
The task monitoring module 214 is configured to monitor an execution state, an execution time, an execution log, and occupied resources of the acquisition subtask when the task executor 213 executes the acquisition subtask.
The task display module 215 is configured to count and display the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks, and the execution duration of the acquisition subtasks.
Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the multi-mode data acquisition method according to the embodiment of the present invention.
The terminal 300 may include: processor 310, memory 320, and communication module 330. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
The memory 320 may be used to store instructions for execution by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 320, when executed by processor 310, enables terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 320, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (INTEGRATED CIRCUIT, simply referred to as an IC), for example, a single packaged IC, or may be comprised of multiple packaged ICs connected to one another for the same function or for different functions. For example, the processor 310 may include only a central processing unit (Central Processing Unit, CPU for short). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
And a communication module 330, configured to establish a communication channel, so that the storage terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.
As shown in fig. 4, the present invention further provides a computer storage medium 400, where the computer storage medium 400 may store a program 401, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory RAM), or the like.
According to the invention, based on a multi-mode data acquisition technology, a plurality of acquisition subtasks are created under the acquisition task, different task executors are selected to execute the data acquisition task according to the acquisition subtasks, and the acquisition of data of different modes can be realized by selecting a plurality of ETL tools in an execution engine in the task executors, so that the data acquisition has better expansibility and compatibility, and is beneficial to timely and conveniently acquiring richer data.
The invention is based on a multi-mode data acquisition technology, and a custom execution engine data acquisition device is arranged, and through the data acquisition device, the ETL tool in the execution engine can directly acquire data of different modes which cannot be directly acquired, so that the data acquisition has flexibility and adaptability, can be suitable for different multi-mode data acquisition scenes, such as medical data acquisition scenes, flexibly selects a proper task executor to perform data acquisition, reduces the complexity and cost of data acquisition, and improves the real-time performance of data acquisition. In addition, the invention has reliable design principle, simple structure and very wide application prospect. The technical effects achieved by this embodiment may be referred to above, and will not be described herein.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the terminal embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some interface, indirect coupling or communication connection of systems or modules, electrical, mechanical, or other form.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A multi-modal data collection method, the method comprising the steps of:
creating an acquisition task;
Configuring a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
Creating an acquisition subtask under an acquisition task, and configuring a data mapping rule between the data source and the target database for the created acquisition subtask;
Selecting an applicable task executor for the created acquisition subtasks according to the configured data source and target database, wherein the task executor comprises a plurality of execution engines, and the execution engines are a plurality of ETL tools which are assembled in advance, including but not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool and an OGG tool;
and calling a task executor to execute the corresponding acquisition subtask.
2. The method of claim 1, wherein the task executor further comprises a data collector, the data collector is a custom execution engine, and the non-ETL tool comprises a real-time interface collector, a real-time timing collector and an MQ interface collector, and is configured to collect data that cannot be collected by the execution engine.
3. The method of claim 1, wherein configuring the data source and target databases comprises: configuring a source path, an address and a port of a data source; and configuring the address, port and library table information of the target database.
4. The multi-modal data collection method of claim 1 further comprising, in configuring data mapping rules between the data source and the target database for the created collection sub-tasks, further configuring data collection patterns of the data source for each created collection sub-task, including a full data pattern, an incremental data pattern, and a time period data pattern.
5. The method of claim 1, further comprising monitoring the execution status, execution time, execution log, and occupied resources of the acquisition subtask as the task executor executes the acquisition subtask;
The method further comprises the steps of counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks.
6. A multi-modal data acquisition system, the system comprising:
the task processing module is used for creating an acquisition task;
The configuration management module is used for configuring a data source and a target database for the acquisition task; the data source is a source for data acquisition, and the target database is a database for storing acquired data;
The task processing module is also used for creating an acquisition subtask under the acquisition task and configuring a data mapping rule between the data source and the target database for the created acquisition subtask;
The task processing module is also used for selecting an applicable task executor for the created acquisition subtask according to the configured data source and target database; the task executor comprises a plurality of execution engines, wherein the execution engines are a plurality of ETL tools which are assembled in advance, and the ETL tools comprise but are not limited to a DataX tool, a Kettle tool, a Canal tool, a Flume tool, a Sqoop tool and an OGG tool; the task executor also comprises a data collector, wherein the data collector is a self-defined execution engine and is a non-ETL tool, and the task executor comprises a real-time interface collector, a real-time sequence collector and an MQ interface collector and is used for collecting data which cannot be collected by the execution engine;
and the task processing module is also used for calling the task executor to execute the corresponding acquisition subtasks.
7. The multi-modal data collection system of claim 6 wherein the configuration management module comprises a data source configuration module and a target database configuration module;
the data source configuration module is used for configuring a source path, an address and a port of a data source;
and the target database configuration module is used for configuring the address, the port and the library table information of the target database.
8. The multi-modal data collection system of claim 6 wherein the task processing module includes a task management module, a task configuration module, a task executor management module, a task monitoring module, and a task display module;
the task management module is used for managing the creation, starting, suspension, restoration and termination of the acquisition task and the acquisition subtask;
The task configuration module is used for configuring a data mapping rule between the data source and the target database for the created acquisition subtask; the data acquisition mode is also used for configuring a data source for the created acquisition subtasks, and comprises a full data mode, an incremental data mode and a time period data mode;
The task executor management module is used for realizing the management of the task executor;
The task monitoring module is used for monitoring the execution state, the execution time, the execution log and the occupied resources of the acquisition subtask when the task executor executes the acquisition subtask;
the task display module is used for counting and displaying the number of successful execution of the acquisition subtasks, the number of failed execution of the acquisition subtasks and the execution time of the acquisition subtasks.
9. A terminal comprising a memory and a processor;
the memory is used for storing a multi-mode data acquisition program;
A processor for implementing the multi-modality data acquisition method of any one of claims 1-5 when executing the multi-modality data acquisition program.
10. A computer readable storage medium storing a computer program, characterized in that the readable storage medium has stored thereon a multi-modal data collection program, which when executed by a processor implements the multi-modal data collection method of any one of claims 1-5.
CN202410518256.5A 2024-04-28 2024-04-28 Multi-mode data acquisition method, system, terminal and storage medium Pending CN118093707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410518256.5A CN118093707A (en) 2024-04-28 2024-04-28 Multi-mode data acquisition method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410518256.5A CN118093707A (en) 2024-04-28 2024-04-28 Multi-mode data acquisition method, system, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN118093707A true CN118093707A (en) 2024-05-28

Family

ID=91156665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410518256.5A Pending CN118093707A (en) 2024-04-28 2024-04-28 Multi-mode data acquisition method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN118093707A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111983A (en) * 2014-06-30 2014-10-22 中国科学院信息工程研究所 Open-type multi-source data collection system and method
US20180011912A1 (en) * 2016-07-11 2018-01-11 Al-Elm Information Security Co. Methods and systems for multi-dynamic data retrieval and data disbursement
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109582723A (en) * 2018-11-30 2019-04-05 深圳市思迪信息技术股份有限公司 Distributed ETL collecting method and device
CN109669983A (en) * 2018-12-27 2019-04-23 杭州火树科技有限公司 Visualize multi-data source ETL tool
CN114661665A (en) * 2022-03-01 2022-06-24 中国工商银行股份有限公司 Determination method of execution engine, model training method and device
CN114880387A (en) * 2022-05-07 2022-08-09 中国银行股份有限公司 Data integration script generation method and device, storage medium and electronic equipment
CN116028192A (en) * 2023-03-29 2023-04-28 中电科大数据研究院有限公司 Multi-source heterogeneous data acquisition method, device and storage medium
JP2023146397A (en) * 2022-03-29 2023-10-12 株式会社トプコン Data management system, management method, and management program
CN117331923A (en) * 2023-09-28 2024-01-02 北方健康医疗大数据科技有限公司 Medical data platform data aggregation mapping system and terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111983A (en) * 2014-06-30 2014-10-22 中国科学院信息工程研究所 Open-type multi-source data collection system and method
US20180011912A1 (en) * 2016-07-11 2018-01-11 Al-Elm Information Security Co. Methods and systems for multi-dynamic data retrieval and data disbursement
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109582723A (en) * 2018-11-30 2019-04-05 深圳市思迪信息技术股份有限公司 Distributed ETL collecting method and device
CN109669983A (en) * 2018-12-27 2019-04-23 杭州火树科技有限公司 Visualize multi-data source ETL tool
CN114661665A (en) * 2022-03-01 2022-06-24 中国工商银行股份有限公司 Determination method of execution engine, model training method and device
JP2023146397A (en) * 2022-03-29 2023-10-12 株式会社トプコン Data management system, management method, and management program
CN114880387A (en) * 2022-05-07 2022-08-09 中国银行股份有限公司 Data integration script generation method and device, storage medium and electronic equipment
CN116028192A (en) * 2023-03-29 2023-04-28 中电科大数据研究院有限公司 Multi-source heterogeneous data acquisition method, device and storage medium
CN117331923A (en) * 2023-09-28 2024-01-02 北方健康医疗大数据科技有限公司 Medical data platform data aggregation mapping system and terminal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林昆;: "面向数据仓库的ETL工具的研究与实现", 计算技术与自动化, no. 01, 15 March 2018 (2018-03-15) *
池燕清等: "基于大数据跨平台的分布式实时数据采集技术实现", 信息与电脑, no. 24, 25 December 2019 (2019-12-25) *

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
EP3543866B1 (en) Resource-efficient record processing in unified automation platforms for robotic process automation
US8812752B1 (en) Connector interface for data pipeline
US8230056B2 (en) Enterprise management system
US8892719B2 (en) Method and apparatus for monitoring network servers
US8990536B2 (en) Systems and methods for journaling and executing device control instructions
CN107451147B (en) Method and device for dynamically switching kafka clusters
EP3617884B1 (en) Adapter extension for inbound messages from robotic automation platforms to unified automation platform
CN108459939A (en) A kind of log collecting method, device, terminal device and storage medium
CN109067841B (en) Service current limiting method, system, server and storage medium based on ZooKeeper
EP3543837A1 (en) Inadvertent input mitigation in robotic process automation
CN108762900A (en) High frequency method for scheduling task, system, computer equipment and storage medium
CN109656963A (en) Metadata acquisition methods, device, equipment and computer readable storage medium
US20100179957A1 (en) Polling Method of Switch Status Based on Timer-triggered Scheduler of Stored Procedures
CN104407919A (en) Data processing task scheduling system and method
CN113760677A (en) Abnormal link analysis method, device, equipment and storage medium
CN117149873A (en) Data lake service platform construction method based on flow batch integration
WO2022237506A1 (en) Method, apparatus, and device for monitoring online diagnosis service, and storage medium
CN110636116B (en) Multidimensional data acquisition system and method
CN113743879A (en) Automatic rule processing method, system and related equipment
CN118093707A (en) Multi-mode data acquisition method, system, terminal and storage medium
US11915035B1 (en) Task state updating method and apparatus, device, and medium
US12040954B2 (en) Alternative control interface provided to infrastructure-as-a-service clients
CN116136801B (en) Cloud platform data processing method and device, electronic equipment and storage medium
CN110728838B (en) Meter reading method and device and power acquisition terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination