CN117931809A

CN117931809A - Data blood margin generation method and system based on digitalization

Info

Publication number: CN117931809A
Application number: CN202410222131.8A
Authority: CN
Inventors: 彭云苹; 于强; 朱乐乐
Original assignee: Shanghai Kuanrui Information Technology Co ltd
Current assignee: Shanghai Kuanrui Information Technology Co ltd
Priority date: 2024-02-28
Filing date: 2024-02-28
Publication date: 2024-04-26

Abstract

The invention provides a data blood-edge generation method and system based on digitalization, wherein the method comprises the following steps: acquiring basic characteristics of a data source file through a platform acquisition information table; the platform file warehouse-in information table records the warehouse-in information of the data source file by filling in the first configuration information; the platform database access information table is connected with the upstream source database by filling in the second configuration information, and data in the upstream source database are obtained; filling third configuration information, accessing and calling a specified API interface by a platform API call information table, acquiring data in the API interface and storing the data in a specified table internal sequence of a specified database; establishing an ods layer table according to the file content of the platform file warehouse entry, the database access table dictionary and the API access table dictionary; the multiple ods layer tables are associated as dw layer tables. The method and the system for generating the blood edges based on the digitized data can rapidly acquire the relationship between the upstream and the downstream of the data, greatly reduce the development error rate and save the time cost.

Description

Data blood margin generation method and system based on digitalization

Technical Field

The invention relates to the technical field of database management, in particular to a data blood-margin generation method and system based on digitalization.

Background

Data lineage techniques in the current market typically query SQL (Structured Query Language, database language) field lineage relationships through a database. The invention patent with the application number of CN202210878726.X and the name of a database query SQL field blood-edge relation generating method discloses a database query SQL field blood-edge relation generating method, firstly, a SQL blood-edge analyzer is constructed, related SQL sentences are input, a field expression set and a query expression set of a result obtained by analyzing the SQL sentences are analyzed, field expression processing is carried out, whether the field expressions are connected or not is checked, all connected expressions of tables are traversed, a mapping relation is obtained, and a result of data blood-edge is obtained.

However, the application limitation of the data blood-source technology is strong, the running time of the test process is long, the operation is complex, and the efficiency is low.

It is therefore desirable to provide a method and apparatus for generating data blood edges based on digitization, which can solve the above-mentioned problems.

Disclosure of Invention

Aiming at the problems and the shortcomings existing in the prior art, the invention provides a data blood-margin generation method and system based on digitalization.

The invention solves the technical problems by the following technical proposal:

The invention provides a data blood-edge generation method based on digitalization, which comprises the following steps:

Recording basic characteristics of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;

the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;

The platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;

The method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;

Establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship;

and associating the multiple od layer tables into a dw layer table, wherein association relations between the dw layer table and the multiple od layer tables are stored in a pace function.

Preferably, the data source file is obtained through a third party API interface or the data source file is obtained through a third party platform.

Preferably, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.

Preferably, a view name and a creation sentence are acquired from the data source file, characters in the creation sentence are segmented to obtain a first keyword and a second keyword, names behind the first keyword are recorded as an upstream table of the data source file, and names behind the second keyword are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.

Preferably, if there is one of the first type tables and there is no one of the data source files, the first type table is deleted to indicate that the blood-edge relationship does not exist;

if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;

If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.

The invention also provides a data blood-edge generating system based on digitalization, which comprises:

The basic feature acquisition module is used for recording basic features of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;

The system comprises a first configuration module, a first configuration module and a second configuration module, wherein the first configuration module is used for writing first configuration information, a platform file storage information table records storage information of the data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and establish a blood-margin relationship, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;

The second configuration module is used for connecting an upstream source database through filling second configuration information, a platform database access information table is connected with the upstream source database, data in the upstream source database are obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;

The platform API call information table is used for accessing and calling a specified API interface by filling third configuration information, acquiring data in the API interface and storing the data in a specified table of a specified database, wherein the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;

the system comprises an ods layer table construction module, a database access table dictionary and an API access table dictionary, wherein the ods layer table construction module is used for constructing an ods layer table according to the file content of a platform file warehouse-in, the database access table dictionary and the API access table dictionary, storing the data of the upstream source into a database, and correlating the ods layer table with the platform file warehouse-in information table through the warehouse-in database name and the warehouse-in table name and constructing a blood-edge relationship;

And the dw layer table construction module is used for associating a plurality of the ods layer tables into a dw layer table, and the association relation between the dw layer table and the ods layer tables is stored in a program function.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

The data blood edge generation method and system based on the digitalization provided by the embodiment of the invention record the basic characteristics of a data source file through a platform acquisition information table, wherein the information fields in the platform acquisition information table at least comprise acquisition IDs, acquisition names and file storage paths, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition IDs; the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID; the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID; the method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID; establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship; the multiple ods layer tables are associated to be dw layer tables, association relations between the dw layer tables and the multiple ods layer tables are stored in a pace function, so that relations between upstream and downstream of data are acquired rapidly, development error rate is reduced greatly, and time cost is saved;

Further, the data source file is obtained through a third party API interface, or the data source file is obtained through a third party platform, or the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database, so that the data source file can be comprehensively and rapidly obtained;

Further, obtaining a view name and a creation sentence from the data source file, dividing characters in the creation sentence to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood-edge relation with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, judging the data source file to be a second type table if the data source file has the upstream table, judging the data source file to be a first type table if the data source file has the upstream table, indicating that the blood-edge relation does not exist if the data source file does not have the upstream table, and deleting the first type table; if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted; if the data source file is in the list, but the first list or the second list is not, the list is indicated as a new list, the list is newly added, so that the blood-edge relation of the list which does not exist is timely deleted, and the first list when the blood-edge relation is changed is timely deleted.

Drawings

FIG. 1 is a flow chart of a method for generating a digitized-based data blood margin according to an embodiment of the invention;

Fig. 2 is a schematic structural diagram of a digitized-based data blood-edge generating system according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Based on the problems existing in the prior art, as shown in fig. 1, the invention provides a data blood-edge generating method based on digitalization, which comprises the following steps:

step S101: recording basic characteristics of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;

Step S102: the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;

step S103: the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;

step S104: the platform API (Application Programming Interface ) calls an information table to access a designated API interface by filling third configuration information, data in the API interface is obtained and stored in a designated table of a designated database, the third configuration information comprises a third configuration ID, and the platform API calls an API access scheduler through the third configuration ID;

Step S105: establishing an ods (Operational Data Store, operable data storage) layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, and storing the data of the upstream source to a database, wherein the ods layer table and the platform file warehouse entry information table are associated through the warehouse entry database name and the warehouse entry table name, and a blood-edge relation is established;

Step S106: and associating the multiple ods layer tables into a dw (Data Warehouse ) layer table, wherein the association relation between the dw layer table and the multiple ods layer tables is stored in a program function.

Specifically, in step S101, the information field of the platform acquisition information table further includes: ID. Whether the file is looped, looped frequency, URL, decrypted, identification keyword, name keyword, whether the URL contains date, calling type, basic information parameter, information head information, keyword, file naming standard, file storage type, collecting website, collecting photo, announcement release time, action chain, remark, release time, update time and update ID.

In step S102, the information fields in the platform file warehouse-in information table further include: ID. File name, file type, database ID, segmenter, field name row, first data row, last data row, full table limit, release time, update ID.

In step S103, the information fields of the platform database access information table further include: data flag, issue time, update ID, valid flag, delete time. The upstream source database is a database selected during form building, and forms are built in the database.

In step S104, the information fields in the platform API call information table further include: ID. Configuration ID, table ID, URL, API interface code, call parameter information, data source code, data flag, issue time, update ID, valid flag, delete time. The specified table of the specified database specifically comprises a table which is related by a dictionary table according to a table name of intotab eng fn in a configuration table of file storage, and can be related to a downstream, wherein the table which is stored in the configuration table is the downstream, and the upstream is the configuration ID of the related table.

In a specific implementation, the data source file is acquired through a third party API interface or acquired through a third party platform.

In specific implementation, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.

In specific implementation, obtaining a view name and a creation statement from the data source file, dividing characters in the creation statement to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.

In a specific implementation, if the first type table is available and the data source file is not available, the blood edge relation is not available, and the first type table is deleted;

A basic feature obtaining module 21, configured to record basic features of a data source file through a platform collection information table, where an information field in the platform collection information table includes at least a collection ID, a collection name, and a file storage path, and the data source file includes a table or a view, and the collection task scheduler is associated with the data source file through the collection ID;

A first configuration module 22, configured to record, by filling first configuration information, a platform file entry information table, where an information field in the platform file entry information table includes at least a first configuration ID, a file ID, a server file path, an entry database name, and an entry table name, where the platform file entry information table and the platform acquisition information table are associated by the file storage path and the entry table name and establish a blood-edge relationship, and where the platform file entry information table is associated with a file entry scheduler by the first configuration ID;

a second configuration module 23, configured to link the upstream source database by filling second configuration information, where the second configuration information includes a second configuration ID, and the platform database access information table is related to the database access scheduler by the second configuration ID, and acquire data in the upstream source database and store the data in a specified table of the specified database;

A third configuration module 24, configured to access and call a specified API interface by filling third configuration information, where the third configuration information includes a third configuration ID, and the platform API call information table is associated with an API access scheduler through the third configuration ID;

the ots layer table construction module 25 is configured to establish an ots layer table according to the file content of the platform file warehouse entry, the database access table dictionary, and the API access table dictionary, and store the data of the upstream source into the database, where the ots layer table and the platform file warehouse entry information table are associated by the warehouse entry database name and the warehouse entry table name, and establish a blood-edge relationship;

and the dw layer table construction module 26 is configured to associate a plurality of the ods layer tables into a dw layer table, where association relations between the dw layer table and the ods layer tables are stored in a program function.

The above scheme will be further described by taking the offsite and pen-by-pen information of the rotatable debt of the profound institute as an example.

Firstly, basic information of a turn-by-turn information table outside a rotatable debt field of a deep exchange is filled in an information field of an information table collected by a platform, and the method comprises the following steps: acquisition ID (240101), acquisition name (pen by pen outside the general debt farm of the deep intersection), file deposit path (/ home/data/mydata/SZ/SZFI/SZFI _OTC_tic).

Then, filling the basic information of the turn-by-turn information table outside the debt field of the deep exchange in the information field in the platform file storage information table, wherein the basic information comprises the following steps: file ID (24010002), file deposit path (/ home/data/mydata/SZ/SZFI/SZFI _OTC_tick/{ YYYYMMDD } -SZFI _OTC_tick_ kzz _new.xlsx), warehouse entry database (mysql/odssse), warehouse entry table English name (Tick_ SZFI _OTC). Establishing an operable data storage through a custom table structure: the information fields in the custom table structure include: form coding (odssse), form chinese name (deep harvest line by line quotation), database (mysql/odssse), form english name (Tick_ SZFI _OTC).

The information fields in the API call information table include: form coding (dwkr), tabular names (bond offsite bought by bought), databases (mysql/dwkr), tabular names (bood _ otr _ outr _quot).

The data source file can be obtained through a third party API interface or obtained through a third party platform, or can be obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.

In summary, according to the method and system for generating the data blood edges based on the digitization provided by the embodiments of the present invention, basic features of a data source file are recorded through a platform acquisition information table, information fields in the platform acquisition information table at least include an acquisition ID, an acquisition name and a file storage path, the data source file includes a table or a view, and an acquisition task scheduler is associated through the acquisition ID; the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID; the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID; the method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID; establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship; the multiple ods layer tables are associated to be dw layer tables, association relations between the dw layer tables and the multiple ods layer tables are stored in a pace function, so that relations between upstream and downstream of data are acquired rapidly, development error rate is reduced greatly, and time cost is saved;

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method for generating a data blood-lineage based on digitization, the method comprising:

2. The digitized based data lineage generation method according to claim 1, wherein the data source file is acquired through a third party API interface or the data source file is acquired through a third party platform.

3. The digitized based data lineage generation method according to claim 1, wherein the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.

4. The digitized based data lineage generation method according to claim 1, wherein a view name and a creation statement are obtained from the data source file, characters in the creation statement are segmented to obtain a first key and a second key, names following the first key are recorded as an upstream table of the data source file, names following the second key are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.

5. The method of digitized-based data lineage generation according to claim 4,

If the first type table exists and the data source file does not exist, the blood relationship is not existed, and the first type table is deleted;

6. A digital-based data lineage generation system, the system including:

7. The digitized based data lineage generation system according to claim 1, wherein the data source file is acquired through a third party API interface or the data source file is acquired through a third party platform.

8. The digitized based data lineage generation system according to claim 1, wherein the data source file is acquired through a database account number provided by a third party, and a database of the third party is extracted to a local database.

9. The digitized based data lineage generation system according to claim 1, wherein a view name and a creation statement are obtained from the data source file, characters in the creation statement are segmented to obtain a first key and a second key, names following the first key are recorded as an upstream table of the data source file, names following the second key are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.

10. The digitized based data lineage generation system according to claim 9, wherein if there is a list of the first type, but there is no data source file, indicating that the lineage relationship is not already present, deleting the list of the first type;