CN117931809A - Data blood margin generation method and system based on digitalization - Google Patents

Data blood margin generation method and system based on digitalization Download PDF

Info

Publication number
CN117931809A
CN117931809A CN202410222131.8A CN202410222131A CN117931809A CN 117931809 A CN117931809 A CN 117931809A CN 202410222131 A CN202410222131 A CN 202410222131A CN 117931809 A CN117931809 A CN 117931809A
Authority
CN
China
Prior art keywords
file
data source
platform
database
source file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410222131.8A
Other languages
Chinese (zh)
Inventor
彭云苹
于强
朱乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kuanrui Information Technology Co ltd
Original Assignee
Shanghai Kuanrui Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kuanrui Information Technology Co ltd filed Critical Shanghai Kuanrui Information Technology Co ltd
Priority to CN202410222131.8A priority Critical patent/CN117931809A/en
Publication of CN117931809A publication Critical patent/CN117931809A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data blood-edge generation method and system based on digitalization, wherein the method comprises the following steps: acquiring basic characteristics of a data source file through a platform acquisition information table; the platform file warehouse-in information table records the warehouse-in information of the data source file by filling in the first configuration information; the platform database access information table is connected with the upstream source database by filling in the second configuration information, and data in the upstream source database are obtained; filling third configuration information, accessing and calling a specified API interface by a platform API call information table, acquiring data in the API interface and storing the data in a specified table internal sequence of a specified database; establishing an ods layer table according to the file content of the platform file warehouse entry, the database access table dictionary and the API access table dictionary; the multiple ods layer tables are associated as dw layer tables. The method and the system for generating the blood edges based on the digitized data can rapidly acquire the relationship between the upstream and the downstream of the data, greatly reduce the development error rate and save the time cost.

Description

Data blood margin generation method and system based on digitalization
Technical Field
The invention relates to the technical field of database management, in particular to a data blood-margin generation method and system based on digitalization.
Background
Data lineage techniques in the current market typically query SQL (Structured Query Language, database language) field lineage relationships through a database. The invention patent with the application number of CN202210878726.X and the name of a database query SQL field blood-edge relation generating method discloses a database query SQL field blood-edge relation generating method, firstly, a SQL blood-edge analyzer is constructed, related SQL sentences are input, a field expression set and a query expression set of a result obtained by analyzing the SQL sentences are analyzed, field expression processing is carried out, whether the field expressions are connected or not is checked, all connected expressions of tables are traversed, a mapping relation is obtained, and a result of data blood-edge is obtained.
However, the application limitation of the data blood-source technology is strong, the running time of the test process is long, the operation is complex, and the efficiency is low.
It is therefore desirable to provide a method and apparatus for generating data blood edges based on digitization, which can solve the above-mentioned problems.
Disclosure of Invention
Aiming at the problems and the shortcomings existing in the prior art, the invention provides a data blood-margin generation method and system based on digitalization.
The invention solves the technical problems by the following technical proposal:
The invention provides a data blood-edge generation method based on digitalization, which comprises the following steps:
Recording basic characteristics of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;
the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;
The platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;
The method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;
Establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship;
and associating the multiple od layer tables into a dw layer table, wherein association relations between the dw layer table and the multiple od layer tables are stored in a pace function.
Preferably, the data source file is obtained through a third party API interface or the data source file is obtained through a third party platform.
Preferably, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
Preferably, a view name and a creation sentence are acquired from the data source file, characters in the creation sentence are segmented to obtain a first keyword and a second keyword, names behind the first keyword are recorded as an upstream table of the data source file, and names behind the second keyword are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
Preferably, if there is one of the first type tables and there is no one of the data source files, the first type table is deleted to indicate that the blood-edge relationship does not exist;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
The invention also provides a data blood-edge generating system based on digitalization, which comprises:
The basic feature acquisition module is used for recording basic features of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;
The system comprises a first configuration module, a first configuration module and a second configuration module, wherein the first configuration module is used for writing first configuration information, a platform file storage information table records storage information of the data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and establish a blood-margin relationship, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;
The second configuration module is used for connecting an upstream source database through filling second configuration information, a platform database access information table is connected with the upstream source database, data in the upstream source database are obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;
The platform API call information table is used for accessing and calling a specified API interface by filling third configuration information, acquiring data in the API interface and storing the data in a specified table of a specified database, wherein the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;
the system comprises an ods layer table construction module, a database access table dictionary and an API access table dictionary, wherein the ods layer table construction module is used for constructing an ods layer table according to the file content of a platform file warehouse-in, the database access table dictionary and the API access table dictionary, storing the data of the upstream source into a database, and correlating the ods layer table with the platform file warehouse-in information table through the warehouse-in database name and the warehouse-in table name and constructing a blood-edge relationship;
And the dw layer table construction module is used for associating a plurality of the ods layer tables into a dw layer table, and the association relation between the dw layer table and the ods layer tables is stored in a program function.
Preferably, the data source file is obtained through a third party API interface or the data source file is obtained through a third party platform.
Preferably, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
Preferably, a view name and a creation sentence are acquired from the data source file, characters in the creation sentence are segmented to obtain a first keyword and a second keyword, names behind the first keyword are recorded as an upstream table of the data source file, and names behind the second keyword are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
Preferably, if there is one of the first type tables and there is no one of the data source files, the first type table is deleted to indicate that the blood-edge relationship does not exist;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
The data blood edge generation method and system based on the digitalization provided by the embodiment of the invention record the basic characteristics of a data source file through a platform acquisition information table, wherein the information fields in the platform acquisition information table at least comprise acquisition IDs, acquisition names and file storage paths, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition IDs; the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID; the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID; the method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID; establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship; the multiple ods layer tables are associated to be dw layer tables, association relations between the dw layer tables and the multiple ods layer tables are stored in a pace function, so that relations between upstream and downstream of data are acquired rapidly, development error rate is reduced greatly, and time cost is saved;
Further, the data source file is obtained through a third party API interface, or the data source file is obtained through a third party platform, or the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database, so that the data source file can be comprehensively and rapidly obtained;
Further, obtaining a view name and a creation sentence from the data source file, dividing characters in the creation sentence to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood-edge relation with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, judging the data source file to be a second type table if the data source file has the upstream table, judging the data source file to be a first type table if the data source file has the upstream table, indicating that the blood-edge relation does not exist if the data source file does not have the upstream table, and deleting the first type table; if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted; if the data source file is in the list, but the first list or the second list is not, the list is indicated as a new list, the list is newly added, so that the blood-edge relation of the list which does not exist is timely deleted, and the first list when the blood-edge relation is changed is timely deleted.
Drawings
FIG. 1 is a flow chart of a method for generating a digitized-based data blood margin according to an embodiment of the invention;
Fig. 2 is a schematic structural diagram of a digitized-based data blood-edge generating system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Based on the problems existing in the prior art, as shown in fig. 1, the invention provides a data blood-edge generating method based on digitalization, which comprises the following steps:
step S101: recording basic characteristics of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;
Step S102: the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;
step S103: the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;
step S104: the platform API (Application Programming Interface ) calls an information table to access a designated API interface by filling third configuration information, data in the API interface is obtained and stored in a designated table of a designated database, the third configuration information comprises a third configuration ID, and the platform API calls an API access scheduler through the third configuration ID;
Step S105: establishing an ods (Operational Data Store, operable data storage) layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, and storing the data of the upstream source to a database, wherein the ods layer table and the platform file warehouse entry information table are associated through the warehouse entry database name and the warehouse entry table name, and a blood-edge relation is established;
Step S106: and associating the multiple ods layer tables into a dw (Data Warehouse ) layer table, wherein the association relation between the dw layer table and the multiple ods layer tables is stored in a program function.
Specifically, in step S101, the information field of the platform acquisition information table further includes: ID. Whether the file is looped, looped frequency, URL, decrypted, identification keyword, name keyword, whether the URL contains date, calling type, basic information parameter, information head information, keyword, file naming standard, file storage type, collecting website, collecting photo, announcement release time, action chain, remark, release time, update time and update ID.
In step S102, the information fields in the platform file warehouse-in information table further include: ID. File name, file type, database ID, segmenter, field name row, first data row, last data row, full table limit, release time, update ID.
In step S103, the information fields of the platform database access information table further include: data flag, issue time, update ID, valid flag, delete time. The upstream source database is a database selected during form building, and forms are built in the database.
In step S104, the information fields in the platform API call information table further include: ID. Configuration ID, table ID, URL, API interface code, call parameter information, data source code, data flag, issue time, update ID, valid flag, delete time. The specified table of the specified database specifically comprises a table which is related by a dictionary table according to a table name of intotab eng fn in a configuration table of file storage, and can be related to a downstream, wherein the table which is stored in the configuration table is the downstream, and the upstream is the configuration ID of the related table.
In a specific implementation, the data source file is acquired through a third party API interface or acquired through a third party platform.
In specific implementation, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
In specific implementation, obtaining a view name and a creation statement from the data source file, dividing characters in the creation statement to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
In a specific implementation, if the first type table is available and the data source file is not available, the blood edge relation is not available, and the first type table is deleted;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
The invention also provides a data blood-edge generating system based on digitalization, which comprises:
A basic feature obtaining module 21, configured to record basic features of a data source file through a platform collection information table, where an information field in the platform collection information table includes at least a collection ID, a collection name, and a file storage path, and the data source file includes a table or a view, and the collection task scheduler is associated with the data source file through the collection ID;
A first configuration module 22, configured to record, by filling first configuration information, a platform file entry information table, where an information field in the platform file entry information table includes at least a first configuration ID, a file ID, a server file path, an entry database name, and an entry table name, where the platform file entry information table and the platform acquisition information table are associated by the file storage path and the entry table name and establish a blood-edge relationship, and where the platform file entry information table is associated with a file entry scheduler by the first configuration ID;
a second configuration module 23, configured to link the upstream source database by filling second configuration information, where the second configuration information includes a second configuration ID, and the platform database access information table is related to the database access scheduler by the second configuration ID, and acquire data in the upstream source database and store the data in a specified table of the specified database;
A third configuration module 24, configured to access and call a specified API interface by filling third configuration information, where the third configuration information includes a third configuration ID, and the platform API call information table is associated with an API access scheduler through the third configuration ID;
the ots layer table construction module 25 is configured to establish an ots layer table according to the file content of the platform file warehouse entry, the database access table dictionary, and the API access table dictionary, and store the data of the upstream source into the database, where the ots layer table and the platform file warehouse entry information table are associated by the warehouse entry database name and the warehouse entry table name, and establish a blood-edge relationship;
and the dw layer table construction module 26 is configured to associate a plurality of the ods layer tables into a dw layer table, where association relations between the dw layer table and the ods layer tables are stored in a program function.
In a specific implementation, the data source file is acquired through a third party API interface or acquired through a third party platform.
In specific implementation, the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
In specific implementation, obtaining a view name and a creation statement from the data source file, dividing characters in the creation statement to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
In a specific implementation, if the first type table is available and the data source file is not available, the blood edge relation is not available, and the first type table is deleted;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
The above scheme will be further described by taking the offsite and pen-by-pen information of the rotatable debt of the profound institute as an example.
Firstly, basic information of a turn-by-turn information table outside a rotatable debt field of a deep exchange is filled in an information field of an information table collected by a platform, and the method comprises the following steps: acquisition ID (240101), acquisition name (pen by pen outside the general debt farm of the deep intersection), file deposit path (/ home/data/mydata/SZ/SZFI/SZFI _OTC_tic).
Then, filling the basic information of the turn-by-turn information table outside the debt field of the deep exchange in the information field in the platform file storage information table, wherein the basic information comprises the following steps: file ID (24010002), file deposit path (/ home/data/mydata/SZ/SZFI/SZFI _OTC_tick/{ YYYYMMDD } -SZFI _OTC_tick_ kzz _new.xlsx), warehouse entry database (mysql/odssse), warehouse entry table English name (Tick_ SZFI _OTC). Establishing an operable data storage through a custom table structure: the information fields in the custom table structure include: form coding (odssse), form chinese name (deep harvest line by line quotation), database (mysql/odssse), form english name (Tick_ SZFI _OTC).
The information fields in the API call information table include: form coding (dwkr), tabular names (bond offsite bought by bought), databases (mysql/dwkr), tabular names (bood _ otr _ outr _quot).
The data source file can be obtained through a third party API interface or obtained through a third party platform, or can be obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
In summary, according to the method and system for generating the data blood edges based on the digitization provided by the embodiments of the present invention, basic features of a data source file are recorded through a platform acquisition information table, information fields in the platform acquisition information table at least include an acquisition ID, an acquisition name and a file storage path, the data source file includes a table or a view, and an acquisition task scheduler is associated through the acquisition ID; the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID; the platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID; the method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID; establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship; the multiple ods layer tables are associated to be dw layer tables, association relations between the dw layer tables and the multiple ods layer tables are stored in a pace function, so that relations between upstream and downstream of data are acquired rapidly, development error rate is reduced greatly, and time cost is saved;
Further, the data source file is obtained through a third party API interface, or the data source file is obtained through a third party platform, or the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database, so that the data source file can be comprehensively and rapidly obtained;
Further, obtaining a view name and a creation sentence from the data source file, dividing characters in the creation sentence to obtain a first keyword and a second keyword, recording names behind the first keyword as an upstream table of the data source file, and recording names behind the second keyword as a downstream table of the data source file; comparing the blood-edge relation with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, judging the data source file to be a second type table if the data source file has the upstream table, judging the data source file to be a first type table if the data source file has the upstream table, indicating that the blood-edge relation does not exist if the data source file does not have the upstream table, and deleting the first type table; if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted; if the data source file is in the list, but the first list or the second list is not, the list is indicated as a new list, the list is newly added, so that the blood-edge relation of the list which does not exist is timely deleted, and the first list when the blood-edge relation is changed is timely deleted.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for generating a data blood-lineage based on digitization, the method comprising:
Recording basic characteristics of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;
the method comprises the steps that through filling first configuration information, a platform file storage information table records storage information of a data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and a blood-margin relation is established, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;
The platform database access information table is connected with an upstream source database by filling second configuration information, data in the upstream source database is obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;
The method comprises the steps that a platform API call information table is accessed to call a specified API interface by filling third configuration information, data in the API interface are obtained and stored in a specified table of a specified database, the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;
Establishing an ods layer table according to the file content of the platform file warehouse entry, a database access table dictionary and an API access table dictionary, storing the data of the upstream source into a database, and associating the ods layer table with the platform file warehouse entry information table through the warehouse entry database name and the warehouse entry table name and establishing a blood-related relationship;
and associating the multiple od layer tables into a dw layer table, wherein association relations between the dw layer table and the multiple od layer tables are stored in a pace function.
2. The digitized based data lineage generation method according to claim 1, wherein the data source file is acquired through a third party API interface or the data source file is acquired through a third party platform.
3. The digitized based data lineage generation method according to claim 1, wherein the data source file is obtained through a database account number provided by a third party, and a database of the third party is extracted to a local database.
4. The digitized based data lineage generation method according to claim 1, wherein a view name and a creation statement are obtained from the data source file, characters in the creation statement are segmented to obtain a first key and a second key, names following the first key are recorded as an upstream table of the data source file, names following the second key are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
5. The method of digitized-based data lineage generation according to claim 4,
If the first type table exists and the data source file does not exist, the blood relationship is not existed, and the first type table is deleted;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
6. A digital-based data lineage generation system, the system including:
The basic feature acquisition module is used for recording basic features of a data source file through a platform acquisition information table, wherein an information field in the platform acquisition information table at least comprises an acquisition ID, an acquisition name and a file storage path, the data source file comprises a table or a view, and an acquisition task scheduler is associated through the acquisition ID;
The system comprises a first configuration module, a first configuration module and a second configuration module, wherein the first configuration module is used for writing first configuration information, a platform file storage information table records storage information of the data source file, information fields in the platform file storage information table at least comprise a first configuration ID, a file ID, a server file path, a storage database name and a storage table name, the platform file storage information table and the platform acquisition information table are associated through the file storage path and the storage table name and establish a blood-margin relationship, and the platform file storage information table is associated with a file storage scheduling program through the first configuration ID;
The second configuration module is used for connecting an upstream source database through filling second configuration information, a platform database access information table is connected with the upstream source database, data in the upstream source database are obtained and stored in a specified table of a specified database, the second configuration information comprises a second configuration ID, and the platform database access information table is connected with a database access scheduler through the second configuration ID;
The platform API call information table is used for accessing and calling a specified API interface by filling third configuration information, acquiring data in the API interface and storing the data in a specified table of a specified database, wherein the third configuration information comprises a third configuration ID, and the platform API call information table is connected with an API access scheduler through the third configuration ID;
the system comprises an ods layer table construction module, a database access table dictionary and an API access table dictionary, wherein the ods layer table construction module is used for constructing an ods layer table according to the file content of a platform file warehouse-in, the database access table dictionary and the API access table dictionary, storing the data of the upstream source into a database, and correlating the ods layer table with the platform file warehouse-in information table through the warehouse-in database name and the warehouse-in table name and constructing a blood-edge relationship;
And the dw layer table construction module is used for associating a plurality of the ods layer tables into a dw layer table, and the association relation between the dw layer table and the ods layer tables is stored in a program function.
7. The digitized based data lineage generation system according to claim 1, wherein the data source file is acquired through a third party API interface or the data source file is acquired through a third party platform.
8. The digitized based data lineage generation system according to claim 1, wherein the data source file is acquired through a database account number provided by a third party, and a database of the third party is extracted to a local database.
9. The digitized based data lineage generation system according to claim 1, wherein a view name and a creation statement are obtained from the data source file, characters in the creation statement are segmented to obtain a first key and a second key, names following the first key are recorded as an upstream table of the data source file, names following the second key are recorded as a downstream table of the data source file; comparing the blood relationship with the upstream table and the downstream table, judging the data source file to be a first type table if the data source file does not have the upstream table, and judging the data source file to be a second type table if the data source file has the upstream table.
10. The digitized based data lineage generation system according to claim 9, wherein if there is a list of the first type, but there is no data source file, indicating that the lineage relationship is not already present, deleting the list of the first type;
if the first type list and the second type list are repeated, the blood relationship is changed, and the first type list or the second type list is deleted;
If the data source file is in the data source file and the first type table or the second type table is not in the data source file, the data source file is expressed as a new table, and the table is newly added.
CN202410222131.8A 2024-02-28 2024-02-28 Data blood margin generation method and system based on digitalization Pending CN117931809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410222131.8A CN117931809A (en) 2024-02-28 2024-02-28 Data blood margin generation method and system based on digitalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410222131.8A CN117931809A (en) 2024-02-28 2024-02-28 Data blood margin generation method and system based on digitalization

Publications (1)

Publication Number Publication Date
CN117931809A true CN117931809A (en) 2024-04-26

Family

ID=90750910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410222131.8A Pending CN117931809A (en) 2024-02-28 2024-02-28 Data blood margin generation method and system based on digitalization

Country Status (1)

Country Link
CN (1) CN117931809A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221699A (en) * 2018-11-27 2020-06-02 北京神州泰岳软件股份有限公司 Resource association relationship discovery method and device and electronic equipment
WO2022156087A1 (en) * 2021-01-22 2022-07-28 平安科技(深圳)有限公司 Data blood relationship establishing method and apparatus, computer device, and storage medium
CN116881512A (en) * 2023-06-30 2023-10-13 青岛银行股份有限公司 Cross-system metadata blood-edge automatic analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221699A (en) * 2018-11-27 2020-06-02 北京神州泰岳软件股份有限公司 Resource association relationship discovery method and device and electronic equipment
WO2022156087A1 (en) * 2021-01-22 2022-07-28 平安科技(深圳)有限公司 Data blood relationship establishing method and apparatus, computer device, and storage medium
CN116881512A (en) * 2023-06-30 2023-10-13 青岛银行股份有限公司 Cross-system metadata blood-edge automatic analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FENFEN GUAN: "The Research of Data Blood Relationship Analysis on Metadata", COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE, 8 February 2019 (2019-02-08), pages 344 *
王东龙: "基于数据中台的应用数据全链路提速研究", 信息技术与信息化, no. 5, 25 May 2023 (2023-05-25), pages 121 - 124 *
王训霞等: "面向自然资源管理的血缘分析技术研究", 测绘与空间地理信息, vol. 46, no. 7, 25 July 2023 (2023-07-25), pages 68 - 71 *

Similar Documents

Publication Publication Date Title
CN105989150B (en) A kind of data query method and device based on big data environment
CN110647579A (en) Data synchronization method and device, computer equipment and readable medium
US8200702B2 (en) Independently variably scoped content rule application in a content management system
CN109947791B (en) Database statement optimization method, device, equipment and storage medium
US11861320B1 (en) Text reduction and analysis interface to a text generation modeling system
CN112269816B (en) Government affair appointment correlation retrieval method
US20210202111A1 (en) Method of classifying medical records
CN115840589A (en) Publishing method supporting heterogeneous distributed database
CN116303641B (en) Laboratory report management method supporting multi-data source visual configuration
CN117931809A (en) Data blood margin generation method and system based on digitalization
US11210349B1 (en) Multi-database document search system architecture
US20230044287A1 (en) Semantics based data and metadata mapping
CN114895955A (en) Method, device and equipment for controlling metadata version of low-code platform
AU2022201117A1 (en) Frameworks and methodologies for enabling searching and/or categorisation of digitised information, including clinical report data
JP2000090093A (en) Method and system for full-text retrieval and record medium recording full-text retrieval program
Rajput et al. Semi-Automated Approach to Map Clinical Concepts to SNOMED CT Terms by Using Terminology Server
CN112487006A (en) Implementation method for dynamically editing data structure and generating database table
CN113064943A (en) Data acquisition method and device, electronic equipment and storage medium
US7873659B2 (en) Database management system, database management method and database management program
CN115774767B (en) Geographic information metadata processing method and device
US20240273309A1 (en) Text generation interface system
WO2024108638A1 (en) Adaptive query method based on sharding indexes, and apparatus
CN111221846B (en) Automatic translation method and device for SQL sentences
CN110678854A (en) Data query method and device
WO2013001571A1 (en) Unstructured data analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination