CN113392085A - Distributed file batch processing method and platform - Google Patents

Distributed file batch processing method and platform Download PDF

Info

Publication number
CN113392085A
CN113392085A CN202110653768.9A CN202110653768A CN113392085A CN 113392085 A CN113392085 A CN 113392085A CN 202110653768 A CN202110653768 A CN 202110653768A CN 113392085 A CN113392085 A CN 113392085A
Authority
CN
China
Prior art keywords
file
batch
files
check
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110653768.9A
Other languages
Chinese (zh)
Inventor
丁文定
徐平
伊布拉音江·玉素甫
王金余
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110653768.9A priority Critical patent/CN113392085A/en
Publication of CN113392085A publication Critical patent/CN113392085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification relates to the technical field of big data, and particularly discloses a distributed file batch processing method and a distributed file batch processing platform, wherein the method comprises the following steps: after receiving a batch file transmission request, sending a batch file generation instruction to a batch file generation module arranged in a system node specified by the batch file transmission request so as to enable the batch file generation module to generate batch files, and feeding back the generated batch files and file information of the batch files to the data exchange platform; the file information at least comprises a file identifier and a file size; after receiving a message that the batch file generation is successful, which is fed back by each appointed system node, generating a check file of the batch file transmission request based on file information corresponding to each appointed system node; and transmitting the batch files to a downstream system based on the check files, so that the accuracy and the integrity of the batch file transmission can be improved while the consumption of system resources is reduced.

Description

Distributed file batch processing method and platform
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a distributed file batch processing method and a distributed file batch processing platform.
Background
With the high-speed development of modern information technology, large-scale enterprise systems gradually move to application systems with fine and definite division of labor, and therefore a data exchange platform meeting the requirement of batch file exchange among multiple application systems of an enterprise is generated. Meanwhile, with the rapid increase of the data scale of enterprises, the traditional financial industry large-scale host database is transformed to a distributed database. In the process, the complete batch files which can be generated by a single database of the original large-scale host computer are formed by batch files corresponding to a plurality of databases after the distributed databases are separately stored. And because the distributed database is in the dynamic quantity adjustment demand, the quantity and the name of the batch files generated by each database can not be fixed, so that the downstream system using the files at the periphery can not be well adapted to the change of the large-scale host system.
At present, there are two common solutions, one of which is that a transformed distributed system generates a batch file corresponding to a plurality of data, and then performs data merging to avoid an influence on a peripheral downstream system. However, this solution requires a data merging device dedicated to the distributed system design, because the data size is large, the resource consumption is increased, and especially when such a system is a core business transaction system, the stability of the external service is affected. In addition, in the process that an enterprise has a large number of systems to transform from a centralized type to a distributed type, the above scheme obviously increases the overall implementation cost additionally and affects the process of transforming the enterprise to the distributed type. The other scheme is that files generated by a distributed system are merged by a data exchange platform, but on one hand, the scheme is easy to cause the cost bottleneck of the data exchange platform in the scene of massive file exchange, increases the time consumption of data exchange, and also has great influence on the processing timeliness of the data exchange platform. Therefore, a more efficient and accurate distributed batch file transmission method is needed.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a distributed file batch processing method and a distributed file batch processing platform, which can systematically and inexpensively implement data exchange from an enterprise mainframe to a distributed system transformation process, and can improve accuracy and integrity of file batch transmission.
The specification provides a distributed file batch processing method and a distributed file batch processing platform, which are realized in the following modes:
a distributed file batch processing method is applied to a data exchange platform, and comprises the following steps: after receiving a batch file transmission request, sending a batch file generation instruction to a batch file generation module arranged in a system node specified by the batch file transmission request so that the batch file generation module generates a batch file corresponding to the corresponding system node, and feeding back the generated batch file and file information of the batch file to the data exchange platform; the file information at least comprises a file identifier and a file size; after receiving a message that the batch file generation is successful, which is fed back by each appointed system node, generating a check file of the batch file transmission request based on file information corresponding to each appointed system node; and transmitting the batch files to a downstream system based on the check files.
In other embodiments, the file identifier at least includes a file name, a system node identifier, and a file generation time.
In other embodiments, the verification file and the batch file are of different file types.
In other embodiments, the check file is further configured with a first association relationship between the file identifier and a designated downstream system to which the batch files corresponding to the corresponding file identifiers are to be transmitted; the transmitting the batch of files to a downstream system based on the check file includes: for any appointed downstream system, extracting all file identifications corresponding to the appointed downstream system from the check file based on the first incidence relation to obtain a file identification set; and transmitting the batch files corresponding to the file identifications in the file identification set to the specified downstream system.
In other embodiments, the transmitting the batch file corresponding to each file identifier in the file identifier set to the specified downstream system includes: extracting the total file size of the batch files corresponding to each file identifier in the file identifier set from the check file; and checking whether all the batch files corresponding to the file identifications in the file identification set are transmitted to the appointed downstream system or not based on the file identifications in the file identification set and the total file size.
In other embodiments, the check file further includes a second association relationship between the file identifier and a database table of a specified downstream system to which the batch file corresponding to the corresponding file identifier is to be loaded; the transmitting the batch of files to a downstream system based on the check file includes: and loading the batch files corresponding to the file identifications in the file identification set into corresponding database tables based on the second incidence relation under the condition that the batch files are determined to be completely transmitted to the specified downstream system.
In other embodiments, the loading the batch file corresponding to each file identifier in the file identifier set into a corresponding database table includes: sequentially taking each file identifier in the file identifier set as an appointed file identifier, and searching database table configuration information to which a batch of files corresponding to the appointed file identifier are to be loaded from a check file; and sending out an undefined abnormal prompt of the table information of the specified file identification under the condition that the configuration information of the database table cannot be found.
In other embodiments, the method further comprises: processing the batch files based on batch file processing logic; and extracting the file size of the processed batch files, updating the file size of the corresponding batch files in the check files by using the extracted file size to obtain updated check files, and transmitting the processed batch files to a downstream system based on the updated check files.
On the other hand, the embodiment of the present specification further provides a data exchange platform, where the platform at least includes a distributed batch apparatus and a file loading apparatus; the distributed batch device at least comprises a batch scheduling module, a batch file generating module and a check file generating module, wherein the batch file generating module is arranged in each system node of the distributed system; the batch scheduling module is used for sending a batch file generation instruction to a batch file generation module in a system node specified by a batch file transmission request after receiving the batch file transmission request; the batch file generation module is used for generating batch files based on the batch file generation instruction and feeding back the generated batch files and file information of the batch files to the batch scheduling module; the file information at least comprises a file identifier and a file size; the batch scheduling module is used for sending a verification file generation instruction to the verification file generation module after receiving a message that the batch file generation fed back by each specified system node is successful; the verification file generation instruction comprises file information of batch files corresponding to the designated system nodes; the verification file generation module is used for receiving the verification file generation instruction and generating the verification file of the batch file transmission request based on the file information in the verification file generation instruction; feeding the check file back to the batch scheduling module; the batch scheduling module is used for sending the check files and the batch files to the file loading device; and the file loading device is used for transmitting the batch files to a downstream system based on the check files.
In other embodiments, the platform further comprises a file exchange device; the batch scheduling module is used for sending the check files and the batch files to the file exchange device; the file exchange device is used for processing the batch files based on batch file processing logic; extracting the file size of the processed batch files, and updating the file size of the corresponding batch files in the verification files by using the extracted file size to obtain updated verification files; sending the processed batch files and the updated check files to the file loading device; and the file loading device is used for transmitting the processed batch files to a downstream system based on the updated check files.
In another aspect, this specification further provides a data exchange platform, where the platform includes at least one processor and a memory for storing processor-executable instructions, and the instructions, when executed by the processor, implement the steps of the method according to any one or more of the foregoing embodiments.
According to the distributed file batch processing method and the distributed file batch processing platform provided by one or more embodiments of the description, for the problem that the quantity and the name of a batch of files are not fixed in the process of transferring an enterprise mainframe to a distributed type, by redesigning a system architecture, after the system is deployed, the transmission of the batch of files can be completed without merging the files by both the distributed system and the data exchange platform, and the excessive consumption of resources caused by file merging in the batch file transmission process is avoided. And the downstream system can be compatible with the problem that the number and the name of the files of the upstream system cannot be fixed, so that the accuracy and the integrity of loading the batch files to the downstream system are further ensured.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:
FIG. 1 is a block diagram of a data exchange platform provided in the present specification;
FIG. 2 is a block diagram illustrating a module structure and a file transfer flow of a distributed batch apparatus provided herein;
fig. 3 is a schematic diagram illustrating a module structure and a file transmission flow of a file exchange device provided in the present specification;
fig. 4 is a schematic diagram illustrating a module structure and a file transmission flow of a file loading apparatus provided in the present specification;
fig. 5 is a schematic flow chart of an implementation of the distributed file batch processing method provided in this specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present specification without making any creative effort shall fall within the protection scope of the embodiments of the present specification.
In one example scenario provided in this specification, the distributed file batch processing method may be applied to a data exchange platform. As shown in fig. 1, the data exchange platform may include a distributed batch apparatus 1, a file exchange apparatus 2, and a file loading apparatus 3.
The distributed batch device 1 is connected with the distributed system and used for processing data of each system node of the distributed system to generate batch files corresponding to each system node; and recording the file information of the generated batch files into a check file. The distributed batching device 1 may also be connected to the file exchanging device 2 for transmitting the generated batch files and the verification files to the file exchanging device 2.
The file exchange means 2 are also connected to file loading means 3. The file exchange device 2 may perform processing such as filtering and cleaning on the batch files, and transmit the processed batch files and the verification files to the file loading device 3.
The file loading means 3 are also connected to the database of the peripheral downstream system. The file loading device 3 can check the batch files according to the file information in the check files, and load a plurality of batch files into the database of the peripheral downstream system after checking without errors.
Fig. 2 is a schematic diagram of the system architecture and the file processing flow of the distributed batch apparatus 1. As shown in FIG. 2, the distributed batch apparatus 1 may include at least a batch scheduling module, a batch file generation module, and a check file generation module. The batch file generation module is arranged in each system node to process data in a database under the corresponding system node and generate a batch file corresponding to the corresponding system node. The configuration center of the platform may pre-store database configuration information under each system node of each distributed system. The database configuration information may include at least a system node to which the database belongs.
Accordingly, the distributed batch apparatus 1 can perform the following steps 201 and 206 for batch processing of files.
Step 201: after receiving the batch file transmission request, the batch scheduling module may send a request for reading the database configuration information of each system node to the configuration center. The batch file transmission request may include, for example, a system node to which the data to be transmitted in the batch file transmission belongs, a database under the system node, table information in the database, a manner of executing processing on the data in the database, and the like. Or, the configuration information of the downstream system to which each generated batch file is to be transmitted, the database table configuration information of the downstream system to which the data in the batch file is to be loaded, and the like may also be included. The generation of the batch file transmission request and the content included therein may be configured as required, and is not limited herein. The configuration center may be preconfigured with configuration information of system nodes under the distributed system, configuration information of downstream systems, and the like.
Step 202: after receiving the request of the batch scheduling module, the configuration center can return the stored database configuration information to the batch scheduling module.
Step 203: and the batch scheduling module sends a batch file generation instruction to a batch file generation module under the system node specified by the batch file transmission request according to the database configuration information of the distributed system.
Step 204: and after receiving the batch file generation instruction, the batch file generation module can process the data in the database under the system node according to the batch file transmission request to generate a batch file. And returning the file information of the generated batch files to the batch scheduling module. The file information may include at least a file identification and a file size. The file identifier may at least include a file name, a system node identifier, a file generation time, and the like. If the file identifier can be: [ file english name ] - [ distributed database number ] - [ date ]. BIN. By adding the distributed database number as a part of the file identification, the file identification generated by the database under different system nodes can be ensured not to be repeated, and meanwhile, the tracing is also convenient.
Step 205: and after receiving an instruction that the batch file generation is successful returned by each specified system node, the batch scheduling module sends a starting instruction for generating the check file to the check file generation module. The check file generation instruction at least comprises file information received from each system node.
Step 206: and after receiving the verification file generation instruction, the verification file generation module generates a verification file and returns a result to the batch scheduling module. The file identification rule for checking the file may be "[ file english name ] - [ date ]. CHK". The file identifications of the bulk files are kept similar, but the suffixes are different to distinguish between the check files and the bulk files. The content and the specific format preferred mode contained in the verification file are as follows:
(1) the first record of the check file may include general information of the bulk file that the bulk file transfer request requires to be transferred. Table 1 is an example table of field information included in the first record.
TABLE 1
Name of field Field description
APP_ID Application identification of source application
APP_NAME Source application name
BIZ_DATE Date of file transmission
FILE_COUNT Number of files in batch
BAK_FIELD Spare field
(2) The second to last record is file information. Table 2 is an example table of field information for the second to last records.
TABLE 2
Figure BDA0003111831550000061
Figure BDA0003111831550000071
FIG. 3 is a schematic diagram of the system architecture of the file exchange device 2 and the processing flow of batch files. The file exchange device distributes the batch files and the check files received from the distributed batch device 1 to a file loading device associated with a downstream system according to preset configuration. The file exchange device 2 may include a file receiving module, a file processing module, and a file distributing module. The configuration center of the platform can also be configured with processing modes (such as field format conversion, decoding mode and filtering configuration) of the batch files and target downstream systems (such as M files for system A and system B; N files for system A and system C) to which the batch files are distributed. Or, the configuration information of the target downstream system to which the batch file should be distributed and the database table to be loaded may be recorded in the check file. Correspondingly, a first association relation between the file identifier and a designated downstream system to which the batch files corresponding to the corresponding file identifier are to be transmitted can be configured in the check file; and a second incidence relation between the file identification and a database table of a specified downstream system to which the batch files corresponding to the corresponding file identification are to be loaded.
Accordingly, the file exchange apparatus 2 can perform the following steps 301 to 303 for the batch processing of the files.
Step 301: the file receiving module can receive the batch files and the check files sent by the distributed batch device 1 and provide the batch files and the check files for the file processing module to process.
Step 302: the file processing module can call a processing mode of the batch files from the configuration center so as to filter, clean and other processing of the batch files, and update the file sizes of the processed batch files into the verification files.
Step 303: the file distribution module may distribute the batch files processed by the file processing module and the updated verification files to the file loading device 3 associated with the corresponding specified downstream system based on the first association relationship.
Fig. 4 is a schematic diagram of the system architecture of the file loading apparatus 3 and a processing flow of batch files. The file loading device is matched with a peripheral downstream system to receive the batch files and the check files provided by the file exchange device 2, and loads all the batch files to a corresponding database table of the downstream system according to file information in the check files, so that the downstream system can simply and effectively adapt to the problems that the number and the name of the files of the distributed system are not fixed and the like. Each file loading apparatus 3 may load the batch file into a database table of a downstream system using steps 401 to 404.
Step 401: and judging whether the batch files corresponding to the check files are aligned or not. Specifically, whether the batch files corresponding to the downstream system associated with the file loading device have arrived at the downstream system may be determined one by one according to the file information in the check file and through the file identifier "[ file english name ] - [ distributed database number ] - [ date ]". When it is determined that the batch files have been reached and the total file size of the actual batch files is consistent with the records in the check file (the integrity of the batch files is verified), step 402 is executed; otherwise, after continuing waiting for a period of time, the batch file transmission and step 401 is executed again.
Step 402: and searching database table configuration information of a downstream system to which each file needs to be loaded from the check file based on the second incidence relation, wherein the database table configuration information can comprise a database to be loaded and table information. If the corresponding database and table information is found successfully, go to step 404; otherwise, step 403 is performed.
Step 403: if the corresponding database table configuration information is not found successfully in step 402, an abnormal prompt of "check file undefined" is thrown.
Step 404: and loading the corresponding file into the corresponding database table according to the database table configuration information searched in the step 402 and the file identifier recorded in the check file.
And repeating the steps until all the batch files corresponding to the downstream system associated with the file loading device are loaded. By adopting the system architecture to transmit the batch files, the transmission of the batch files can be realized without merging the batch files of each system node, and the problem of overlarge resource consumption caused by merging the batch files is avoided. Even if the number and the name of the files of the distributed system are not fixed, the batch files can be accurately and completely loaded into a database table of a downstream system, and the accuracy and the integrity of batch file loading are improved. And if the problem that the abnormal file cannot be loaded occurs, the source of the file with the problem can be quickly traced back, and the efficiency of correcting the abnormal file loading is improved.
Based on the above scenario example, an embodiment of the present specification provides a distributed batch file transmission method, as shown in fig. 5. FIG. 5 is a flow diagram of one embodiment of a distributed file batch processing method provided herein. The method may be applied to a data exchange platform and may include the following steps.
S52: after receiving a batch file transmission request, sending a batch file generation instruction to a batch file generation module arranged in a system node specified by the batch file transmission request so that the batch file generation module generates a batch file corresponding to the corresponding system node, and feeding back the generated batch file and file information of the batch file to the data exchange platform; the file information at least comprises a file identifier and a file size;
s54: after receiving a message that the batch file generation is successful, which is fed back by each appointed system node, generating a check file of the batch file transmission request based on file information corresponding to each appointed system node;
s56: and transmitting the batch files to a downstream system based on the check files.
By adopting the method to transmit the batch files, the transmission of the batch files can be realized without merging the batch files of each system node, and the problem of overlarge resource consumption caused by merging the batch files is avoided. Even if the number and the name of the files of the distributed system are not fixed, the batch files can be accurately and completely loaded into a database table of a downstream system, and the accuracy and the integrity of batch file loading are improved.
In other embodiments, the file identifier may include at least a file name, a system node identifier, and a file generation time. The file name may be generated according to the content contained in the batch file, or may be generated by using a random code. The system node identification can adopt database coding and the like under the system node. By adopting the method for configuring the file identification, the file identification generated by different system nodes at different time points can be avoided being different, the files can be accurately processed in the batch file transmission process, and the accuracy of batch file transmission and loading is improved.
In other embodiments, the verification file and the batch file are of different file types. By adopting different file types, verification files and batch files can be distinguished conveniently. For example, different file identification suffixes may be used to distinguish the check file from the file types of the bulk file. For example, the batch file adopts "BIN" as the suffix of the file identification, and the check file adopts "CHK" as the suffix of the file identification.
In other embodiments, the check file is further configured with a first association relationship between the file identifier and a designated downstream system to which the batch file corresponding to the corresponding file identifier is to be transmitted. The transmitting the bulk file to a downstream system based on the verification file may include: for any appointed downstream system, extracting all file identifications corresponding to the appointed downstream system from the check file based on the first incidence relation to obtain a file identification set; and transmitting the batch files corresponding to the file identifications in the file identification set to the specified downstream system.
In other embodiments, the transmitting the batch file corresponding to each file identifier in the file identifier set to the specified downstream system may include: extracting the total file size of the batch files corresponding to each file identifier in the file identifier set from the check file; and checking whether all the batch files corresponding to the file identifications in the file identification set are transmitted to the appointed downstream system or not based on the file identifications in the file identification set and the total file size. And integrating all file identifications and total file sizes of the batch files corresponding to the downstream systems, and checking whether all the batch files corresponding to the downstream systems are aligned, so that the transmission integrity of the batch files can be more accurately determined. And in the case that all the files are in order, the file loading step is executed again, so that the accuracy of file loading can be further improved.
In other embodiments, the check file further includes a second association relationship between the file identifier and a database table of a specified downstream system to which the batch file corresponding to the corresponding file identifier is to be loaded. The transmitting the bulk file to a downstream system based on the verification file may include: and loading the batch files corresponding to the file identifications in the file identification set into corresponding database tables based on the second incidence relation under the condition that the batch files are determined to be completely transmitted to the specified downstream system. By configuring the second association relation in the check file, the efficiency and the accuracy of loading the files of each downstream system can be greatly improved.
In other embodiments, the loading the batch file corresponding to each file identifier in the file identifier set into a corresponding database table includes: sequentially taking each file identifier in the file identifier set as an appointed file identifier, and searching database table configuration information to which a batch of files corresponding to the appointed file identifier are to be loaded from a check file; and sending out an undefined abnormal prompt of the table information of the specified file identification under the condition that the configuration information of the database table cannot be found. By executing file transmission and loading in the mode, the abnormal reminding can be thrown out based on the file identification under the condition that the file loading is abnormal, the efficiency of tracing the file loading abnormity is greatly improved, and the abnormity can be corrected in time.
In other embodiments, the method further comprises: the file exchange device processes the batch files based on batch file processing logic; and extracting the file size of the processed batch files, updating the file size of the corresponding batch file in the check files by using the extracted file size to obtain updated check files, and transmitting the processed batch files and the updated check files to a file loading device associated with a corresponding downstream system. The batch file processing logic may be preconfigured. Preferably, the batch file processing logic may be configured based on file identifications respectively. Correspondingly, the processing logic identifier corresponding to each processing logic can be recorded in the verification file. In the transmission process of the batch files, the processing logic identifiers corresponding to the file identifiers can be read from the check files so as to call the processing logic corresponding to the processing logic identifiers, and the batch files corresponding to the corresponding file identifiers are processed, so that the processing efficiency is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. For details, reference may be made to the description of the related embodiments of the related processing, and details are not repeated herein.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the method and system architecture provided by the above embodiments, an embodiment of the present specification further provides a data exchange platform, where the platform may at least include a distributed batch device and a file loading device; the distributed batch device at least comprises a batch scheduling module, a batch file generating module and a check file generating module, wherein the batch file generating module is arranged in each system node of the distributed system.
The batch scheduling module may be configured to send a batch file generation instruction to a batch file generation module in a system node specified by the batch file transmission request after receiving the batch file transmission request.
The batch file generation module can be used for generating batch files based on the batch file generation instruction and feeding back the generated batch files and file information of the batch files to the batch scheduling module; the file information at least comprises a file identifier and a file size.
The batch scheduling module may be configured to send a verification file generation instruction to the verification file generation module after receiving a message that batch files fed back by each of the designated system nodes are successfully generated; the verification file generation instruction comprises file information of the batch files corresponding to the designated system nodes.
The check file generation module may be configured to receive the check file generation instruction, and generate the check file of the batch file transmission request based on file information in the check file generation instruction; and feeding back the check file to the batch scheduling module.
The batch scheduling module may be configured to send the check file and the batch file to the file loading device.
The file loading device may be configured to transmit the batch file to a downstream system based on the check file.
In other embodiments, the platform may further comprise a file exchange device. Correspondingly, the batch scheduling module may be configured to send the verification file and the batch file to the file exchange device. The file exchange device can be used for processing the batch files based on batch file processing logic; extracting the file size of the processed batch files, and updating the file size of the corresponding batch files in the verification files by using the extracted file size to obtain updated verification files; and sending the processed batch files and the updated verification files to the file loading device. The file loading device may be configured to transmit the processed batch files to the downstream system based on the updated verification file.
It should be noted that the above-mentioned platform may also include other embodiments according to the description of the above-mentioned embodiments. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.
The present specification also provides a data exchange platform that may include at least one processor and a memory for storing processor-executable instructions that, when executed by the processor, perform steps comprising the method of any one or more of the embodiments described above. The memory may include physical means for storing information, typically by digitizing the information for storage on a medium using electrical, magnetic or optical means. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.
It should be noted that the embodiments of the present disclosure are not limited to the cases where the data model/template is necessarily compliant with the standard data model/template or the description of the embodiments of the present disclosure. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using these modified or transformed data acquisition, storage, judgment, processing, etc. may still fall within the scope of the alternative embodiments of the present description.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (11)

1. A distributed file batch processing method is applied to a data exchange platform, and comprises the following steps:
after receiving a batch file transmission request, sending a batch file generation instruction to a batch file generation module arranged in a system node specified by the batch file transmission request so that the batch file generation module generates a batch file corresponding to the corresponding system node, and feeding back the generated batch file and file information of the batch file to the data exchange platform; the file information at least comprises a file identifier and a file size;
after receiving a message that the batch file generation is successful, which is fed back by each appointed system node, generating a check file of the batch file transmission request based on file information corresponding to each appointed system node;
and transmitting the batch files to a downstream system based on the check files.
2. The method of claim 1, wherein the file identifier comprises at least a file name, a system node identifier, and a file generation time.
3. The method of claim 1, wherein the verification file is of a different file type than the bulk file.
4. The method according to claim 1, wherein the check file is further configured with a first association relationship between file identifiers and designated downstream systems to which the batch files corresponding to the corresponding file identifiers are to be transmitted;
the transmitting the batch of files to a downstream system based on the check file includes: for any appointed downstream system, extracting all file identifications corresponding to the appointed downstream system from the check file based on the first incidence relation to obtain a file identification set; and transmitting the batch files corresponding to the file identifications in the file identification set to the specified downstream system.
5. The method according to claim 4, wherein the transmitting the batch files corresponding to the file identifiers in the file identifier set to the specified downstream system comprises:
extracting the total file size of the batch files corresponding to each file identifier in the file identifier set from the check file;
and checking whether all the batch files corresponding to the file identifications in the file identification set are transmitted to the appointed downstream system or not based on the file identifications in the file identification set and the total file size.
6. The method according to claim 5, wherein the check file further comprises a second association relationship between the file identifier and a database table of a designated downstream system to which the batch file corresponding to the corresponding file identifier is to be loaded;
the transmitting the batch of files to a downstream system based on the check file includes: and loading the batch files corresponding to the file identifications in the file identification set into corresponding database tables based on the second incidence relation under the condition that the batch files are determined to be completely transmitted to the specified downstream system.
7. The method of claim 6, wherein loading the batch files corresponding to the file identifiers in the file identifier set into the corresponding database table comprises:
sequentially taking each file identifier in the file identifier set as an appointed file identifier, and searching database table configuration information to which a batch of files corresponding to the appointed file identifier are to be loaded from a check file;
and sending out an undefined abnormal prompt of the table information of the specified file identification under the condition that the configuration information of the database table cannot be found.
8. The method of claim 1, further comprising:
processing the batch files based on batch file processing logic;
and extracting the file size of the processed batch files, updating the file size of the corresponding batch files in the check files by using the extracted file size to obtain updated check files, and transmitting the processed batch files to a downstream system based on the updated check files.
9. A data exchange platform is characterized in that the platform at least comprises a distributed batch device and a file loading device; the distributed batch device at least comprises a batch scheduling module, a batch file generating module and a check file generating module, wherein the batch file generating module is arranged in each system node of the distributed system;
the batch scheduling module is used for sending a batch file generation instruction to a batch file generation module in a system node specified by a batch file transmission request after receiving the batch file transmission request;
the batch file generation module is used for generating batch files based on the batch file generation instruction and feeding back the generated batch files and file information of the batch files to the batch scheduling module; the file information at least comprises a file identifier and a file size;
the batch scheduling module is used for sending a verification file generation instruction to the verification file generation module after receiving a message that the batch file generation fed back by each specified system node is successful; the verification file generation instruction comprises file information of batch files corresponding to the designated system nodes;
the verification file generation module is used for receiving the verification file generation instruction and generating the verification file of the batch file transmission request based on the file information in the verification file generation instruction; feeding the check file back to the batch scheduling module;
the batch scheduling module is used for sending the check files and the batch files to the file loading device;
and the file loading device is used for transmitting the batch files to a downstream system based on the check files.
10. The platform of claim 9, wherein the platform further comprises a file exchange device;
the batch scheduling module is used for sending the check files and the batch files to the file exchange device;
the file exchange device is used for processing the batch files based on batch file processing logic; extracting the file size of the processed batch files, and updating the file size of the corresponding batch files in the verification files by using the extracted file size to obtain updated verification files; sending the processed batch files and the updated check files to the file loading device;
and the file loading device is used for transmitting the processed batch files to a downstream system based on the updated check files.
11. A data exchange platform comprising at least one processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 8.
CN202110653768.9A 2021-06-11 2021-06-11 Distributed file batch processing method and platform Pending CN113392085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110653768.9A CN113392085A (en) 2021-06-11 2021-06-11 Distributed file batch processing method and platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110653768.9A CN113392085A (en) 2021-06-11 2021-06-11 Distributed file batch processing method and platform

Publications (1)

Publication Number Publication Date
CN113392085A true CN113392085A (en) 2021-09-14

Family

ID=77620583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110653768.9A Pending CN113392085A (en) 2021-06-11 2021-06-11 Distributed file batch processing method and platform

Country Status (1)

Country Link
CN (1) CN113392085A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168549A (en) * 2021-12-10 2022-03-11 中国建设银行股份有限公司 File processing method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168549A (en) * 2021-12-10 2022-03-11 中国建设银行股份有限公司 File processing method and device

Similar Documents

Publication Publication Date Title
EP2474919B1 (en) System and method for data replication between heterogeneous databases
CN103164523A (en) Inspection method, device and system of data consistency inspection
CN104090901A (en) Method, device and server for processing data
CN110688828A (en) File processing method and device, file processing system and computer equipment
CN110287251B (en) MongoDB-HBase distributed high fault-tolerant data real-time synchronization method
CN106612330A (en) System and method supporting distributed multi-file importing
US20230030856A1 (en) Distributed table storage processing method, device and system
CN107391611A (en) A kind of process model generation method of the General ETL Tool based on workflow
CN113392085A (en) Distributed file batch processing method and platform
CN110019169B (en) Data processing method and device
CN112258266B (en) Distributed order processing method, device, equipment and storage medium
CN111581227A (en) Event pushing method and device, computer equipment and storage medium
CN116089527A (en) Data verification method, storage medium and device
CN115170152A (en) Data distribution method, device, equipment and storage medium
CN112596806A (en) Data lake data loading script generation method and system
EP4081911A1 (en) Edge table representation of processes
CN111651259A (en) Dependency relationship-based system management method and device and storage medium
CN112069772A (en) Data processing method and device based on FPGA, electronic equipment and storage medium
CN111324783B (en) Data processing method and device
CN117076546B (en) Data processing method, terminal device and computer readable storage medium
US11194665B2 (en) Systems and methods for seamless redelivery of missing data
CN115658383A (en) Backup data processing method and device and computer equipment
CN113360511A (en) Method, device and equipment for processing credit investigation information
CN118152036A (en) Scheduling template processing method, device, computer equipment and storage medium
CN113553329A (en) Data integration system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination