WO2020238860A1 - Distributed file batch processing method and apparatus, and readable storage medium - Google Patents

Distributed file batch processing method and apparatus, and readable storage medium Download PDF

Info

Publication number
WO2020238860A1
WO2020238860A1 PCT/CN2020/092139 CN2020092139W WO2020238860A1 WO 2020238860 A1 WO2020238860 A1 WO 2020238860A1 CN 2020092139 W CN2020092139 W CN 2020092139W WO 2020238860 A1 WO2020238860 A1 WO 2020238860A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
download
exported
terminal
database
Prior art date
Application number
PCT/CN2020/092139
Other languages
French (fr)
Chinese (zh)
Inventor
魏艳梅
侯向辉
李斌
江旻
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2020238860A1 publication Critical patent/WO2020238860A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of financial technology (Fintech), in particular to distributed file batch processing methods, devices, and readable storage media.
  • a shared NAS disk (Network Attached Storage, network storage) is used to store data to realize data exchange in a distributed architecture. Specifically, a local machine downloads a file from cloud storage to a local disk and writes it by the local machine. Enter the NAS disk, and then the processing system reads the files from the NAS disk and processes the files. Finally, the processing system writes the processed files to the NAS disk again and synchronizes to the cloud storage for data exchange.
  • the main purpose of this application is to propose a distributed file batch processing method, device, and readable storage medium, aiming to improve the efficiency of processing distributed files.
  • the distributed file batch processing method includes the following steps:
  • a file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
  • the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to all through the download terminal.
  • the steps of the local disk corresponding to the download terminal include:
  • the file download notification is detected, and the download terminal corresponding to the file download notification is determined;
  • the file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
  • the file content of the file is read in the local disk through the download terminal, and the file is imported into the corresponding file of the download terminal based on the file content through the download terminal
  • the steps in the database include:
  • the file is imported into the database corresponding to the download terminal through the download terminal.
  • the step of importing the file into the database corresponding to the download terminal through the download terminal based on the import type includes:
  • the step of importing the file into the database corresponding to the download terminal through the download terminal based on the import type includes:
  • the step of exporting the file to be exported to the local disk through the processing terminal and sending the file to be exported to the cloud storage includes:
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
  • the second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into
  • the steps of exporting a second preset number of second split files to the first directory include:
  • the present application also provides a distributed file batch processing device.
  • the distributed file batch processing device includes a memory, a processor, and a device that is stored on the memory and can run on the processor.
  • a distributed file batch processing program when the distributed file batch processing program is executed by the processor, the following steps are implemented:
  • a file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
  • the file download notification is detected, and the download terminal corresponding to the file download notification is determined;
  • the file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
  • the file is imported into the database corresponding to the download terminal through the download terminal.
  • the distributed file batch program is executed by the processor to implement the following steps:
  • the distributed file batch program is executed by the processor to implement the following steps:
  • the first split file is imported into the database corresponding to the download terminal in parallel through the download terminal.
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
  • the second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  • this application also provides a distributed file batch processing device, the distributed file batch processing device including: a memory, a processor, and stored on the memory and capable of running on the processor Distributed file batch processing program which implements the steps of the distributed file batch processing method as described above when the distributed file batch processing program is executed by the processor.
  • the present application also provides a computer-readable storage medium on which a distributed file batch processing program is stored, which is implemented when the distributed file batch processing program is executed by a processor. The steps of the distributed file batch processing method as described above.
  • the distributed file batch processing method proposed in this application detects the file download notification, determines the download terminal corresponding to the file download notification, and sends the file corresponding to the file download notification through the download terminal from the corresponding cloud storage Download to the local disk corresponding to the download terminal; read the file content of the file in the local disk through the download terminal, and import the file into the file through the download terminal based on the file content In the database corresponding to the download terminal; process the file in the database through the processing terminal corresponding to the database to obtain the file to be exported; export the file to be exported to the local disk through the processing terminal, and Send the file to be exported to the cloud storage.
  • the NAS disk is replaced by a database to realize distributed data exchange.
  • FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in a solution of an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a distributed file batch processing method of this application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the terminal in the embodiment of the present application may be a PC or a server device.
  • the terminal may include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a distributed file batch processing program.
  • the operating system is a program that manages and controls distributed file batch processing equipment and software resources, and supports the operation of network communication modules, user interface modules, distributed file batch processing programs, and other programs or software; network communication modules are used for management and Control the network interface 1002; the user interface module is used to manage and control the user interface 1003.
  • the distributed file batch processing device calls the distributed file batch processing program stored in the memory 1005 through the processor 1001, and executes each of the following distributed file batch processing methods Operation in the embodiment.
  • Fig. 2 is a schematic flowchart of a first embodiment of a distributed file batch processing method according to this application, and the method includes:
  • Step S10 the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the corresponding download terminal through the download terminal.
  • Local disk
  • Step S20 read the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
  • Step S30 processing the file in the database through the processing terminal corresponding to the database to obtain the file to be exported;
  • Step S40 Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
  • a database is used instead of the NAS disk, the file is downloaded to the local disk through the download terminal, and imported into the database, and then the file is read from the database through the processing terminal and processed, and finally the processed file is exported to the local disk. And synchronize to the database to realize batch processing of distributed files.
  • Step S10 the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the corresponding download terminal through the download terminal.
  • Local disk
  • the distributed file batch processing method of this embodiment is applied to the distributed file batch processing equipment of financial institutions such as financial institutions or banking systems.
  • the distributed file batch processing equipment is hereinafter referred to as the processing equipment.
  • the processing equipment includes several downloads.
  • the file download notification is issued by the message bus. Any downloading terminal can receive the file download notification.
  • the downloading terminal that receives the file download notification is responsible for notifying the file download to the corresponding file library.
  • the download terminal that received the file download notification is determined, and the download terminal downloads the corresponding file from the cloud storage corresponding to the file download notification to the local corresponding to the download terminal. Intraday.
  • the IP information of the downloader needs to be recorded, and when the corresponding file is downloaded to the local disk, the recorded IP information is associated with the current file, that is, the current file is recorded Which downloader will download it from?
  • step S10 includes:
  • Step a the file download notification is detected, and the download terminal corresponding to the file download notification is determined;
  • any download terminal can receive a file download notification. Specifically, a polling process is started, and each download terminal monitors the file status of each file in the file list in turn. When the file download notification is detected, the downloader in the monitoring state is determined to be the downloader this time, and the IP information of the downloader is recorded.
  • Step b Download the file status as the file to be downloaded in the file list corresponding to the file download notification through the download terminal, and download from the corresponding cloud storage to the local disk corresponding to the download terminal;
  • the file status of each file is in the file list to indicate whether each file needs to be downloaded.
  • the download terminal scans the file status indicated in the file list corresponding to the file download notification, and sets the file status to be downloaded The file is downloaded from the corresponding cloud storage to the local disk corresponding to the downloader.
  • Step c Change the file status of the file to downloading or downloaded through the download terminal, and update the file list according to the current file status.
  • the file status of the current file is changed to downloading, and after the download is completed, the file status of the current file is changed to downloaded, and the file status of the current file is changed in real time Update to the file list to update the file list.
  • the download terminal only downloads the file whose file status is to be downloaded, and when the file status of the current file is downloading, even if the download is not completed within the polling interval, the next download terminal detects the file When downloading notifications, there is no need to download them.
  • the downloading terminal currently executing the downloading action not only downloads the file to be downloaded in the file list to the local disk, but also detects the file status Whether the download terminal corresponding to the file being downloaded is normal or not, the IP information associated with the file whose file status is being downloaded can be specifically determined, and the current download terminal sends a survival detection packet to the download terminal corresponding to the IP information to determine whether it is in the first preview. Set whether the corresponding response packet is received within the time; or within the second preset time, determine whether the download progress of the file whose file status is under downloading has changed, etc.
  • the downloading terminal is normal, no need to adjust the downloading file; if not, it is determined that the downloading terminal corresponding to the file in the file status is down or the downloading process is interrupted. At this time, the switching action will be executed, and the specific switching to the current downloading terminal will be executed. Download operation, and change the IP information associated with the corresponding file in the file list to the IP information of the current downloader.
  • step S20 the file content of the file is read in the local disk through the download terminal, and based on the file content, the file is imported into the database corresponding to the download terminal through the download terminal.
  • the file content of the current file is read in the local disk through the download terminal.
  • the file content includes the IP information associated with the current file. Based on the file content, it is determined whether the current file is If it is not downloaded by the current downloader, it will not be processed; if it is, the current file will be imported into the database corresponding to the downloader. That is, the file status of the current file is downloaded and downloaded by the current downloader.
  • the current downloader can import it into the database. For the file status to be downloaded or downloading, or files downloaded by other downloaders, the current The download side does not need to import it into the database.
  • the import timing is determined based on the file content. It is understandable that some files do not require urgent processing. Therefore, the downloaded files can be divided into urgent files and non-urgent files in advance. Files can be imported into the database in real time. For non-urgent files, you can process them in batches, that is, keep the currently downloaded non-urgent files, and when the non-urgent files reach the preset number, import all the non-urgent files into the database together, or Based on the preset interval time, import the non-urgent files within the interval time into the database.
  • Step S30 Process the file in the database through the processing terminal corresponding to the database to obtain the file to be exported.
  • the processing terminal corresponding to the database is used to read the file in the database and process the file.
  • the specific processing process is determined according to the business type of the file. If the current file is a repayment entry file and belongs to the accounting category, then The repayment account and the number of repayments in the repayment entry document are calculated to deduct the amount corresponding to the repayment account and enter the deducted amount; if the current document is an interest settlement document, it is obtained The interest rate of each account, the amount of savings in each account, and the storage time in the interest settlement file are calculated, and the interest of each account is calculated based on the interest rate, amount and storage time; if the current file is a flow file and belongs to a flow account, the flow data will be exported. Generate flow reports, etc.
  • the processing terminal is used to read the current file and determine the service type to which the current file belongs, and based on the service type, the current file is processed in a corresponding processing mode to obtain the file to be exported.
  • Step S40 Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
  • the file to be exported is exported to the local disk through the processing terminal to update the corresponding file in the local disk, and the file to be exported is sent to the cloud storage to update the corresponding file in the cloud storage. document.
  • the files to be exported can be kept, and after the third preset time interval, all the files to be exported within the third preset time are exported to the local disk together, and all the files to be exported are synchronously sent to the cloud storage .
  • the files to be exported can be retained, and when the number of files to be exported reaches a preset number, all the files to be exported are exported to a local disk together, and all the files to be exported are synchronously sent to cloud storage.
  • the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the download terminal through the download terminal.
  • the file content of the file is read in the local disk through the download terminal, and based on the file content, the file is imported into the database corresponding to the download terminal through the download terminal.
  • the NAS disk is replaced by a database.
  • the download terminal downloads the target file and imports it into the database.
  • the corresponding processing terminal processes the target file. After processing, the processed target file is exported to the local disk and synchronized to Cloud storage enables distributed data exchange.
  • step S20 includes:
  • Step S21 Read the sending and receiving configuration table of the file in the local disk through the download terminal, and determine whether the file needs to be imported based on the sending and receiving configuration table;
  • Step S22 if necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type;
  • Step S23 based on the import type, import the file into the database corresponding to the download terminal through the download terminal.
  • the file type and import type of the file need to be judged, so that the correct file can be smoothly imported into the database, and the intelligence of file import is improved.
  • Step S21 Read the sending and receiving configuration table of the file in the local disk through the download terminal, and determine whether the file needs to be imported based on the sending and receiving configuration table.
  • each file corresponds to a sending and receiving configuration table
  • the import and export conditions of the corresponding file are set in the sending and receiving configuration table in advance, including the need to import the database, the need to export the database, and the corresponding file type. Therefore, the sending and receiving configuration table of the current file can be read through the download terminal to determine whether the current file needs to be imported into the database.
  • Step S22 If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type.
  • the transceiver configuration table corresponding to the current file is further read to determine the file type of the current file, where the file type includes fixed-length files and non-fixed-length files, which is understandable If it is determined that the current file is a fixed-length file, it is necessary to further determine the fixed-length characters corresponding to the fixed-length file, so that when the current file is imported into the database, the fixed-length characters are imported according to the determined fixed-length characters.
  • the import type can be determined according to the file type. Specifically, if the file type of the current file is a fixed-length file, the corresponding import type is multi-threaded import; if the file type of the current file is For non-fixed-length files, the corresponding import type is single-threaded import.
  • Step S23 based on the import type, import the file into the database corresponding to the download terminal through the download terminal.
  • the current file is imported into the database corresponding to the download terminal through the download terminal.
  • step S23 specifically includes:
  • this step first determine whether there is a failure record corresponding to the current file in the database. It is understandable that failures such as downtime or process interruption may occur during the process of importing the file into the database, causing the file import to fail and remain in The data in the database is the failure record. In order to avoid the current file being a file that failed to be imported before, resulting in data duplication and occupying too much space in the database, it is necessary to determine whether the database currently has a failure record corresponding to the current file.
  • the failure record is cleaned up, and the file configuration of the current file is read through the downloader. It is understandable that when the file is imported into the database, similar files have the same Therefore, some files can be determined in advance whether to import the file header according to the actual situation. If the currently imported line number is the first line of the current file, and the file is configured to skip the file header, the file of the current file The header is not imported into the database, otherwise the file header is also imported into the database.
  • the step of reading the file configuration of the file through the download terminal is directly executed to determine whether the file header of the file needs to be skipped.
  • the file header is skipped through the download terminal, and the file except the file header is imported into the database corresponding to the download terminal.
  • step S23 specifically includes:
  • this step first determine whether there is a failure record corresponding to the current file in the database. It is understandable that failures such as downtime or process interruption may occur during the process of importing the file into the database, causing the file import to fail and remain in The data in the database is the failure record. In order to avoid the current file being a file that failed to be imported before, resulting in data duplication and occupying too much space in the database, it is necessary to determine whether the database currently has a failure record corresponding to the current file.
  • the failure record is cleaned up, and the current file is split into a first preset number of first split files through the download terminal, where the first split file
  • the set quantity can be set according to the actual situation.
  • the fixed-length character corresponding to the current file and the file content of the current file can be used to determine the corresponding split number, and according to the split number, The current file is split into split files corresponding to the split number.
  • the first split file that is split is imported into the database corresponding to the download end in parallel through the download end, that is, multiple threads are used to import it into the database in parallel at the same time.
  • the file type and import type of the file need to be judged, so that the correct file can be smoothly imported into the database, and the intelligence of file import is improved.
  • step S40 includes:
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
  • the second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  • the processing terminal obtains the transceiver configuration table corresponding to the file to be exported, and determines the initialization state of the file to be exported based on the transceiver configuration table corresponding to the file to be exported.
  • the processing terminal when exporting the file to be exported, the processing terminal first obtains the sending and receiving configuration table corresponding to the file to be exported. It is understandable that each file has a sending and receiving configuration table from beginning to end. In the process of downloading to the local or importing the file into the database or processing the file to generate the file to be exported, the sending and receiving configuration table is always carried.
  • the sending and receiving configuration table includes the initialization status of the file to be exported. Therefore, the initialization status of the file to be exported can be determined by obtaining the sending and receiving configuration table of the file to be exported.
  • the initialization state is specifically the export state. Therefore, it is possible to determine whether the current file to be exported needs to be exported by reading the initialization state of the file to be exported, that is, the file to be exported whose initialization state is the export needs to be exported.
  • the processing terminal reads the initialization state of the file to be exported, and determines the file to be exported whose initialization state is the export state as needing to be exported. It is understandable that there can be one or more processing ends. In the case of multiple processing ends, a polling mechanism can also be used to poll and read the initialization status of the file to be exported, and set the initialization status to the export status. And the file to be exported processed by this processing end is determined to be exported. That is, in the process of processing the current file to generate the file to be exported, the processing terminal needs to record the IP information of the processing terminal to determine which files to be exported are processed by the machine through the IP information.
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second directories. Split the file and export it to the first directory.
  • the file to be exported if it is determined by the processing terminal that the file to be exported needs to be exported, it is determined that the file to be exported is in the local disk and the corresponding first directory in the cloud storage, that is, the location to which the file to be exported will be exported is determined, and Split the file to be exported into a second preset number of second split files, where the second preset number can be set according to the actual situation, if the preset number is 7, split the file to be exported into 7 Split files.
  • the corresponding number of splits can be determined by the fixed-length characters corresponding to the file to be exported and the file content of the file to be exported, and according to the split number, the file to be exported can be split into splits corresponding to the number of splits file.
  • step of exporting the second split file to the first directory includes:
  • this step it is determined whether there are residual files in the first directory, if there are residual files, the residual files are deleted, and the second split file is exported to the first directory through the processing terminal; if it does not exist, Then, the second split file is directly exported to the first directory through the processing terminal.
  • the second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  • the second split file in the first directory is merged into a merged file and merged into the second directory.
  • the specific method can be When the file to be exported is split into the second split file, number the split files, such as part1, part2, part3, etc. After the second split file is exported to the first directory, according to the number of the split file, one One merges the split files into a merged file. So far, the files to be exported are synchronized and exported to the local disk and cloud storage.
  • the download end and/or the processing end has a failure such as downtime or process interruption
  • the fault file corresponding to the fault and the fault action corresponding to the fault file are recorded, and the current download end and / Or the standby end corresponding to the processing end, according to the recorded fault file and fault action, continue to execute.
  • the application also provides a distributed file batch processing device.
  • the distributed file batch processing apparatus of the present application includes a memory, a processor, and a distributed file batch processing program stored on the memory and running on the processor.
  • the distributed file batch processing program is executed by the processor. The following steps are implemented during execution:
  • a file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
  • the file download notification is detected, and the download terminal corresponding to the file download notification is determined;
  • the file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
  • the file is imported into the database corresponding to the download terminal through the download terminal.
  • the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
  • the second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  • the application also provides a computer-readable storage medium.
  • a distributed file batch processing program is stored on the computer-readable storage medium of the present application, and the distributed file batch processing program is executed by a processor to implement the steps of the distributed file batch processing method as described above.
  • the method implemented when the distributed file batch processing program running on the processor is executed can refer to the various embodiments of the distributed file batch processing method of the present application, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed is a distributed file batch processing method, comprising: when a file downloading notification is detected, determining a downloading end corresponding to the file downloading notification, and downloading, by means of the downloading end, a file, corresponding to the file downloading notification, from corresponding cloud storage to a local disk corresponding to the downloading end; reading, by means of the downloading end, file content of the file in the local disk, and on the basis of the file content, importing, by means of the downloading end, the file to a database corresponding to the downloading end; processing the file in the database by means of a processing end corresponding to the database, so as to obtain a file to be exported; and exporting the file to be exported to the local disk by means of the processing end, and sending the file to be exported to the cloud storage. Further disclosed are a distributed file batch processing apparatus and device and a storage medium.

Description

分布式文件批处理方法、装置、与可读存储介质Distributed file batch processing method, device and readable storage medium
本申请要求2019年5月30日申请的,申请号为201910465966.5,名称为“分布式文件批处理方法、装置、设备与可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。This application claims the priority of the Chinese patent application filed on May 30, 2019, with the application number 201910465966.5, titled "Distributed file batch processing method, device, equipment and readable storage medium", the full text of which is hereby introduced Reference.
技术领域Technical field
本申请涉及金融科技(Fintech)技术领域,尤其涉及分布式文件批处理方法、装置、与可读存储介质。This application relates to the field of financial technology (Fintech), in particular to distributed file batch processing methods, devices, and readable storage media.
背景技术Background technique
近年来,随着金融科技(Fintech),尤其是互联网金融的不断发展,数据处理技术被引入银行等金融机构的日常服务中。在金融机构日常服务的过程中,往往需要做跑批工作,跑批最主要就是产生总帐、进行总分核对或者是进行大批量交易,如结息、计提、代收付等;或者是生成报表,导出流水数据等,也即***需要对来自各分机构的分布式文件进行批处理,那么如何对分布式文件进行批处理。In recent years, with the continuous development of financial technology (Fintech), especially Internet finance, data processing technology has been introduced into the daily services of banks and other financial institutions. In the process of daily services of financial institutions, it is often necessary to run batch work. The main reason for running batch is to generate a general ledger, check the total score, or conduct large-scale transactions, such as interest settlement, accrual, and collection and payment; or Generate reports, export pipeline data, etc. That is, the system needs to batch process distributed files from various branches, so how to batch process distributed files.
在现有技术中,采用共享NAS盘(Network Attached Storage,网络存储器)存储数据,实现分布式架构数据交换,具体的,由本地机器从云存储中将文件下载到本地盘,并由本地机器写入NAS盘,再由处理***从NAS盘读取文件,并对文件进行处理,最后由处理***将处理后的文件再一次写入NAS盘,并同步到云存储中,实现数据交换。In the prior art, a shared NAS disk (Network Attached Storage, network storage) is used to store data to realize data exchange in a distributed architecture. Specifically, a local machine downloads a file from cloud storage to a local disk and writes it by the local machine. Enter the NAS disk, and then the processing system reads the files from the NAS disk and processes the files. Finally, the processing system writes the processed files to the NAS disk again and synchronizes to the cloud storage for data exchange.
但目前对分布式文件的批处理方式,依赖NAS盘实现,需要较高的硬件成本,而NAS性能写入慢,特别是大量小文件写入时较慢。因此,现有技术对分布式文件的跑批处理方式还有待改进。However, the current batch processing method for distributed files relies on NAS disks to achieve high hardware costs, and NAS performance is slow to write, especially when writing a large number of small files. Therefore, the prior art batch processing method for distributed files needs to be improved.
发明概述Summary of the invention
技术问题technical problem
问题的解决方案The solution to the problem
技术解决方案Technical solutions
本申请的主要目的在于提出一种分布式文件批处理方法、装置、与可读存储介质,旨在提高处理分布式文件的高效性。The main purpose of this application is to propose a distributed file batch processing method, device, and readable storage medium, aiming to improve the efficiency of processing distributed files.
为实现上述目的,本申请提供一种分布式文件批处理方法,所述分布式文件批处理方法包括如下步骤:In order to achieve the above objective, the present application provides a distributed file batch processing method. The distributed file batch processing method includes the following steps:
检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;A file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Reading the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;以及Process the file in the database by the processing terminal corresponding to the database to obtain the file to be exported; and
通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
在一实施例中,所述检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘的步骤包括:In one embodiment, the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to all through the download terminal. The steps of the local disk corresponding to the download terminal include:
检测到文件下载通知,确定所述文件下载通知对应的下载端;The file download notification is detected, and the download terminal corresponding to the file download notification is determined;
通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;以及Downloading, through the download terminal, the file status in the file list corresponding to the file download notification as the file to be downloaded, from the corresponding cloud storage to the local disk corresponding to the download terminal; and
通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。The file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
在一实施例中,所述通过所述下载端在所述本地盘中读取所述文件的文件内容,并通过所述下载端基于所述文件内容将所述文件导入所述下载端对应的数据库中的步骤包括:In an embodiment, the file content of the file is read in the local disk through the download terminal, and the file is imported into the corresponding file of the download terminal based on the file content through the download terminal The steps in the database include:
通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入;Reading the sending and receiving configuration table of the file in the local disk by the download terminal, and determining whether the file needs to be imported based on the sending and receiving configuration table;
若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;以及If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type; and
基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Based on the import type, the file is imported into the database corresponding to the download terminal through the download terminal.
在一实施例中,若所述导入类型为单线程导入,则所述基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中的步骤包括:In one embodiment, if the import type is single-threaded import, the step of importing the file into the database corresponding to the download terminal through the download terminal based on the import type includes:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;以及If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether the file header of the file needs to be skipped; and
若需要,则通过所述下载端将跳过所述文件头的所述文件导入所述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the download terminal through the download terminal.
在一实施例中,若所述导入类型为多线程导入,则所述基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中的步骤包括:In an embodiment, if the import type is multi-threaded import, the step of importing the file into the database corresponding to the download terminal through the download terminal based on the import type includes:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;以及If it exists, clear the failure record, and split the file into a first preset number of first split files through the download terminal; and
通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。Import the first split file into the database corresponding to the download terminal in parallel through the download terminal.
在一实施例中,所述通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储的步骤包括:In an embodiment, the step of exporting the file to be exported to the local disk through the processing terminal and sending the file to be exported to the cloud storage includes:
通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录;以及If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory; and
在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
在一实施例中,所述若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录的步骤包括:In one embodiment, if necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into The steps of exporting a second preset number of second split files to the first directory include:
若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;以及If it does not exist, split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory; and
若存在,则删除所述残留文件,并通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录。If it exists, delete the residual file, and split the file to be exported into a second preset number of second split files through the processing terminal, and export to the first directory.
此外,为实现上述目的,本申请还提供一种分布式文件批处理装置,所述分布式文件批处理装置包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的分布式文件批处理程序,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In addition, in order to achieve the above object, the present application also provides a distributed file batch processing device. The distributed file batch processing device includes a memory, a processor, and a device that is stored on the memory and can run on the processor. A distributed file batch processing program, when the distributed file batch processing program is executed by the processor, the following steps are implemented:
检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;A file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Reading the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;以及Process the file in the database by the processing terminal corresponding to the database to obtain the file to be exported; and
通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
在一实施例中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In an embodiment, the following steps are implemented when the distributed file batch program is executed by the processor:
检测到文件下载通知,确定所述文件下载通知对应的下载端;The file download notification is detected, and the download terminal corresponding to the file download notification is determined;
通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;以及Downloading, through the download terminal, the file status in the file list corresponding to the file download notification as the file to be downloaded, from the corresponding cloud storage to the local disk corresponding to the download terminal; and
通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。The file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
在一实施例中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In an embodiment, the following steps are implemented when the distributed file batch program is executed by the processor:
通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入;Reading the sending and receiving configuration table of the file in the local disk by the download terminal, and determining whether the file needs to be imported based on the sending and receiving configuration table;
若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;以及If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type; and
基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Based on the import type, the file is imported into the database corresponding to the download terminal through the download terminal.
在一实施例中,若所述导入类型为单线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In an embodiment, if the import type is single-threaded import, the distributed file batch program is executed by the processor to implement the following steps:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;以及If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether the file header of the file needs to be skipped; and
若需要,则通过所述下载端将跳过所述文件头的所述文件导入所述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the download terminal through the download terminal.
在一实施例中,若所述导入类型为多线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In one embodiment, if the import type is multi-threaded import, the distributed file batch program is executed by the processor to implement the following steps:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;以及If it exists, clean up the failure record, and split the file into a first predetermined number of first split files through the download terminal; and
通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。The first split file is imported into the database corresponding to the download terminal in parallel through the download terminal.
在一实施例中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In an embodiment, the following steps are implemented when the distributed file batch program is executed by the processor:
通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录;以及If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory; and
在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出 文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
在一实施例中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:In an embodiment, the following steps are implemented when the distributed file batch program is executed by the processor:
若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;以及If it does not exist, split the file to be exported into a second preset number of second split files through the processing terminal, and export to the first directory; and
若存在,则删除所述残留文件,并通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录。If it exists, delete the residual file, and split the file to be exported into a second preset number of second split files through the processing terminal, and export to the first directory.
此外,为实现上述目的,本申请还提供一种分布式文件批处理设备,所述分布式文件批处理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的分布式文件批处理程序,所述分布式文件批处理程序被所述处理器执行时实现如上所述的分布式文件批处理方法的步骤。In addition, in order to achieve the above object, this application also provides a distributed file batch processing device, the distributed file batch processing device including: a memory, a processor, and stored on the memory and capable of running on the processor Distributed file batch processing program which implements the steps of the distributed file batch processing method as described above when the distributed file batch processing program is executed by the processor.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有分布式文件批处理程序,所述分布式文件批处理程序被处理器执行时实现如上所述的分布式文件批处理方法的步骤。In addition, in order to achieve the above objective, the present application also provides a computer-readable storage medium on which a distributed file batch processing program is stored, which is implemented when the distributed file batch processing program is executed by a processor. The steps of the distributed file batch processing method as described above.
本申请提出的分布式文件批处理方法,检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。本申请通过数据库代替NAS盘,实现分布式数据交换。The distributed file batch processing method proposed in this application detects the file download notification, determines the download terminal corresponding to the file download notification, and sends the file corresponding to the file download notification through the download terminal from the corresponding cloud storage Download to the local disk corresponding to the download terminal; read the file content of the file in the local disk through the download terminal, and import the file into the file through the download terminal based on the file content In the database corresponding to the download terminal; process the file in the database through the processing terminal corresponding to the database to obtain the file to be exported; export the file to be exported to the local disk through the processing terminal, and Send the file to be exported to the cloud storage. In this application, the NAS disk is replaced by a database to realize distributed data exchange.
发明的有益效果The beneficial effects of the invention
对附图的简要说明Brief description of the drawings
附图说明Description of the drawings
图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图;FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in a solution of an embodiment of the present application;
图2为本申请分布式文件批处理方法第一实施例的流程示意图;2 is a schematic flowchart of the first embodiment of the distributed file batch processing method of this application;
图3为本申请分布式文件批处理方法第二实施例的流程示意图。FIG. 3 is a schematic flowchart of a second embodiment of a distributed file batch processing method of this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
发明实施例Invention embodiment
本发明的实施方式Embodiments of the invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, but not to limit the application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1, FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
本申请实施例终端可以是PC机或服务器设备。The terminal in the embodiment of the present application may be a PC or a server device.
如图1所示,该终端可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the terminal may include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.
本领域技术人员可以理解,图1中示出的设备结构并不构成对设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the device structure shown in FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than shown in the figure, or a combination of certain components, or different component arrangements.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及分布式文件批处理程序。As shown in FIG. 1, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a distributed file batch processing program.
其中,操作***是管理和控制分布式文件批处理设备与软件资源的程序,支持网络通信模块、用户接口模块、分布式文件批处理程序以及其他程序或软件的运行;网络通信模块用于管理和控制网络接口1002;用户接口模块用于管理和控制用户接口1003。Among them, the operating system is a program that manages and controls distributed file batch processing equipment and software resources, and supports the operation of network communication modules, user interface modules, distributed file batch processing programs, and other programs or software; network communication modules are used for management and Control the network interface 1002; the user interface module is used to manage and control the user interface 1003.
在图1所示的分布式文件批处理设备中,所述分布式文件批处理设备通过处理器1001调用存储器1005中存储的分布式文件批处理程序,并执行下述分布式文件批处理方法各个实施例中的操作。In the distributed file batch processing device shown in FIG. 1, the distributed file batch processing device calls the distributed file batch processing program stored in the memory 1005 through the processor 1001, and executes each of the following distributed file batch processing methods Operation in the embodiment.
基于上述硬件结构,提出本申请分布式文件批处理方法实施例。Based on the above hardware structure, an embodiment of the distributed file batch processing method of this application is proposed.
参照图2,图2为本申请分布式文件批处理方法第一实施例的流程示意图,所述方法包括:Referring to Fig. 2, Fig. 2 is a schematic flowchart of a first embodiment of a distributed file batch processing method according to this application, and the method includes:
步骤S10,检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;Step S10, the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the corresponding download terminal through the download terminal. Local disk
步骤S20,通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Step S20, read the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
步骤S30,通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;Step S30, processing the file in the database through the processing terminal corresponding to the database to obtain the file to be exported;
步骤S40,通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Step S40: Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
本实施例使用数据库代替NAS盘,通过下载端将文件下载到本地盘,并导入数据库中,再通过处理端从数据库中读取文件并对文件进行处理,最后把处理的文件导出到本地盘,并同步到数据库,实现分布式文件的批处理。In this embodiment, a database is used instead of the NAS disk, the file is downloaded to the local disk through the download terminal, and imported into the database, and then the file is read from the database through the processing terminal and processed, and finally the processed file is exported to the local disk. And synchronize to the database to realize batch processing of distributed files.
以下将对各个步骤进行详细说明:Each step will be described in detail below:
步骤S10,检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘。Step S10, the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the corresponding download terminal through the download terminal. Local disk.
本实施例分布式文件批处理方法应用于理财机构或者银行***等金融机构的分布式文件批处理设备中,为描述方便,分布式文件批处理设备以下简称处理设备,其中,处理设备包括若干下载端、若干处理端和数据库等,文件下载通知由消息总线发布,任何一台下载端都可以接收文件下载通知,接收到文件下载通知的下载端负责将文件下载通知对应的文件落库。The distributed file batch processing method of this embodiment is applied to the distributed file batch processing equipment of financial institutions such as financial institutions or banking systems. For the convenience of description, the distributed file batch processing equipment is hereinafter referred to as the processing equipment. The processing equipment includes several downloads. The file download notification is issued by the message bus. Any downloading terminal can receive the file download notification. The downloading terminal that receives the file download notification is responsible for notifying the file download to the corresponding file library.
具体的,在检测到消息总线发布的文件下载通知时,确定接收到文件下载通知的下载端,通过该下载端,从文件下载通知对应的云存储中,下载对应的文件到下载端对应的本地盘中。Specifically, when the file download notification issued by the message bus is detected, the download terminal that received the file download notification is determined, and the download terminal downloads the corresponding file from the cloud storage corresponding to the file download notification to the local corresponding to the download terminal. Intraday.
需要说明是,在确定接收到文件下载通知的下载端时,需记录该下载端的IP信 息,并在将对应的文件下载到本地盘时,将记录的IP信息与当前文件关联,即记录当前文件由哪台下载端下载。It should be noted that when determining the downloader that received the file download notification, the IP information of the downloader needs to be recorded, and when the corresponding file is downloaded to the local disk, the recorded IP information is associated with the current file, that is, the current file is recorded Which downloader will download it from?
进一步地,步骤S10包括:Further, step S10 includes:
步骤a,检测到文件下载通知,确定所述文件下载通知对应的下载端;Step a, the file download notification is detected, and the download terminal corresponding to the file download notification is determined;
在该步骤中,为防止单点故障,任何一台下载端都可以接收文件下载通知,具体的,启动轮询进程,各个下载端轮流监控文件列表中各个文件的文件状态。在检测到文件下载通知时,处于监控状态的下载端确定为本次下载端,并记录该下载端的IP信息。In this step, in order to prevent a single point of failure, any download terminal can receive a file download notification. Specifically, a polling process is started, and each download terminal monitors the file status of each file in the file list in turn. When the file download notification is detected, the downloader in the monitoring state is determined to be the downloader this time, and the IP information of the downloader is recorded.
步骤b,通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;Step b: Download the file status as the file to be downloaded in the file list corresponding to the file download notification through the download terminal, and download from the corresponding cloud storage to the local disk corresponding to the download terminal;
在该步骤中,文件列表中有各个文件的文件状态,用以表示各个文件是否需要下载,具体通过下载端扫描文件下载通知对应的文件列表中标示的文件状态,并将文件状态为待下载的文件从对应的云存储中下载到下载端对应的本地盘中。In this step, the file status of each file is in the file list to indicate whether each file needs to be downloaded. Specifically, the download terminal scans the file status indicated in the file list corresponding to the file download notification, and sets the file status to be downloaded The file is downloaded from the corresponding cloud storage to the local disk corresponding to the downloader.
步骤c,通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。Step c: Change the file status of the file to downloading or downloaded through the download terminal, and update the file list according to the current file status.
在该步骤中,在下载端下载当前文件过程中,将当前文件的文件状态更改为下载中,并在下载完成后,将当前文件的文件状态更改为已下载,并且将当前文件的文件状态实时更新到文件列表中,以更新文件列表。In this step, in the process of downloading the current file on the download side, the file status of the current file is changed to downloading, and after the download is completed, the file status of the current file is changed to downloaded, and the file status of the current file is changed in real time Update to the file list to update the file list.
可以理解,在本实施例中,下载端只下载文件状态为待下载的文件,而当当前文件的文件状态为下载中时,即使在轮询间隔内未完成下载,下一下载端检测到文件下载通知时,也不需要对其进行下载。It can be understood that in this embodiment, the download terminal only downloads the file whose file status is to be downloaded, and when the file status of the current file is downloading, even if the download is not completed within the polling interval, the next download terminal detects the file When downloading notifications, there is no need to download them.
当然,为避免下载端宕机或者下载进程中断,导致文件一直处于下载中的状态,当前执行下载动作的下载端除了将文件列表中处于待下载的文件下载到本地盘之外,还检测文件状态处于下载中的文件对应的下载端是否正常,具体可确定文件状态处于下载中的文件关联的IP信息,通过当前的下载端向该IP信息对应的下载端发送存活检测包,确定在第一预设时间内是否接收到对应的响应包;或者在第二预设时间内,确定文件状态处于下载中的文件的下载进度是否有变 化等,若是,则表明文件状态处于下载中的文件对应的下载端正常,无需对下载中的文件进行调整;若否,则确定文件状态处于下载中的文件对应的下载端宕机或者下载进程中断,此时,将执行切换动作,具体切换至当前下载端执行下载操作,并将文件列表中对应的文件关联的IP信息更改为当前下载端的IP信息。Of course, in order to prevent the downloading terminal from going down or the downloading process interruption, resulting in the file being downloaded all the time, the downloading terminal currently executing the downloading action not only downloads the file to be downloaded in the file list to the local disk, but also detects the file status Whether the download terminal corresponding to the file being downloaded is normal or not, the IP information associated with the file whose file status is being downloaded can be specifically determined, and the current download terminal sends a survival detection packet to the download terminal corresponding to the IP information to determine whether it is in the first preview. Set whether the corresponding response packet is received within the time; or within the second preset time, determine whether the download progress of the file whose file status is under downloading has changed, etc. If yes, it means that the file whose status is under downloading corresponds to the download The downloading terminal is normal, no need to adjust the downloading file; if not, it is determined that the downloading terminal corresponding to the file in the file status is down or the downloading process is interrupted. At this time, the switching action will be executed, and the specific switching to the current downloading terminal will be executed. Download operation, and change the IP information associated with the corresponding file in the file list to the IP information of the current downloader.
步骤S20,通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中。In step S20, the file content of the file is read in the local disk through the download terminal, and based on the file content, the file is imported into the database corresponding to the download terminal through the download terminal.
在本实施例中,在将文件下载到本地盘之后,通过下载端在本地盘中读取当前文件的文件内容,该文件内容包括当前文件关联的IP信息,基于文件内容,确定当前文件是否是当前下载端下载的,若不是,则不处理;若是,则将当前文件导入下载端对应的数据库中。即当前文件的文件状态为已下载,且是由当前下载端下载的,当前下载端才可以将其导入数据库中,对于文件状态为待下载或者下载中,或者是其他下载端下载的文件,当前下载端并不需要将其导入数据库。In this embodiment, after the file is downloaded to the local disk, the file content of the current file is read in the local disk through the download terminal. The file content includes the IP information associated with the current file. Based on the file content, it is determined whether the current file is If it is not downloaded by the current downloader, it will not be processed; if it is, the current file will be imported into the database corresponding to the downloader. That is, the file status of the current file is downloaded and downloaded by the current downloader. The current downloader can import it into the database. For the file status to be downloaded or downloading, or files downloaded by other downloaders, the current The download side does not need to import it into the database.
进一步的,在读取文件的文件内容后,基于文件内容,确定导入时机,可以理解的,有些文件并不需要紧急处理,因此,可事先将下载的文件分为紧急文件和非紧急文件,紧急文件可实时导入数据库中,对于非紧急文件,则可以批量处理,即将当前下载的非紧急文件保留,并在非紧急文件达到事先设置的数量时,将所有的非紧急文件一块导入数据库中,或者基于预设的间隔时间,将间隔时间内的非紧急文件一块导入数据库中。Further, after reading the file content of the file, the import timing is determined based on the file content. It is understandable that some files do not require urgent processing. Therefore, the downloaded files can be divided into urgent files and non-urgent files in advance. Files can be imported into the database in real time. For non-urgent files, you can process them in batches, that is, keep the currently downloaded non-urgent files, and when the non-urgent files reach the preset number, import all the non-urgent files into the database together, or Based on the preset interval time, import the non-urgent files within the interval time into the database.
具体在将当前文件导入数据库的过程中,标记当前文件的文件状态为导入中,并在将当前文件导入到数据库后,将当前文件的文件状态标记为已导入,避免重复导入。Specifically, in the process of importing the current file into the database, mark the file status of the current file as importing, and after importing the current file into the database, mark the file status of the current file as imported to avoid repeated importing.
步骤S30,通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件。Step S30: Process the file in the database through the processing terminal corresponding to the database to obtain the file to be exported.
在本实施例中,采用数据库对应的处理端读取数据库中的文件,并对文件进行处理,具体处理过程根据文件的业务类型决定,如当前文件为还款入账文件,属于核算类,则对还款入账文件中的还款账户和还款数目进行核算,以扣取还 款账户对应的金额,并将所扣取的金额入账;如当前文件为结息文件,属于结息类,则获取结息文件中各账户的利率、各账户的储蓄金额,以及存储时间,并基于利率、金额和存储时间,计算各账户的利息;如当前文件为流水文件,属于流水账目,则导出流水数据,生成流水报表等。In this embodiment, the processing terminal corresponding to the database is used to read the file in the database and process the file. The specific processing process is determined according to the business type of the file. If the current file is a repayment entry file and belongs to the accounting category, then The repayment account and the number of repayments in the repayment entry document are calculated to deduct the amount corresponding to the repayment account and enter the deducted amount; if the current document is an interest settlement document, it is obtained The interest rate of each account, the amount of savings in each account, and the storage time in the interest settlement file are calculated, and the interest of each account is calculated based on the interest rate, amount and storage time; if the current file is a flow file and belongs to a flow account, the flow data will be exported. Generate flow reports, etc.
即,采用处理端读取当前文件,并确定当前文件所属的业务类型,基于业务类型,采用对应的处理方式对当前文件进行处理,以得到待导出文件。That is, the processing terminal is used to read the current file and determine the service type to which the current file belongs, and based on the service type, the current file is processed in a corresponding processing mode to obtain the file to be exported.
步骤S40,通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Step S40: Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
在本实施例中,在得到待导出文件后,通过处理端将待导出文件导出至本地盘,以更新本地盘中对应的文件,并将待导出文件发送至云存储,以更新云存储中对应的文件。In this embodiment, after the file to be exported is obtained, the file to be exported is exported to the local disk through the processing terminal to update the corresponding file in the local disk, and the file to be exported is sent to the cloud storage to update the corresponding file in the cloud storage. document.
进一步地,可将待导出文件保留,并在间隔第三预设时间后,将第三预设时间内所有的待导出文件一块导出至本地盘,并将所有的待导出文件同步发送至云存储。Further, the files to be exported can be kept, and after the third preset time interval, all the files to be exported within the third preset time are exported to the local disk together, and all the files to be exported are synchronously sent to the cloud storage .
进一步地,可将待导出文件保留,并在待导出文件达到事先设置的数量时,将所有的待导出文件一块导出至本地盘,并将所有的待导出文件同步发送至云存储。Further, the files to be exported can be retained, and when the number of files to be exported reaches a preset number, all the files to be exported are exported to a local disk together, and all the files to be exported are synchronously sent to cloud storage.
本实施例在检测到文件下载通知时,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。本申请通过数据库代替NAS盘,由下载端下载目标文件,并导入数据库,由对应的处理端对目标文件进行数据处理,并在处理后,将处理后的目标文件导出至本地盘,并同步到云存储,实现分布式数据交换。In this embodiment, when a file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the download terminal through the download terminal. Corresponding local disk; the file content of the file is read in the local disk through the download terminal, and based on the file content, the file is imported into the database corresponding to the download terminal through the download terminal Processing the file in the database by the processing terminal corresponding to the database to obtain the file to be exported; exporting the file to be exported to the local disk through the processing terminal, and the file to be exported Send to the cloud storage. In this application, the NAS disk is replaced by a database. The download terminal downloads the target file and imports it into the database. The corresponding processing terminal processes the target file. After processing, the processed target file is exported to the local disk and synchronized to Cloud storage enables distributed data exchange.
进一步地,基于本申请分布式文件批处理方法第一实施例,提出本申请分布式 文件批处理方法第二实施例。Further, based on the first embodiment of the distributed file batch processing method of this application, a second embodiment of the distributed file batch processing method of this application is proposed.
分布式文件批处理方法的第二实施例与分布式文件批处理方法的第一实施例的区别在于,参照图3,步骤S20包括:The difference between the second embodiment of the distributed file batch processing method and the first embodiment of the distributed file batch processing method is that, referring to FIG. 3, step S20 includes:
步骤S21,通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入;Step S21: Read the sending and receiving configuration table of the file in the local disk through the download terminal, and determine whether the file needs to be imported based on the sending and receiving configuration table;
步骤S22,若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;Step S22, if necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type;
步骤S23,基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Step S23, based on the import type, import the file into the database corresponding to the download terminal through the download terminal.
本实施例在将下载好的文件导入数据库时,需判断文件的文件类型以及导入类型,以将正确的文件顺利的导入数据库,提高文件导入的智能性。In this embodiment, when the downloaded file is imported into the database, the file type and import type of the file need to be judged, so that the correct file can be smoothly imported into the database, and the intelligence of file import is improved.
以下将对各个步骤进行详细说明:Each step will be described in detail below:
步骤S21,通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入。Step S21: Read the sending and receiving configuration table of the file in the local disk through the download terminal, and determine whether the file needs to be imported based on the sending and receiving configuration table.
在本实施例中,每一个文件都对应有收发配置表,收发配置表中事先设置好对应文件的导入导出情况,具体包括需不需要导入数据库、需不需要导出数据库以及对应的文件类型等,因此,可通过下载端读取当前文件的收发配置表,从而确定当前文件是否需要导入数据库。In this embodiment, each file corresponds to a sending and receiving configuration table, and the import and export conditions of the corresponding file are set in the sending and receiving configuration table in advance, including the need to import the database, the need to export the database, and the corresponding file type. Therefore, the sending and receiving configuration table of the current file can be read through the download terminal to determine whether the current file needs to be imported into the database.
步骤S22,若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型。Step S22: If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type.
在本实施例中,若确定需要将当前文件导入数据库,则进一步读取当前文件对应的收发配置表,确定当前文件的文件类型,其中,文件类型包括定长文件和非定长文件,可以理解的,若确定当前文件为定长文件,则需进一步确定定长文件对应的定长字符,以便后续在对当前文件进行导入数据库时,按照确定的定长字符进行导入。In this embodiment, if it is determined that the current file needs to be imported into the database, the transceiver configuration table corresponding to the current file is further read to determine the file type of the current file, where the file type includes fixed-length files and non-fixed-length files, which is understandable If it is determined that the current file is a fixed-length file, it is necessary to further determine the fixed-length characters corresponding to the fixed-length file, so that when the current file is imported into the database, the fixed-length characters are imported according to the determined fixed-length characters.
在确定了当前文件的文件类型之后,即可根据文件类型确定导入类型,具体的,若当前文件的文件类型为定长文件,则对应的导入类型为多线程导入;若当前文件的文件类型为非定长文件,则对应的导入类型为单线程导入。After the file type of the current file is determined, the import type can be determined according to the file type. Specifically, if the file type of the current file is a fixed-length file, the corresponding import type is multi-threaded import; if the file type of the current file is For non-fixed-length files, the corresponding import type is single-threaded import.
步骤S23,基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Step S23, based on the import type, import the file into the database corresponding to the download terminal through the download terminal.
在本实施例中,根据确定的导入类型,按照对应的导入方式,通过下载端将当前文件导入下载端对应的数据库中。In this embodiment, according to the determined import type, according to the corresponding import method, the current file is imported into the database corresponding to the download terminal through the download terminal.
具体的,若导入类型为单线程导入,则步骤S23具体包括:Specifically, if the import type is single-threaded import, step S23 specifically includes:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
在该步骤中,先确定数据库中是否存在当前文件对应的失败记录,可以理解的,在将文件导入数据库的过程中很可能发生如宕机或者进程中断的故障,导致文件导入失败,而残留在数据库中的数据即为失败记录,为避免当前文件是之前导入失败的文件,而导致数据重复,占用数据库太多的空间,因此,需先确定数据库当前是否存在当前文件对应的失败记录。In this step, first determine whether there is a failure record corresponding to the current file in the database. It is understandable that failures such as downtime or process interruption may occur during the process of importing the file into the database, causing the file import to fail and remain in The data in the database is the failure record. In order to avoid the current file being a file that failed to be imported before, resulting in data duplication and occupying too much space in the database, it is necessary to determine whether the database currently has a failure record corresponding to the current file.
若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether the file header of the file needs to be skipped;
在该步骤中,若确定数据库中存在当前文件对应的失败记录,则清理该失败记录,并通过下载端读取当前文件的文件配置,可以理解的,在将文件导入数据库时,同类文件具有相同的文件头,因此,某些文件可根据实际情况事先确定是否需要导入文件头,若当前导入的行数为当前文件的第一行时,且文件配置为跳过文件头时,当前文件的文件头不导入数据库,否则将文件头也导入数据库。In this step, if it is determined that there is a failure record corresponding to the current file in the database, the failure record is cleaned up, and the file configuration of the current file is read through the downloader. It is understandable that when the file is imported into the database, similar files have the same Therefore, some files can be determined in advance whether to import the file header according to the actual situation. If the currently imported line number is the first line of the current file, and the file is configured to skip the file header, the file of the current file The header is not imported into the database, otherwise the file header is also imported into the database.
可以理解的,所不存在,则直接执行通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头的步骤。It is understandable that if it does not exist, the step of reading the file configuration of the file through the download terminal is directly executed to determine whether the file header of the file needs to be skipped.
若需要,则通过所述下载端将跳过所述文件头的所述文件导入所述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the downloading terminal through the downloading terminal.
在该步骤中,若确定需要,则通过下载端跳过文件头,将除文件头的文件导入下载端对应的数据库中。In this step, if it is determined that it is necessary, the file header is skipped through the download terminal, and the file except the file header is imported into the database corresponding to the download terminal.
可以理解的,若不需要,则直接执行将当前文件导入对应的数据库中。Understandably, if it is not needed, directly execute the import of the current file into the corresponding database.
若导入类型为多线程导入,则步骤S23具体包括:If the import type is multi-threaded import, step S23 specifically includes:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
在该步骤中,先确定数据库中是否存在当前文件对应的失败记录,可以理解的,在将文件导入数据库的过程中很可能发生如宕机或者进程中断的故障,导致文件导入失败,而残留在数据库中的数据即为失败记录,为避免当前文件是之前导入失败的文件,而导致数据重复,占用数据库太多的空间,因此,需先确定数据库当前是否存在当前文件对应的失败记录。In this step, first determine whether there is a failure record corresponding to the current file in the database. It is understandable that failures such as downtime or process interruption may occur during the process of importing the file into the database, causing the file import to fail and remain in The data in the database is the failure record. In order to avoid the current file being a file that failed to be imported before, resulting in data duplication and occupying too much space in the database, it is necessary to determine whether the database currently has a failure record corresponding to the current file.
若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;If it exists, clear the failure record, and split the file into a first predetermined number of first split files through the download terminal;
在该步骤中,若确定数据库中存在当前文件对应的失败记录,则清理该失败记录,并通过下载端将当前文件拆分为第一预设数量的第一拆分文件,其中,第一预设数量可根据实际情况进行设置。In this step, if it is determined that there is a failure record corresponding to the current file in the database, the failure record is cleaned up, and the current file is split into a first preset number of first split files through the download terminal, where the first split file The set quantity can be set according to the actual situation.
进一步地,在将当前文件拆分为第一拆分文件时,可通过当前文件对应的定长字符,以及当前文件的文件内容,确定对应的拆分数量,并按照所述拆分数量,将当前文件拆分为拆分数量对应的拆分文件。Further, when splitting the current file into the first split file, the fixed-length character corresponding to the current file and the file content of the current file can be used to determine the corresponding split number, and according to the split number, The current file is split into split files corresponding to the split number.
通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。Import the first split file into the database corresponding to the download terminal in parallel through the download terminal.
在该步骤中,通过下载端将拆分的第一拆分文件并行导入下载端对应的数据库,即采用多个线程,同时并行将其导入数据库中。In this step, the first split file that is split is imported into the database corresponding to the download end in parallel through the download end, that is, multiple threads are used to import it into the database in parallel at the same time.
最后,生成对应的文件记录,具体的,将当前文件的文件状态标记为已导入。Finally, the corresponding file record is generated, specifically, the file status of the current file is marked as imported.
本实施例在将下载好的文件导入数据库时,需判断文件的文件类型以及导入类型,以将正确的文件顺利的导入数据库,提高文件导入的智能性。In this embodiment, when importing the downloaded file into the database, the file type and import type of the file need to be judged, so that the correct file can be smoothly imported into the database, and the intelligence of file import is improved.
进一步地,基于本申请分布式文件批处理方法第一、第二实施例,提出本申请分布式文件批处理方法第三实施例。Further, based on the first and second embodiments of the distributed file batch processing method of this application, a third embodiment of the distributed file batch processing method of this application is proposed.
分布式文件批处理方法的第三实施例与分布式文件批处理方法的第一、第二实施例的区别在于,步骤S40包括:The difference between the third embodiment of the distributed file batch processing method and the first and second embodiments of the distributed file batch processing method is that step S40 includes:
通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件, 并导出至第一目录;If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
本实施例在将待导出文件导出到本地盘和云存储时,需对待导出文件进行进一步判断,也即并不是所有的待导出文件都需要导出,而只是导出符合条件的待导出文件,并且在导出待导出文件时,选择正确的导出方式,使得文件的导出更加迅速和便捷。In this embodiment, when exporting files to be exported to local disks and cloud storage, further judgments are needed on the files to be exported, that is, not all files to be exported need to be exported, but only the files to be exported that meet the conditions are exported, and When exporting the file to be exported, select the correct export method to make the export of the file faster and more convenient.
以下将对各个步骤进行详细说明:The following will describe each step in detail:
通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态。The processing terminal obtains the transceiver configuration table corresponding to the file to be exported, and determines the initialization state of the file to be exported based on the transceiver configuration table corresponding to the file to be exported.
在本实施例中,在将待导出文件导出时,先通过处理端获取待导出文件对应的收发配置表,可以理解的,每一个文件至始至终都带有一个收发配置表,在将文件下载到本地或者将文件导入数据库或者对文件进行处理生成待导出文件的过程中,始终携带着收发配置表。收发配置表包括待导出文件的初始化状态。因此,可通过获取待导出文件的收发配置表确定待导出文件的初始化状态。In this embodiment, when exporting the file to be exported, the processing terminal first obtains the sending and receiving configuration table corresponding to the file to be exported. It is understandable that each file has a sending and receiving configuration table from beginning to end. In the process of downloading to the local or importing the file into the database or processing the file to generate the file to be exported, the sending and receiving configuration table is always carried. The sending and receiving configuration table includes the initialization status of the file to be exported. Therefore, the initialization status of the file to be exported can be determined by obtaining the sending and receiving configuration table of the file to be exported.
基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出。Based on the initialization state, it is determined by the processing terminal whether the file to be exported needs to be exported.
在本实施例中,初始化状态具体为导出状态,因此,可通过读取待导出文件的初始化状态,确定当前的待导出文件是否需要导出,即初始化状态为导出的待导出文件需要导出。In this embodiment, the initialization state is specifically the export state. Therefore, it is possible to determine whether the current file to be exported needs to be exported by reading the initialization state of the file to be exported, that is, the file to be exported whose initialization state is the export needs to be exported.
具体的,通过处理端读取待导出文件的初始化状态,将初始化状态为导出状态的待导出文件确定为需要导出。可以理解的,处理端可以是一个或者是多个,在处理端是多个的情况下,也可采用轮询机制,轮询读取待导出文件的初始化状态,并将初始化状态为导出状态,且是本处理端处理的待导出文件确定为需要导出。即处理端在将当前文件进行处理,以生成待导出文件的过程中,还需记录处理端的IP信息,以通过IP信息确定哪些待导出文件是本机处理的。Specifically, the processing terminal reads the initialization state of the file to be exported, and determines the file to be exported whose initialization state is the export state as needing to be exported. It is understandable that there can be one or more processing ends. In the case of multiple processing ends, a polling mechanism can also be used to poll and read the initialization status of the file to be exported, and set the initialization status to the export status. And the file to be exported processed by this processing end is determined to be exported. That is, in the process of processing the current file to generate the file to be exported, the processing terminal needs to record the IP information of the processing terminal to determine which files to be exported are processed by the machine through the IP information.
若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录。If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second directories. Split the file and export it to the first directory.
在本实施例中,若通过处理端确定当前待导出文件需要导出,则确定待导出文件在本地盘和在云存储中对应的第一目录,即确定待导出文件即将被导出到哪个位置,并将待导出文件拆分为第二预设数量的第二拆分文件,其中,第二预设数量可根据实际情况进行设置,如预设数量为7,则将待导出文件拆分为7个拆分文件。In this embodiment, if it is determined by the processing terminal that the file to be exported needs to be exported, it is determined that the file to be exported is in the local disk and the corresponding first directory in the cloud storage, that is, the location to which the file to be exported will be exported is determined, and Split the file to be exported into a second preset number of second split files, where the second preset number can be set according to the actual situation, if the preset number is 7, split the file to be exported into 7 Split files.
进一步地,可通过待导出文件对应的定长字符,以及待导出文件的文件内容,确定对应的拆分数量,并按照该拆分数量,将待导出文件拆分为拆分数量对应的拆分文件。Further, the corresponding number of splits can be determined by the fixed-length characters corresponding to the file to be exported and the file content of the file to be exported, and according to the split number, the file to be exported can be split into splits corresponding to the number of splits file.
在将待导出文件拆分为多个拆分文件后,启动多线程并行导出,将多个拆分文件导出至确定的第一目录。After the file to be exported is split into multiple split files, multi-threaded parallel export is started, and the multiple split files are exported to the determined first directory.
进一步的,将所述第二拆分文件导出至所述第一目录的步骤包括:Further, the step of exporting the second split file to the first directory includes:
若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;If it does not exist, split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory;
若存在,则删除所述残留文件,并通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录。If it exists, delete the residual file, and split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory.
在该步骤中,确定所述第一目录是否存在残留文件,若存在,则删除所述残留文件,并通过处理端将所述第二拆分文件导出至所述第一目录;若不存在,则通过处理端直接将所述第二拆分文件导出至所述第一目录。In this step, it is determined whether there are residual files in the first directory, if there are residual files, the residual files are deleted, and the second split file is exported to the first directory through the processing terminal; if it does not exist, Then, the second split file is directly exported to the first directory through the processing terminal.
可以理解的,在将待导出文件导出时,可能会出现宕机或者进程中断等故障,在此种情况下,对应的目录中必然存在未完全导出的文件,即残留文件,因此,需要将残留文件删除,再通过处理端将第二拆分文件导出,避免重复导出,若在对应的目录中不存在残留文件,则直接通过处理端将第二拆分文件导出。It is understandable that when exporting files to be exported, failures such as downtime or process interruption may occur. In this case, there must be incompletely exported files in the corresponding directory, that is, residual files. Therefore, the residual files need to be removed. The file is deleted, and the second split file is exported through the processing terminal to avoid repeated export. If there are no residual files in the corresponding directory, the second split file is directly exported through the processing terminal.
在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
在本实施例,在将第二拆分文件拆分导出到第一目录后,将第一目录中的第二拆分文件合并为合并文件,并合并到第二目录中,具体方式可在将待导出文件 拆分为第二拆分文件时,对拆分文件进行编号,如part1、part2、part3等,在将第二拆分文件导出至第一目录后,根据拆分文件的编号,一一将拆分文件合并为合并文件,至此,待导出文件被同步导出都本地盘与云存储中。In this embodiment, after splitting and exporting the second split file to the first directory, the second split file in the first directory is merged into a merged file and merged into the second directory. The specific method can be When the file to be exported is split into the second split file, number the split files, such as part1, part2, part3, etc. After the second split file is exported to the first directory, according to the number of the split file, one One merges the split files into a merged file. So far, the files to be exported are synchronized and exported to the local disk and cloud storage.
本实施例在将待导出文件导出到本地盘和云存储时,需对待导出文件进行进一步判断,也即并不是所有的待导出文件都需要导出,而只是导出符合条件的待导出文件,并且在导出待导出文件时,选择正确的导出方式,使得文件的导出更加迅速和便捷。In this embodiment, when exporting files to be exported to local disks and cloud storage, further judgments on the files to be exported are required, that is, not all files to be exported need to be exported, but only the files to be exported that meet the conditions are exported, and When exporting the file to be exported, choose the correct export method to make the export of the file faster and more convenient.
需要说明的是,在上述实施例中,若下载端和/或处理端发生宕机或者进程中断等故障,则记录该故障对应的故障文件以及故障文件对应的故障动作,并切换当前下载端和/或处理端对应的备用端,按照记录的故障文件和故障动作,继续执行。It should be noted that, in the above-mentioned embodiment, if the download end and/or the processing end has a failure such as downtime or process interruption, the fault file corresponding to the fault and the fault action corresponding to the fault file are recorded, and the current download end and / Or the standby end corresponding to the processing end, according to the recorded fault file and fault action, continue to execute.
本申请还提供一种分布式文件批处理装置。本申请分布式文件批处理装置包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的分布式文件批处理程序,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:The application also provides a distributed file batch processing device. The distributed file batch processing apparatus of the present application includes a memory, a processor, and a distributed file batch processing program stored on the memory and running on the processor. The distributed file batch processing program is executed by the processor. The following steps are implemented during execution:
检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;A file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Reading the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;Processing the file in the database by the processing terminal corresponding to the database to obtain the file to be exported;
通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
进一步地,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, when the distributed file batch processing program is executed by the processor, the following steps are implemented:
检测到文件下载通知,确定所述文件下载通知对应的下载端;The file download notification is detected, and the download terminal corresponding to the file download notification is determined;
通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;Downloading, through the download terminal, the file status in the file list corresponding to the file download notification to be a file to be downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal;
通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。The file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
进一步地,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, when the distributed file batch processing program is executed by the processor, the following steps are implemented:
通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入;Reading the sending and receiving configuration table of the file in the local disk by the download terminal, and determining whether the file needs to be imported based on the sending and receiving configuration table;
若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type;
基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Based on the import type, the file is imported into the database corresponding to the download terminal through the download terminal.
进一步地,若所述导入类型为单线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, if the import type is single-threaded import, the following steps are implemented when the distributed file batch program is executed by the processor:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether it is necessary to skip the file header of the file;
若需要,则通过所述下载端将跳过所述文件头的所述文件导入所述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the download terminal through the download terminal.
进一步地,若所述导入类型为多线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, if the import type is multi-threaded import, the following steps are implemented when the distributed file batch program is executed by the processor:
确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;If it exists, clear the failure record, and split the file into a first predetermined number of first split files through the download terminal;
通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。Import the first split file into the database corresponding to the download terminal in parallel through the download terminal.
进一步地,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, when the distributed file batch processing program is executed by the processor, the following steps are implemented:
通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件, 并导出至第一目录;If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second Split the file and export it to the first directory;
在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
进一步地,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:Further, when the distributed file batch processing program is executed by the processor, the following steps are implemented:
若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;If it does not exist, split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory;
若存在,则删除所述残留文件,并通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录。If it exists, delete the residual file, and split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory.
本申请还提供一种计算机可读存储介质。The application also provides a computer-readable storage medium.
本申请计算机可读存储介质上存储有分布式文件批处理程序,所述分布式文件批处理程序被处理器执行时实现如上所述的分布式文件批处理方法的步骤。A distributed file batch processing program is stored on the computer-readable storage medium of the present application, and the distributed file batch processing program is executed by a processor to implement the steps of the distributed file batch processing method as described above.
其中,在所述处理器上运行的分布式文件批处理程序被执行时所实现的方法可参照本申请分布式文件批处理方法各个实施例,此处不再赘述。The method implemented when the distributed file batch processing program running on the processor is executed can refer to the various embodiments of the distributed file batch processing method of the present application, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。It should be noted that in this article, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article or system that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the superiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器, 或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书与附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made by using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (15)

  1. 一种分布式文件批处理方法,其中,所述分布式文件批处理方法包括如下步骤:A distributed file batch processing method, wherein the distributed file batch processing method includes the following steps:
    检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;A file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
    通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Reading the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
    通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;以及Process the file in the database by the processing terminal corresponding to the database to obtain the file to be exported; and
    通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
  2. 如权利要求1所述的分布式文件批处理方法,其中,所述检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘的步骤包括:The distributed file batch processing method according to claim 1, wherein the file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file download notification corresponding to the file is notified through the download terminal , The step of downloading from the corresponding cloud storage to the local disk corresponding to the download terminal includes:
    检测到文件下载通知,确定所述文件下载通知对应的下载端;The file download notification is detected, and the download terminal corresponding to the file download notification is determined;
    通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;以及Downloading, through the download terminal, the file status in the file list corresponding to the file download notification as the file to be downloaded, from the corresponding cloud storage to the local disk corresponding to the download terminal; and
    通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。The file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
  3. 如权利要求1所述的分布式文件批处理方法,其中,所述通过所述下载端在所述本地盘中读取所述文件的文件内容,并通过所述下载端基于所述文件内容将所述文件导入所述下载端对应的数据库中的步骤包括:The distributed file batch processing method according to claim 1, wherein the file content of the file is read in the local disk through the download terminal, and the file content is processed by the download terminal based on the file content. The step of importing the file into the database corresponding to the download terminal includes:
    通过所述下载端在所述本地盘中读取所述文件的收发配置表,并 基于所述收发配置表确定所述文件是否需要导入;Reading the sending and receiving configuration table of the file in the local disk by the download terminal, and determining whether the file needs to be imported based on the sending and receiving configuration table;
    若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;以及If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type; and
    基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Based on the import type, the file is imported into the database corresponding to the download terminal through the download terminal.
  4. 如权利要求3所述的分布式文件批处理方法,其中,若所述导入类型为单线程导入,则所述基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中的步骤包括:The distributed file batch processing method of claim 3, wherein, if the import type is single-threaded import, the file is imported into the download terminal corresponding to the download terminal based on the import type The steps in the database include:
    确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
    若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;以及If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether the file header of the file needs to be skipped; and
    若需要,则通过所述下载端将跳过所述文件头的所述文件导入所述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the download terminal through the download terminal.
  5. 如权利要求3所述的分布式文件批处理方法,其中,若所述导入类型为多线程导入,则所述基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中的步骤包括:The distributed file batch processing method according to claim 3, wherein, if the import type is multi-threaded import, then based on the import type, the file is imported into the download end corresponding to the download end through the download end The steps in the database include:
    确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
    若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;以及If it exists, clean up the failure record, and split the file into a first predetermined number of first split files through the download terminal; and
    通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。Import the first split file into the database corresponding to the download terminal in parallel through the download terminal.
  6. 如权利要求1-5任一项所述的分布式文件批处理方法,其中,所述通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储的步骤包括:5. The distributed file batch processing method according to any one of claims 1 to 5, wherein the processing terminal exports the file to be exported to the local disk, and sends the file to be exported to The cloud storage steps include:
    通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
    基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
    若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录;以及If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second directories. Split the file and export it to the first directory; and
    在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  7. 如权利要求6所述的分布式文件批处理方法,其中,所述若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录的步骤包括:7. The distributed file batch processing method of claim 6, wherein, if necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, The steps of splitting the file to be exported into a second preset number of second split files and exporting to the first directory include:
    若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
    若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;以及If it does not exist, split the file to be exported into a second preset number of second split files through the processing terminal, and export to the first directory; and
    若存在,则删除所述残留文件,并通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录。If it exists, delete the residual file, and split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory.
  8. 一种分布式文件批处理装置,其中,所述分布式文件批处理装置包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的分布式文件批处理程序,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:A distributed file batch processing device, wherein the distributed file batch processing device includes a memory, a processor, and a distributed file batch processing program stored on the memory and running on the processor, and When the distributed file batch program is executed by the processor, the following steps are implemented:
    检测到文件下载通知,确定所述文件下载通知对应的下载端,并通过所述下载端将所述文件下载通知对应的文件,从对应的云存储中下载至所述下载端对应的本地盘;A file download notification is detected, the download terminal corresponding to the file download notification is determined, and the file corresponding to the file download notification is downloaded from the corresponding cloud storage to the local disk corresponding to the download terminal through the download terminal;
    通过所述下载端在所述本地盘中读取所述文件的文件内容,并基于所述文件内容,通过所述下载端将所述文件导入所述下载端对应的数据库中;Reading the file content of the file in the local disk through the download terminal, and based on the file content, import the file into the database corresponding to the download terminal through the download terminal;
    通过所述数据库对应的处理端处理所述数据库中的所述文件,以得到待导出文件;以及Process the file in the database by the processing terminal corresponding to the database to obtain the file to be exported; and
    通过所述处理端将所述待导出文件导出至所述本地盘,并将所述待导出文件发送至所述云存储。Export the file to be exported to the local disk through the processing terminal, and send the file to be exported to the cloud storage.
  9. 如权利要求8所述的分布式文件批处理装置,其中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:8. The distributed file batch processing device of claim 8, wherein the distributed file batch processing program implements the following steps when being executed by the processor:
    检测到文件下载通知,确定所述文件下载通知对应的下载端;The file download notification is detected, and the download terminal corresponding to the file download notification is determined;
    通过所述下载端将所述文件下载通知对应的文件列表中文件状态为待下载的文件,从对应的云存储中下载至所述下载端对应的本地盘;以及Downloading, through the download terminal, the file status in the file list corresponding to the file download notification as the file to be downloaded, from the corresponding cloud storage to the local disk corresponding to the download terminal; and
    通过所述下载端将所述文件的文件状态更改为下载中或者已下载,并根据当前文件状态,更新所述文件列表。The file status of the file is changed to downloading or downloaded through the download terminal, and the file list is updated according to the current file status.
  10. 如权利要求8所述的分布式文件批处理装置,其中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:8. The distributed file batch processing apparatus of claim 8, wherein the distributed file batch processing program implements the following steps when being executed by the processor:
    通过所述下载端在所述本地盘中读取所述文件的收发配置表,并基于所述收发配置表确定所述文件是否需要导入;Reading the sending and receiving configuration table of the file in the local disk by the download terminal, and determining whether the file needs to be imported based on the sending and receiving configuration table;
    若需要,则基于所述收发配置表,通过所述处理端确定所述文件所属的文件类型,并基于所述文件类型,确定所述文件的导入类型;以及If necessary, determine the file type to which the file belongs through the processing terminal based on the transceiver configuration table, and determine the import type of the file based on the file type; and
    基于所述导入类型,通过所述下载端将所述文件导入所述下载端对应的数据库中。Based on the import type, the file is imported into the database corresponding to the download terminal through the download terminal.
  11. 如权利要求10所述的分布式文件批处理装置,其中,若所述导入类型为单线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:10. The distributed file batch processing apparatus of claim 10, wherein if the import type is single-threaded import, the distributed file batch processing program is executed by the processor to implement the following steps:
    确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
    若存在,则清理所述失败记录,并通过所述下载端读取所述文件的文件配置,以确定是否需要跳过所述文件的文件头;以及If it exists, clear the failure record, and read the file configuration of the file through the download terminal to determine whether it is necessary to skip the file header of the file; and
    若需要,则通过所述下载端将跳过所述文件头的所述文件导入所 述下载端对应的数据库中。If necessary, import the file skipping the file header into the database corresponding to the download terminal through the download terminal.
  12. 如权利要求10所述的分布式文件批处理装置,其中,若所述导入类型为多线程导入,则所述分布式文件批处理程序被所述处理器执行时实现以下步骤:10. The distributed file batch processing apparatus of claim 10, wherein if the import type is multithreaded import, the distributed file batch processing program is executed by the processor to implement the following steps:
    确定所述数据库中是否存在所述文件对应的失败记录;Determining whether there is a failure record corresponding to the file in the database;
    若存在,则清理所述失败记录,并通过所述下载端将所述文件拆分为第一预设数量的第一拆分文件;以及If it exists, clean up the failure record, and split the file into a first predetermined number of first split files through the download terminal; and
    通过所述下载端将所述第一拆分文件并行导入所述下载端对应的数据库中。Import the first split file into the database corresponding to the download terminal in parallel through the download terminal.
  13. 如权利要求8-12任一项所述的分布式文件批处理装置,其中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:12. The distributed file batch processing device according to any one of claims 8-12, wherein the distributed file batch processing program implements the following steps when being executed by the processor:
    通过所述处理端获取所述待导出文件对应的收发配置表,并基于所述待导出文件对应的收发配置表确定所述待导出文件的初始化状态;Acquiring, by the processing terminal, the sending and receiving configuration table corresponding to the file to be exported, and determining the initialization state of the file to be exported based on the sending and receiving configuration table corresponding to the file to be exported;
    基于所述初始化状态,通过所述处理端确定所述待导出文件是否需要导出;Based on the initialization state, determine whether the file to be exported needs to be exported through the processing terminal;
    若需要,则通过所述处理端确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至第一目录;以及If necessary, the processing terminal determines that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and splits the file to be exported into a second preset number of second directories. Split the file and export it to the first directory; and
    在所述第一目录中将所述第二拆分文件合并为合并文件,并合并到所述待导出文件对应的第二目录。The second split file is merged into a merged file in the first directory, and merged into a second directory corresponding to the file to be exported.
  14. 如权利要求13所述的分布式文件批处理装置,其中,所述分布式文件批处理程序被所述处理器执行时实现以下步骤:13. The distributed file batch processing apparatus of claim 13, wherein the distributed file batch processing program implements the following steps when being executed by the processor:
    若需要,则确定所述待导出文件在所述本地盘和在所述云存储对应的第一目录,并确定所述第一目录是否存在残留文件;If necessary, determining that the file to be exported is in the local disk and the first directory corresponding to the cloud storage, and determining whether there are residual files in the first directory;
    若不存在,则通过所述处理端将所述待导出文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录;以及If it does not exist, split the file to be exported into a second preset number of second split files through the processing terminal, and export to the first directory; and
    若存在,则删除所述残留文件,并通过所述处理端将所述待导出 文件拆分为第二预设数量的第二拆分文件,并导出至所述第一目录。If it exists, delete the residual file, and split the to-be-exported file into a second preset number of second split files through the processing terminal, and export to the first directory.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有分布式文件批处理程序,所述分布式文件批处理程序被处理器执行时实现如权利要求1至7中任一项所述的分布式文件批处理方法的步骤。。A computer-readable storage medium, wherein a distributed file batch processing program is stored on the computer-readable storage medium, and when the distributed file batch processing program is executed by a processor, the implementation is as in any one of claims 1 to 7 The steps of the distributed file batch processing method described in the item. .
PCT/CN2020/092139 2019-05-30 2020-05-25 Distributed file batch processing method and apparatus, and readable storage medium WO2020238860A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910465966.5A CN110191182B (en) 2019-05-30 2019-05-30 Distributed file batch processing method, device, equipment and readable storage medium
CN201910465966.5 2019-05-30

Publications (1)

Publication Number Publication Date
WO2020238860A1 true WO2020238860A1 (en) 2020-12-03

Family

ID=67719137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092139 WO2020238860A1 (en) 2019-05-30 2020-05-25 Distributed file batch processing method and apparatus, and readable storage medium

Country Status (2)

Country Link
CN (1) CN110191182B (en)
WO (1) WO2020238860A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597228A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 File processing method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110191182B (en) * 2019-05-30 2023-04-21 深圳前海微众银行股份有限公司 Distributed file batch processing method, device, equipment and readable storage medium
CN113722277A (en) * 2020-05-25 2021-11-30 中兴通讯股份有限公司 Data import method, device, service platform and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379149A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service system providing function of processing files according to received commands
CN104580437A (en) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 Cloud storage client and high-efficiency data access method thereof
CN105956481A (en) * 2015-09-17 2016-09-21 ***股份有限公司 Data processing method and device
US20180088984A1 (en) * 2016-09-23 2018-03-29 EMC IP Holding Company LLC Methods and devices of batch process of content management
CN107967347A (en) * 2017-12-07 2018-04-27 湖北三新文化传媒有限公司 Batch data processing method, server, system and storage medium
CN110191182A (en) * 2019-05-30 2019-08-30 深圳前海微众银行股份有限公司 Distributed document batch processing method, device, equipment and readable storage medium storing program for executing

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102307219B (en) * 2011-03-18 2015-04-08 北京思特奇信息技术股份有限公司 File access system, file uploading method and file downloading method
US10152489B2 (en) * 2015-07-24 2018-12-11 Salesforce.Com, Inc. Synchronize collaboration entity files
CN108830715B (en) * 2018-05-30 2023-04-25 平安科技(深圳)有限公司 Batch file part disc returning processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379149A (en) * 2012-04-19 2013-10-30 北京智慧风云科技有限公司 Cloud service system providing function of processing files according to received commands
CN104580437A (en) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 Cloud storage client and high-efficiency data access method thereof
CN105956481A (en) * 2015-09-17 2016-09-21 ***股份有限公司 Data processing method and device
US20180088984A1 (en) * 2016-09-23 2018-03-29 EMC IP Holding Company LLC Methods and devices of batch process of content management
CN107967347A (en) * 2017-12-07 2018-04-27 湖北三新文化传媒有限公司 Batch data processing method, server, system and storage medium
CN110191182A (en) * 2019-05-30 2019-08-30 深圳前海微众银行股份有限公司 Distributed document batch processing method, device, equipment and readable storage medium storing program for executing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597228A (en) * 2020-12-26 2021-04-02 中国农业银行股份有限公司 File processing method and system
CN112597228B (en) * 2020-12-26 2024-06-07 中国农业银行股份有限公司 File processing method and system

Also Published As

Publication number Publication date
CN110191182B (en) 2023-04-21
CN110191182A (en) 2019-08-30

Similar Documents

Publication Publication Date Title
WO2020238860A1 (en) Distributed file batch processing method and apparatus, and readable storage medium
US10908977B1 (en) Efficient message queuing service
US11615082B1 (en) Using a data store and message queue to ingest data for a data intake and query system
US8707194B1 (en) System and method for decentralized performance monitoring of host systems
CN113254466B (en) Data processing method and device, electronic equipment and storage medium
WO2020181810A1 (en) Data processing method and apparatus applied to multi-level caching in cluster
US11966797B2 (en) Indexing data at a data intake and query system based on a node capacity threshold
CN105224445A (en) Distributed tracking system
CN107710215A (en) The method and apparatus of mobile computing device safety in test facilities
US11822433B2 (en) Qualification parameters for captain selection in a search head cluster
CN111143382B (en) Data processing method, system and computer readable storage medium
US12019634B1 (en) Reassigning a processing node from downloading to searching a data group
US10031948B1 (en) Idempotence service
US11892976B2 (en) Enhanced search performance using data model summaries stored in a remote data store
CN110535901A (en) Service degradation method, apparatus, computer equipment and storage medium
US20180121531A1 (en) Data Updating Method, Device, and Related System
CN109614271A (en) Control method, device, equipment and the storage medium of multiple company-data consistency
CN106557530B (en) Operation system, data recovery method and device
CN107291524B (en) Remote command processing method and device
US10496467B1 (en) Monitoring software computations of arbitrary length and duration
CN115905151A (en) Method, system and device for querying circulation information based on backup log
CN116208487A (en) Method, device, equipment and medium for upgrading consensus algorithm in block chain system
US10728323B2 (en) Method and apparatus for operating infrastructure layer in cloud computing architecture
CN116360931A (en) Link tracking method, device, system and storage medium
US11841827B2 (en) Facilitating generation of data model summaries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20813287

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20813287

Country of ref document: EP

Kind code of ref document: A1