CN110134516B

CN110134516B - Financial data processing method, apparatus, device and computer readable storage medium

Info

Publication number: CN110134516B
Application number: CN201910411302.0A
Authority: CN
Inventors: 赵雄
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2024-07-02
Anticipated expiration: 2039-05-16
Also published as: CN110134516A

Abstract

The invention discloses a financial data processing method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: if a data processing request aiming at a target account is received, acquiring data to be processed of the target account; determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed; and respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the plurality of computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after the computation. The invention improves the accounting speed of the financial institutions when processing massive financial data, and meets the high-timeliness requirement of the business end on the financial data processing.

Description

Financial data processing method, apparatus, device and computer readable storage medium

Technical Field

The present invention relates to the technical field of financial science and technology (Fintech), and in particular, to a method, an apparatus, a device and a computer readable storage medium for processing financial data.

Background

With the development of computer technology, more and more computer technologies (such as artificial intelligence, blockchain, cloud computing) are applied in the financial field, and the traditional financial industry is gradually changing to financial science and technology (Fintech); in financial science and technology, there are a large number of scenarios of financial data processing, for example, in order to achieve the purpose of accounting for business funds costs or benefits, a financial institution needs to perform internal funds transfer pricing (FTP, funds TRANSFER PRICING), for an asset business, the FTP price represents its funds costs, and FTP interest needs to be paid; for liability service, FTP represents its fund income from which FTP interest income can be obtained; at present, under the condition that the financial data volume is larger and larger, when a financial institution processes massive financial data, for example, when internal funds transfer pricing is carried out, the accounting speed is low, and the problem that the high timeliness requirement of a business end on financial data processing cannot be met exists.

Disclosure of Invention

The invention mainly aims to provide a financial data processing method, a financial data processing device, financial data processing equipment and a financial data processing computer readable storage medium, and aims to solve the problems that a financial institution has low accounting speed when processing massive financial data and cannot meet the high timeliness requirement of a business end on financial data processing under the condition that the financial data volume is larger and larger.

To achieve the above object, the present invention provides a financial data processing method including the steps of:

if a data processing request aiming at a target account is received, acquiring data to be processed of the target account;

Determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed;

And respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the plurality of computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after the computation.

Optionally, if a data processing request for a target account is received, the step of obtaining the data to be processed of the target account includes:

if a data processing request aiming at a target account is received, acquiring a data warehouse tool Hive table corresponding to the target account;

Based on a preset access mode, accessing a Hadoop Distributed File System (HDFS) file corresponding to the Hive table to obtain the data to be processed of the target account.

Optionally, based on a preset access manner, accessing an HDFS file corresponding to the Hive table, and the step of obtaining the data to be processed of the target account includes:

Determining the number of access processes corresponding to the HDFS files according to the HDFS files corresponding to the Hive table;

and accessing the HDFS file based on the determined access processes to acquire the data to be processed of the target account.

Optionally, the step of determining the number of computing nodes in the distributed computing framework corresponding to the data processing request based on the data to be processed includes:

acquiring a target data volume of the data to be processed;

Setting a calculation data quantity threshold value of each calculation node in a distributed calculation frame corresponding to the data processing request according to the type of the data processing request;

And determining the number of the computing nodes corresponding to the data to be processed according to the target data quantity and the computing data quantity threshold.

Optionally, the step of distributing the plurality of task instances generated according to the data to be processed to a plurality of computing nodes respectively, starting parallel computation of the plurality of computing nodes, and obtaining the processing result corresponding to the data processing request after computation includes:

Generating a plurality of task instances according to the data to be processed, and respectively distributing the task instances to a plurality of computing nodes;

based on the computing node, retrieving rule configuration parameters corresponding to the data processing request from an in-memory database; the rule configuration parameters are imported to the memory database from a disk database;

and starting a plurality of computing nodes to perform parallel computation based on the task instance of the computing node and the rule configuration parameter, and obtaining a processing result corresponding to the data processing request after computation.

Optionally, the step of distributing the plurality of task instances generated according to the data to be processed to a plurality of computing nodes respectively, starting parallel computation of the plurality of computing nodes, and obtaining the processing result corresponding to the data processing request after the computation further includes:

And storing the processing result into the HDFS file, and updating the Hive table so that the updated Hive table corresponds to the HDFS file comprising the processing result.

Optionally, the step of storing the processing result in the HDFS file and updating the Hive table so that the updated Hive table corresponds to the HDFS file including the processing result further includes:

And generating a data processing analysis report corresponding to the target account based on a preset report tool and the updated Hive table.

In addition, to achieve the above object, the present invention also proposes a financial data processing apparatus including:

The acquisition module is used for acquiring the data to be processed of the target account if a data processing request aiming at the target account is received;

A determining module, configured to determine, based on the data to be processed, the number of computing nodes in a distributed computing framework corresponding to the data processing request;

and the processing module is used for respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the plurality of computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after the computation.

Optionally, the acquiring module includes:

the first acquisition unit is used for acquiring a data warehouse tool Hive table corresponding to a target account if a data processing request aiming at the target account is received;

The second obtaining unit is used for accessing the Hadoop Distributed File System (HDFS) file corresponding to the Hive table based on a preset access mode to obtain the data to be processed of the target account.

Optionally, the second acquisition unit includes:

A determining subunit, configured to determine, according to an HDFS file corresponding to the Hive table, a number of access processes corresponding to the HDFS file;

And the access subunit is used for accessing the HDFS file based on the determined access processes to acquire the data to be processed of the target account.

Optionally, the determining module includes:

a third acquisition unit configured to acquire a target data amount of the data to be processed;

A setting unit, configured to set a calculation data amount threshold of each calculation node in a distributed calculation frame corresponding to the data processing request according to the type of the data processing request;

and the determining unit is used for determining the number of the computing nodes corresponding to the data to be processed according to the target data quantity and the computing data quantity threshold value.

Optionally, the processing module includes:

The distribution unit is used for generating a plurality of task examples according to the data to be processed and respectively distributing the task examples to a plurality of computing nodes;

The calling unit is used for calling rule configuration parameters corresponding to the data processing request from the memory database based on the computing node; the rule configuration parameters are imported to the memory database from a disk database;

And the processing unit is used for starting a plurality of computing nodes to perform parallel computation based on the task instance and the rule configuration parameters of the computing nodes, and obtaining a processing result corresponding to the data processing request after the computation.

Optionally, the apparatus further comprises:

and the updating module is used for storing the processing result into the HDFS file and updating the Hive table so that the updated Hive table corresponds to the HDFS file comprising the processing result.

Optionally, the apparatus further comprises:

And the report module is used for generating a data processing analysis report corresponding to the target account based on a preset report tool and the updated Hive table.

In addition, to achieve the above object, the present invention also proposes a financial data processing apparatus, the apparatus comprising: a memory, a processor and a financial data processing program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the financial data processing method as claimed in any one of the preceding claims.

In addition, in order to achieve the above object, the present invention also proposes a computer-readable storage medium having stored thereon a financial data processing program which, when executed by a processor, implements the steps of the financial data processing method as set forth in any one of the above.

If a data processing request aiming at a target account is received, acquiring data to be processed of the target account; determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed; respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after computation; therefore, for massive data to be processed, only corresponding computing nodes are added in the distributed computing framework, and a plurality of computing nodes are used for parallel computing, so that the accounting speed of a financial institution when processing massive financial data is greatly improved, and the high timeliness requirement of a business end for financial data processing is met.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of a financial data processing method according to the present invention;

FIG. 3 is a flowchart of a financial data processing method according to a second embodiment of the present invention;

FIG. 4 is a flowchart of a third embodiment of a financial data processing method according to the present invention;

FIG. 5 is a flowchart of a financial data processing method according to a fourth embodiment of the present invention;

fig. 6 is a flowchart of a financial data processing method according to a fifth embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.

It should be noted that fig. 1 is a schematic structural diagram of a hardware operating environment of the financial data processing apparatus. The financial data processing device of the embodiment of the invention can be a terminal device such as a PC, a portable computer and the like.

As shown in fig. 1, the financial data processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the configuration of the financial data processing apparatus shown in fig. 1 is not limiting and may include more or fewer components than shown, or certain components in combination, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a financial data processing program may be included in the memory 1005, which is a computer-readable storage medium. The operating system is a program for managing and controlling hardware and software resources of the financial data processing equipment, and supports the running of the financial data processing program and other software or programs.

In the financial data processing apparatus shown in fig. 1, the user interface 1003 is mainly used for data communication with each terminal; the network interface 1004 is mainly used for connecting a background server and carrying out data communication with the background server; and the processor 1001 may be configured to call a financial data processing program stored in the memory 1005 and perform the following operations:

Further, the processor 1001 may be further configured to invoke a financial data processing program stored in the memory 1005, and perform the following steps:

acquiring a target data volume of the data to be processed;

Based on the above-described structure, various embodiments of the financial data processing method of the present invention are presented.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a financial data processing method according to the present invention.

Embodiments of the present invention provide embodiments of a financial data processing method, it being noted that although a logic sequence is shown in the flow diagrams, in some cases the steps shown or described may be performed in a different order than that shown or described herein.

The financial data processing method of the embodiment comprises the following steps:

step S100, if a data processing request aiming at a target account is received, acquiring data to be processed of the target account;

In financial science and technology, there are a large number of scenarios of financial data processing; at present, under the trend that the financial data volume is more and more huge, when a financial institution processes massive financial data, the situation that the accounting speed is low and the high timeliness requirement of a business end on the financial data processing cannot be met exists generally.

For example, financial institutions need to conduct internal funds transfer pricing (FTP, funds TRANSFER PRICING) for accounting for business funds costs or revenue, and for asset businesses, FTP prices represent their funds costs, requiring payment of FTP interest; for liability service, FTP represents its fund income from which FTP interest income can be obtained; in the prior art, the fund transfer pricing method uses a traditional oracle database, account detail data and rule configuration information are stored in the oracle database, a c++ written pricing engine is adopted when the FTP price is calculated, TPS (Transaction Per Second, transaction number per second) can only reach 2000, the existing pricing method has performance bottleneck when processing mass data, and the requirement of a business end T+1 sunrise report cannot be met.

In this embodiment, if a data processing request for a target account is received, data to be processed of the target account is obtained; specifically, if a data processing request for a target account sent by an upstream service system is received, as an implementation manner, the data processing request may be an account detail data of the target account prepared by the upstream service system, that is, a signal that the data to be processed is prepared to be completed after the data to be processed is sent; further, the data to be processed of the target account in this embodiment is stored in an HBase database, where HBase is a distributed column-oriented database built on a Hadoop distributed file system, and provides random real-time read-write access to the data, and the financial institution in this embodiment obtains account details data of the target account from HBase after receiving the signal.

Step S200, determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed;

Determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed; specifically, the financial institution performs data preprocessing on the obtained account detail data to generate an input required by a computing engine corresponding to the data processing request, for example, when the data processing request is FTP pricing, the input data required by the FTP pricing computing engine is an account number, a subject, a product, a mechanism, a currency, a balance, a date of interest, a due date, an original period, a frequency of re-pricing, a last re-pricing date, a next re-pricing date, and the like of the target account; in the distributed computing framework, the maximum threshold value of each computing node for processing data is set according to the complexity degree of the computing logic (computing engine) corresponding to the data processing request, and it can be understood that if the computing logic is complex, for example, FTP pricing logic is complex, a series of logic judgment, rule matching, rule calculation and the like are performed for each target account during computing, if the data amount of the data to be processed distributed to each computing node is large, the computing process of the computing node can not be completed for a long time, and performance problems are caused; as an implementation manner, in this embodiment, the financial institution reasonably sets the calculation data amount of each calculation node according to the complexity of the current calculation logic, and the calculation processes of each calculation node under the distributed calculation frame are independent from each other, so that the number of calculation nodes is correspondingly increased under the condition that the data amount of the data to be processed is large, so as to ensure the calculation speed of each calculation node.

And step S300, respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the plurality of computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after the computation.

The method comprises the steps that a computing node calls a preset computing logic algorithm corresponding to a data processing request, the data to be processed distributed to the current computing node is computed, under a distributed computing framework, a plurality of computing nodes are started simultaneously, the computing is performed in parallel, and a processing result corresponding to the data processing request is obtained after the computing.

In this embodiment, if a data processing request for a target account is received, to-be-processed data of the target account is obtained; determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed; respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after computation; therefore, for massive data to be processed, only corresponding computing nodes are added in the distributed computing framework, and a plurality of computing nodes are used for parallel computing, so that the accounting speed of a financial institution when processing massive financial data is greatly improved, and the high timeliness requirement of a business end for financial data processing is met.

Further, a second embodiment of the financial data processing method of the present invention is presented.

Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a financial data processing method according to the present invention, based on the first embodiment of the financial data processing method, in this embodiment, step S100, if a data processing request for a target account is received, includes the steps of:

step S110, if a data processing request aiming at a target account is received, acquiring a data warehouse tool Hive table corresponding to the target account;

In this embodiment, the data to be processed of the target account is stored in the corresponding distributed file system in the form of Hive table, which is a data warehouse tool based on Hadoop, and the structured data file can be mapped into a database table, and a simple sql query function is provided, so that sql statement can be converted into MapReduce task to run; in this embodiment, the upstream service system prepares account details data of the target account, that is, after the data to be processed, stores the account details data in a corresponding HDFS file in the form of a Hive table, and sends a data processing request for the target account, and after receiving the data processing request for the target account, the financial institution obtains the Hive table corresponding to the account details data from the service system side.

Step S120, based on a preset access mode, accessing a Hadoop Distributed File System (HDFS) file corresponding to the Hive table to obtain data to be processed of the target account;

Specifically, in this example, as an implementation manner, step S120, based on a preset access manner, accesses an HDFS file corresponding to the Hive table, and the step of obtaining the data to be processed of the target account includes:

Step a, determining the number of access processes corresponding to the HDFS files according to the HDFS files corresponding to the Hive table;

and b, accessing the HDFS file based on the determined access processes to acquire the data to be processed of the target account.

Specifically, in this embodiment, a plurality of access processes are set, and a maximum threshold value of each access process for processing a data file is controlled, when a data amount of an HDFS file corresponding to a target account is large, a concurrent access process is correspondingly increased to increase a data processing speed, each access process obtains a portion of data to be processed from the HDFS file based on its own data processing threshold value, and performs further preprocessing on the data to be processed, for example, when a data processing request is for FTP pricing, the data to be processed obtained from the HDFS file by a financial institution is data such as an account number, a subject, a product, a mechanism, a currency, a balance, a date of interest, an expiration date, an original term, a frequency of re-pricing, a last re-pricing date, a next re-pricing date of the target account, and then these data are spliced into: the account information, the last day pricing result and the repayment plan are used as the input of the next FTP pricing calculation engine, the calculation engine can be written based on the udf function of Hive, and the access process can be used for setting the maximum threshold value so as to avoid performance problems caused by excessive data processed by a single access process.

In this embodiment, if a data processing request for a target account is received, a data warehouse tool Hive table corresponding to the target account is obtained, according to an HDFS file corresponding to the Hive table, the number of access processes corresponding to the HDFS file is determined, based on a determined number of access processes, the HDFS file is accessed to obtain data to be processed of the target account, based on the data to be processed, the number of computing nodes is determined in a distributed computing framework corresponding to the data processing request, a plurality of task instances generated according to the data to be processed are respectively distributed to a plurality of computing nodes, and a plurality of computing nodes are started to perform parallel computation, so that a processing result corresponding to the data processing request is obtained after computation; therefore, when the data volume of the data to be processed becomes larger, the access process can be increased in a transverse expansion mode, meanwhile, the concurrent processing performance of the system is enhanced by adding computing nodes, namely computing resources (cpu+memory), and in an FTP pricing application scene, the embodiment can support daily FTP price calculation of accounts with the order of hundred million or more, and a T daily report can be generated for relevant personnel to check and analyze before T+1 day is shifted.

Further, a third embodiment of the financial data processing method of the present invention is presented.

Referring to fig. 4, fig. 4 is a flowchart illustrating a third embodiment of a financial data processing method according to the present invention, based on the first embodiment of the financial data processing method, in this embodiment, step S200, based on the data to be processed, determines the number of computing nodes in a distributed computing framework corresponding to the data processing request, including:

step S210, obtaining a target data volume of the data to be processed;

Step S220, setting a calculation data quantity threshold value of each calculation node in a distributed calculation frame corresponding to the data processing request according to the type of the data processing request;

step S230, determining the number of the computing nodes corresponding to the data to be processed according to the target data amount and the computing data amount threshold.

In this embodiment, the step of determining the number of computing nodes in the distributed computing framework corresponding to the data processing request is specifically determined according to an actual data amount of the data to be processed and a computing data amount threshold of each computing node, and specifically, a target data amount of the data to be processed is obtained, where the target data amount is a total data amount of the data to be processed, which is actually required to be input into the computing nodes after the data preprocessing; according to the type of the data processing request, setting a calculation data quantity threshold value of each calculation node in a distributed calculation frame corresponding to the data processing request, specifically, the types of the data processing requests are different, corresponding calculation logics are different, calculation complexity and calculation speed are also different, and according to the specific type of the data processing request, namely, the specific complexity, the calculation quantity maximum value of each calculation node, namely, the calculation quantity threshold value is determined, so that the problem of slow calculation speed caused by overlarge data quantity to be processed distributed by the calculation node is avoided; according to the target data amount and the calculated data amount threshold, determining the number of the calculation nodes corresponding to the data to be processed, for example, setting the size of a data file which can be processed by each calculation node to be 3M, if the data size is 100M after the data to be processed is obtained, simultaneously starting 34 calculation nodes to perform parallel calculation, wherein 33 calculation nodes process 3M files, the remaining calculation node processes 1M files, the data calculated by each calculation node are mutually independent, and when the data file to be processed is enlarged, the concurrent calculation nodes can be increased only by increasing related calculation resources (cpu+memory), so that the accounting speed of a financial institution when processing massive financial data is greatly improved, and the high timeliness requirement of a service end on financial data processing is met.

Further, a fourth embodiment of the financial data processing method of the present invention is proposed.

Referring to fig. 5, fig. 5 is a flowchart of a fourth embodiment of a financial data processing method according to the present invention, based on the first embodiment of the financial data processing method, in this embodiment, step S300, a plurality of task instances generated according to the data to be processed are respectively allocated to a plurality of computing nodes, and a plurality of computing nodes are started to perform parallel computing, and a step of obtaining a processing result corresponding to the data processing request after computing includes:

step S310, generating a plurality of task instances according to the data to be processed, and respectively distributing the task instances to a plurality of computing nodes;

step S320, based on the computing node, retrieving rule configuration parameters corresponding to the data processing request from a memory database; the rule configuration parameters are imported to the memory database from a disk database;

And step S330, based on the task instance and the rule configuration parameters of the computing nodes, starting a plurality of computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after computation.

In the prior art, the rule configuration parameters are stored in the disk database, and each calculation needs to be called from the disk database once, so that the calculation speed is reduced.

In this embodiment, the rule configuration parameters corresponding to the data processing request are stored in the memory database, and the computing node in this embodiment retrieves the rule configuration parameters from the disk database and stores the rule configuration parameters in the memory database, so that frequent queries to the disk database are reduced in the computing process, and the computing speed of the computing node is further improved.

Further, a fifth embodiment of the financial data processing method of the present invention is proposed.

Referring to fig. 6, fig. 6 is a flowchart of a fifth embodiment of a financial data processing method according to the second embodiment of the present invention, in this embodiment, step S300 includes respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting parallel computation by the computing nodes, and after the step of obtaining a processing result corresponding to the data processing request after the computation, further includes:

Step S400, storing the processing result into the HDFS file, and updating the Hive table so that the updated Hive table corresponds to the HDFS file comprising the processing result.

And each computing node obtains a computing result after corresponding logic judgment, rule matching and rule computing according to the corresponding data to be processed and the computing logic, and outputs the result to Hive.

Further, step S400, storing the processing result in the HDFS file, and updating the Hive table, so that the step of updating the Hive table to correspond to the HDFS file including the processing result further includes:

And S500, generating a data processing analysis report corresponding to the target account based on a preset report tool and the updated Hive table.

For different data processing requests, selecting a corresponding report tool, for example, when the data processing requests are for FTP pricing, the report tool selects a report tool such as MSTR or Bi@report, and the like, gathers the calculated result detail data according to latitude of products, subjects and the like to generate an FTP analysis report, and views the FTP analysis report through the MSTR.

In this embodiment, if a data processing request for a target account is received, a Hive table of a data warehouse tool corresponding to the target account is obtained; based on a preset access mode, accessing a Hadoop Distributed File System (HDFS) file corresponding to the Hive table to obtain data to be processed of the target account; determining the number of computing nodes in a distributed computing framework corresponding to the data processing request based on the data to be processed; respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after computation; storing the processing result into the HDFS file, and updating the Hive table to enable the updated Hive table to correspond to the HDFS file comprising the processing result; generating a data processing analysis report corresponding to the target account based on a preset report tool and the updated Hive table; therefore, the accounting speed of the financial institution when processing massive financial data is improved, and the high-timeliness requirement of the business end on the financial data processing is met.

In addition, an embodiment of the present invention further provides a financial data processing apparatus, where the financial data processing apparatus includes:

Preferably, the acquiring module includes:

Preferably, the second acquisition unit includes:

Preferably, the determining module includes:

Preferably, the processing module includes:

Preferably, the apparatus further comprises:

The steps of the financial data processing method described above are implemented when each module of the financial data processing apparatus provided in this embodiment is running, and are not described herein again.

In addition, the embodiment of the invention also provides a computer readable storage medium, wherein the storage medium stores a financial data processing program, and the financial data processing program realizes the steps of the financial data processing method when being executed by a processor.

The method implemented when the financial data processing program running on the processor is executed may refer to various embodiments of the financial data processing method of the present invention, which are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A financial data processing method, characterized in that when processing massive financial data, the financial data processing method comprises the following steps:

if a data processing request aiming at a target account is received, acquiring data to be processed of the target account; the data processing request includes FTP pricing;

Respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after computation;

wherein the step of determining the number of computing nodes in the distributed computing framework corresponding to the data processing request based on the data to be processed includes:

Acquiring a target data volume of the data to be processed; the target data size is the total data size of the data to be processed, which is actually required to be input into a computing node after data preprocessing;

the types of the data processing requests are different, the corresponding calculation logics are different, the calculation complexity and the calculation speed are different; according to the type of the data processing request, namely the specific complexity, setting a calculation data quantity threshold value of each calculation node in a distributed calculation frame corresponding to the data processing request, and avoiding the problem of slow calculation speed caused by overlarge data quantity to be processed distributed by the calculation nodes;

Determining the number of the computing nodes corresponding to the data to be processed according to the target data volume and the computing data volume threshold;

and when the data file to be processed becomes larger, adding concurrent computing nodes by adding related computing resources.

2. The financial data processing method of claim 1, wherein the step of obtaining the data to be processed for the target account if a data processing request for the target account is received comprises:

3. The financial data processing method according to claim 2, wherein the step of accessing the HDFS file corresponding to the Hive table based on a preset access manner to obtain the data to be processed of the target account includes:

4. A financial data processing method according to any one of claims 1 to 3, wherein the step of distributing the plurality of task instances generated from the data to be processed to the plurality of computing nodes, respectively, and starting parallel computation by the plurality of computing nodes, and obtaining the processing result corresponding to the data processing request after computation includes:

5. The financial data processing method according to claim 2, wherein the step of distributing the plurality of task instances generated according to the data to be processed to the plurality of computing nodes, respectively, and starting the plurality of computing nodes to perform parallel computation, and obtaining the processing result corresponding to the data processing request after the computation further comprises:

6. The method of claim 5, wherein the steps of saving the processing result in the HDFS file and updating the Hive table such that the updated Hive table corresponds to the HDFS file including the processing result further comprise:

7. A financial data processing apparatus, wherein in processing massive amounts of financial data, the financial data processing apparatus comprises:

The acquisition module is used for acquiring the data to be processed of the target account if a data processing request aiming at the target account is received; the data processing request includes FTP pricing;

The processing module is used for respectively distributing a plurality of task instances generated according to the data to be processed to a plurality of computing nodes, starting the computing nodes to perform parallel computation, and obtaining a processing result corresponding to the data processing request after the computation;

Wherein the determining module comprises:

A third acquisition unit configured to acquire a target data amount of the data to be processed; the target data size is the total data size of the data to be processed, which is actually required to be input into a computing node after data preprocessing;

The setting unit is used for different types of the data processing requests, different corresponding calculation logics, different calculation complexity and different calculation speeds; according to the type of the data processing request, namely the specific complexity, setting a calculation data quantity threshold value of each calculation node in a distributed calculation frame corresponding to the data processing request, and avoiding the problem of slow calculation speed caused by overlarge data quantity to be processed distributed by the calculation nodes;

A determining unit, configured to determine, according to the target data amount and the calculated data amount threshold, the number of calculation nodes corresponding to the data to be processed;

8. The financial data processing apparatus of claim 7 wherein the acquisition module comprises:

9. The financial data processing apparatus of claim 8, wherein the second acquisition unit comprises:

10. The financial data processing apparatus of any one of claims 7 to 9, wherein the processing module comprises:

11. The financial data processing apparatus of claim 8, wherein the apparatus further comprises:

12. The financial data processing apparatus of claim 11, wherein the apparatus further comprises:

13. A financial data processing apparatus, the apparatus comprising: a memory, a processor and a financial data processing program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the financial data processing method as claimed in any one of claims 1 to 6.

14. A computer-readable storage medium, on which a financial data processing program is stored, which, when executed by a processor, implements the steps of the financial data processing method as claimed in any one of claims 1 to 6.