CN117424890A

CN117424890A - Data processing method, device, equipment and medium

Info

Publication number: CN117424890A
Application number: CN202311358413.2A
Authority: CN
Inventors: 雷志勇
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2023-10-19
Filing date: 2023-10-19
Publication date: 2024-01-19

Abstract

The embodiment of the application provides a data processing method, a device, equipment and a medium, which are applied to the field of data processing and are used for obtaining an operation command result by configuring a task execution operation command of a data processing engine; when the target address is mapped to the database, a target table of the database is obtained, and an operation command result is written into the target table to obtain a table file; when the target address is mapped to the file transmission server, obtaining a segmentation dimension and a file type, segmenting an operation command result into subfiles according to the segmentation dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server; the method realizes the automatic operation of the input operation command to acquire the financial business data, and converts the financial business data into the data result in the target format, thereby reducing the operation and flow and improving the processing efficiency.

Description

Data processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a data processing method, apparatus, device, and medium.

Background

When the financial business data is required to be searched, the linux host needs to be manually logged in to inquire and download the table data in a command line mode, SQL sentences provided by the business are modified to download the financial business data to a local directory in batches, the financial business data are converted into files in a specific format according to different requirements to be packaged, and then a plurality of files are sent to a user in a mail mode, so that the operation and the flow are complicated, and the time consumption is long.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the related art to a certain extent.

Therefore, an object of the embodiments of the present application is to provide a data processing method, apparatus, device, and medium, which can reduce operations and procedures and improve processing efficiency.

An embodiment of a first aspect of the present application provides a data processing method, including:

acquiring an operation command and a target address;

the task of the configuration data processing engine executes the operation command to obtain an operation command result;

when the target address is mapped to a database, a target table of the database is obtained, and the operation command result is written into the target table to obtain a table file;

when the target address is mapped to a file transmission server, obtaining a segmentation dimension and a file type, segmenting the operation command result into subfiles according to the segmentation dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server.

According to certain embodiments of the first aspect of the present application, the task of the configuration data processing engine executing the operation command to obtain an operation command result includes:

Dividing the operation command into a plurality of keywords, and analyzing the keywords into nodes of a grammar tree according to a preset semantic rule to form the grammar tree;

performing data type binding and function binding on nodes of the grammar tree to express keywords in the grammar tree through metadata information;

performing equivalent conversion on the nodes of the grammar tree according to a preset optimization strategy to obtain an optimized grammar tree;

generating a plurality of physical plans which can be executed by the configuration data processing engine according to the optimization grammar tree, acquiring the cost of the physical plans, and selecting a target physical plan with the minimum cost from the plurality of physical plans;

and executing the operation command in the form of a distributed data set according to the target physical plan to obtain an operation command result.

According to certain embodiments of the first aspect of the present application, the writing the operation command result into the target table includes:

when the target table does not exist in the database;

establishing a new target table;

and writing the operation command result into a new target table.

According to certain embodiments of the first aspect of the present application, the creating a new target table includes:

Extracting a field and a value type of the operation command result;

acquiring a preset table name;

establishing a new target table, enabling the field of the new target table to correspond to the field of the operation command result, enabling the value type of the new target table to correspond to the value type of the operation command result, and enabling the name of the new target table to be the preset table name.

According to certain embodiments of the first aspect of the present application, when the segmentation dimension is empty; the step of dividing the operation command result into subfiles according to the dividing dimension comprises the following steps:

acquiring a preset file name;

establishing a new file, and taking the preset file name as the name of the new file;

writing the operation command result into the new file to obtain a subfile.

According to certain embodiments of the first aspect of the present application, after generating the target file of the corresponding file type from the subfiles, the data processing method further includes:

dividing the target file into at least one data block;

storing the data blocks in a scattered manner on nodes of a cluster of the distributed file system;

and creating a unique identifier for the data block, and recording the mapping relation between the unique identifier and the data block.

According to certain embodiments of the first aspect of the present application, the uploading the target file to the file transfer server includes:

constructing an encrypted transmission channel connected to the file transmission server;

acquiring transmission parameters;

encrypting the target file to obtain an encrypted file;

and calling a transmission tool based on a secure file transmission protocol to upload the target file to the file transmission server through the encrypted transmission channel according to the transmission parameters.

An embodiment of the second aspect of the present application, a data processing apparatus, includes:

the input unit is used for acquiring an operation command and a target address;

the command execution unit is used for configuring the task of the data processing engine to execute the operation command to obtain an operation command result;

the first transmission unit is used for acquiring a target table of the database when the target address is mapped to the database, and writing the operation command result into the target table to obtain a table file;

and the second transmission unit is used for acquiring a segmentation dimension and a file type when the target address is mapped to the file transmission server, segmenting the operation command result into subfiles according to the segmentation dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server.

An embodiment of the third aspect of the present application, an electronic device, comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the program, when executed by the processor, implementing a data processing method as described above.

An embodiment of the fourth aspect of the present application, a computer-readable storage medium stores computer-executable instructions for causing a computer to perform the data processing method as described above.

The data processing method, the device, the equipment and the medium disclosed by the embodiment of the application acquire an operation command result by configuring the task execution operation command of the data processing engine; when the target address is mapped to the database, a target table of the database is obtained, and an operation command result is written into the target table to obtain a table file; when the target address is mapped to the file transmission server, obtaining a segmentation dimension and a file type, segmenting an operation command result into subfiles according to the segmentation dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server; the method realizes the automatic operation of the input operation command to acquire the financial business data, and converts the financial business data into the data result in the target format, thereby reducing the operation and flow and improving the processing efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present application or the related technical solutions in the prior art, it should be understood that, in the following description, the drawings are only for convenience and clarity to describe some embodiments in the technical solutions of the present application, and other drawings may be obtained according to these drawings without any inventive effort for those skilled in the art.

FIG. 1 is a step diagram of a data processing method provided by an embodiment of the present application;

fig. 2 is a sub-step diagram of step S200;

FIG. 3 is a sub-step diagram of the step of creating a new target table;

FIG. 4 is a sub-step diagram of a step of splitting an operation command result into subfiles according to a splitting dimension;

FIG. 5 is a sub-step diagram of a step of encapsulating a target file in a distributed file system;

FIG. 6 is a sub-step diagram of uploading a target file to a file transfer server;

FIG. 7 is a block diagram of a data processing apparatus provided by an embodiment of the present application;

fig. 8 is a block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the description of the present specification, reference to the terms "one embodiment," "another embodiment," or "certain embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

The embodiment of the application provides a data processing method which is applied to a data batch downloading tool.

Specifically, the data bulk download tool is applied to hive table data.

Referring to FIG. 1, a data processing method includes, but is not limited to, the steps of:

step S100, obtaining an operation command and a target address;

step S200, configuring a task execution operation command of a data processing engine to obtain an operation command result;

step S300, when the target address is mapped to the database, a target table of the database is obtained, and the result of the operation command is written into the target table to obtain a table file;

step S400, when the target address is mapped to the file transmission server, obtaining the dividing dimension and the file type, dividing the operation command result into subfiles according to the dividing dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server.

For step S100, the user inputs an operation command and a target address through the input device, and the data batch download tool acquires the operation command and the target address.

The input device may be a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input board, a joystick, a microphone, etc. An input device is a device that inputs data and information to a computer. Is the bridge for computers to communicate with users or other devices. Input devices are one of the primary means of information exchange between a user and a computer system. An input device is a means for human or external interaction with a computer for inputting raw data and programs for processing these numbers into the computer. The computer can receive various data, either numeric data or various non-numeric data, such as text, images, sounds, etc.

The operation instruction is an SQL statement. SQL (Structured Query Language) is a database language with multiple functions of data manipulation and data definition. For example, the SQL statement for financial business data may be an SQL statement for querying product information of the financial product a, or an SQL statement for querying sales information of the financial product B.

The target address is used for mapping to a storage destination of an operation command result obtained by executing the operation command; specifically, the destination address is a link address of the storage destination of the operation command result.

If the user wants to store the operation command result in the database, the link address of the database can be input as a target address; if the user wants to store the operation command result in the file transfer server, the link address of the file transfer server can be input as the target address.

Referring to FIG. 2, for step S200, task execution operation commands of the configuration data processing engine result in operation command results, including, but not limited to, the steps of:

step S210, dividing the operation command into a plurality of keywords, and analyzing the keywords into nodes of a grammar tree according to a preset semantic rule to form the grammar tree;

step S220, data type binding and function binding are carried out on the nodes of the grammar tree so as to express keywords in the grammar tree through metadata information;

step S230, performing equivalent conversion on nodes of the grammar tree according to a preset optimization strategy to obtain an optimized grammar tree;

step S240, generating a plurality of physical plans which can be executed by the configuration data processing engine according to the optimized grammar tree, acquiring the cost of the physical plans, and selecting a target physical plan with the minimum cost from the plurality of physical plans;

Step S250, executing the operation command in the form of a distributed data set according to the target physical plan to obtain an operation command result.

The data batch downloading tool calls Spark tasks in the background to execute SQL sentences to obtain SQL results as operation command results.

Spark is a data processing engine designed specifically for large-scale data processing, and is a computational model based on RDD (resilient distributed data set). Spark can distributively process a large amount of collection data, split the large amount of collection data, calculate respectively, and then combine the calculated results.

Spark can be based on memory data processing, can also be based on hard disk data processing, have the advantage of fast; the system supports multiple programming languages, supports the interactive shell to facilitate development and test, and has usability; the method can realize a stack type solution, such as batch processing, interactive query, real-time stream processing, graph calculation and machine learning, and can realize various running modes, thereby having universality.

For SQL sentences, one SQL sentence generates a program recognizable by an Execution engine, and three processes of parsing (Parser), optimizing (Optimizer) and executing (Execution) are needed.

For step S210, the operation command is divided into a plurality of keywords, and the plurality of keywords are parsed into nodes of the syntax tree according to a preset semantic rule to form the syntax tree. Specifically, the SQL character string is segmented into token keywords, and then the keywords are parsed into nodes of a grammar tree according to a preset semantic rule to form the grammar tree, which can be realized by using a third party class library ANTLR. In this process, it is determined whether the SQL statement meets the specification, such as select from where, whether the keywords are written in pairs. Of course, the table name, table field are not checked at this stage.

For step S220, data type binding and function binding are performed on the nodes of the syntax tree to express the keywords in the syntax tree through metadata information. The skeleton is basically provided by the parsed logic plan, and basic metadata information is needed to express the morphemes, wherein the most important metadata information mainly comprises two parts: the table schema mainly comprises basic definitions (column names and data types) of the table, data formats (Json and Text) of the table, physical positions of the table and the like, and the basic functions mainly refer to class information. The whole grammar tree is traversed again, data type binding and function binding are carried out on each node on the tree, for example, a peoples morpheme is analyzed into a table containing three columns of an age, an id and a name according to metadata table information, a peoples.age is analyzed into variables of int of data types, and sum is analyzed into a specific aggregation function. This process determines if the table name, field name, of the SQL statement is actually present in the metadata repository.

For step S230, the optimized syntax tree is obtained by performing equivalent transformation on the nodes of the syntax tree by the optimizer according to the preset optimization strategy. Optimizers are classified into two types, rule-based optimization (RBO) and cost-based optimization (CBO). The rule-based optimization strategy is to traverse the grammar tree once, and the mode matching can meet the nodes of the specific rule and perform corresponding equivalent conversion. Three general rules are presented below: predicate pushdown (Predicate Pushdown), constant accumulation (Constant Folding), column value clipping (Column clipping).

Predicate pushdown is performed before the filter operation is pushed down to join, and when join is performed later, the data size is significantly reduced, and join time is inevitably reduced. Two tables in the syntax tree do join first, then filter using age > 10. The join operator is a very time-consuming operator, and the amount of time consumed generally depends on the size of the two tables involved in join, and if the size of the two tables involved in join can be reduced, the time required for join operators can be greatly reduced.

Constant accumulation is, for example, to change the calculation x+ (100+80) to x+180. If not optimized, each result needs to be performed once for 100+80 operations and then added to the result. After the optimization, the operation of 100+80 is not needed to be performed again.

Column value clipping is when a table is used, it is not necessary to scan all of its column values, but rather only the id is scanned, and unnecessary clipping is removed. On one hand, the optimization greatly reduces the consumption of network and memory data, and on the other hand, the scanning efficiency is greatly improved for the columnar storage database.

For step S240, a plurality of physical plans that can be executed by the configuration data processing engine are generated according to the optimization syntax tree, costs of the physical plans are obtained, and a target physical plan with the smallest costs is selected from the plurality of physical plans. After the logic execution plan has been optimized relatively well, the logic execution plan still has no way to actually execute, it is only logically viable, and in fact Spark does not know how to execute this. For example, join is an abstract concept representing that two tables are merged according to the same id, however, how to implement merging is not specified by the logic execution plan. At this time, it is necessary to convert the logical execution plan into the physical execution plan, that is, to convert the logically viable execution plan into a plan that Spark can actually execute. Such as join operator, spark formulated different algorithm strategies for the operator according to different scenarios, broadcastHashJoin, shuffleHashJoin and SortMergejoin, etc. In practice, sparkPlanner converts the optimized logical plan, and generates a plurality of executable physical plans;

Then, the CBO (Cost optimization-based) optimization strategy calculates the Cost of each physical plan according to the Cost Model, and selects the physical plan with the minimum Cost as the final target physical plan.

For step S250, the operation command is executed in the form of a distributed data set according to the target physical plan to obtain an operation command result. Specifically, a java byte code is generated according to an optimal physical execution plan, SQL sentences are converted into DAGs, and operations are performed in a distributed data set mode, so that an operation command result is obtained.

For example, for an SQL statement that queries product information of financial product A, after execution by the task of the data processing engine, product information of financial product A may be obtained. And for the SQL statement for inquiring the sales information of the financial product B, after the task of the data processing engine is executed, the sales information of the financial product B can be obtained.

For step S300, when the target address is the link address of the database, a communication path is constructed between the target address and the database, and the database is connected with the database, and the database returns information indicating its own type through the communication path, and can confirm that the target address is mapped to the database according to the information. In particular, the database may be an ORACLE database. It will be appreciated that although the present embodiment enumerates the database as an ORACLE database, this is not limiting as to the type of database. In other embodiments, the database may be other database types, such as MySQL database, etc.

And after the link address of the database is connected with the database, inquiring a target table of the database in the database, and writing an operation command result corresponding to the financial business data into the target table to obtain a table file. Specifically, the table name of the target table of the user input database is table1, and the data batch downloading tool receives the table name of the target table of the user input database, and searches the database according to the table name of the target table of the user input database. When the database is provided with a table corresponding to the table name of the target table of the user input database, determining the table as the target table, and writing the operation command result into the target table to obtain a table file.

When the database does not have the target table; establishing a new target table; and writing the operation command result into a new target table.

Referring to fig. 3, wherein a new target table is created, including but not limited to the following steps:

step S301, extracting a field and a value type of an operation command result;

step S302, obtaining a preset table name;

step S303, a new target table is established, and the field of the new target table corresponds to the field of the operation command result, the value type of the new target table corresponds to the value type of the operation command result, and the name of the new target table is the preset table name.

For example, when the field of the operation command result extracted from the corresponding financial service data is a character string type, the value type is CHAR, and the preset table name input by the user is table1, a new target table is built according to the field and the value type of the financial service data and the preset table name, so that the field of the new target table is a character string type, the value type of the new target table is CHAR, and the name of the new target table is table1.

It will be appreciated that although the present embodiment enumerates the fields of the operation command result as a string type, this is not intended to limit the types of fields of the operation command result. In other embodiments, the database may be other fields, such as a value type, etc.

Although the present embodiment exemplifies an example in which the value type of the operation command result is CHAR, this does not limit the value type of the operation command result. In other embodiments, the value type may be other, such as int, etc.

When the user needs to download the table file in the ORACLE database, the table name is input, the corresponding table file is searched according to the table name, and the table file is downloaded from the ORACLE database to the local.

For step S400, when the target address maps to the link address of the file transfer server, a communication path is established between the target address and the file transfer server, and the file transfer server is connected to the file transfer server, and the file transfer server returns information indicating its own type through the communication path, and can confirm that the target address maps to the file transfer server based on the information. And acquiring a segmentation dimension and a file type, segmenting an operation command result corresponding to the financial service data into subfiles according to the segmentation dimension, generating a target file corresponding to the file type according to the subfiles, and uploading the target file to the file transmission server according to the link address of the file transmission server.

Specifically, the file transfer server is an SFTP server, which is an SFTP network protocol-based server, and SFTP (Secure File Transfer Protocol) is a network protocol for securely transferring files between computers. It is encrypted by the SSH (Secure Shell) protocol, providing secure access and transmission of files.

The SFTP server uses SSH protocols for data encryption and authentication to protect data from unauthorized access and eavesdropping. All file transfers are done on an encrypted channel so that the data remains confidential throughout the transfer process. The SFTP server provides management functions for remote files, and a user can execute uploading, downloading, deleting, renaming and other operations. Meanwhile, the method also supports the creation and navigation of the catalogue, so that the file organization and management are more convenient. The SFTP server allows the administrator to set different permissions and access levels for each user or group of users as desired. This ensures that only authorized users can access a particular file or directory, increasing the security of the system. The SFTP server records information such as login, file transmission and operation of the user and generates a corresponding log. These logs can be used for auditing and troubleshooting, as well as to help monitor and maintain the operating state of the server. SFTP servers typically have a high degree of reliability and redundancy mechanisms supporting a variety of storage media and cluster deployments to ensure availability and persistence of file transfer services. SFTP is a cross-platform protocol that enables file transfer between different operating systems (e.g., windows, linux, mac, etc.). This makes data exchange in a heterogeneous environment more flexible and convenient.

Referring to fig. 4, when the segmentation dimension input by the user is empty; dividing the operation command result into subfiles according to the dividing dimension, including but not limited to the following steps:

step S401, obtaining a preset file name;

step S402, a new file is established, and a preset file name is used as the name of the new file;

step S403, writing the operation command result into the new file to obtain the subfiles.

For step S401, the user inputs a preset file name through the input device, and the data batch download tool receives the preset file name.

For step S402, a new file is created, and since the user does not input the segmentation dimension, the default segmentation dimension is 1, so only a new file is created. And taking the preset file name input by the user as the name of the newly-built file so as to facilitate subsequent inquiry of the file.

For step S403, all the operation command results are written into the new file to obtain the subfiles. Since only one new file is created, all operation command results need to be written into the new file.

Referring to fig. 5, after generating the target file of the corresponding file type from the subfiles, the data processing method further includes the step of encapsulating the target file in a distributed file system:

Step S411, dividing the target file into at least one data block;

step S412, the data blocks are stored on the nodes of the cluster of the distributed file system in a scattered manner;

in step S413, a unique identifier is created for the data block, and a mapping relationship between the unique identifier and the data block is recorded.

Among other things, hadoop Distributed File System (HDFS) refers to a distributed file system (Distributed File System) designed to operate on general purpose hardware (commodity hardware). The Hadoop distributed file system can provide high-throughput data access, and is very suitable for application on a large-scale data set. The Hadoop distributed file system relaxes a part of POSIX constraints to achieve the purpose of streaming file system data reading. The Hadoop distributed file system adopts a Master-Slave (Master/Slave) structure model, and an HDFS cluster consists of a NameNode and a plurality of DataNodes. The NameNode is used as a main server to manage the naming space of a file system and the access operation of a client to the file; the DataNode in the cluster manages the stored data.

When uploading the target file to the distributed file system, the file is automatically divided into a plurality of data blocks by the distributed file system, wherein the size of the data blocks can be set by itself, and is usually 128M by default. These data blocks are stored scattered on different nodes in a cluster of the distributed file system. The distributed file system creates a unique identifier for each data block and records the mapping between these identifiers and the file to which they belong.

The encapsulation process is transparent to the user and requires only uploading the file into a distributed file system, similar in operation to a conventional file system. When a file is downloaded from a distributed file system, the Hadoop framework automatically retrieves the corresponding data blocks from the cluster and reassembles the complete file.

Thus, when using a distributed file system, there is no need to manually perform the file encapsulation operation, which is automatically handled by the Hadoop framework. Only the files need to be uploaded, downloaded and operated according to the interfaces provided by the distributed file system.

Referring to fig. 6, uploading a target file to a file transfer server includes, but is not limited to, the steps of:

step S421, constructing an encrypted transmission channel connected to a file transmission server;

step S422, obtaining transmission parameters;

step S423, encrypting the target file to obtain an encrypted file;

step S424, a transmission tool based on a secure file transmission protocol is called to upload the target file to the file transmission server through the encrypted transmission channel according to the transmission parameters.

Specifically, the SFTP client is used to connect to the SFTP server and provide a user name and a password for authentication to perform connection authentication. SFTP uses SSH protocol for data transfer, so that a secure encrypted channel is automatically established after connection establishment. The channel ensures confidentiality and integrity during transmission. Once the connection is established, the client may select a different transmission mode: upload, download, delete, etc. Before performing the upload, the client needs to tell the server which file to upload and which location to store it in the server. To this end, the client sends a command to the server to open the file path and provide the required transmission parameters. Once the file channel is open, the client may transmit the file to be uploaded to the server through the channel. The transmission process is performed through the encrypted channel, and the accuracy of the transmission can be checked during the transmission process. When the file transfer is completed, the client sends a command to close the file channel to the server so as to release the corresponding resource. Once the data transfer is complete, or when the client no longer needs a connection, a command to drop out may be sent to close the SFTP session and disconnect from the server.

When the user needs to download the file in the SFTP server, the file name is input, the corresponding file is searched according to the file name, and the file is downloaded from the SFTP server to the local.

It will be appreciated that the files downloaded locally have been converted to the type of file required by the user.

In the embodiment, the user only needs to input the operation command and the target address for the financial service data to the data batch downloading tool, and the data batch downloading tool can automatically process and acquire the financial service data according to the operation command, so that the operation command result of the financial service data is converted into the file type required by the user, the operation and the flow are reduced, and the processing efficiency is improved.

The embodiment of the application provides a data processing device.

Referring to fig. 7, a data processing apparatus includes: an input unit 110, a command execution unit 120, a first transmission unit 130, and a second transmission unit 140.

Wherein, the input unit 110 is used for acquiring an operation command and a target address; the command execution unit 120 is configured to configure task execution operation commands of the data processing engine to obtain operation command results; the first transmission unit 130 is configured to obtain a target table of the database when the target address is mapped to the database, and write the result of the operation command into the target table to obtain a table file; the second transmission unit 140 is configured to obtain the dividing dimension and the file type when the target address is mapped to the file transmission server, divide the operation command result into subfiles according to the dividing dimension, generate a target file corresponding to the file type according to the subfiles, and upload the target file to the file transmission server.

The user inputs an operation command and a target address through the input device, and the data batch download tool acquires the operation command and the target address through the input unit 110.

The target address is used to map to a storage destination of an operation command result obtained by executing the operation command.

Dividing the operation command into a plurality of keywords by the command execution unit 120, and parsing the plurality of keywords into nodes of a syntax tree according to a preset semantic rule to form the syntax tree; performing data type binding and function binding on nodes of the grammar tree to express keywords in the grammar tree through metadata information; performing equivalent conversion on nodes of the grammar tree according to a preset optimization strategy to obtain an optimized grammar tree; generating a plurality of physical plans which can be executed by a configuration data processing engine according to the optimized grammar tree, acquiring the cost of the physical plans, and selecting a target physical plan with the minimum cost from the plurality of physical plans; executing the operation command in the form of a distributed data set according to the target physical plan to obtain an operation command result.

Dividing the operation command into a plurality of keywords, and analyzing the keywords into nodes of a grammar tree according to a preset semantic rule to form the grammar tree. Specifically, the SQL character string is segmented into token keywords, and then the keywords are parsed into nodes of a grammar tree according to a preset semantic rule to form the grammar tree, which can be realized by using a third party class library ANTLR. In this process, it is determined whether the SQL statement meets the specification, such as select from where, whether the keywords are written in pairs. Of course, the table name, table field are not checked at this stage.

And carrying out data type binding and function binding on the nodes of the grammar tree so as to express keywords in the grammar tree through metadata information. The skeleton is basically provided by the parsed logic plan, and basic metadata information is needed to express the morphemes, wherein the most important metadata information mainly comprises two parts: the table schema mainly comprises basic definitions (column names and data types) of the table, data formats (Json and Text) of the table, physical positions of the table and the like, and the basic functions mainly refer to class information. The whole grammar tree is traversed again, data type binding and function binding are carried out on each node on the tree, for example, a peoples morpheme is analyzed into a table containing three columns of an age, an id and a name according to metadata table information, a peoples.age is analyzed into variables of int of data types, and sum is analyzed into a specific aggregation function. This process determines if the table name, field name, of the SQL statement is actually present in the metadata repository.

And carrying out equivalent conversion on the nodes of the grammar tree by an optimizer according to a preset optimization strategy to obtain an optimized grammar tree. Optimizers are classified into two types, rule-based optimization (RBO) and cost-based optimization (CBO). The rule-based optimization strategy is to traverse the grammar tree once, and the mode matching can meet the nodes of the specific rule and perform corresponding equivalent conversion. Three general rules are presented below: predicate pushdown (Predicate Pushdown), constant accumulation (Constant Folding), column value clipping (Column clipping).

Generating a plurality of physical plans which can be executed by the configuration data processing engine according to the optimized grammar tree, acquiring the cost of the physical plans, and selecting a target physical plan with the minimum cost from the plurality of physical plans. After the logic execution plan has been optimized relatively well, the logic execution plan still has no way to actually execute, it is only logically viable, and in fact Spark does not know how to execute this. For example, join is an abstract concept representing that two tables are merged according to the same id, however, how to implement merging is not specified by the logic execution plan. At this time, it is necessary to convert the logical execution plan into the physical execution plan, that is, to convert the logically viable execution plan into a plan that Spark can actually execute. Such as join operator, spark formulated different algorithm strategies for the operator according to different scenarios, broadcastHashJoin, shuffleHashJoin and SortMergejoin, etc. In practice, sparkPlanner converts the optimized logical plan, and generates a plurality of executable physical plans;

Executing the operation command in the form of a distributed data set according to the target physical plan to obtain an operation command result. Specifically, a java byte code is generated according to an optimal physical execution plan, SQL sentences are converted into DAGs, and operations are performed in a distributed data set mode, so that an operation command result is obtained.

For the first transmission unit 130, when the target address is the link address of the database, a communication path is constructed between the target address and the database, and the communication path is connected with the database, and the database returns information indicating the type of the database itself through the communication path, and according to the information, the mapping of the target address to the database can be confirmed. In particular, the database may be an ORACLE database. It will be appreciated that although the present embodiment enumerates the database as an ORACLE database, this is not limiting as to the type of database. In other embodiments, the database may be other database types, such as MySQL database, etc.

And after the link address of the database is connected with the database, inquiring a target table in the database, and writing the result of the operation command corresponding to the financial business data into the target table to obtain a table file. Specifically, the table name of the target table of the user input database is table1, and the data batch downloading tool receives the table name of the target table of the user input database, and searches the database according to the table name of the target table of the user input database. When the database is provided with a table corresponding to the table name of the target table of the user input database, determining the table as the target table, and writing the operation command result into the target table to obtain a table file.

Extracting a field and a value type of an operation command result corresponding to financial business data; acquiring a preset table name; establishing a new target table, enabling the field of the new target table to correspond to the field of the operation command result, enabling the value type of the new target table to correspond to the value type of the operation command result, and enabling the name of the new target table to be a preset table name.

For example, if the field of the extracted operation command result is a string type, the value type is CHAR, and the preset table name input by the user is table1, then a new target table is built according to the field and the value type of the operation command result and the preset table name, so that the field of the new target table is a string type, the value type of the new target table is CHAR, and the name of the new target table is table1.

For the second transmission unit 140, when the target address maps to the link address of the file transfer server, a communication path is constructed between the target address and the file transfer server, and the file transfer server is connected to the file transfer server, and the file transfer server returns information indicating its own type through the communication path, and can confirm that the target address maps to the file transfer server based on the information. The second transmission unit 140 acquires the division dimension and the file type, divides the operation command result corresponding to the financial service data into subfiles according to the division dimension, generates a target file corresponding to the file type according to the subfiles, and uploads the target file to the file transmission server according to the link address of the file transmission server.

Specifically, the file transfer server is an SFTP server, which is an SFTP network protocol-based server that encrypts via the SSH (Secure Shell) protocol, providing secure access and transfer of files.

When the segmentation dimension input by the user is empty; acquiring a preset file name; establishing a new file, and taking a preset file name as the name of the new file; writing the operation command result corresponding to the financial business data into the newly built file to obtain the subfiles. The user inputs a preset file name through the input device, and the data batch downloading tool receives the preset file name.

A new file is built, and because the user does not input the segmentation dimension, the default segmentation dimension is 1, so that only one new file is built. And taking the preset file name input by the user as the name of the newly-built file so as to facilitate subsequent inquiry of the file.

And writing all operation command results into the newly-built file to obtain the subfiles. Since only one new file is created, all operation command results need to be written into the new file.

Dividing the target file into at least one data block; storing the data blocks in a scattered manner on nodes of a cluster of the distributed file system; a unique identifier is created for the data block, and a mapping relationship between the unique identifier and the data block is recorded.

Constructing an encrypted transmission channel connected to a file transmission server; acquiring transmission parameters; encrypting the target file to obtain an encrypted file; and calling a transmission tool based on a secure file transmission protocol to upload the target file to the file transmission server through the encrypted transmission channel according to the transmission parameters.

The embodiment of the application provides electronic equipment. Referring to fig. 8, the electronic device includes a memory 220, a processor 210, a program stored on the memory 220 and executable on the processor 210, and a data bus 230 for enabling connection communication between the processor 210 and the memory 220, which when executed by the processor 210 implements the data processing method as described above.

The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Generally, for the hardware structure of the electronic device, the processor 210 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided in the embodiments of the present application.

Memory 220 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 220 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 220, and the processor 210 invokes the interface information processing method to execute the embodiments of the present application.

The input/output interface is used for realizing information input and output.

The communication interface is used for realizing communication interaction between the device and other devices, and can realize communication in a wired mode (such as USB, network cable and the like) or in a wireless mode (such as mobile network, WIFI, bluetooth and the like).

Bus 230 conveys information between various components of the device (e.g., processor 210, memory 220, input/output interfaces, and communication interfaces). The processor 210, memory 220, input/output interfaces, and communication interfaces enable communication connections to each other within the device via bus 230.

Embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores computer-executable instructions for causing a computer to perform the data processing method as described above.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. In the foregoing description of the present specification, descriptions of the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present application have been described in detail, the present application is not limited to the embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. A method of data processing, comprising:

acquiring an operation command and a target address;

2. The data processing method of claim 1, wherein the task of the configuration data processing engine executing the operation command to obtain an operation command result comprises:

3. The data processing method according to claim 1, wherein the writing the operation command result to the target table includes:

when the target table does not exist in the database;

establishing a new target table;

and writing the operation command result into a new target table.

4. A data processing method according to claim 3, wherein said creating a new destination table comprises:

extracting a field and a value type of the operation command result;

acquiring a preset table name;

5. The data processing method of claim 1, wherein when the partition dimension is empty; the step of dividing the operation command result into subfiles according to the dividing dimension comprises the following steps:

acquiring a preset file name;

writing the operation command result into the new file to obtain a subfile.

6. The data processing method according to claim 1, wherein after generating the target file of the corresponding file type from the subfiles, the data processing method further comprises:

dividing the target file into at least one data block;

7. The data processing method according to claim 1, wherein the uploading the target file to the file transfer server includes:

acquiring transmission parameters;

Encrypting the target file to obtain an encrypted file;

8. A data processing apparatus, comprising:

the input unit is used for acquiring an operation command and a target address;

9. An electronic device comprising a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the program when executed by the processor implementing the data processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the data processing method according to any one of claims 1 to 7.