CN110677402A

CN110677402A - Data integration method and device based on intelligent network card

Info

Publication number: CN110677402A
Application number: CN201910904415.4A
Authority: CN
Inventors: 郑琳琳; 刘畅; 郑文琛
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-01-10
Anticipated expiration: 2039-09-24
Also published as: CN110677402B

Abstract

The application provides a data integration method and a device based on an intelligent network card, wherein the method comprises the following steps: the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node; the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; the second intelligent network card sends the target data to a second node so that the second node integrates the target data; wherein the first node and the second node are two nodes in a big data compute engine platform. The method can reduce the burden of the CPU, save network resources, reduce communication overhead, and simultaneously, a user can obtain the optimal data compression ratio according to the requirement.

Description

Data integration method and device based on intelligent network card

Technical Field

The application relates to the field of big data, in particular to a data integration method and device based on an intelligent network card.

Background

Spark (calculation engine) is a distributed big data parallel processing platform based on memory calculation, which integrates batch processing, real-time stream processing, interactive query and graph calculation, and avoids resource waste caused by the need of deploying different clusters in various operation scenes.

MapReduce (programmed computing model) is a computing model, framework and platform oriented to large data parallel processing, and serves as a cluster-based high-performance parallel computing platform. The Spark and the MapReduce are integrated to form a large data processing platform with stronger functions. Spark integrates the cluster management and storage platform from the framework, using its cluster management and underlying storage. MapReduce can be effectively used for data types such as log files and static batch processing work, and other processing tasks can be assigned to Spark. The integrated Spark platform MapReduce process involves transmission of a large amount of data, the Spark platform comprises a plurality of working nodes, each working node can process tasks distributed by the Spark platform in parallel, and data transmission in the MapReduce process occupies network resources and causes long time consumption for data synchronization among Spark working nodes. In some technical schemes, data transmitted between nodes is compressed and then transmitted, and then decompressed after reaching a destination node. The method can save network resources and reduce communication overhead. However, the data compression and decompression operation itself needs to be completed by the CPU, which occupies a large amount of CPU time and is very high in cost, so that it is a problem that is urgently needed to solve at present to liberate CPU resources and seek other methods to process data transmitted in the Spark platform.

Disclosure of Invention

The invention provides a data integration method and device based on an intelligent network card, which are applied to a big data computing engine platform and used for compressing data transmitted in the MapReduce process, thereby lightening the burden of a CPU, saving network resources and reducing communication overhead.

In a first aspect, an embodiment of the present application provides a data integration method based on an intelligent network card, where the method includes:

the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node;

the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to a second node so that the second node integrates the target data, wherein the first node and the second node are two nodes in a big data computing engine platform.

Optionally, the compressing, by the first intelligent network card, the data to be transmitted according to the target compression ratio to obtain compressed data, including:

determining the preorder data and the postorder data in the data to be transmitted according to the target compression ratio;

determining a first function rule according to the preamble data; taking the coefficient value of the first function rule as the compressed data, wherein the first function rule is used for reflecting the incidence relation between the preorder data and the postorder data;

the first intelligent network card sends the compressed data to the second intelligent network card, and the method comprises the following steps:

sending the preorder data, the first function rule and the compressed data to the second intelligent network card; and determining a predicted value of the subsequent data by the second intelligent network card according to the preorder data, the first function rule and the compressed data, wherein the predicted value of the subsequent data and the preorder data form the target data.

Optionally, the first function rule satisfies the following formula:

wherein x is the number of ith data in the preamble data, n is the number of the preamble data, and p_iIs coefficient of ith data in the preamble data, and p (x) is ith data in the preamble data; set P ═ { P_iAnd is the compressed data.

Optionally, determining preamble data in the data to be transmitted according to the target compression ratio includes:

determining the number n of preamble data according to the following formula:

n＝(1-m)*K；

wherein m is the target compression ratio, and K is the number of data in the data to be transmitted.

Optionally, before sending the compressed data to the second intelligent network card, the method further includes:

calculating the compression loss rate of the data to be transmitted;

outputting prompt information based on the compression loss rate, wherein the prompt information is used for prompting a user whether the compression loss rate is met;

sending the compressed data to the second intelligent network card, including:

and if a determination instruction is received, sending the compressed data to the second intelligent network card.

Optionally, the outputting prompt information based on the compression loss rate further includes:

and if a negative instruction is received, prompting a user to reset a target compression ratio, and compressing the data to be transmitted by the first intelligent network card based on the reset target compression ratio.

Optionally, calculating a compression loss rate of the data to be transmitted includes:

when the x is the serial number of the jth subsequent data in the subsequent data, according to the formula

Determining a predicted value of the jth subsequent data; calculating the difference value between the predicted value of the jth subsequent data and the true value of the jth subsequent data, and determining the sum of the difference values;

according to the formula

Calculating the compression loss rate;

wherein, theSaid E is said compression loss rate, said Σ E (j) is said difference sum, and said D is_jIs the jth subsequent data.

In a second aspect, an embodiment of the present application provides a data integration apparatus based on an intelligent network card, where the apparatus includes:

the device comprises an acquisition module, a compression module and a compression module, wherein the acquisition module is used for acquiring data to be transmitted and a target compression ratio;

the processing module is used for compressing the data to be transmitted according to the target compression ratio to obtain compressed data;

the communication module is used for sending the compressed data to a second intelligent network card;

the processing module is further used for decompressing the compressed data to obtain target data;

the communication module is further configured to send the target data to the second node;

the processing module is further used for integrating the target data;

wherein the first node and the second node are two nodes in a big data compute engine platform.

In a third aspect, an embodiment of the present application provides an intelligent network card, including:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing one or more steps of any of the above methods according to the obtained program instructions.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a computer, cause the computer to perform one or more steps of the intelligent network card-based data integration method provided in the first aspect.

In a fifth aspect, an embodiment of the present application provides a program product, where the program product includes program instructions, and when the program instructions are executed by a computer, the computer executes one or more steps of the data integration method based on the intelligent network card as provided in the first aspect.

The beneficial effect of this application is as follows:

in the technical scheme of the embodiment of the application, a data integration method based on an intelligent network card is provided, and the method comprises the following steps: the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node; the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; the second intelligent network card sends the target data to a second node so that the second node integrates the target data; wherein the first node and the second node are two nodes in a big data compute engine platform. The first node and the second node can be represented by a CPU, and the intelligent network card replaces the CPU to compress data, so that the burden of the CPU is reduced, network resources are saved, and communication overhead is reduced.

Drawings

Fig. 1 is a schematic diagram of a data integration system architecture based on an intelligent network card according to an embodiment of the present application;

FIG. 2 is a schematic diagram of the MapReduce process;

fig. 3 is a schematic flowchart of a data integration method based on an intelligent network card according to an embodiment of the present application;

fig. 4 is a graph of n-order polynomial fitting of data to be transmitted, which is numbered "D0-D10", according to the embodiment of the present application;

FIG. 5 is a schematic flowchart illustrating a process of obtaining an optimal compression ratio by a user according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data integration device based on an intelligent network card according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The shapes and sizes of the various elements in the drawings are not to be considered as true proportions, but rather are merely intended to illustrate the context of the application.

The following describes a data integration system based on an intelligent network card according to an embodiment of the present application. The system can be suitable for a big data computing engine platform and is used for compressing and decompressing the transmitted data in the big data computing engine platform. Referring to fig. 1, fig. 1 is a schematic diagram of a data integration system architecture based on an intelligent network card according to an embodiment of the present disclosure, in fig. 1, a first node and a second node of a big data computing engine platform are taken as an example, and one node may be represented by one CPU. The first intelligent network card and the second intelligent network card include: the FPGA chip, the network interface module and the PCle interface; the FPGA chip is used for customizing a compression module and a decompression module through algorithm internal instantiation; the network port module is used for communicating the first intelligent network card and the second intelligent network card through a network link; and the PCle interface is used for the communication between the first intelligent network card and the CPU. It should be understood that the first intelligent network card and the second intelligent network card provided by the present application have the same architecture and the same function, and both can perform the same operation on data in the big data computing engine platform, as shown in fig. 1, the embodiment of the present application only elaborates in detail on the data flow direction of the data to be transmitted from the first network card to the second network card.

When the big data computing engine platform is a Spark platform, taking the MapReduce process of the Spark platform as an example, the MapReduce process is divided into three stages: map, shuffle, reduce. Before the shuffle, that is, in the map phase, MapReduce performs a fragmentation (split) operation on data to be processed, and allocates a MapTask to each fragment. Then, the map () function processes each row of data in each slice to obtain a key value pair (key, value), where key is an offset and value is the content of a row, and the obtained key value pair is also called an "intermediate result"; then, the method enters a shuffle stage, so that it can be seen that the shuffle stage is used for processing an "intermediate result", which means that the "intermediate result" outputted irregularly at the map end is "arranged" into data with a certain rule according to a specified rule so as to be received and processed at the reduce end, and the reduce () function performs partition combination on the data after the reduce end receives the data with the certain rule.

For example, as shown in fig. 2, fig. 2 is a schematic diagram of a MapReduce process; the method comprises the steps that a Spark platform reads an HDFS file, a map end slices data in the HDFS file into three areas, each area has three types of data, the data in each area are dispersed in a shuffle process, the data are classified according to data types and transmitted to a reduce end after classification is finished, the classified data are combined by the reduce end to obtain the three areas, and one area is filled with one type of data.

The Spark platform comprises a plurality of working nodes, and the working nodes are hereinafter referred to as "nodes" for short; taking the first node and the second node as an example, it is a possible case that the first node is connected with a first intelligent network card and the second node is connected with a second intelligent network card. After the Spark platform acquires the HDFS file, the first node sends data to be transmitted in the HDFS file to a first intelligent network card, and the first network card acquires a target compression rate set by a user and compresses the data to be transmitted according to the target compression rate to obtain compressed data; the first intelligent network card sends the compressed data to the second intelligent network card, and the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to the second node so that the second node integrates and processes the target data, wherein the processing process of the target data by the second node can be shuffle processing and reduce processing after map.

Another possible scenario is that, taking a first node and a second node in the Spark platform as an example, the first node is connected to a first intelligent network card, and the second node is connected to a second intelligent network card. After the Spark platform acquires the HDFS file, the first node processes data to be transmitted in the HDFS file, the data to be transmitted is subjected to map processing and shuffle processing in the MapReduce process, then the first node sends the data to be transmitted to the first intelligent network card, and the first intelligent network card compresses the data to be transmitted according to the target compression rate ratio to obtain compressed data; the first intelligent network card sends the compressed data to the second intelligent network card, and the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to the second node so that the second node integrates and processes the data to be transmitted. The second node specifically integrates the target data, and the integration processing of the second node may be reduce processing in a MapReduce process, that is, the target data is classified and merged.

Referring to fig. 3, fig. 3 is a schematic flow chart of a data integration method based on an intelligent network card according to an embodiment of the present disclosure, where the method may apply a Spark platform, may process data transmitted in a MapReduce process in the Spark platform, and may also process data transmitted between nodes in the Spark platform, and the embodiment of the present disclosure is not limited in particular. The Spark platform comprises a first node and a second node; the first node is connected with a first intelligent network card, and the second node is connected with a second intelligent network card. The method comprises the following steps:

s301: the first intelligent network card acquires data to be transmitted and a target compression ratio;

as a new technology, the intelligent network card is originally designed to support various virtualization functions at a much lower cost than a common CPU, assist the CPU in processing network loads, and has a programmable network interface, usually comprising a plurality of ports and an internal switch, forwarding data at a higher speed than the speed, and intelligently mapping the data to related applications based on network data packets, application sockets and the like; network traffic is detected and managed. In addition, the network card is used as a first pass gateway for the data stream to enter and exit, and can also realize monitoring and sniffing so as to avoid network attack and realize the effect of safety isolation.

At present, several mainstream intelligent network card architectures are different, and can be roughly classified into three types, namely an Application Specific Integrated Circuit (ASIC) network card, a programmable Gate Array (FPGA) network card, and a System On Chip (SOC) network card. The intelligent network card (such as mallanox connectictx-5 series) with ASIC structure has low cost and excellent performance, the intelligent network card of the type generally has a programmable interface, but the flexible space of control is smaller because the processing logic is solidified on the ASIC; compared with the FPGA-based intelligent network card (such as the napath NT100E3-1-PTP series), the flexibility is higher, but the cost is slightly higher and the programming difficulty is larger; the SOC architecture contains a dedicated CPU (e.g., mellonox blue field series) that provides a balance of performance and controllability, and self-developed network cards from various vendors are typically used with this architecture.

Therefore, the embodiment of the application adopts an intelligent network card technology based on the FPGA, and the compression and decompression module customized by instantiation in the FPGA chip replaces the data transmitted by the CPU in the MapReduce process in the Spark platform.

Example 1, after a Spark platform acquires an HDFS (distributed file) file, a first intelligent network card directly acquires data to be transmitted in a first node and a target compression ratio set by a user through a PCle interface.

Example 2, after a Spark platform acquires an HDFS (distributed file) file, the first node performs map processing and shuffle processing on data to be transmitted in the HDFS file, and the first intelligent network card acquires the data to be transmitted processed by the first node and a target compression ratio set by a user through a PCle interface.

S302: and the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data. Optionally, the compressing, by the first intelligent network card, the data to be transmitted according to the target compression ratio to obtain compressed data, including: the first intelligent network card determines the preorder data and the postorder data in the data to be transmitted according to the target compression ratio; the first intelligent network card determines a first function rule according to the preorder data; taking the coefficient value of the first function rule as the compressed data, wherein the first function rule is used for reflecting the incidence relation between the preorder data and the postorder data;

the first intelligent network card sends the preorder data, the first function rule and the compressed data to the second intelligent network card; and determining a predicted value of the subsequent data by the second intelligent network card according to the preorder data, the first function rule and the compressed data, wherein the predicted value of the subsequent data and the preorder data form the target data.

For example, assuming that there are 10 pieces of data to be transmitted, the target compression ratio input by the user is 40%, the first network card divides the data to be transmitted into preceding data and subsequent data, where there are 6 preceding data and 4 subsequent data, and curve fitting is performed on the 6 data to obtain a function law: y is a + bx, and the polynomial coefficients { a, b } are collected as compressed data; and sending the 6 pre-order data, the compressed data and a function rule y which is a + bx to a second intelligent network card, predicting the 4 subsequent data by the second intelligent network card by using a formula y which is a + bx to respectively obtain a predicted value of each subsequent data, and forming target data, namely decompressed data, by the predicted values of the 4 subsequent data and the 6 pre-order data.

Optionally, the first function rule satisfies the following formula:

Optionally, the determining, by the first intelligent network card, the preamble data in the data to be transmitted according to the target compression ratio includes: determining the number n of preamble data according to the following formula:

n＝(1-m)*K；

For example, assuming that there are 10 pieces of data to be transmitted, the target compression ratio m is 60%, that is, 60% of the data to be transmitted is ignored during the compression process, and the ignored data is subsequent data, then the number n of the preamble data is 4.

It should be understood that, when the target compression ratio set by the user is lower, the larger the number n of the preamble data is, the smaller the compression error is; when the target compression ratio set by the user is higher, the number n of the preamble data is smaller, and the compression error is larger.

Optionally, before sending the compressed data to the second intelligent network card, the method further includes: the first intelligent network card calculates the compression loss rate of the data to be transmitted; the first node outputs prompt information based on the compression loss rate, wherein the prompt information is used for prompting a user whether the compression loss rate is met;

the first intelligent network card sends the compressed data to the second intelligent network card, and the method comprises the following steps: and if the first node receives the determination instruction, sending the compressed data to the second intelligent network card.

Optionally, the first node outputs prompt information based on the compression loss rate, and further includes:

and if the first node receives a negative instruction, prompting a user to reset a target compression ratio, and compressing the data to be transmitted by the first intelligent network card based on the reset target compression ratio.

Optionally, the calculating, by the first intelligent network card, a compression loss rate of the data to be transmitted includes:

when x is the jth subsequent data in the subsequent data, the first intelligent network card is according to the formula

Determining a predicted value of the jth subsequent data; the first intelligent network card calculates the difference value between the predicted value of the jth subsequent data and the true value of the jth subsequent data, and determines the sum of the difference values;

the first intelligent network card is according to formula

Calculating the compression loss rate;

wherein E is the compression loss rate, Σ E (j) is the difference sum, and D_jIs the jth subsequent data.

For example, with the first node and the second node in the Spark platform, assuming that there are 10 data in the data to be transmitted, the target compression ratio input by the user for the first time is 60%, and each data in the 10 data is numbered "D0-D9" in sequence; dividing the 10 data into 4 pieces of preamble data numbered from D0 to D3 and 6 pieces of following data numbered from D4 to D9 according to a target compression ratio input by a user for the first time; carrying out n-order polynomial fitting on preamble data with the numbers of D0-D3;

referring to fig. 4, fig. 4 is a graph of n-order polynomial fitting according to data numbered "D0-D9" provided by the embodiment of the present application, and the first half of the curve is fitted according to data numbered "D0-D3" to obtain a first function law, where the first function law satisfies the following formula

Wherein x is the number of ith data in the preamble data, n is the number of the preamble data, and p_iThe coefficient of the ith data in the preamble data is also the polynomial coefficient of the first function rule; polynomial coefficient set P ═ { P } for preserving first function law_iAs compressed data.

Before the first intelligent network card sends the compressed data to the second intelligent network cardAnd the first intelligent network card predicts the subsequent data according to a first function rule to obtain a predicted value of the subsequent data. That is, when x is the number of the jth data in the subsequent data, the formula is used

Predicting the subsequent data with the serial numbers of D4-D9 to obtain the predicted value of the subsequent data with the serial numbers of D4-D9, fitting the latter half part of the curve, as shown in FIG. 4, wherein the origin on the latter half part of the curve is the predicted value of the subsequent data, the small square frame in FIG. 4 is the real value of the subsequent data, and the difference value between the real value of the subsequent data and the predicted value of the subsequent data can be calculated according to the difference value, so as to further calculate the sum of the difference values; the first intelligent network card calculates the compression loss rate of the 10 data, wherein the compression loss rate is mainly calculated by the loss of subsequent data, and the method specifically comprises the following steps: the first intelligent network card calculates the difference between the predicted value of the subsequent data with the serial number of D4-D9 and the true value of the subsequent data with the serial number of D4-D9, and the six difference values are added to obtain the sum of errors; the first intelligent network card is according to the formula

Calculating the compression loss rate of the 10 data; wherein E is the compression loss rate, Σ E (j) is the sum of the difference values, and D_jIs the data number of the jth subsequent data in the subsequent data.

After obtaining the compression loss rates of the 10 pieces of data, the first node may further generate a prompt message according to the compression loss rate, where the prompt message may be a dialog box "whether to accept the compression loss rate, and if the next step is accepted, if the next step is not accepted, the target compression rate is reset; and the user gives a determination instruction or a negative instruction according to the prompt information. If the first intelligent network card receives a user determination instruction transmitted from the first node, the compressed data is sent to the second intelligent network card; if the first intelligent network card receives a user negative instruction transmitted from the first node, the first intelligent network card outputs information through the first node to prompt a user to reset a target compression ratio, the first intelligent network card compresses the data to be transmitted based on the target compression ratio reset by the user, the steps are repeated until an error loss rate accepted by the user is obtained, and the user takes the target compression ratio set by the final user as an optimal compression ratio. Detailed flow is shown in fig. 5, and fig. 5 is a flow chart of obtaining an optimal compression ratio by a user according to an embodiment of the present application, and the user may set a target compression ratio multiple times according to the acceptance degree of the compression loss ratio until the compression loss ratio can be accepted.

It should be understood that different types of data that different users need to process are different, and tolerance degrees of different users to data compression loss rates are also different, so in the data integration method based on the intelligent network card provided in the embodiment of the present application, a user may select a target compression ratio according to actual needs, and further obtain a compression loss rate that the user can accept, and after determining a target compression ratio that the user finally determines, that is, after receiving a user determination instruction, the first intelligent network card compresses data to be transmitted according to the target compression ratio in an actual operating environment.

S303: the first intelligent network card sends the compressed data to the second intelligent network card;

as shown in fig. 1, the network port module of the first intelligent network card in fig. 1 communicates through the network port module of the second network card in the network link; the first intelligent network card sends the compressed data to the second intelligent network card through the network port module, and simultaneously, the first intelligent network card also sends the preorder data in the data to be transmitted to the second intelligent network card through a network link.

S304: the second intelligent network card decompresses the compressed data to obtain target data;

illustratively, after the second intelligent network card receives the compressed data and the pre-order data in the data to be transmitted, a decompression module in an FPGA chip in the second intelligent network card decompresses the compressed data according to a built-in algorithm and the pre-order data to obtain the target data.

S305: the second intelligent network card sends the target data to the second node;

for example, as shown in fig. 1, the second intelligent network card sends the data to be transmitted to the second node through the PCle interface.

S306: and the second node integrates the target data.

For example, the second node may be a CPU, and the CPU performs a series of processes on the data to be transmitted obtained after decompression according to a built-in algorithm of a Spark platform, where the processes may be map processes in a MapReduce process, shuffle processes in an intermediate process, and reduce processes, and embodiments of the present application are not limited specifically.

The full range of embodiments is presented below.

Taking two nodes in Spark as an example, after a Spark platform acquires an HDFS (distributed file) file, a first intelligent network card directly acquires data to be transmitted in a first node and a target compression ratio set by a user through a PCle interface, assuming that the data to be transmitted has 2000 data, and numbering each data in the 2000 data as "D0-D1999" in sequence; the target compression ratio input by the user for the first time is 40%, the first network card divides the data to be transmitted into preorder data and subsequent data, wherein the preorder data number is 1200, the preorder data number is 'D0-D799', the subsequent data number is 800, the subsequent data number is 'D800-D1999', and curve fitting is performed on the 800 data to obtain a function law

Polynomial coefficient p₀，…，p₇₉₉The set is used as compressed data;

and before the first intelligent network card sends the compressed data to the second intelligent network card, the first intelligent network card predicts the subsequent data according to the first function rule to obtain a predicted value of the subsequent data. That is, when x is the number of the jth data in the subsequent data, the formula is usedPredicting the subsequent data with the serial number of 'D800-D1999', obtaining the predicted value of the subsequent data with the serial number of 'D800-D1999', calculating the difference value between the real value of the subsequent data and the predicted value of the subsequent data according to the predicted value, and further calculating the sum of the difference values; the first intelligent network card calculates the compression loss rate of the 2000 data, wherein the compression loss rate is mainly calculated by the loss of subsequent data, and the method specifically comprises the following steps: the first intelligent network card calculates the difference between the predicted value of the subsequent data with the serial number of D800-D1999 and the true value of the subsequent data with the serial number of D800-D1999, and 1200 differences are added to obtain the sum of errors; the first intelligent network card is according to the formula

Calculating a compression loss rate of the 2000 data; wherein E is the compression loss rate, Σ E (j) is the sum of the difference values, and D_jIs the data number of the jth subsequent data in the subsequent data.

After obtaining the compression loss rates of the 2000 pieces of data, the first node may further generate a prompt message according to the compression loss rate, where the prompt message may be a dialog box "whether to accept the compression loss rate, and if the next step is accepted, if the next step is not accepted, the target compression rate is reset; and the user gives a determination instruction or a negative instruction according to the prompt information.

If the first intelligent network card receives a user determination instruction transmitted from the first node, the compressed data is sent to the second intelligent network card; if the first intelligent network card receives a user negative instruction transmitted from the first node, the first intelligent network card outputs information through the first node to prompt a user to reset a target compression ratio, the first intelligent network card compresses the data to be transmitted based on the target compression ratio reset by the user, the steps are repeated until an error loss rate accepted by the user is obtained, and the user takes the target compression ratio set by the final user as an optimal compression ratio.

The first intelligent network card numbers the compressed data to be D0-D799' preamble data and formula

The compressed data is sent to the second intelligent network card, and the second intelligent network card decompresses the compressed data, and the method specifically comprises the following steps:

mixing 800 preamble data, polynomial coefficient p₀，…，p₇₉₉Set and

sending the information to a second intelligent network card; second intelligent network card utilization formula

And predicting 1200 subsequent data to respectively obtain a predicted value of each subsequent data, wherein the predicted values of 800 subsequent data and 1200 preceding data form target data, namely decompressed data.

The second intelligent network card sends the target data to the second node through the PCle interface, and the second node performs reduce processing or map processing on the target data.

It should be noted that, the time spent for compressing and decompressing the transmission data in the MapReduce process of the Spark platform by the intelligent network card is less than the time saved by compressing the data, so that compressing the data not only frees up CPU resources, but also does not occupy the time for processing the data by the CPU.

Based on the same inventive concept, the embodiment of the invention provides a data integration device based on an intelligent network card, which is applied to a Spark platform, and takes a first node and a second node of the Spark platform as an example, wherein the first node is connected with a first intelligent network card, and the second node is connected with a second intelligent network card. Referring to fig. 6, fig. 6 is a schematic structural diagram of data integration based on an intelligent network card according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes an obtaining module 601, a processing module 602, and a communication module 603.

An obtaining module 601, configured to obtain data to be transmitted and a target compression ratio;

a processing module 602, configured to compress the data to be transmitted according to the target compression ratio to obtain compressed data;

the communication module 603 is configured to send the compressed data to a second intelligent network card;

the processing module 602 is further configured to decompress the compressed data to obtain target data;

the communication module 603 is further configured to send the target data to the second node;

the processing module 602 is further configured to perform integration processing on the target data;

wherein the first node and the second node are two nodes in a Spark platform.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A data integration method based on an intelligent network card is characterized by comprising the following steps:

2. The method of claim 1, wherein the compressing the data to be transmitted by the first intelligent network card according to the target compression rate to obtain compressed data comprises:

3. The method of claim 2, wherein the first function law satisfies the following equation:

4. The method of claim 2, wherein determining preamble data in the data to be transmitted according to a target compression ratio comprises:

determining the number n of preamble data according to the following formula:

n＝(1-m)*K；

5. The method of claim 3, wherein prior to sending the compressed data to the second smart network card, the method further comprises:

calculating the compression loss rate of the data to be transmitted;

sending the compressed data to the second intelligent network card, including:

6. The method of claim 5, wherein outputting a hint information based on the compression loss rate further comprises:

7. The method of claim 5, wherein calculating the compression loss rate of the data to be transmitted comprises:

according to the formula

Calculating the compression loss rate;

8. The utility model provides a data integration device based on intelligent network card, its characterized in that, first node and first intelligent network card, the device includes:

the processing module is further used for integrating the target data;

9. An intelligent network card, comprising: the intelligent network card includes: memory, processor and program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the intelligent network card based data integration method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the method according to any one of claims 1-7.