CN114756564B

CN114756564B - Data processing method, device, equipment and medium for stream computing

Info

Publication number: CN114756564B
Application number: CN202210505210.0A
Authority: CN
Inventors: 易晓博
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2024-07-02
Anticipated expiration: 2042-05-10
Also published as: CN114756564A

Abstract

The invention relates to a data processing technology, and discloses a data processing method for stream computing, which comprises the following steps: partitioning and barrel-separating storage are carried out on preset data; when monitoring that a user executes a new operation or a deletion operation on the preset data, generating a corresponding new data file and a corresponding deletion data file; the new data file and the deleted data file are combined at regular time, the preset data is refreshed according to the combination result, and the combined new data file and the combined deleted data file are deleted; when the user is monitored to execute the updating operation on the preset data, identifying the sub-bucket where the data to be updated corresponding to the updating operation is located, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation. The invention also provides a data processing device, equipment and medium for streaming computing. The invention can improve the efficiency of stream calculation.

Description

Data processing method, device, equipment and medium for stream computing

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus for streaming computing, an electronic device, and a computer readable storage medium.

Background

The streaming computing is a common computing mode in a big data application scene, and is used for computing the data in real time in the process of continuously changing the large-scale streaming data so as to meet the requirements of completing the addition, update and deletion of the related data in a very short time. Currently, in a big data stream computing scenario, the industry mainly adopts the following two schemes:

The first approach is read-time merging, which is used, for example, by the non-relational database Apache Hbase. According to the scheme, user operations such as new addition, update and deletion are recorded in an LSM tree (Log-Structured MergeTree, log-structure merging tree), and when a user reads data, the scheme performs Log merging and data analysis and calculation according to the operation logs recorded in the LSM tree to obtain the latest data. This solution requires the storage of a large number of user operation records, which can create a large number of fragmented files in the storage system, putting pressure on the data storage.

The second scheme is copy-on-write, i.e. each operation of adding, updating and deleting data needs to take out the data file corresponding to the relevant data from the storage system, then execute the updating operation, and finally re-write the updated data file back to the storage system. According to the scheme, data scanning and data reading and writing operations are required to be executed in a large area every time of data refreshing, the system overhead is high, and the data refreshing efficiency is reduced.

Disclosure of Invention

The invention provides a data processing method, a data processing device and a computer readable storage medium for streaming computing, which are mainly used for improving the efficiency of streaming computing.

To achieve the above object, the present invention provides a data processing method for streaming computing, including:

Partitioning preset data, and dividing the data in each partition into different sub-buckets for storage;

when monitoring that a user executes a new operation or a deletion operation on the preset data, generating corresponding new data files and deletion data files, and storing the new data files and the deletion data files into corresponding partitions and sub-buckets;

The new data files and the deleted data files in each partition and each sub-bucket are acquired at fixed time to be combined to obtain a combination result, the preset data are refreshed according to the combination result, and the combined new data files and the combined deleted data files are deleted;

When the user is monitored to execute the updating operation on the preset data, identifying the sub-bucket where the data to be updated corresponding to the updating operation is located, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation.

Optionally, partitioning the preset data, and dividing the data in each partition into different sub-buckets for storage, including:

Dividing the preset data into different partitions by using preset partition keys;

dividing the data in each partition into different sub-buckets by using a preset sub-bucket key;

Creating a data table consistent with the barrel dividing quantity;

And storing the data corresponding to each sub-bucket in each partition into the same data table one by one.

Optionally, the generating a corresponding new data file includes:

Acquiring a new operation record corresponding to the new operation;

Analyzing the newly-added operation record to obtain newly-added operation data information, wherein the newly-added operation data information comprises a data table to be operated;

Identifying a table structure of the data table to be operated;

Generating a data file consistent with the table structure of the data table to be operated according to the newly-added operation data information, and taking the data file as a newly-added data file.

Optionally, the storing the new added data file and the deleted data file in the corresponding partitions and buckets includes:

identifying partition keys and barrel dividing keys corresponding to the newly added data files;

Positioning a partition where the newly added data file is located by utilizing a partition key corresponding to the newly added data file;

Positioning the sub-bucket of the newly added data file according to the sub-bucket key corresponding to the newly added data file in the partition of the newly added data file;

And scanning a data table in the sub-bucket where the newly added data file is located, and inserting the newly added data file into the last row of the data table in the sub-bucket where the newly added data file is located.

Optionally, the step of acquiring the new data file and the deleted data file in each partition and each sub-bucket at the fixed time to combine includes:

according to a preset timing task, scanning each partition and each sub-bucket of the preset data at fixed time;

Acquiring a new data file and a deleted data file in each sub-bucket in each partition;

the newly added data files and the deleted data files in each sub-bucket are combined for the first time one by one to obtain first combined data files;

and carrying out second merging on all the first merging data files in the same partition one by one to obtain the merging result.

Optionally, the obtaining the new data file and the deleted data file in each of the sub-buckets in each of the partitions includes:

acquiring an execution time point of each preset timing task;

Adding a preset time difference to each execution time point to obtain a file coverage time period;

And obtaining newly added data files and deleted data files generated in the file coverage time period in each sub-bucket in each partition.

Optionally, the first merging the added data files and the deleted data files in each sub-bucket one by one to obtain a first merged data file, which includes:

Analyzing the newly added data files and the deleted data texts in each sub-bucket to obtain an operation data table and an operation field corresponding to each newly added data file and each deleted data file;

And summing up the newly added data files and the deleted data files which are the same in the operation data table and correspond to the same operation field in each sub-bucket to obtain the first merged data file.

In order to solve the above-mentioned problems, the present invention also provides a data processing apparatus for streaming computation, the apparatus comprising:

the data partitioning and barrel dividing module is used for partitioning preset data and dividing the data in each partition into different barrels for storage;

The data adding and deleting file generating module is used for generating corresponding added data files and deleted data files when monitoring that a user executes new adding operation or deleting operation on the preset data, and storing the added data files and the deleted data files into corresponding partitions and sub-buckets;

the data adding and deleting file merging module is used for regularly acquiring new added data files and deleted data files in each partition and each sub-bucket to obtain a merging result, refreshing the preset data according to the merging result, and deleting the merged new added data files and the merged deleted data files;

The data updating processing module is used for identifying the sub-bucket where the data to be updated corresponding to the updating operation is located when the user executes the updating operation on the preset data, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

A memory storing at least one computer program; and

And a processor executing the program stored in the memory to implement the data processing method for streaming computation.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned data processing method for streaming computation.

According to the embodiment of the invention, the newly added data file and the deleted data file generated by the newly added data operation and the deleted data operation are combined and deleted at fixed time, so that the situation of sharp increase of the newly added data file and the deleted data file is avoided, the storage pressure is reduced, the preset data is refreshed according to the combination result, the calculated amount for refreshing the preset data can be reduced, the preset data is stored in the sub-barrels, the corresponding sub-areas and sub-barrels can be scanned to acquire the corresponding data no matter whether the newly added operation, the data deleting operation or the updating operation is executed on the preset data, the scanning of the whole data is not needed, and the cost of data scanning is reduced. Therefore, the invention can improve the efficiency of the streaming calculation.

Drawings

FIG. 1 is a flow chart of a data processing method for streaming computing according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a detailed implementation of one of the steps in the data processing method for streaming computing shown in FIG. 1;

FIG. 3 is a functional block diagram of a data processing apparatus for streaming computing according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of an electronic device implementing the data processing method for streaming computing according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the application provides a data processing method for streaming computing. The execution body of the data processing method for streaming computing includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the data processing method for streaming computation may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Referring to fig. 1, a flow chart of a data processing method for streaming computing according to an embodiment of the invention is shown. In this embodiment, the data processing method for streaming computing includes:

S1, partitioning preset data, and dividing the data in each partition into different sub-buckets for storage;

in the embodiment of the present invention, the preset data refers to data that is authorized to be accessed, for example, user web browsing data, user order data, and the like. The preset data may be structured data, stored in a relational Database, such as a Database.

In the embodiment of the invention, in order to avoid all scanning all preset data when performing operations such as adding, deleting, updating and the like on specific data in the preset data, the preset data is stored in a partitioning and barrel-dividing manner.

In the embodiment of the invention, the partitioning refers to designating a specific partition space for a data table when the data table is created, wherein the partition space corresponds to different file directories, so that related data can be searched through the partition, all data of the preset data are not required to be scanned, and only the data of the designated partition are required to be scanned.

In the embodiment of the invention, the sub-bucket refers to that on the basis of partition, a specific column is assigned to a data file in the partition, hash calculation is performed on the specific column to obtain a hash value corresponding to each specific column, the data file corresponding to the same hash value is stored in the same bucket, and the sub-bucket is a finer-granularity data range division.

In detail, the partitioning the preset data and dividing the data in each partition into different sub-buckets for storage includes: dividing the preset data into different partitions by using preset partition keys; dividing the data in each partition into different sub-buckets by using a preset sub-bucket key; creating a data table consistent with the barrel dividing quantity; and storing the data corresponding to each sub-bucket in each partition into the same data table one by one.

In the embodiment of the present invention, the preset partition key and the preset partition key may be specified in advance according to actual situations.

Exemplary, the data sheet named good-price includes fields such as commodity ID, commodity name, and commodity price. The preset partition key is a price level, the price level comprises A, B, C types, and the data table named as good-price is stored in a partition mode according to the partition key as follows:

/hive/test.db/Goods-price/rank＝A

/hive/test.db/Goods-price/rank＝B

/hive/test.db/Goods-price/rank＝C

Wherein, the data of commodity price level belonging to the range of A in the good-price data table is stored under the condition of catalog/hive/test.db/good-price/rank=A. It will be appreciated that the table structure corresponding to the data stored by each partition is unchanged or consistent with the table structure of the data table good-price, except that the corresponding data is different.

In the embodiment of the present invention, the preset partition key may be obtained by performing hash calculation through a certain field in a data table stored in a certain partition, and storing hash value corresponding data records with the same value into a unified bucket file.

In another alternative embodiment of the present invention, a data table based on Apache Iceberg storage formats may be created, apache Iceberg is a storage component suitable for streaming computing, and supports functions such as partition change, so as to facilitate maintenance of the preset data after partition and barrel separation.

S2, when a user executes a new operation or a deletion operation on the preset data, generating corresponding new data files and deletion data files, and storing the new data files and the deletion data files into corresponding partitions and sub-buckets;

In the embodiment of the invention, taking an online shopping scene as an example, a user adds goods to a shopping cart, an operation record of adding data to the shopping cart is generated, and the user deletes one kind of goods from the shopping cart, and an operation record of deleting data to the shopping cart is generated. It can be understood that many users shopping online can generate a great amount of operations of adding or deleting data for users at the same time, and in a streaming computing scenario, the system does not execute corresponding data computation every time a new record or deleting record of data is generated, but stores the corresponding operation record, and when the user needs to acquire the latest data, the user performs computation of adding or deleting data according to the corresponding operation record, and finally presents the latest data to the user.

In detail, the generating the corresponding newly added data file includes: acquiring a new operation record corresponding to the new operation; analyzing the newly-added operation record to obtain newly-added operation data information, wherein the newly-added operation data information comprises a data table to be operated; identifying a table structure of the data table to be operated; generating a data file consistent with the table structure of the data table to be operated according to the newly-added operation data information, and taking the data file as a newly-added data file.

In the embodiment of the invention, the newly added operation data information comprises information such as a data table to be operated, an operation type, an operation condition, a field to be operated, a field variation and operation generation time.

For example, if the data table to be operated is a commodity ordering table, the corresponding table structure is "user ID commodity unit price commodity ordering amount". And if the operation type in the newly added operation data information is newly added, the operation condition is that the user ID is 00123, the commodity ID is 045, the commodity unit price is 68, the field to be operated is the commodity ordering amount, and the field change amount is 3, generating a content corresponding to a data file with the same table structure as the data table to be operated according to the newly added operation data information to be 00123 045 68 3.

In the embodiment of the present invention, the method for generating the corresponding deleted data file is the same as the method for generating the corresponding newly added data file, which is not described herein again.

In detail, the storing the newly added data file in the corresponding partition and the corresponding bucket includes: identifying partition keys and barrel dividing keys corresponding to the newly added data files; positioning a partition where the newly added data file is located by utilizing a partition key corresponding to the newly added data file; positioning the sub-bucket of the newly added data file according to the sub-bucket key corresponding to the newly added data file in the partition of the newly added data file; and scanning a data table in the sub-bucket where the newly added data file is located, and inserting the newly added data file into the last row of the data table in the sub-bucket where the newly added data file is located.

In the embodiment of the present invention, the method for storing the deleted data file in the corresponding partition and the corresponding sub-bucket and the method for storing the newly added data file in the corresponding partition and the corresponding sub-bucket are not described herein again.

S3, acquiring new data files and deleted data files in each partition and each sub-bucket at fixed time, merging to obtain a merging result, refreshing the preset data according to the merging result, and deleting the merged new data files and the merged deleted data files;

In the embodiment of the present invention, in a conventional streaming computing scenario, generally, when a user needs to acquire the latest data, the system performs the merging computation of the corresponding new data file and the deleted data file to acquire the latest data, which may cause the memory to be occupied by the proliferation or backlog of the new data file and the deleted data file stored in the system, and also increase the workload of the merging computation, resulting in a certain delay in acquiring the latest data by the user. Therefore, in the embodiment of the invention, the newly added data file and the deleted data file are merged at fixed time, so that the situation that the newly added data file and the deleted data file are suddenly added is avoided, and the workload of merging calculation can be reduced.

In detail, referring to fig. 2, the step of acquiring the new data file and the deleted data file in each partition and each bucket at the same time for merging includes:

S31, scanning each partition and each sub-bucket of the preset data at fixed time according to a preset timing task;

s32, acquiring newly added data files and deleted data files in each sub-bucket in each partition;

S33, merging newly added data files and deleted data files in each sub-bucket one by one to obtain a first merged data file;

S34, carrying out second combination on all the first combined data files in the same partition one by one to obtain the combination result.

Further, the obtaining the new data file and the deleted data file in each of the sub-buckets in each of the partitions includes: acquiring an execution time point of each preset timing task; adding a preset time difference to each execution time point to obtain a file coverage time period; and obtaining newly added data files and deleted data files generated in the file coverage time period in each sub-bucket in each partition.

For example, if the execution time point of a preset timing task is 15:00 and the preset time difference is 30 minutes, the file coverage period is 14:30 to 15:30, then the acquisition is required at 14:30 to 15: and the newly added data file and the deleted data file generated in the 30 time period.

In detail, the step of merging the newly added data files and the deleted data files in each sub-bucket for the first time one by one to obtain a first merged data file includes: analyzing the newly added data files and the deleted data texts in each sub-bucket to obtain an operation data table and an operation field corresponding to each newly added data file and each deleted data file; and carrying out data summation calculation on the newly added data file and the deleted data file which are in the same operation data table and correspond to the same operation field in each sub-bucket to obtain the first merged data file.

In the embodiment of the invention, the 'read-time merging' technology in the streaming calculation can be utilized, and the preset data can be refreshed according to the merging result.

In another optional embodiment of the present invention, if the execution time point of the preset timing task does not arrive, if the user initiates a request for acquiring the latest data, a merging operation for the newly added data file and the deleted data file may be performed according to the request for acquiring the latest data, where the newly added data file and the deleted data file that need to be merged are related newly added data files and deleted data files that are stored in the system and are not deleted.

And S4, when monitoring that a user executes the updating operation on the preset data, identifying a sub-bucket where the data to be updated corresponding to the updating operation is located, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation.

In the embodiment of the invention, the data to be updated can be obtained by analyzing the updating operation, the partition key corresponding to the data to be updated is identified, the partition key corresponding to the data to be updated is utilized to position the sub-bucket where the data to be updated is located, the data to be updated is taken out from the sub-bucket where the data to be updated is located by utilizing the technology of copy-on-write in stream calculation, then the data to be updated is updated according to the updating operation, and the updated data is written back into the sub-bucket where the data to be updated is located.

It can be understood that in a conventional streaming computing scenario, in general, each time of data updating operation, a data block corresponding to data to be updated needs to be taken out from a corresponding storage, after the data updating is completed, the updated data block is written back to the original storage, the data size of the data block involved in each time of data reading and writing is relatively large, and the corresponding data reading and writing costs are relatively large. According to the embodiment of the invention, the data blocks with larger data volume can be split into the data with smaller data volume and exist in the form of the barrel text by storing the data in the barrel in the partition manner, and when the data is required to be updated each time, only the barrel files where the data to be updated are located need to be scanned, so that the data to be updated is rapidly positioned and acquired, and the cost of data reading and writing is reduced.

FIG. 3 is a functional block diagram of a data processing apparatus for streaming computing according to an embodiment of the present invention.

The data processing apparatus 100 for streaming computing according to the present invention may be installed in an electronic device. The data processing apparatus 100 for streaming computing may include a data partitioning and barreling module 101, a data adding and deleting file generating module 102, a data adding and deleting file merging module 103, and a data updating processing module 104 according to the implemented functions. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

The data partitioning and barrel dividing module 101 is configured to partition preset data, and divide the data in each partition into different barrels for storage; the data adding and deleting file generating module 102 is configured to generate a corresponding new data file and a corresponding deleted data file when a user performs a new adding operation or a deleting operation on the preset data, and store the new data file and the deleted data file in a corresponding partition and a corresponding barrel;

the data adding and deleting file merging module 103 is configured to obtain each partition and each newly added data file and deleted data file in the partition at regular time, merge the newly added data file and the deleted data file to obtain a merging result, refresh the preset data according to the merging result, and delete the merged newly added data file and the merged deleted data file;

The data update processing module 104 is configured to identify a sub-bucket in which data to be updated corresponding to the update operation is located when a user performs the update operation on the preset data, obtain the data to be updated from the sub-bucket in which the data to be updated is located, and update the data to be updated according to the update operation.

In detail, the specific implementation manner of each module of the data processing apparatus 100 for streaming computing is as follows:

/hive/test.db/Goods-price/rank＝A

/hive/test.db/Goods-price/rank＝B

/hive/test.db/Goods-price/rank＝C

Step two, when monitoring that a user executes a new operation or a deletion operation on the preset data, generating corresponding new data files and deletion data files, and storing the new data files and the deletion data files into corresponding partitions and sub-buckets;

Step three, acquiring new data files and deleted data files in each partition and each sub-bucket at regular time, merging to obtain a merging result, refreshing the preset data according to the merging result, and deleting the merged new data files and the merged deleted data files;

In detail, the step of obtaining the new data file and the deleted data file in each partition and each sub-bucket at fixed time to combine includes: according to a preset timing task, scanning each partition and each sub-bucket of the preset data at fixed time; acquiring a new data file and a deleted data file in each sub-bucket in each partition; the newly added data files and the deleted data files in each sub-bucket are combined for the first time one by one to obtain first combined data files; and carrying out second merging on all the first merging data files in the same partition one by one to obtain the merging result.

And step four, when monitoring that a user executes updating operation on the preset data, identifying a sub-bucket where the data to be updated corresponding to the updating operation is located, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation.

According to the embodiment of the invention, the newly added data file and the deleted data file generated by the newly added data operation and the deleted data operation are combined and deleted at fixed time, so that the situation of sharp increase of the newly added data file and the deleted data file is avoided, the storage pressure is reduced, the preset data is refreshed according to the combination result, the calculated amount for refreshing the preset data can be reduced, the preset data is stored in the sub-barrels, the corresponding sub-areas and sub-barrels can be scanned to acquire the corresponding data no matter whether the newly added operation, the data deleting operation or the updating operation is executed on the preset data, the scanning of the whole data is not needed, and the cost of data scanning is reduced. Therefore, the data processing device for streaming computing provided by the invention can improve the efficiency of streaming computing.

Fig. 4 is a schematic structural diagram of an electronic device implementing a data processing method for streaming computing according to an embodiment of the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a data processing program for streaming computing.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as code of a data processing program for streaming calculation, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes programs or modules (e.g., a data processing program for streaming calculation, etc.) stored in the memory 11 by running or executing the programs or modules, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.

The bus may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 4 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The data processing program for streaming calculation stored in the memory 11 in the electronic device 1 is a combination of instructions, which when run in the processor 10, can realize:

Further, the modules/units integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A data processing method for streaming computing, the method comprising:

When monitoring that a user executes an updating operation on the preset data, identifying a sub-bucket where data to be updated corresponding to the updating operation is located, acquiring the data to be updated from the sub-bucket where the data to be updated is located, and updating the data to be updated according to the updating operation;

The storing the new added data file and the deleted data file in the corresponding partition and the corresponding barrel includes: identifying partition keys and barrel dividing keys corresponding to the newly added data files; positioning a partition where the newly added data file is located by utilizing a partition key corresponding to the newly added data file; positioning the sub-bucket of the newly added data file according to the sub-bucket key corresponding to the newly added data file in the partition of the newly added data file; scanning a data table in a sub-bucket where the newly added data file is located, and inserting the newly added data file into the last row of the data table in the sub-bucket where the newly added data file is located;

The step of obtaining each partition and the newly added data file and the deleted data file in each sub-bucket at fixed time for merging comprises the following steps: according to a preset timing task, scanning each partition and each sub-bucket of the preset data at fixed time; acquiring a new data file and a deleted data file in each sub-bucket in each partition; the newly added data files and the deleted data files in each sub-bucket are combined for the first time one by one to obtain first combined data files; carrying out second merging on all the first merging data files in the same partition one by one to obtain a merging result;

The obtaining the new data file and the deleted data file in each sub-bucket in each partition includes: acquiring an execution time point of each preset timing task; adding a preset time difference to each execution time point to obtain a file coverage time period; acquiring newly added data files and deleted data files generated in the file coverage time period in each sub-bucket in each partition;

The step of merging the newly added data files and the deleted data files in each sub-bucket for the first time one by one to obtain a first merged data file comprises the following steps: analyzing the newly added data files and the deleted data texts in each sub-bucket to obtain an operation data table and an operation field corresponding to each newly added data file and each deleted data file; and summing up the newly added data files and the deleted data files which are the same in the operation data table and correspond to the same operation field in each sub-bucket to obtain the first merged data file.

2. The data processing method for streaming computing according to claim 1, wherein partitioning preset data and dividing the data in each partition into different buckets for storage, comprises:

Creating a data table consistent with the barrel dividing quantity;

3. The method for data processing of streaming computing according to claim 1, wherein said generating a corresponding newly added data file comprises:

Acquiring a new operation record corresponding to the new operation;

Identifying a table structure of the data table to be operated;

4. A data processing apparatus for streaming computation for implementing the data processing method for streaming computation according to any one of claims 1 to 3, characterized in that the apparatus comprises:

5. An electronic device, the electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, instructions being executable by the at least one processor to enable the at least one processor to perform the data processing method for streaming computation of any one of claims 1 to 3.

6. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method for streaming computation according to any one of claims 1 to 3.