CN104951306B - Data processing method and system based on real-time Computational frame - Google Patents

Data processing method and system based on real-time Computational frame Download PDF

Info

Publication number
CN104951306B
CN104951306B CN201510338373.4A CN201510338373A CN104951306B CN 104951306 B CN104951306 B CN 104951306B CN 201510338373 A CN201510338373 A CN 201510338373A CN 104951306 B CN104951306 B CN 104951306B
Authority
CN
China
Prior art keywords
data
transaction identifier
result
working node
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510338373.4A
Other languages
Chinese (zh)
Other versions
CN104951306A (en
Inventor
杜冲
谢贵明
徐钊
陈志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201510338373.4A priority Critical patent/CN104951306B/en
Publication of CN104951306A publication Critical patent/CN104951306A/en
Application granted granted Critical
Publication of CN104951306B publication Critical patent/CN104951306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of data processing method and system based on real-time Computational frame, methods described includes:Control node initiates Transaction Identifier corresponding to affairs and generation, and transaction commands message is sent to the first working node according to the Transaction Identifier;First working node pulls batch of data according to the transaction commands message from specified data source;The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and sends the data to the second working node;Second working node is handled the data, the result of the data is submitted into database according to the Transaction Identifier, the result includes the field of the Transaction Identifier.The present invention is relative to traditional data processing method, it is ensured that data are not lost and will not be repeatedly processed through vaporization, condensation and absorption, and reliability is high.

Description

Data processing method and system based on real-time Computational frame
Technical field
The present invention relates to network data processing field, more particularly to a kind of data processing side based on real-time Computational frame Method and system.
Background technology
With the fast development of Internet technology, the demand of data processing is also on the increase.The value of data is over time Pass and pass, if data can be by real-time collection, transmission, processing and in turn to inline system generation positive feedback (example Such as the model parameter of on-line system is modified), then the value of data can be exhibited to maximum.Therefore, data are entered The correlation technique that row calculates in real time is also just arisen at the historic moment.
Most of existing various systems for calculating data in real time are intended for the design object of high-throughput, low time delay, right Guarantee can not be provided in data reliability, or weaker guarantee can only be provided.And in some business scenarios, some weights be present The very high data of the property wanted need to be processed in real-time, such as need the data of deducting fees of real-time accounting report, real-time update system model Parameter etc..Therefore, traditional real time computation system can not be satisfied with the demand of reliable real time data processing.
In traditional real-time Computational frame, ensureing the real time data processing of reliability can only generally also ensure that data are not lost Lose, but be likely to data to be processed repeatedly, that is, the data processing repeated be present.Although there are some data to be repeatedly processed through vaporization, condensation and absorption Repeatedly result is not influenceed.For example, renewal communication user head image information into Key-Value systems, with messenger Code is key, head image information value, if every time by the way of covering updates, then data are updated several times Key- Value be do not have it is influential.But for this calculating operation of form, data can not be then repeatedly processed through vaporization, condensation and absorption, otherwise can image Result.
The content of the invention
Based on this, it is necessary to for above-mentioned technical problem, there is provided one kind can ensure that data are not lost and will not repeated The data processing method and system based on real-time Computational frame of processing.
A kind of data processing method based on real-time Computational frame, methods described include:
Control node initiates Transaction Identifier corresponding to affairs and generation, and transaction commands message is sent according to the Transaction Identifier To the first working node;
First working node pulls batch of data according to the transaction commands message from specified data source;
The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and the data are sent out Give the second working node;
Second working node is handled the data, according to the Transaction Identifier by the processing knot of the data Fruit is submitted to database, and the result includes the field of the Transaction Identifier.
A kind of data handling system based on real-time Computational frame, the system include:
Control node, for initiating Transaction Identifier corresponding to affairs and generation, affairs life is sent according to the Transaction Identifier Make message;
First working node, for receiving the transaction commands message of the control node transmission and according to the transaction commands Message pulls batch of data from specified data source;
First working node, which is additionally operable to metadata corresponding to the data corresponding to the Transaction Identifier, to be stored;
The data are handled, root by the second working node, the data pulled for receiving first working node The result of the data is submitted to database according to the Transaction Identifier, the result includes the Transaction Identifier Field;
Database, the result for the data submitted for storing second working node.
Above-mentioned data processing method and system based on real-time Computational frame, by initiating affairs in control node and generating Corresponding Transaction Identifier, transaction commands message is sent to the first working node, the first working node root according to the Transaction Identifier According to the transaction commands message from specified data source pulling data, the metadata of the data is then corresponded into the affairs Mark is stored, and sends the data to the second working node, and the second working node is handled the data, and The result of the data is submitted to by database according to the Transaction Identifier, the result includes the affairs mark The field of knowledge, so metadata can also be got according to the Transaction Identifier when data processing fails and be pulled again with realizing Its corresponding data, will not lose data;In addition, as long as the second working node compares when submitting the result of the data Transaction Identifier just can judge whether the result of the data was submitted, so as to avoid the result of the data well Repetition submit, relative to traditional data processing method, reliability is high.
Brief description of the drawings
Fig. 1 is the applied environment figure that the data processing method based on real-time Computational frame is realized in an embodiment;
Fig. 2 is Storm cluster block diagrams in one embodiment;
Fig. 3 is the cut-away view of Computational frame in one embodiment;
Fig. 4 is the schematic flow sheet of the data processing method based on real-time Computational frame in one embodiment;
Fig. 5 is that the result of data is submitted to database by the second working node according to Transaction Identifier in one embodiment Particular flow sheet;
Fig. 6 is the specific stream that the result of data is submitted to after database by the second working node according to Transaction Identifier Cheng Tu;
Fig. 7 is the structured flowchart of the data handling system based on real-time Computational frame in one embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
It is appreciated that term " first " used in the present invention, " second " etc. can be used to describe various elements herein, But these elements should not be limited by these terms.These terms are only used for distinguishing first element and another element.
As shown in figure 1, it is the application environment that the data processing method based on real-time Computational frame is realized in one embodiment Figure.The application environment includes service server 102, message queue 104, Computational frame 106 and database 108.Wherein business Server 102 can be the direct computer for producing business datum or the intermediary forwarded to business datum Server etc..Business datum can be various request datas, advertisement refresh data etc..
Message queue 104 from 102 access service data of service terminal when business datum can be converted to distributed message team Row.The pulling data from message queue 104 of Computational frame 106 is handled, and the result renewal of data is arrived into database In 108.As long as database 108 notifies the result of the data of Computational frame 106 to store successfully, then data with regard to persistence, It will not lose.
The processing data, and to the unique Transaction Identifier of distribution, thing per batch of data in a batch of Computational frame 106 Business mark and the Transaction Identifier of previous generation are incremented by relation.The metadata of data and corresponding Transaction Identifier can be stored Get up, metadata information can be used for re-reading the batch data from message queue 104.Computational frame 106 ensures at data The result of reason is committed in database 108 in order.In one embodiment, database 108 is Key-Value distributed storages System.
In one embodiment, the Computational frame 106 in Fig. 1 is the cluster based on Storm.Storm cluster block diagram is such as Shown in Fig. 2.
Storm is the real time computation system increased income by Twitter companies, and it possesses the spy such as real-time, distributed, Error Tolerance Point.Storm is substantially a Message Processing network being made up of processor and message queue, there is provided a series of to calculate in real time Primitive, allow developer to be spun off from the message queue maintenance of complexity, fault detect, the affairs such as cluster management, and special Note the exploitation in business function.
Storm clusters include a control node 202 and multiple working nodes 206.Control node 202 and working node Coordinated between 206 by coordinating and managing cluster (zookeeper) 204.
Specifically, control node 202 runs one for code distribution, task distribution and to the shape of working node 206 State is monitored, the background program of troubleshooting etc..Working node 206 runs one is used for whether monitoring control node 202 There is the task of transmission and start or close the background program of the progress of work.Coordinated management cluster 204 is responsible for the He of working node 206 Coordination between control node 202.The state of working node 206 and control node 202 is stored in coordinated management cluster 204, So to restart rapidly in the case of process unexpected death.
In one embodiment, the cut-away view of the Computational frame in Fig. 1 is as shown in Figure 2.The Computational frame includes control Node, coordinated management cluster, the first working node and the second working node processed.Control node is the control of whole Computational frame Center, the first working node are used for the pulling data from specified data source (such as database, file, log system etc.), and The second working node is sent to be handled.Second working node can be completed much to work, such as:Filtering, polymerization, access text Part or database.Second working node receives data and handled from the first working node, if running into complex data Processing, the result after the second working node may also be handled oneself is sent to another second working node and subsequently located Reason.Data can be transmitted to multiple second working nodes by one the second working node, can also receive from the multiple first work The data that node or the second working node transmit.Degree of parallelism setting is carried out to each first working node and the second working node, When disposal ability deficiency, it can be extended by improving degree of parallelism.
As shown in figure 4, the flow chart for the data processing method based on real-time Computational frame in one embodiment.In Fig. 4 The data processing method based on real-time Computational frame be illustrated with the Computational frame run in Fig. 3.Methods described Comprise the following steps:
Step S402:Control node initiates Transaction Identifier corresponding to affairs and generation, and affairs life is sent according to Transaction Identifier Message is made to the first working node.
Specifically, all it is to need to handle and update into database by Computational frame from the data of service terminal access, And database is to need control node initiation affairs just data to be handled.Control node can generate one when initiating affairs Corresponding Transaction Identifier, transaction commands message is then sent to the first working node according to Transaction Identifier.
First working node has multiple, and control node can calculate its cryptographic Hash using Transaction Identifier as keyword, and will breathe out Uncommon value identical Transaction Identifier storage is into same first working node, even if this ensure that same transaction commands message Repeatedly it can be also sent in same first working node by sending.
Step S404:First working node pulls batch of data according to transaction commands message from specified data source.
Specifically, after control node sends transaction commands message, the first working node will be from specified data source In pull batch of data.In one embodiment, the data source specified refers to the message queue for including miscellaneous service data.
Data source is Distributed Message Queue, and such first working node is from data source during pulling data, it is only necessary to even Be connected to data source, it is not necessary to carry out it is cumbersome restart and the operation such as insert, the flexibility of pulling data can be improved.
Step S406:The metadata of data is corresponded to Transaction Identifier and stored by the first working node, and data are sent To the second working node.
Specifically, there is corresponding metadata per batch of data.Metadata (Metadata) is also known as broker data or relaying Data, to describe the data (data about data) of data.Metadata is primarily used to describe data attribute (property) Information, support such as instruction storage location, historical data, resource lookup, file record function.
The metadata of data is corresponded into Transaction Identifier to be stored so that data, metadata and Transaction Identifier three it Between mapping relations one by one be present.Corresponding metadata is so got by Transaction Identifier can, according to metadata can Return to and pull corresponding data in the data source specified again.
In one embodiment, the metadata of data corresponds to Transaction Identifier and is stored in coordinated management cluster (Zookeeper) In.
Step S408:Second working node is handled data, is submitted the result of data according to Transaction Identifier To database.
Specifically, after the metadata for the data that wherein one or more first working nodes are pulled is stored, First working node will send the data to the second working node and be handled, finally according to Transaction Identifier by the processing of data As a result it is submitted to database.
Result includes the field of Transaction Identifier.By ensureing data relatively per Transaction Identifier corresponding to batch data The order submitted of result, while the effect of duplicate removal can also be reached when the result to data is submitted.
In one embodiment, the Transaction Identifier that control node generates every time is relative to its previous Transaction Identifier generated It is incremented by.For example, the Transaction Identifier of previous generation of control node is N, then the Transaction Identifier generated afterwards is just N+1. Submitted successfully if Transaction Identifier is that the result of the data corresponding to N is also no, control node is will not to control work section Point submits the result that Transaction Identifier is the data corresponding to N+1.
Above-mentioned data processing method and system based on real-time Computational frame, by initiating affairs in control node and generating Corresponding Transaction Identifier, transaction commands message is sent to the first working node, the first working node root according to the Transaction Identifier According to the transaction commands message from specified data source pulling data, the metadata of the data is then corresponded into the affairs Mark is stored, and sends the data to the second working node, and the second working node is handled the data, and The result of the data is submitted to by database according to the Transaction Identifier, the result includes the affairs mark The field of knowledge, so metadata can also be got according to the Transaction Identifier when data processing fails and be pulled again with realizing Its corresponding data, will not lose data;In addition, as long as the second working node compares when submitting the result of the data Transaction Identifier just can judge whether the result of the data was submitted, so as to avoid the result of the data well Repetition submit, relative to traditional data processing method, reliability is high.
As shown in figure 5, in one embodiment, the second working node submits the result of data according to Transaction Identifier Include to the step of database:
Step S502:Whether the second working node detection Transaction Identifier, which is more than the last result submitted, is included Transaction Identifier, if so, step S504 is then performed, if it is not, then performing step S506.
Specifically, it is incremented by closing between Transaction Identifier and the Transaction Identifier of last generation corresponding to control node generation System.It is N+1 if necessary to Transaction Identifier corresponding to the result of the data of submission, and the last result institute submitted Comprising Transaction Identifier be N, then the result for the data for illustrating currently to submit was not submitted also, directly performed step S504 Submit the result of data;It is N if necessary to Transaction Identifier corresponding to the result of the data of submission, and the last time carries The Transaction Identifier that the result of friendship is included also is N, then the results of the data for illustrating currently to submit is submitted mistake , step S506 is directly performed to avoid repeating submitting.Be not in the result correspondence for the data that needs are submitted in theory Transaction Identifier be less than the Transaction Identifier number that the last result submitted is included because this is by real-time Computational frame Internal mechanism ensure.
Step S504:Submit the result of data.
Step S506:The result of data is not submitted.
In one embodiment, the data processing method based on real-time Computational frame is somebody's turn to do also to comprise the following steps:Work as data Result be submitted to database process occur failure when, the second working node notifies to control node according to Transaction Identifier The control node;Described in control node controls the first working node to be pulled again from specified data source according to Transaction Identifier Data.
Specifically, will be according to Transaction Identifier when control node receives the fault message of the second working node transmission Corresponding metadata is got, and transmits metadata to the first working node, then controls the first working node again according to member Data pull the batch data from specified data source again.
Fig. 6 is refer to, the result of data is submitted to after database according to Transaction Identifier for the second working node Particular flow sheet.
Step S602:Second working node notifies control node result to submit successfully according to Transaction Identifier.
Specifically, after the result of data is submitted to database by the second working node according to Transaction Identifier, the Two working nodes will notify control node result to submit successfully, while the second work node is also notified that control node phase The Transaction Identifier answered.
Step S604:Control node metadata according to corresponding to obtaining Transaction Identifier.
Specifically, when metadata is all that corresponding Transaction Identifier is stored, therefore, control node can be according to Transaction Identifier Metadata corresponding to directly obtaining.
Step S606:Control node sends confirmation to specified data source according to metadata and drawn with realizing from data source Remove batch of data.
Specifically, control node will send confirmation to data source, represent metadata pair after metadata is obtained The data answered have been disposed and have successfully been submitted to database and suffered.So, the vernier for specifying data is used in data source Just move after the meeting, in order to which next batch data can be pulled when the first working node next time is according to transaction commands message pulling data.
Illustrate the principle of the above-mentioned data processing method based on real-time Computational frame below by concrete application scene, should Application scenarios illustrate by taking Computational frame in Fig. 3 as an example.
As shown in figure 3, control node is the control centre of whole Computational frame in real time, the first working node is used for from message Pulling data in queue, the data that the second working node is used to pull the first working node are handled.Control node is responsible for Transaction commands message, control the first working node pulling data from message queue are sent, confirms whether data are successfully processed, After data processing success, confirmation is sent to message queue;In the case of data processing failure, control node can be again Send transaction commands message and remove the batch data that reads back.Control node initiates affairs and generates new Transaction Identifier, according to affairs mark Know and send transaction commands message to after one of them first working node, the first working node can pull number from message queue According to.If the pulling data success from message queue of the first working node, can correspond to Transaction Identifier by the metadata of data and deposit Store up in coordinated management cluster.
After having stored metadata, the first working node can send data to the second working node to carry out data processing. Second working node is the working node that can carry out paralleling transaction processing, and the second working node is divided into two classes:A kind of second work Make node and be not related to change to outside storage state, the result of data can be aggregated into after data have been handled another In the working node of class second.Generally requiring the action of renewal external storage state can all be placed in another kind of second working node OK.After the result of data collects, control node can issue affairs submiting command to the second working node.Second work The result of data is successfully committed to after database and can feed back one to control node include Transaction Identifier by node Successful submission information, control node according to Transaction Identifier from coordinated management cluster in obtain the Transaction Identifier corresponding to first number According to, and confirmation is sent to message queue according to metadata.It is used to specifying the verniers of data so in message queue just after the meeting Move, in order to which next batch data can be pulled when the first working node next time is according to transaction commands message pulling data.
Further, it is one-to-one relation between Transaction Identifier, metadata, data three, if the second work section When clicking through row data processing failure or when the second working node submits data failure, it can be collected according to Transaction Identifier from coordinated management Metadata corresponding to data is obtained in group, and then is pulled again to the batch data from message queue according to metadata.If draw Data failure is taken, then records the metadata of sky.
In order to improve performance, data processing is divided into processing stage and the stage of recognition, and processing stage can be and true with parallel computation Affairs must be submitted in order by recognizing the stage.Filtered inside general batch of data, count, merge etc. operation, be not related to it is external The operation that portion's storage state is updated is placed on processing stage, because processing stage is parallel, it is possible to while carry out more The processing of batch data.It is related to the operation being updated to outside storage state, then needs to be placed on the stage of recognition, some affairs needs Processing phase process is waited to complete, and the submitted success of previous affairs, then into the stage of recognition, the control of transaction status Conversion is coordinated by control node.
Either in processing stage still in the stage of recognition, affairs failure (such as write database and fail), the second work Fault message can be sent to notify the control node batch data processing failure to control node by making node, and control node can enter To abnormality processing flow, metadata is obtained from coordinated management cluster, and the batch data is reset according to metadata, after then carrying out Continuous processing, untill affairs are submitted successfully.
In a distributed system, usually it at this moment can not know whether corresponding processing succeeds also there is " time-out " state , it is possible to processing has succeeded, and the confirmation message for simply returning bag has delay, causes time-out;It is also likely to be that machine or process are hung Fall, it is at all no to be handled accordingly.In one embodiment, " time-out " is handled as status of fail, if In the time-out time of setting, not yet clearly obtain the successful message of the issued transaction, then unsuccessfully handled as the affairs, Data readback can equally be carried out.
Either fault message notice is unsuccessfully also a time out caused failure, when carrying out data readback, if data Processing procedure is not idempotence, then is likely to result in Data duplication.In one embodiment, database Key-Value Distributed memory system.
When designing Key-Value forms, it is necessary to increase Transaction ID field in Value, the field references are recently submitted The affairs ID number of result, when the result of each data is submitted, it is required for entering the Transaction ID field in Value Row renewal.
For example, what the first working node pulled from specified data source is a collection of ad data, ad data bag Included advertising items number, control node can give generated per a collection of ad data one it is unique corresponding to Transaction Identifier, in order to Do not repeat to submit during processing data.It is the click volume of advertisement that if Computational frame is calculative, then is stored in database Data result is for Key with advertising items number, and the click volume of advertisement is stored for Value, and includes affairs mark in Value The field of knowledge.Advertising items number are easy to retrieve corresponding ad click amount from database.
As shown in fig. 7, in one embodiment, there is provided a kind of data handling system 700 based on real-time Computational frame, Function with the data processing method based on real-time Computational frame for realizing above-mentioned each embodiment.Real-time calculation block should be based on The data handling system 700 of frame includes control node 702, the first working node 704, the second working node 706, database 708.
Control node 702 is used to initiate Transaction Identifier corresponding to affairs and generation, and transaction commands are sent according to Transaction Identifier Message.
First working node 704 is used to receive the transaction commands message of the transmission of control node 702 and disappeared according to transaction commands Breath pulls batch of data from specified data source.
First working node 704, which is additionally operable to metadata corresponding to data corresponding to Transaction Identifier, to be stored.
Second working node 706 is used to receive the data that the first working node 704 pulls, and data are handled, according to The result of data is submitted to database 708 by Transaction Identifier, and result includes the field of Transaction Identifier;
Database 708, the result for the data submitted for storing the second working node 706.
In one embodiment, it is between Transaction Identifier and the Transaction Identifier of last generation corresponding to control node generation It is incremented by relation.
Second working node 706 is used to detect whether Transaction Identifier is more than what the last result submitted was included Transaction Identifier, if so, the result of data then is committed into database 708, otherwise, the result of data is not submitted to Database 708.
In one embodiment, the second working node 706 is additionally operable to be submitted to database 708 in the result of data When failure occurs for process, control node 702 is notified according to Transaction Identifier.
Control node 702 is additionally operable to control the first working node 704 according to Transaction Identifier again from specified data source Pulling data.
Specifically, control node 702 is used for the metadata according to corresponding to obtaining Transaction Identifier, and transmits metadata to the One working node 704, the first working node 704 of control is according to the metadata again pulling data from specified data source.
In one embodiment, the second working node 706 is additionally operable to put forward the result of data according to Transaction Identifier Be sent to after database 708 notifies the result of control node 702 to submit successfully according to Transaction Identifier.
Control node 702 is additionally operable to the metadata according to corresponding to obtaining Transaction Identifier, and according to metadata to specified number Confirmation, which is sent, according to source pulls next batch data to realize from data source.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic The non-volatile memory mediums such as dish, CD, read-only memory (Read-Only Memory, ROM), or random storage memory Body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (8)

1. a kind of data processing method based on real-time Computational frame, methods described include:
Control node initiates affairs and Transaction Identifier corresponding to generating, and transaction commands message is sent to the according to the Transaction Identifier One working node, control node are the control centres of whole Computational frame, and the control conversion of transaction status is assisted by control node Adjust;
First working node pulls batch of data according to the transaction commands message from specified data source;
The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and is sent the data to Second working node, metadata are used for describing the information of data attribute so that between data, metadata and Transaction Identifier three In the presence of mapping relations one by one;
Second working node is handled the data, is put forward the result of the data according to the Transaction Identifier Database is sent to, the result includes the field of the Transaction Identifier, number corresponding to the Transaction Identifier judgement According to result whether submitted, if submitted, do not submit the result of the data, otherwise, submit the number According to result;
When failure occurs for the process that the result of the data is submitted to database, second working node is according to Transaction Identifier notifies the control node;
The control node controls first working node to be pulled again from specified data source according to the Transaction Identifier The data.
2. according to the method for claim 1, it is characterised in that corresponding to the generation the step of Transaction Identifier generated in Transaction Identifier and last generation Transaction Identifier between be incremented by relation;
The step of result of the data is submitted to database by second working node according to the Transaction Identifier, bag Include:
Second working node detects whether the Transaction Identifier is more than the affairs mark that the last result submitted is included Know, if so, then submitting the result of the data, otherwise, do not submit the result of the data.
3. according to the method for claim 1, it is characterised in that the control node control first working node according to The step of Transaction Identifier pulls the data from specified data source again, including:
Control node metadata according to corresponding to obtaining the Transaction Identifier, and the metadata is sent to described first Working node;
First working node pulls the data from specified data source again according to the metadata.
4. according to the method for claim 3, it is characterised in that second working node is according to the Transaction Identifier by institute After stating the step of the results of data is submitted to database, in addition to:
Second working node notifies the control node result to submit successfully according to the Transaction Identifier;
Control node metadata according to corresponding to obtaining the Transaction Identifier;
The control node sends confirmation to specified data source according to the metadata and pulled down with realizing from data source Batch of data.
5. a kind of data handling system based on real-time Computational frame, the system include:
Control node, for initiating Transaction Identifier corresponding to affairs and generation, transaction commands are sent according to the Transaction Identifier and disappeared Breath, control node are the control centres of whole Computational frame, and the control conversion of transaction status is coordinated by control node;
First working node, for receiving the transaction commands message of the control node transmission and according to the transaction commands message Batch of data is pulled from specified data source;
First working node, which is additionally operable to metadata corresponding to the data corresponding to the Transaction Identifier, to be stored, first number According to for describing the information of data attribute so that mapping relations one by one between data, metadata and Transaction Identifier three be present;
The data are handled, according to institute by the second working node, the data pulled for receiving first working node State Transaction Identifier and the result of the data is submitted to database, the result includes the word of the Transaction Identifier Section, whether the result of data was submitted corresponding to the Transaction Identifier judgement, if submitted, did not submitted described The result of data, otherwise, submit the result of the data;
Second working node is additionally operable to when failure occurs for the process that the result of the data is submitted to database, root The control node is notified according to the Transaction Identifier;
The control node is additionally operable to control first working node according to the Transaction Identifier again from specified data source In pull the data;
Database, the result for the data submitted for storing second working node.
6. system according to claim 5, it is characterised in that Transaction Identifier and upper one corresponding to the control node generation Relation is incremented by between the Transaction Identifier of secondary generation;
Second working node is used to detect whether the Transaction Identifier is included more than the last result submitted Transaction Identifier, if so, the result of the data then is committed into the database, otherwise, not by the place of the data Reason result is submitted to the database.
7. system according to claim 5, it is characterised in that the control node is used to be obtained according to the Transaction Identifier Corresponding metadata, and the metadata is sent to first working node, first working node is controlled according to institute State metadata and pull the data from specified data source again.
8. system according to claim 7, it is characterised in that second working node is additionally operable to according to the affairs The result of the data is submitted to after database by mark notifies the control node to handle according to the Transaction Identifier As a result submit successfully;
The control node be additionally operable to according to the Transaction Identifier obtain corresponding to metadata, and according to the metadata to specify Data source send confirmation pull next batch data to realize from data source.
CN201510338373.4A 2015-06-17 2015-06-17 Data processing method and system based on real-time Computational frame Active CN104951306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510338373.4A CN104951306B (en) 2015-06-17 2015-06-17 Data processing method and system based on real-time Computational frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510338373.4A CN104951306B (en) 2015-06-17 2015-06-17 Data processing method and system based on real-time Computational frame

Publications (2)

Publication Number Publication Date
CN104951306A CN104951306A (en) 2015-09-30
CN104951306B true CN104951306B (en) 2018-03-20

Family

ID=54165977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510338373.4A Active CN104951306B (en) 2015-06-17 2015-06-17 Data processing method and system based on real-time Computational frame

Country Status (1)

Country Link
CN (1) CN104951306B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870982B (en) * 2017-10-02 2021-04-23 深圳前海微众银行股份有限公司 Data processing method, system and computer readable storage medium
CN108009849B (en) * 2017-11-30 2021-12-17 北京小度互娱科技有限公司 Method and device for generating account state
CN110045912B (en) * 2018-01-16 2021-06-01 华为技术有限公司 Data processing method and device
CN109144761A (en) * 2018-07-12 2019-01-04 北京猫眼文化传媒有限公司 A kind of data fault processing method and system
CN110955509A (en) * 2019-12-11 2020-04-03 深圳迅策科技有限公司 Finance concurrent transaction processing apparatus
CN111988217B (en) * 2020-08-31 2022-09-23 Oppo广东移动通信有限公司 Data interaction method and device, electronic equipment and storage medium
CN113821407B (en) * 2021-09-15 2023-08-01 浙江浙大网新软件产业集团有限公司 Storm distributed real-time computing method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235747A (en) * 2013-04-24 2013-08-07 曙光信息产业(北京)有限公司 Method and system for recovering metadata
CN104408552A (en) * 2014-11-13 2015-03-11 华为技术有限公司 Method, device and system for cooperatively processing task

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235747A (en) * 2013-04-24 2013-08-07 曙光信息产业(北京)有限公司 Method and system for recovering metadata
CN104408552A (en) * 2014-11-13 2015-03-11 华为技术有限公司 Method, device and system for cooperatively processing task

Also Published As

Publication number Publication date
CN104951306A (en) 2015-09-30

Similar Documents

Publication Publication Date Title
CN104951306B (en) Data processing method and system based on real-time Computational frame
US11151479B2 (en) Automated computer-based model development, deployment, and management
US10417528B2 (en) Analytic system for machine learning prediction model selection
US10331490B2 (en) Scalable cloud-based time series analysis
US11627053B2 (en) Continuous data sensing of functional states of networked computing devices to determine efficiency metrics for servicing electronic messages asynchronously
US10628409B2 (en) Distributed data transformation system
CN109961204B (en) Service quality analysis method and system under micro-service architecture
CN103513983B (en) method and system for predictive alert threshold determination tool
CN103069394B (en) The feature of assessment data flow diagram
US20180240062A1 (en) Collaborative algorithm development, deployment, and tuning platform
US7912946B2 (en) Method using footprints in system log files for monitoring transaction instances in real-time network
US20150363386A1 (en) Domain Knowledge Driven Semantic Extraction System
US10642610B2 (en) Scalable cloud-based time series analysis
CN103853821A (en) Method for constructing multiuser collaboration oriented data mining platform
US20240205266A1 (en) Epistemic uncertainty reduction using simulations, models and data exchange
CN112799708B (en) Method and system for jointly updating business model
US20090193112A1 (en) System and computer program product for monitoring transaction instances
WO2023071761A1 (en) Anomaly positioning method and device
CN103618652A (en) Audit and depth analysis system and audit and depth analysis method of business data
US11354583B2 (en) Automatically generating rules for event detection systems
CN105205052B (en) A kind of data digging method and device
CN109918313A (en) A kind of SaaS software performance method for diagnosing faults based on GBDT decision tree
Raj et al. Big data analytics processes and platforms facilitating smart cities
Bouhata et al. Byzantine fault tolerance in distributed machine learning: a survey
US20210334704A1 (en) Method and System for Operating a Technical Installation with an Optimal Model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant