CN104951306B - Data processing method and system based on real-time Computational frame - Google Patents
Data processing method and system based on real-time Computational frame Download PDFInfo
- Publication number
- CN104951306B CN104951306B CN201510338373.4A CN201510338373A CN104951306B CN 104951306 B CN104951306 B CN 104951306B CN 201510338373 A CN201510338373 A CN 201510338373A CN 104951306 B CN104951306 B CN 104951306B
- Authority
- CN
- China
- Prior art keywords
- data
- transaction identifier
- result
- working node
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of data processing method and system based on real-time Computational frame, methods described includes:Control node initiates Transaction Identifier corresponding to affairs and generation, and transaction commands message is sent to the first working node according to the Transaction Identifier;First working node pulls batch of data according to the transaction commands message from specified data source;The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and sends the data to the second working node;Second working node is handled the data, the result of the data is submitted into database according to the Transaction Identifier, the result includes the field of the Transaction Identifier.The present invention is relative to traditional data processing method, it is ensured that data are not lost and will not be repeatedly processed through vaporization, condensation and absorption, and reliability is high.
Description
Technical field
The present invention relates to network data processing field, more particularly to a kind of data processing side based on real-time Computational frame
Method and system.
Background technology
With the fast development of Internet technology, the demand of data processing is also on the increase.The value of data is over time
Pass and pass, if data can be by real-time collection, transmission, processing and in turn to inline system generation positive feedback (example
Such as the model parameter of on-line system is modified), then the value of data can be exhibited to maximum.Therefore, data are entered
The correlation technique that row calculates in real time is also just arisen at the historic moment.
Most of existing various systems for calculating data in real time are intended for the design object of high-throughput, low time delay, right
Guarantee can not be provided in data reliability, or weaker guarantee can only be provided.And in some business scenarios, some weights be present
The very high data of the property wanted need to be processed in real-time, such as need the data of deducting fees of real-time accounting report, real-time update system model
Parameter etc..Therefore, traditional real time computation system can not be satisfied with the demand of reliable real time data processing.
In traditional real-time Computational frame, ensureing the real time data processing of reliability can only generally also ensure that data are not lost
Lose, but be likely to data to be processed repeatedly, that is, the data processing repeated be present.Although there are some data to be repeatedly processed through vaporization, condensation and absorption
Repeatedly result is not influenceed.For example, renewal communication user head image information into Key-Value systems, with messenger
Code is key, head image information value, if every time by the way of covering updates, then data are updated several times Key-
Value be do not have it is influential.But for this calculating operation of form, data can not be then repeatedly processed through vaporization, condensation and absorption, otherwise can image
Result.
The content of the invention
Based on this, it is necessary to for above-mentioned technical problem, there is provided one kind can ensure that data are not lost and will not repeated
The data processing method and system based on real-time Computational frame of processing.
A kind of data processing method based on real-time Computational frame, methods described include:
Control node initiates Transaction Identifier corresponding to affairs and generation, and transaction commands message is sent according to the Transaction Identifier
To the first working node;
First working node pulls batch of data according to the transaction commands message from specified data source;
The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and the data are sent out
Give the second working node;
Second working node is handled the data, according to the Transaction Identifier by the processing knot of the data
Fruit is submitted to database, and the result includes the field of the Transaction Identifier.
A kind of data handling system based on real-time Computational frame, the system include:
Control node, for initiating Transaction Identifier corresponding to affairs and generation, affairs life is sent according to the Transaction Identifier
Make message;
First working node, for receiving the transaction commands message of the control node transmission and according to the transaction commands
Message pulls batch of data from specified data source;
First working node, which is additionally operable to metadata corresponding to the data corresponding to the Transaction Identifier, to be stored;
The data are handled, root by the second working node, the data pulled for receiving first working node
The result of the data is submitted to database according to the Transaction Identifier, the result includes the Transaction Identifier
Field;
Database, the result for the data submitted for storing second working node.
Above-mentioned data processing method and system based on real-time Computational frame, by initiating affairs in control node and generating
Corresponding Transaction Identifier, transaction commands message is sent to the first working node, the first working node root according to the Transaction Identifier
According to the transaction commands message from specified data source pulling data, the metadata of the data is then corresponded into the affairs
Mark is stored, and sends the data to the second working node, and the second working node is handled the data, and
The result of the data is submitted to by database according to the Transaction Identifier, the result includes the affairs mark
The field of knowledge, so metadata can also be got according to the Transaction Identifier when data processing fails and be pulled again with realizing
Its corresponding data, will not lose data;In addition, as long as the second working node compares when submitting the result of the data
Transaction Identifier just can judge whether the result of the data was submitted, so as to avoid the result of the data well
Repetition submit, relative to traditional data processing method, reliability is high.
Brief description of the drawings
Fig. 1 is the applied environment figure that the data processing method based on real-time Computational frame is realized in an embodiment;
Fig. 2 is Storm cluster block diagrams in one embodiment;
Fig. 3 is the cut-away view of Computational frame in one embodiment;
Fig. 4 is the schematic flow sheet of the data processing method based on real-time Computational frame in one embodiment;
Fig. 5 is that the result of data is submitted to database by the second working node according to Transaction Identifier in one embodiment
Particular flow sheet;
Fig. 6 is the specific stream that the result of data is submitted to after database by the second working node according to Transaction Identifier
Cheng Tu;
Fig. 7 is the structured flowchart of the data handling system based on real-time Computational frame in one embodiment.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
It is appreciated that term " first " used in the present invention, " second " etc. can be used to describe various elements herein,
But these elements should not be limited by these terms.These terms are only used for distinguishing first element and another element.
As shown in figure 1, it is the application environment that the data processing method based on real-time Computational frame is realized in one embodiment
Figure.The application environment includes service server 102, message queue 104, Computational frame 106 and database 108.Wherein business
Server 102 can be the direct computer for producing business datum or the intermediary forwarded to business datum
Server etc..Business datum can be various request datas, advertisement refresh data etc..
Message queue 104 from 102 access service data of service terminal when business datum can be converted to distributed message team
Row.The pulling data from message queue 104 of Computational frame 106 is handled, and the result renewal of data is arrived into database
In 108.As long as database 108 notifies the result of the data of Computational frame 106 to store successfully, then data with regard to persistence,
It will not lose.
The processing data, and to the unique Transaction Identifier of distribution, thing per batch of data in a batch of Computational frame 106
Business mark and the Transaction Identifier of previous generation are incremented by relation.The metadata of data and corresponding Transaction Identifier can be stored
Get up, metadata information can be used for re-reading the batch data from message queue 104.Computational frame 106 ensures at data
The result of reason is committed in database 108 in order.In one embodiment, database 108 is Key-Value distributed storages
System.
In one embodiment, the Computational frame 106 in Fig. 1 is the cluster based on Storm.Storm cluster block diagram is such as
Shown in Fig. 2.
Storm is the real time computation system increased income by Twitter companies, and it possesses the spy such as real-time, distributed, Error Tolerance
Point.Storm is substantially a Message Processing network being made up of processor and message queue, there is provided a series of to calculate in real time
Primitive, allow developer to be spun off from the message queue maintenance of complexity, fault detect, the affairs such as cluster management, and special
Note the exploitation in business function.
Storm clusters include a control node 202 and multiple working nodes 206.Control node 202 and working node
Coordinated between 206 by coordinating and managing cluster (zookeeper) 204.
Specifically, control node 202 runs one for code distribution, task distribution and to the shape of working node 206
State is monitored, the background program of troubleshooting etc..Working node 206 runs one is used for whether monitoring control node 202
There is the task of transmission and start or close the background program of the progress of work.Coordinated management cluster 204 is responsible for the He of working node 206
Coordination between control node 202.The state of working node 206 and control node 202 is stored in coordinated management cluster 204,
So to restart rapidly in the case of process unexpected death.
In one embodiment, the cut-away view of the Computational frame in Fig. 1 is as shown in Figure 2.The Computational frame includes control
Node, coordinated management cluster, the first working node and the second working node processed.Control node is the control of whole Computational frame
Center, the first working node are used for the pulling data from specified data source (such as database, file, log system etc.), and
The second working node is sent to be handled.Second working node can be completed much to work, such as:Filtering, polymerization, access text
Part or database.Second working node receives data and handled from the first working node, if running into complex data
Processing, the result after the second working node may also be handled oneself is sent to another second working node and subsequently located
Reason.Data can be transmitted to multiple second working nodes by one the second working node, can also receive from the multiple first work
The data that node or the second working node transmit.Degree of parallelism setting is carried out to each first working node and the second working node,
When disposal ability deficiency, it can be extended by improving degree of parallelism.
As shown in figure 4, the flow chart for the data processing method based on real-time Computational frame in one embodiment.In Fig. 4
The data processing method based on real-time Computational frame be illustrated with the Computational frame run in Fig. 3.Methods described
Comprise the following steps:
Step S402:Control node initiates Transaction Identifier corresponding to affairs and generation, and affairs life is sent according to Transaction Identifier
Message is made to the first working node.
Specifically, all it is to need to handle and update into database by Computational frame from the data of service terminal access,
And database is to need control node initiation affairs just data to be handled.Control node can generate one when initiating affairs
Corresponding Transaction Identifier, transaction commands message is then sent to the first working node according to Transaction Identifier.
First working node has multiple, and control node can calculate its cryptographic Hash using Transaction Identifier as keyword, and will breathe out
Uncommon value identical Transaction Identifier storage is into same first working node, even if this ensure that same transaction commands message
Repeatedly it can be also sent in same first working node by sending.
Step S404:First working node pulls batch of data according to transaction commands message from specified data source.
Specifically, after control node sends transaction commands message, the first working node will be from specified data source
In pull batch of data.In one embodiment, the data source specified refers to the message queue for including miscellaneous service data.
Data source is Distributed Message Queue, and such first working node is from data source during pulling data, it is only necessary to even
Be connected to data source, it is not necessary to carry out it is cumbersome restart and the operation such as insert, the flexibility of pulling data can be improved.
Step S406:The metadata of data is corresponded to Transaction Identifier and stored by the first working node, and data are sent
To the second working node.
Specifically, there is corresponding metadata per batch of data.Metadata (Metadata) is also known as broker data or relaying
Data, to describe the data (data about data) of data.Metadata is primarily used to describe data attribute (property)
Information, support such as instruction storage location, historical data, resource lookup, file record function.
The metadata of data is corresponded into Transaction Identifier to be stored so that data, metadata and Transaction Identifier three it
Between mapping relations one by one be present.Corresponding metadata is so got by Transaction Identifier can, according to metadata can
Return to and pull corresponding data in the data source specified again.
In one embodiment, the metadata of data corresponds to Transaction Identifier and is stored in coordinated management cluster (Zookeeper)
In.
Step S408:Second working node is handled data, is submitted the result of data according to Transaction Identifier
To database.
Specifically, after the metadata for the data that wherein one or more first working nodes are pulled is stored,
First working node will send the data to the second working node and be handled, finally according to Transaction Identifier by the processing of data
As a result it is submitted to database.
Result includes the field of Transaction Identifier.By ensureing data relatively per Transaction Identifier corresponding to batch data
The order submitted of result, while the effect of duplicate removal can also be reached when the result to data is submitted.
In one embodiment, the Transaction Identifier that control node generates every time is relative to its previous Transaction Identifier generated
It is incremented by.For example, the Transaction Identifier of previous generation of control node is N, then the Transaction Identifier generated afterwards is just N+1.
Submitted successfully if Transaction Identifier is that the result of the data corresponding to N is also no, control node is will not to control work section
Point submits the result that Transaction Identifier is the data corresponding to N+1.
Above-mentioned data processing method and system based on real-time Computational frame, by initiating affairs in control node and generating
Corresponding Transaction Identifier, transaction commands message is sent to the first working node, the first working node root according to the Transaction Identifier
According to the transaction commands message from specified data source pulling data, the metadata of the data is then corresponded into the affairs
Mark is stored, and sends the data to the second working node, and the second working node is handled the data, and
The result of the data is submitted to by database according to the Transaction Identifier, the result includes the affairs mark
The field of knowledge, so metadata can also be got according to the Transaction Identifier when data processing fails and be pulled again with realizing
Its corresponding data, will not lose data;In addition, as long as the second working node compares when submitting the result of the data
Transaction Identifier just can judge whether the result of the data was submitted, so as to avoid the result of the data well
Repetition submit, relative to traditional data processing method, reliability is high.
As shown in figure 5, in one embodiment, the second working node submits the result of data according to Transaction Identifier
Include to the step of database:
Step S502:Whether the second working node detection Transaction Identifier, which is more than the last result submitted, is included
Transaction Identifier, if so, step S504 is then performed, if it is not, then performing step S506.
Specifically, it is incremented by closing between Transaction Identifier and the Transaction Identifier of last generation corresponding to control node generation
System.It is N+1 if necessary to Transaction Identifier corresponding to the result of the data of submission, and the last result institute submitted
Comprising Transaction Identifier be N, then the result for the data for illustrating currently to submit was not submitted also, directly performed step S504
Submit the result of data;It is N if necessary to Transaction Identifier corresponding to the result of the data of submission, and the last time carries
The Transaction Identifier that the result of friendship is included also is N, then the results of the data for illustrating currently to submit is submitted mistake
, step S506 is directly performed to avoid repeating submitting.Be not in the result correspondence for the data that needs are submitted in theory
Transaction Identifier be less than the Transaction Identifier number that the last result submitted is included because this is by real-time Computational frame
Internal mechanism ensure.
Step S504:Submit the result of data.
Step S506:The result of data is not submitted.
In one embodiment, the data processing method based on real-time Computational frame is somebody's turn to do also to comprise the following steps:Work as data
Result be submitted to database process occur failure when, the second working node notifies to control node according to Transaction Identifier
The control node;Described in control node controls the first working node to be pulled again from specified data source according to Transaction Identifier
Data.
Specifically, will be according to Transaction Identifier when control node receives the fault message of the second working node transmission
Corresponding metadata is got, and transmits metadata to the first working node, then controls the first working node again according to member
Data pull the batch data from specified data source again.
Fig. 6 is refer to, the result of data is submitted to after database according to Transaction Identifier for the second working node
Particular flow sheet.
Step S602:Second working node notifies control node result to submit successfully according to Transaction Identifier.
Specifically, after the result of data is submitted to database by the second working node according to Transaction Identifier, the
Two working nodes will notify control node result to submit successfully, while the second work node is also notified that control node phase
The Transaction Identifier answered.
Step S604:Control node metadata according to corresponding to obtaining Transaction Identifier.
Specifically, when metadata is all that corresponding Transaction Identifier is stored, therefore, control node can be according to Transaction Identifier
Metadata corresponding to directly obtaining.
Step S606:Control node sends confirmation to specified data source according to metadata and drawn with realizing from data source
Remove batch of data.
Specifically, control node will send confirmation to data source, represent metadata pair after metadata is obtained
The data answered have been disposed and have successfully been submitted to database and suffered.So, the vernier for specifying data is used in data source
Just move after the meeting, in order to which next batch data can be pulled when the first working node next time is according to transaction commands message pulling data.
Illustrate the principle of the above-mentioned data processing method based on real-time Computational frame below by concrete application scene, should
Application scenarios illustrate by taking Computational frame in Fig. 3 as an example.
As shown in figure 3, control node is the control centre of whole Computational frame in real time, the first working node is used for from message
Pulling data in queue, the data that the second working node is used to pull the first working node are handled.Control node is responsible for
Transaction commands message, control the first working node pulling data from message queue are sent, confirms whether data are successfully processed,
After data processing success, confirmation is sent to message queue;In the case of data processing failure, control node can be again
Send transaction commands message and remove the batch data that reads back.Control node initiates affairs and generates new Transaction Identifier, according to affairs mark
Know and send transaction commands message to after one of them first working node, the first working node can pull number from message queue
According to.If the pulling data success from message queue of the first working node, can correspond to Transaction Identifier by the metadata of data and deposit
Store up in coordinated management cluster.
After having stored metadata, the first working node can send data to the second working node to carry out data processing.
Second working node is the working node that can carry out paralleling transaction processing, and the second working node is divided into two classes:A kind of second work
Make node and be not related to change to outside storage state, the result of data can be aggregated into after data have been handled another
In the working node of class second.Generally requiring the action of renewal external storage state can all be placed in another kind of second working node
OK.After the result of data collects, control node can issue affairs submiting command to the second working node.Second work
The result of data is successfully committed to after database and can feed back one to control node include Transaction Identifier by node
Successful submission information, control node according to Transaction Identifier from coordinated management cluster in obtain the Transaction Identifier corresponding to first number
According to, and confirmation is sent to message queue according to metadata.It is used to specifying the verniers of data so in message queue just after the meeting
Move, in order to which next batch data can be pulled when the first working node next time is according to transaction commands message pulling data.
Further, it is one-to-one relation between Transaction Identifier, metadata, data three, if the second work section
When clicking through row data processing failure or when the second working node submits data failure, it can be collected according to Transaction Identifier from coordinated management
Metadata corresponding to data is obtained in group, and then is pulled again to the batch data from message queue according to metadata.If draw
Data failure is taken, then records the metadata of sky.
In order to improve performance, data processing is divided into processing stage and the stage of recognition, and processing stage can be and true with parallel computation
Affairs must be submitted in order by recognizing the stage.Filtered inside general batch of data, count, merge etc. operation, be not related to it is external
The operation that portion's storage state is updated is placed on processing stage, because processing stage is parallel, it is possible to while carry out more
The processing of batch data.It is related to the operation being updated to outside storage state, then needs to be placed on the stage of recognition, some affairs needs
Processing phase process is waited to complete, and the submitted success of previous affairs, then into the stage of recognition, the control of transaction status
Conversion is coordinated by control node.
Either in processing stage still in the stage of recognition, affairs failure (such as write database and fail), the second work
Fault message can be sent to notify the control node batch data processing failure to control node by making node, and control node can enter
To abnormality processing flow, metadata is obtained from coordinated management cluster, and the batch data is reset according to metadata, after then carrying out
Continuous processing, untill affairs are submitted successfully.
In a distributed system, usually it at this moment can not know whether corresponding processing succeeds also there is " time-out " state
, it is possible to processing has succeeded, and the confirmation message for simply returning bag has delay, causes time-out;It is also likely to be that machine or process are hung
Fall, it is at all no to be handled accordingly.In one embodiment, " time-out " is handled as status of fail, if
In the time-out time of setting, not yet clearly obtain the successful message of the issued transaction, then unsuccessfully handled as the affairs,
Data readback can equally be carried out.
Either fault message notice is unsuccessfully also a time out caused failure, when carrying out data readback, if data
Processing procedure is not idempotence, then is likely to result in Data duplication.In one embodiment, database Key-Value
Distributed memory system.
When designing Key-Value forms, it is necessary to increase Transaction ID field in Value, the field references are recently submitted
The affairs ID number of result, when the result of each data is submitted, it is required for entering the Transaction ID field in Value
Row renewal.
For example, what the first working node pulled from specified data source is a collection of ad data, ad data bag
Included advertising items number, control node can give generated per a collection of ad data one it is unique corresponding to Transaction Identifier, in order to
Do not repeat to submit during processing data.It is the click volume of advertisement that if Computational frame is calculative, then is stored in database
Data result is for Key with advertising items number, and the click volume of advertisement is stored for Value, and includes affairs mark in Value
The field of knowledge.Advertising items number are easy to retrieve corresponding ad click amount from database.
As shown in fig. 7, in one embodiment, there is provided a kind of data handling system 700 based on real-time Computational frame,
Function with the data processing method based on real-time Computational frame for realizing above-mentioned each embodiment.Real-time calculation block should be based on
The data handling system 700 of frame includes control node 702, the first working node 704, the second working node 706, database 708.
Control node 702 is used to initiate Transaction Identifier corresponding to affairs and generation, and transaction commands are sent according to Transaction Identifier
Message.
First working node 704 is used to receive the transaction commands message of the transmission of control node 702 and disappeared according to transaction commands
Breath pulls batch of data from specified data source.
First working node 704, which is additionally operable to metadata corresponding to data corresponding to Transaction Identifier, to be stored.
Second working node 706 is used to receive the data that the first working node 704 pulls, and data are handled, according to
The result of data is submitted to database 708 by Transaction Identifier, and result includes the field of Transaction Identifier;
Database 708, the result for the data submitted for storing the second working node 706.
In one embodiment, it is between Transaction Identifier and the Transaction Identifier of last generation corresponding to control node generation
It is incremented by relation.
Second working node 706 is used to detect whether Transaction Identifier is more than what the last result submitted was included
Transaction Identifier, if so, the result of data then is committed into database 708, otherwise, the result of data is not submitted to
Database 708.
In one embodiment, the second working node 706 is additionally operable to be submitted to database 708 in the result of data
When failure occurs for process, control node 702 is notified according to Transaction Identifier.
Control node 702 is additionally operable to control the first working node 704 according to Transaction Identifier again from specified data source
Pulling data.
Specifically, control node 702 is used for the metadata according to corresponding to obtaining Transaction Identifier, and transmits metadata to the
One working node 704, the first working node 704 of control is according to the metadata again pulling data from specified data source.
In one embodiment, the second working node 706 is additionally operable to put forward the result of data according to Transaction Identifier
Be sent to after database 708 notifies the result of control node 702 to submit successfully according to Transaction Identifier.
Control node 702 is additionally operable to the metadata according to corresponding to obtaining Transaction Identifier, and according to metadata to specified number
Confirmation, which is sent, according to source pulls next batch data to realize from data source.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with
The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium
In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
The non-volatile memory mediums such as dish, CD, read-only memory (Read-Only Memory, ROM), or random storage memory
Body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously
Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art
Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (8)
1. a kind of data processing method based on real-time Computational frame, methods described include:
Control node initiates affairs and Transaction Identifier corresponding to generating, and transaction commands message is sent to the according to the Transaction Identifier
One working node, control node are the control centres of whole Computational frame, and the control conversion of transaction status is assisted by control node
Adjust;
First working node pulls batch of data according to the transaction commands message from specified data source;
The metadata of the data is corresponded to the Transaction Identifier and stored by the first working node, and is sent the data to
Second working node, metadata are used for describing the information of data attribute so that between data, metadata and Transaction Identifier three
In the presence of mapping relations one by one;
Second working node is handled the data, is put forward the result of the data according to the Transaction Identifier
Database is sent to, the result includes the field of the Transaction Identifier, number corresponding to the Transaction Identifier judgement
According to result whether submitted, if submitted, do not submit the result of the data, otherwise, submit the number
According to result;
When failure occurs for the process that the result of the data is submitted to database, second working node is according to
Transaction Identifier notifies the control node;
The control node controls first working node to be pulled again from specified data source according to the Transaction Identifier
The data.
2. according to the method for claim 1, it is characterised in that corresponding to the generation the step of Transaction Identifier generated in
Transaction Identifier and last generation Transaction Identifier between be incremented by relation;
The step of result of the data is submitted to database by second working node according to the Transaction Identifier, bag
Include:
Second working node detects whether the Transaction Identifier is more than the affairs mark that the last result submitted is included
Know, if so, then submitting the result of the data, otherwise, do not submit the result of the data.
3. according to the method for claim 1, it is characterised in that the control node control first working node according to
The step of Transaction Identifier pulls the data from specified data source again, including:
Control node metadata according to corresponding to obtaining the Transaction Identifier, and the metadata is sent to described first
Working node;
First working node pulls the data from specified data source again according to the metadata.
4. according to the method for claim 3, it is characterised in that second working node is according to the Transaction Identifier by institute
After stating the step of the results of data is submitted to database, in addition to:
Second working node notifies the control node result to submit successfully according to the Transaction Identifier;
Control node metadata according to corresponding to obtaining the Transaction Identifier;
The control node sends confirmation to specified data source according to the metadata and pulled down with realizing from data source
Batch of data.
5. a kind of data handling system based on real-time Computational frame, the system include:
Control node, for initiating Transaction Identifier corresponding to affairs and generation, transaction commands are sent according to the Transaction Identifier and disappeared
Breath, control node are the control centres of whole Computational frame, and the control conversion of transaction status is coordinated by control node;
First working node, for receiving the transaction commands message of the control node transmission and according to the transaction commands message
Batch of data is pulled from specified data source;
First working node, which is additionally operable to metadata corresponding to the data corresponding to the Transaction Identifier, to be stored, first number
According to for describing the information of data attribute so that mapping relations one by one between data, metadata and Transaction Identifier three be present;
The data are handled, according to institute by the second working node, the data pulled for receiving first working node
State Transaction Identifier and the result of the data is submitted to database, the result includes the word of the Transaction Identifier
Section, whether the result of data was submitted corresponding to the Transaction Identifier judgement, if submitted, did not submitted described
The result of data, otherwise, submit the result of the data;
Second working node is additionally operable to when failure occurs for the process that the result of the data is submitted to database, root
The control node is notified according to the Transaction Identifier;
The control node is additionally operable to control first working node according to the Transaction Identifier again from specified data source
In pull the data;
Database, the result for the data submitted for storing second working node.
6. system according to claim 5, it is characterised in that Transaction Identifier and upper one corresponding to the control node generation
Relation is incremented by between the Transaction Identifier of secondary generation;
Second working node is used to detect whether the Transaction Identifier is included more than the last result submitted
Transaction Identifier, if so, the result of the data then is committed into the database, otherwise, not by the place of the data
Reason result is submitted to the database.
7. system according to claim 5, it is characterised in that the control node is used to be obtained according to the Transaction Identifier
Corresponding metadata, and the metadata is sent to first working node, first working node is controlled according to institute
State metadata and pull the data from specified data source again.
8. system according to claim 7, it is characterised in that second working node is additionally operable to according to the affairs
The result of the data is submitted to after database by mark notifies the control node to handle according to the Transaction Identifier
As a result submit successfully;
The control node be additionally operable to according to the Transaction Identifier obtain corresponding to metadata, and according to the metadata to specify
Data source send confirmation pull next batch data to realize from data source.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510338373.4A CN104951306B (en) | 2015-06-17 | 2015-06-17 | Data processing method and system based on real-time Computational frame |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510338373.4A CN104951306B (en) | 2015-06-17 | 2015-06-17 | Data processing method and system based on real-time Computational frame |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104951306A CN104951306A (en) | 2015-09-30 |
CN104951306B true CN104951306B (en) | 2018-03-20 |
Family
ID=54165977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510338373.4A Active CN104951306B (en) | 2015-06-17 | 2015-06-17 | Data processing method and system based on real-time Computational frame |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104951306B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107870982B (en) * | 2017-10-02 | 2021-04-23 | 深圳前海微众银行股份有限公司 | Data processing method, system and computer readable storage medium |
CN108009849B (en) * | 2017-11-30 | 2021-12-17 | 北京小度互娱科技有限公司 | Method and device for generating account state |
CN110045912B (en) * | 2018-01-16 | 2021-06-01 | 华为技术有限公司 | Data processing method and device |
CN109144761A (en) * | 2018-07-12 | 2019-01-04 | 北京猫眼文化传媒有限公司 | A kind of data fault processing method and system |
CN110955509A (en) * | 2019-12-11 | 2020-04-03 | 深圳迅策科技有限公司 | Finance concurrent transaction processing apparatus |
CN111988217B (en) * | 2020-08-31 | 2022-09-23 | Oppo广东移动通信有限公司 | Data interaction method and device, electronic equipment and storage medium |
CN113821407B (en) * | 2021-09-15 | 2023-08-01 | 浙江浙大网新软件产业集团有限公司 | Storm distributed real-time computing method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235747A (en) * | 2013-04-24 | 2013-08-07 | 曙光信息产业(北京)有限公司 | Method and system for recovering metadata |
CN104408552A (en) * | 2014-11-13 | 2015-03-11 | 华为技术有限公司 | Method, device and system for cooperatively processing task |
-
2015
- 2015-06-17 CN CN201510338373.4A patent/CN104951306B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103235747A (en) * | 2013-04-24 | 2013-08-07 | 曙光信息产业(北京)有限公司 | Method and system for recovering metadata |
CN104408552A (en) * | 2014-11-13 | 2015-03-11 | 华为技术有限公司 | Method, device and system for cooperatively processing task |
Also Published As
Publication number | Publication date |
---|---|
CN104951306A (en) | 2015-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104951306B (en) | Data processing method and system based on real-time Computational frame | |
US11151479B2 (en) | Automated computer-based model development, deployment, and management | |
US10417528B2 (en) | Analytic system for machine learning prediction model selection | |
US10331490B2 (en) | Scalable cloud-based time series analysis | |
US11627053B2 (en) | Continuous data sensing of functional states of networked computing devices to determine efficiency metrics for servicing electronic messages asynchronously | |
US10628409B2 (en) | Distributed data transformation system | |
CN109961204B (en) | Service quality analysis method and system under micro-service architecture | |
CN103513983B (en) | method and system for predictive alert threshold determination tool | |
CN103069394B (en) | The feature of assessment data flow diagram | |
US20180240062A1 (en) | Collaborative algorithm development, deployment, and tuning platform | |
US7912946B2 (en) | Method using footprints in system log files for monitoring transaction instances in real-time network | |
US20150363386A1 (en) | Domain Knowledge Driven Semantic Extraction System | |
US10642610B2 (en) | Scalable cloud-based time series analysis | |
CN103853821A (en) | Method for constructing multiuser collaboration oriented data mining platform | |
US20240205266A1 (en) | Epistemic uncertainty reduction using simulations, models and data exchange | |
CN112799708B (en) | Method and system for jointly updating business model | |
US20090193112A1 (en) | System and computer program product for monitoring transaction instances | |
WO2023071761A1 (en) | Anomaly positioning method and device | |
CN103618652A (en) | Audit and depth analysis system and audit and depth analysis method of business data | |
US11354583B2 (en) | Automatically generating rules for event detection systems | |
CN105205052B (en) | A kind of data digging method and device | |
CN109918313A (en) | A kind of SaaS software performance method for diagnosing faults based on GBDT decision tree | |
Raj et al. | Big data analytics processes and platforms facilitating smart cities | |
Bouhata et al. | Byzantine fault tolerance in distributed machine learning: a survey | |
US20210334704A1 (en) | Method and System for Operating a Technical Installation with an Optimal Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |