Background technique
The data line of more new database can add an exclusive lock to the row data to guarantee the accuracy updated,
Only after the completion of this update, just allow to be updated next time.Data under certain typical business scenarios, in data
Library level is that huge hot spot is again simultaneously single-point.For example the frequent big account of a book keeping operation needs more new balance;Such as accounting
General ledger subject in system, more than one hundred million entries, which need to drive, daily updates dozens of general ledger section purpose amount incurred and remaining sum;Such as
System for settling account needs to go out result by business or subject matter sorting according to bilateral mechanism for daily more than one hundred million payment messages.These
Hot spot data (account, general ledger subject, sorting result) under scene, it is per second in addition every millisecond require to carry out magnanimity it is concurrent more
Newly, in the operation that Database Systems level is unbearable this scale.Real-time update can not accomplish that quasi real time updating is also
One quite challenging project.
Meanwhile in the large scale system of distributed structure/architecture, the delay of interaction between naturally occurring system, it may be possible to which network prolongs
Slow reason, it is also possible to which the response of some asynchronous message middlewares itself is delayed.These usual delays are very of short duration, but
Under certain unusual conditions, some delays may be amplified to very long, it may be possible to minute rank even hour rank.Therefore, at certain
It is counted a bit using the time as dimension, during more new data, for example updates the daily amount incurred of some account or sorting two
The each hour receivables and payables amount of money etc. of financial institution, it is also necessary to it considers how to identify and handle this kind of data delayed to reach,
It is also desirable to reduce the resource overhead to database as much as possible.
In order to solve the quasi real time replacement problem of hot spot data, there are mainly two types of schemes at present:
First, not going real-time update when receiving update request, but this batch data being kept in database.To day
Terminal hour summarizes the data for needing to update out, disposable progress by traversing all temporary detailed datas as unit of the date
It updates.The frequency that updates may be needed higher under certain scenes, therefore can be using even minute hour as a cycle, according to rising
The only time traverses temporary detailed data, summarizes the data for needing to update out, is disposably updated.Summarize update to be also known as
Catch updated.
In this scheme, it can be very good to solve the problems, such as that hot spot data updates.But there is likely to be some because of certain
A little reasons delay, and the update not being properly received by system also when summarizing is requested, and in this scenario, are just counted, make by leakage
At the inaccuracy of more new data.
Second, it is special to stamp in the database to these data after the update that traversal has summarized some period is requested
Label or deletion.In next round traversal, request all to be the request summarized not yet present in default database, in this way
It would not repeat and omit.But this scheme is needed to carry out batch updating to mass data within the almost same time or be deleted
Except operation, it is a huge impact for database, will greatly affect the stability of system.
Summary of the invention
The method and device thereof for updating and counting the purpose of the present invention is to provide data in a kind of distributed system, can be true
The timeliness for protecting data statistics, and effectively prevent the leakage for the data not counted in time by delay to count, it is ensured that coming freely
The accurate punctual statistics of the information of large-scale distributed architecture system.
It is updated and system in order to solve the above technical problems, embodiments of the present invention disclose data in a kind of distributed system
The method of meter, comprising the following steps:
Judge whether the time difference between the time of reception of data and the generation moment of the data that receive is more than predetermined
Threshold value;
If it is judged that be it is yes, then according to the time of reception generate data batch number;
If it is judged that be it is no, then according to generate the moment generate data batch number;
It will be in the data deposit database with batch number;
The data with batch number stored in database are counted according to batch number.
Embodiments of the present invention also disclose data in a kind of distributed system and update and the device of statistics, comprising:
Judging unit, the time of reception of the data for judging to receive and the time difference of the data generated between the moment
It whether is more than predetermined threshold;
First generation unit, for generating batch of data according to the time of reception when the judging result of judging unit, which is, is
Secondary number;
Second generation unit, for generating criticizing for data according to the moment is generated when the judging result of judging unit is no
Secondary number;
Storage unit, the data for that will have batch number are stored in database;
Statistic unit, for being counted according to batch number to the data with batch number stored in database.
Compared with prior art, the main distinction and its effect are embodiment of the present invention:
Batch number is stamped to data according to the generation moment of data, it can be ensured that the timeliness of follow-up data statistics, and for
The data for postponing to receive beat batch number according to its time of reception, can effectively prevent the data not counted in time by delay
Leakage statistics, it is ensured that the accurate punctual statistics for carrying out the information of large-scale distributed architecture system freely.Also, it is counted by batch, it can
The update operating burden for reducing Database Systems, particularly with hot spot data with per second or even every millisecond requires to carry out magnanimity
The Database Systems of concurrent update.
Specific embodiment
In the following description, in order to make the reader understand this application better, many technical details are proposed.But this
The those of ordinary skill in field is appreciated that even if without these technical details and many variations based on the following respective embodiments
And modification, each claim of the application technical solution claimed can also be realized.
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to implementation of the invention
Mode is described in further detail.
First embodiment of the invention is related to data in a kind of distributed system and updates and the method for statistics.Fig. 1 is this point
The flow diagram of the method for data update and statistics in cloth system.
Specifically, as shown in Figure 1, in the distributed system data update and statistics method the following steps are included:
In a step 101, judge the time of reception of the data received and the time difference of the data generated between the moment
It whether is more than predetermined threshold.If it is judged that be it is yes, then enter step 102;Otherwise, 103 are entered step.
In a step 102, the batch number of data is generated according to the time of reception.
In step 103, according to the batch number for generating moment generation data.
It, will be in the data deposit database with batch number then into step 104.
Then into step 105, the data with batch number stored in database are counted according to batch number.
Hereafter terminate this process.
In a preference of the invention, above-mentioned steps 105 include following sub-step:
Determine that statistics belongs to the statistical time with a batch of all data with the data according to the generation moment of data.
Furthermore, it is to be understood that can also otherwise be set according to the actual situation in other embodiments of the invention
Determine the statistical time of different batches data, for example be divided into multiple time intervals for one day, is stored in data in each time interval
The data with batch number in library are counted at the time of the time interval terminates.
In an another preference of the invention, it is divided within one day multiple time intervals, each time interval corresponding one in advance
A batch number, the batch number include the date on the same day.And include following sub-step in above-mentioned steps 102:
Using the corresponding batch number of the time interval where the time of reception of data as the batch number of the data.And above-mentioned step
Rapid 103 include following sub-step:
Using the corresponding batch number of time interval where the generation moment of data as the batch number of the data.
One of this preference the specific implementation process is as follows:
Hot spot data is counted by dimension of half an hour, then, 48 timeslices will be divided within 24 hours one day, then
The serial number of these timeslices can be set as date splicing upper 01 to 48, i.e. yyyyMMdd01~yyyyMMdd48.In this way if
Some business actually occur the time be in the 00:29:50 on the 30th of August in 2014, in 00:00-00:30/, we are just
It can be stamped in 2014083001 this batch number deposit database to it in advance when receiving the database.It has served as
00:30/after, we can go all data for being accompanied with 2014083001 this batch number of scan database, by certain
Dimension carries out collect statistics.Result after summarizing means that the total of the data updated in this half an hour of 00:00-00:30
With.
But not all data can all reach quickly, for example, the data occurred just now in 00:29:50, it is possible to
It is just received processing (general delay is all in second grade even Millisecond) in 00:30:50, so, we cannot be at 00:30 points
With regard to starting to be scanned at once, but need to be arranged a predetermined threshold, such as (how many conjunctions were arranged in predetermined threshold actually in 10 minutes
It is suitable, need to count real data the distribution map of delay time, look for a most suitable point), make in a timeslice
Business almost all can all be reached before 00:40 point, then we are in 00:40 points 00:00-00:30 points of statistics of data, meeting
Obtain the result of a pin-point accuracy.
However, in some extreme environments, a few datas possible delay time is very long, such as delay half an hour or several small
When.In this way, we start when 00:40 starts statistics, because data are also less than even if added 10 minutes predetermined thresholds
It reaches, would not also count on these data.When these data are when 00:59:16 is received, if actually occurred according to business
Time 00:29:50 is to batch number 2014083001 is stamped, then being aggregated because of this batch number, this number
It is counted on according to never having again.So having had been subjected to that batch that it should belong to when we need to being actually reached
The data of secondary sum time beat the batch number of other batches, are included in other batches and are handled.For example this 00:59:16 is arrived
The data reached, we will stamp 2014083002 batch number, occur to unite together in the data of 00:30-01:00 with all
Meter.In addition, if having data be 01:59:16 be received it is necessary to stamp 2014083004 batch number, with all generations
It is counted together in the data of 01:30-02:00.And so on, this processing is all made to the subsequent extended data of generation.
It is appreciated that batch number can also be otherwise determined in other embodiments of the invention, for example, with
Hour, moon etc. are unit, are divided and are different time interval, then the batch number of determining corresponding each time interval, or with
Based on generated batch number, subsequent batch number of Accumulating generation, etc..
In addition, the update of database and the timeliness of statistical operation are quasi real time, to connect in another preference of the invention
The data received are electronic commerce transaction datas.It is appreciated that being reality of the real-time reception from distributed system in the present invention
When database or other real time data sources data (for example, transaction data), then generate its batch number, and since distribution is
Data with batch number are finally carried out the respective record in quasi real time statistics and more new database by the performance issue of system.
Furthermore, it is to be understood that in the present invention, information included by batch number can determine as the case may be, for example, can
The date generated including the batch number may also comprise corresponding disparate databases if data can be stored in multiple databases
Mark, for example, the batch number of data can be if data are stored in two statistic frequencies different database A and B
201408301a5b, indicate same data be stored in two databases batch be respectively 1 and 5.Data may be from same system,
It can be from multiple systems, partial data will appear transmission delay in transmission process.
Batch number is stamped to data according to the generation moment of data, it can be ensured that the timeliness of follow-up data statistics, and for
The data for postponing to receive beat batch number according to its time of reception, can effectively prevent the data not counted in time by delay
Leakage statistics, it is ensured that the accurate punctual statistics for carrying out the information of large-scale distributed architecture system freely.Also, it is counted by batch,
The update operating burden that Database Systems can be reduced, particularly with hot spot data with per second or even every millisecond requires to carry out magnanimity
Concurrent update Database Systems.
Each method embodiment of the invention can be realized in a manner of software, hardware, firmware etc..Regardless of the present invention be with
Software, hardware or firmware mode realize that instruction code may be stored in any kind of computer-accessible memory
In (such as permanent perhaps revisable volatibility is perhaps non-volatile solid or non-solid, it is fixed or
The replaceable medium etc. of person).Equally, memory may, for example, be programmable logic array (Programmable Array
Logic, referred to as " PAL "), random access memory (Random Access Memory, referred to as " RAM "), it may be programmed read-only deposit
Reservoir (Programmable Read Only Memory, referred to as " PROM "), read-only memory (Read-Only Memory, letter
Claim " ROM "), electrically erasable programmable read-only memory (Electrically Erasable Programmable ROM, referred to as
" EEPROM "), disk, CD, digital versatile disc (Digital Versatile Disc, referred to as " DVD ") etc..
Second embodiment of the invention is related to data in a kind of distributed system and updates and the device of statistics.Fig. 2 is this point
The structural schematic diagram of the device of data update and statistics in cloth system.
As shown in Fig. 2, data update in the distributed system and the device of statistics includes:
Judging unit, the time of reception of the data for judging to receive and the time difference of the data generated between the moment
It whether is more than predetermined threshold.
First generation unit, for generating batch of data according to the time of reception when the judging result of judging unit, which is, is
Secondary number.
Second generation unit, for generating criticizing for data according to the moment is generated when the judging result of judging unit is no
Secondary number.
Storage unit, the data for that will have batch number are stored in database.
Statistic unit, for being counted according to batch number to the data with batch number stored in database.
In a preference of the invention, it is divided within one day multiple time intervals in advance, each time interval is one corresponding
Batch number, the batch number include the date on the same day.And above-mentioned first generation unit includes following subelement:
First batch number generates subelement, makees for the corresponding batch number of time interval where the time of reception by data
For the batch number of the data.And above-mentioned second generation unit includes following subelement:
Second lot number generates subelement, makees for the corresponding batch number of time interval where the generation moment by data
For the batch number of the data.
In another preference of the invention, above-mentioned statistic unit includes following subelement:
Moment statistics subelement is generated, for determining that statistics belongs to same batch with the data according to the generation moment of data
All data statistical time.
In addition, in another preference of the invention, it is real subject to the update of above-mentioned database and the timeliness of statistical operation
When.The data received are electronic commerce transaction datas.
First embodiment is method implementation corresponding with present embodiment, and present embodiment can be implemented with first
Mode is worked in coordination implementation.The relevant technical details mentioned in first embodiment are still effective in the present embodiment, in order to
It reduces and repeats, which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in the first implementation
In mode.
It should be noted that each unit mentioned in each equipment embodiment of the present invention is all logic unit, physically,
One logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physics
The combination of unit realizes that the Physical realization of these logic units itself is not most important, these logic units institute reality
The combination of existing function is only the key for solving technical problem proposed by the invention.In addition, in order to protrude innovation of the invention
Part, there is no the technical problem relationship proposed by the invention with solution is less close for the above-mentioned each equipment embodiment of the present invention
Unit introduce, this does not indicate above equipment embodiment and there is no other units.
It should be noted that in the claim and specification of this patent, such as first and second or the like relationship
Term is only used to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying
There are any actual relationship or orders between these entities or operation.Moreover, the terms "include", "comprise" or its
Any other variant is intended to non-exclusive inclusion so that include the process, methods of a series of elements, article or
Equipment not only includes those elements, but also including other elements that are not explicitly listed, or further include for this process,
Method, article or the intrinsic element of equipment.In the absence of more restrictions, being wanted by what sentence " including one " limited
Element, it is not excluded that there is also other identical elements in the process, method, article or apparatus that includes the element.
Although being shown and described to the present invention by referring to some of the preferred embodiment of the invention,
It will be understood by those skilled in the art that can to it, various changes can be made in the form and details, without departing from this hair
Bright spirit and scope.