CN109145033A

CN109145033A - Computer system and computer implemented method

Info

Publication number: CN109145033A
Application number: CN201811141251.6A
Authority: CN
Inventors: 阿德南·法科
Original assignee: Individual
Current assignee: Individual
Priority date: 2009-09-25
Filing date: 2010-09-22
Publication date: 2019-01-04
Anticipated expiration: 2030-09-22
Also published as: CN109145033B; WO2011036448A2; JP2013506180A; WO2011036448A3; SG10201703775XA; CN102648467B; CN102648467A; JP2016026353A; JP5892937B2; EP2480991A2

Abstract

This application discloses computer system and computer implemented methods.A kind of computer system for according to the set number of computations of input value.The computer system includes database, is configured as the first input value X of storage_nSequence, wherein n=1,2 ... i, wherein for any value n, X_nIt is located at X in the sequence_n+1Between x.Each value of x with the second input value y_n, according to transfer algorithm from y_nThe conversion value p of middle calculating_nAnd output valve z_nIt is associated, wherein z_n=z_n‑1+p_n.Database can be additionally configured to store multiple output valve z_n.Database can be additionally configured to identify minimum value and/or maximum value from the sequence for the value being stored therein, and the second subsequence of the first subsequence of at least limit value and the value after the first subsequence of value, wherein, the boundary between the first subsequence and the second subsequence is located at the minimum value of sequence or the position of maximum value.

Description

Computer system and computer implemented method

The application is international filing date September in 2010 22, the international application of international application no PCT/GB2010/001784 Enter on May 25th, 2012 thenational phase application No. is 201080053621.6, entitled " database and for commenting The divisional application of the patent application of the method for data of the valence from database ", entire contents are hereby expressly incorporated by reference.

Technical field

The present invention relates to the databases for storing such as finance data, more particularly, to building database, in data The method of the data of the computer implemented method of storing data and/or operation storage in the database in library.

The invention further relates to the computer-implemented inquiry mechanism for obtaining and/or evaluating the data from database, tools Body, it is related to the inquiry mechanism for obtaining minimum value or maximum value from the sequence of values stored in database.

Background technique

Database is for for example storing up data in computer system memory and making the associated structure of data.According to expected Usage and there are different database schemas.The main usage of Universal Database is management and is convenient for associated with related application Data input and retrieval.Recent trend is the private database framework that occurs optimizing for specific application domain.

Complex event processing (CEP) is the mistake that low delay is carried out to the event data (for example, finance data) of real world Filter, association, polymerization and/or the technology calculated.This data are usually generated with high frequency, it is therefore desirable to be stored in database appropriate In to allow it to be evaluated by Real-Time Evaluation or in the later period.The a variety of private databases for occurring attempting to store this data produce Product, wherein this data amount generated usually covered Universal Database.

Table 1 shows the product that can be used for CEP application, and provides the different function for operating CEP data.

Table 1

The improvement for being designed to provide basic database technology and processing capacity of these products.However, data storage or Person executes the inquiry and retrieval of data according further to traditional processing.Although these databases can be well suited for holding The traditional transaction of row is oriented to operation, but other than the querying method of standard, they, which are not provided, allows to access and/or evaluate The effective means of a large amount of continuous datas.

It is related to descriptive statistic is provided to this request of a large amount of continuous datas, wherein the importance of each record is small In total description.Descriptive statistic becomes more and more important now, especially for high frequency Large Volume Data application.

The core for evaluating the processing of big consecutive data set is the response to statistics descriptive data request.

Financial service group is made of data providing and client.Data providing handles great institutions client (for example, silver Row) and small client (for example, casual household).Handle biggish client or directly carry out or by third-party vendor (for example, Vhayu all marketing datas) are provided to them, to allow to construct advanced and accurate statistical fluctuation.However, now, by In requiring associated cost with biggish bandwidth needed for the complete market supply of conveying and calculating, this cannot be used for smaller Client.Therefore, snapshot or overview can only be provided to lesser client, and this allows to provide the approximate situation fluctuated.

In addition, when determining the minimum value or maximum value in a storage string value in the database, it is often necessary to retrieval and All records in the serial data are evaluated to determine position and/or the size of minimum/maximum.

Therefore, the I/O bus use when retrieving data set and/or network bandwidth use aspect and evaluation data set institute It is required that calculating in terms of, operating cost is high.These costs will increase with the increase of the quantity of required serial data intermediate value.

Specifically, due to retrieving and evaluating largely the individually cost of data sequences, so that the cost of comparative experiments is especially high.

Summary of the invention

In a first aspect, the present invention provides a kind of database for storing data, it is being configurable to generate to being deposited The intermediate description of data is stored up, to allow more efficiently to operate stored data.

More specifically, the first aspect of the present invention can be provided arranged to the first input value X of storage_nSequence data Library, wherein n=1,2 ... i,

Wherein, for any value n, X_nThe X being all located in sequence_n+1And X_n-1Between, and it is associated with following values:

Second input value y_n；

Conversion value p_n, according to transfer algorithm from y_nIn it is counted；And

Output valve z_n, wherein z_n=z_n-1+p_n；

Database is configured to store multiple output valve z_n。

Database is the computer system for the quantity for calculating the set of input value.

First input value x_nGenerally numerical value, and usually time value.

Second input value y_nIt can be the variable of description financial transaction, such as price or exchange hand.

Conversion value p_nY can be equal to_n.In this case, output valve z in the database is stored_nIt is all from 1 to n Second input value y_nAccumulative total amount.Optionally, output valve z_nIt can be for for exampleOrOr second is defeated Enter value y_nDifferent functions p_nAccumulative total amount.

By providing the second input value y_n(for example, stock price) is stored in output valve z in the form of accumulative_nInterior database, Database stores the second input value by intermediate form (that is, in the form of precomputation or part processing form).Usually, it means that with number The case where traditional database of original untreated form storage, is compared accordingly, raw from database according to the first aspect of the invention Less operation is needed at descriptive data.

Therefore, in second aspect, the present invention provides computer implemented method, this method is used to input according to first Value x_nSequence the corresponding second input value y of a part (from n=a to n=b)_nSet carry out number of computations, and include: At least one output valve z is extracted from database according to the first aspect of the invention_n, wherein n=b.

In general, the method for second aspect of the present invention further includes extracting another output valve z from database_n, wherein n=a.

It, usually can be with using the method for the second aspect of the present invention being combined with database according to a first aspect of the present invention By from output valve z corresponding with the end of interested data sequence_bAnd optionally with interested data sequence The corresponding output valve z in beginning_aDatabase in retrieved and carry out amount shown in computational chart 2.

Table 2

Function p_n	Available descriptive statistic
		P_n=y_n	Summation
P_n=y_n	It is average
		P_n=y_n ²	Variance
P_n=x_n﹒ y_n	The correlation of first input value and the second input value

Therefore, method according to a second aspect of the present invention, by only being examined from database according to a first aspect of the present invention Two output valves of rope can generate the range of descriptive statistic relevant to continuous data sequence.This wherein usually must with using The traditional database that all data values in sequence interested must be retrieved is compared, and retrieval cost is reduced.Therefore, with traditional number Compared according to evaluation method, for the method for second aspect of the present invention, used with I/O bus and/network bandwidth use it is associated Cost tends to reduce.Calculating cost associated with the method for second aspect of the present invention also tends to reduce relative to conventional method.

The reduction for calculating cost associated with the method for second aspect of the present invention is even more important to financial application.The present invention The method of second aspect can permit all terminal users (either small visitor of great institutions client also such as casual household of marketing data Family) generate high precision as needed and complicated descriptive variable, even if they can not obtain whole market supply.This be because To only have a small amount of data to need to be sent to client, and client is only needed to execute a small amount of calculating.

Specifically, if carrying out multiple requests relevant to identical data sequence, foundation will rapidly be amortized The cost of database according to a first aspect of the present invention.Effectively, the database of first aspect present invention allows stored Data itself inline enters the major part of data description, thus when inquiring database using the method for second aspect of the present invention, Reduce the totle drilling cost based on each request.

The cost of descriptive statistic is generated usually to interested data sequence using the database of first aspect present invention Size have lower dependence.This is opposite with traditional database, wherein in traditional database, generates descriptive system The cost of meter and the size of interested data sequence are approximately in proportion.

The cost of descriptive statistic is generated by reducing, the database of first aspect present invention also reduces comparative experiments Cost, for example, the comparison to two data of interest sequences, the variation of such as two stock prices in different time periods.

It effectively, can be by using the side of the second aspect of the present invention of the database combination with first aspect present invention Method has insertion and operator any descriptive statistic to generate.

The conversion value p of first aspect present invention_nIt can be the extremely complex variable calculated according to multiple raw values. For example, conversion value p_nIt can be to provide and be included within output valve z_nIn part processing input variable customized index, responding Use when user requests and calculates.

The precomputation of extremely complex variable or customized index can make the slave number in response to user query by progress According to the reduction for the data volume retrieved in library, and also reduce the quantity of the calculating step in response to each query execution.Specifically Ground, it is contemplated that calculating variable can be reused to reply different user query, which thereby enhance whole efficiency.

Raw value for calculating precomputation variable can be weighted with statistical or dynamical fashion weighting.That is, working as When building precomputation variable (static state weighting) or weight can change over time (i.e., it is possible to updating for each record), often A original value can be assigned specified weight.

In the case where dynamic weighting, each weight variable (or index) for production be expensive, therefore with offer May be in response to user query be reused multiple preparation correlation of indices connection cost it is especially high.

As an example, providing conversion value p under the background of financial application_n(compound function as multiple measured values) Concept can be used for constructing the market capitalization Weighted Index of stock price, wherein the weight of each stock in index can be with base Change in the collective market cap of its variation.According to the use of this index, the process of index construction of executing separately is requested to each, In data and the huge saving of present aspect generation can be calculated as.

The precomputation of data (provides the conversion value p as complex variable_n) also allow to execute more complicated multistage operations, It may be considered the description of description.Additional financial application example is to construct multiple indexes, each based on according to their city The stock set (for example, industrial group's index, telecommunications company's index) of field capitalization weighting.This will allow index (or very To being each company and index) it is combined to the activity that them are monitored in different subsets.

In general, the database of first aspect present invention be configured as storing it is evenly spaced along the sequence of the first input value Value x (the first input variable), so that x_n+1-x_n=x_n-x_n-1。

In general, in x_nIn the case where time value, the interval between continuous time value is less than 0.5s, preferably smaller than 0.1s, More preferably less than 0.05s.

In general, database is additionally configured to first value of the storage for example including 1000 records (preferably 10000 records) Sequence.

The database of first aspect present invention can be the reorganization of traditional database.

Many applications (for example, analysis of finance data) utilize the time series data continuously increased.In this case, Aim at the subsequence for finding the historical data to match with nearest subsequence.In many cases, it is necessary to for difference Sub-sequence length and/position repeats the search process.When data store in traditional database, each search process must It must restart, start to work from the initial data of input database.

However it has been found that in many cases, matching the processing of subsequence by the respective record of two independent subsequences of measurement The distance between；Then final matching results composition is obtained using the summation of these distance measures.Since the operation has Be embedded in and operation can use the number of suitable configurations according to a first aspect of the present invention in conjunction with the method for second aspect of the present invention The operation is executed according to library.

More specifically, therefore, in some cases, p_nFunction be y_nThe distance between and another second input value letter Number.The distance is usually Euclidean distance, it is also possible to be another distance measure, such as mahalanobis distance.

Therefore, the database of first aspect present invention can be configured as calculating and store for example nearest data sequence with Cumulative distance measured value between historical series.In this way, the subsequence of nearest data sequence and historical data sequence The distance between corresponding subsequence can calculate according to the cumulative distance measured value at the beginning and end of subsequence.

The advantages of being used for the invention of the application using the present invention is, compared with all corresponding subsequences in database, The subsequence size of all requests can be easily matched using only the difference between the beginning and end of subsequence.This subtracts significantly It is small the distance between each record measured value is had to carry out for each request needed for cost.

Although having to carry out a large amount of operations, in the database by the aggregate-value storage of distance, the retrieval of each request Cost is relatively low with calculating, this is because only needing to retrieve two records from pre-calculated data.Therefore, it is asked using the multiple of processing Asking can rapidly share the cost for establishing database.

In some cases, conversion value p_nIt is the second input value y_nWith with the first input value x_nAssociated another input value y′_nFunction.For example, in some cases, p_n=(y '_n·y_n).In this case, the database of first aspect present invention It can be used in combination with the method for second aspect of the present invention, to determine the second input value y_nWith correspond to user interested first Input value x_nSequence a part another input value y '_nCorrelation.

Method according to a second aspect of the present invention, the request of data of the database from first aspect present invention is by extracting institute The beginning of request data section and end composition.When extraction/retrieval process becomes valuableness due to delay (for example, working as access number When must be made requests according to library or data must be extracted via network), the preextraction of data is advantageously carried out, is used for eliminating The waiting expense of additional extraction operation simultaneously utilizes temporary position during cache.

Therefore, the method for second aspect of the present invention can be used for include extracted from database it is adjacent with the interested value of horse back Added value another step.Then, these added values can be used in calculating later.

The delay cost of multiple data values is retrieved close to the cost of retrieval single value, so the totle drilling cost for retrieving multiple values is small In the cost for individually retrieving each value.Added value is generally stored inside in cache memory, with can be as needed with lower Delay cost extract these values.

Advantageously, the multiple data values retrieved in single extraction operation are the continuous subsequences of data.This feature increases Efficiency associated with this adaptation of the method for second aspect of the present invention is saved.

It, can be in monitoring bank behaviour in conjunction with the method for the second aspect of the present invention that the database of first aspect present invention uses Make (such as Money transfer) in terms of it is especially advantageous and in terms of detecting abnormal movement it is especially advantageous.

And period smaller using amplitude longer movement it can execute the Main Patterns swindled over a longer period of time To be hidden in normal behaviour.Allow to supervise using the method for second aspect of the present invention with the database combination of first aspect present invention Multiple average values are controlled to detect any opposite variation, the evidence for swindle can be provided.

For example, the average value that the average value transferred accounts in 5 hours in the past is equal to last hour can indicate normal behaviour, and Any increase of the average value of last hour can indicate possible swindle.Due to not knowing the group of accurate parameter or average value It closes to be monitored in advance, so the value of wide scope must be studied.The ability for quickly and easily responding different requests is very big Ground is conducive to the processing.This method can also be applied to different detection applications, such as transaction supervision, going through based on normal behaviour History determines search abnormal behaviour.

It, can the parallel and many different size of average values of real time monitoring as the extension of the application.As described above, may be used also Easily and efficiently to execute the history experimental evaluation of optional mean size.They are the database knots with first aspect present invention Close the direct result of the cost-effective operation of the method using second aspect of the present invention.

In general, being configured as storage mass data (recording more than 1000, and usually more than 10000 records) In database, user is difficult to obtain each record.

In general, the user of database mainly it is nearest to database record it is interested.

Therefore, in a third aspect, the present invention can be provided in time t and edit database with storage time dependence variable The computer implemented method of preceding value, comprising the following steps: select and from time t to the sequence for the predetermined time interval for returning measurement The set of the value of corresponding Time Dependent variable.

Database can be database according to a first aspect of the present invention, but it's not necessary.

Typically, recently input the time interval of record compared with the time interval of older record, between interval more Closely.As a result, database stores nearest data with large-spacing size, and store number more remote with closely-spaced size According to.

In the case where the high fluidity financial market for example per second with mass data flow, third aspect present invention Database, which can be used for reducing, to be stored and the subsequent data volume for being sent to user.Intuitively, if user is made with Millisecond With the data, then its main point of interest will be nearest data, and interested in longer period (that is, minute, hour) User do not require a millisecond data precision usually.Therefore, the third aspect of the present invention allows in a manner of more efficient according to visitor Family needs (and/or constraint) to handle different client's requests.This reduces total data hair by only sending required by user It send, so as to cause lower required network bandwidth and carrying cost.

In general, the database for being configured as storing relevant data sequence between multiple (such as but is not only the present invention The database of first aspect) sort operation, the relative changes of storing data to monitor will be passed through.For example, database can be deposited Storage is used for the particular market index of financial instrument set, and can be divided according to the relative value of market index data Class.With the variation of the value of market index, further sort operation will be needed with more new database.

Similarly, when database is for determining that for example nearest data sequence is matched with the pattern between historical data sequence When degree, can classify to different matching results, with search for example with the preferably matched historical series of nearest sequence. (database of first aspect present invention can be used in combination with the method for second aspect of the present invention to execute in the matching of this pattern, It is also possible to use traditional database and method)

However, this sort operation is expensive being calculated as present aspect.Therefore, in fourth aspect, the present invention can be with Computer implemented method is provided, is classified according to scheduled classification standard to data set, comprising the following steps:

Data set is taken an overall view of, whether all values to determine data set according to scheduled classification standard are unordered, and

If all values of data set are unordered, selectively classified according to preassigned to data set.

Therefore, when the value of data set is unordered, the calculating cost classified to database can only be generated.If data The value of collection is orderly, then increased cost is only the cost for taking an overall view of data to determine this.

This method is particularly effective in the case where data (for example, the market index for being used for financial instrument) have low fluctuation. In this case, hardly need carry out sort operation, thus using this method can to avoid unwanted sort operation, by This, which reduces, is always calculated as this.

Specifically, when the value (for example, market index of financial instrument) for determining classified order in data set is to be confirmed as When the calculated value of the function of multiple measured data values, this method is effective.This calculated value usually has low fluctuation, this be because It is them for the change relative insensitivity of single measured data values.This value calculated especially in accordance with the accumulative data of long-time The case where.

This calculated value can for it is following any one:

A) average value of multiple measured data values；

B) sum of multiple measured data values；

C) maximum value or minimum value of multiple measured data values；Or

D) standard deviation of multiple measured data values.

There is disclosed herein databases for storing data, are configurable to generate the intermediate description of institute's storing data, To allow more to efficiently identify the maximum value and/or minimum value of the specific interested specified serial data of user.

The identification of minimum value and/or maximum value in data sequence is useful in such as financial application, wherein example Such as, the range (that is, difference between maximum value and minimum value) of the stock price in given time period is usually stock price or city The instruction of the fluctuation of field.Therefore, advantageously, by efficient and processing, such as stock valence can be positioned for the different periods The maximum value and/or minimum value of lattice, and there is lower calculating cost.

Exceptional value of the identification of minimum value and/or maximum value in data sequence in location data sequence is also useful 's.For example, being even more important when detecting swindle.By by the maximum value of such as stock price and/or minimum value and the stock The normal fluctuation of price is compared, and can detecte out abnormal behaviour.This processing requirement is assessed many periods and (is built The picture of the actually detected result of the picture or swindle of the normal behaviour of vertical such as stock price), thus want maximizing and/ Or effective identification of minimum value, cost will be calculated and be maintained at acceptable level.

Therefore, in the 5th aspect, the present invention provides the databases for being configured for following processing:

(i) sequence of storage value；

(ii) minimum value and/or maximum value in the sequence of discre value；

(iii) at least the second sub- sequence of the first subsequence of limit value and the value after the first subsequence of value Column,

Wherein, the boundary between the first subsequence and the second subsequence is located at the minimum value of sequence or the position of maximum value.

By data lab setting in the minimum value of the specified subsequence for the sequence for being used to determine value and/or the computer of maximum value In system.

Term " following " is stored in the function of the directionality of the data sequence in database.Typically, pass through data sequence Each value in column is entered the sequence of database to limit the directionality of data sequence.Therefore, typically, the value recently input It is considered as the value that " following " is previously entered.Therefore, in general, the value in the second subsequence is more defeated later than the value of the first subsequence Enter database.

In many applications of database, the data sequence characterising parameter stored changes with time.Therefore, this In the case of, the most recent value of parameter " will follow " the relatively old value of parameter along the sequence of value.

The database of fifth aspect present invention is configured as at least limiting the sub- sequence of two values in the data set stored It arranges, the boundary between two subsequences is located at the maximum value of sequence or the position of minimum value.

In this way, at least it is used as initial step, it is intended to identify the maximum value in the specified serial data being defined by the user Or the user of the database of minimum value, it can determine whether the specified serial data crosses over the boundary between two subsequences, therefore Determine whether the maximum value of entire database or minimum value are included in the specified serial data.Must retrieve this avoids user and Entire serial data is evaluated, to reduce retrieval associated with inquiry and calculating cost.

Therefore, in the 6th aspect, the present invention provides the minimum value determined in the specified serial data that is defined by the user and/ Or the computer implemented method of maximum value, comprising the following steps:

(i) database according to a fifth aspect of the present invention is provided；

(ii) determine whether specified serial data crosses over the first subsequence and the second subsequence；And

(iii) if specified serial data crosses over the first subsequence and the second subsequence, the first subsequence and second are extracted The value of boundary between subsequence.

In general, the data of storage in the database will be divided into more subsequences, the boundary between adjacent subsequence is Local maximum or local minimum.Therefore, identification institute's storing data sequence is configured as in the database of fifth aspect present invention In the case where minimum value in column, database is usually configured to

(i) local minimum corresponding with the minimum value for the value for following the first subsequence is identified；

(ii) the third subsequence of value after the second subsequence of value is limited, wherein the second subsequence and the Boundary between three subsequences is located at the position of local minimum；

(iii) another local minimum corresponding with the minimum value in the value after n-th of subsequence is followed is identified, Wherein, n=2；

(iv) (n+2) a subsequence of the value after (n+1) a subsequence of value is limited, wherein (n+ 1) boundary between subsequence and (n+2) a subsequence is located at the position of another local minimum；And

(v) step (iii) and (iv) is repeated until n=k for all integer value n, wherein not another at n=k Local minimum can be used for limiting the boundary between adjacent subsequence.

In the case where the database of fifth aspect present invention is configured as the maximum value in identification institute's storing data sequence, It can execute in a similar manner and database is divided into the multiple subsequences limited by local maximum.

Typically, the data value in third subsequence is respectively arranged with and between the first subsequence and the second subsequence Boundary minimum value or maximum value the corresponding label of value.

Similarly, the data value in any subsequence in multiple subsequences can be set be located at the subsequence with The local minimum of boundary between subsequence or the corresponding label of the value of maximum value afterwards.Typically, it marks in this way All subsequences of the boundary value (minimum value or maximum value) of note to the last.In that case it is preferable that also to last son Data value (that is, data value after last boundary value) in sequence is marked.Preferably, the data in last subsequence Value is respectively arranged with label corresponding with the value of the record.

Term " last " and " final " refer to the directionality of database.Therefore, " last " boundary value be usually with recently it is defeated The associated boundary value of the data entered, for example, the boundary value between k-th of subsequence and (k+1) a subsequence.Similarly, " final " subsequence be include the subsequence for recently inputting data in the data sequence that is stored in database.

Once the record in registration database in this way, the method for sixth aspect present invention may be used for determination and refer to Determine the maximum value or minimum value in serial data.In this case, specify serial data that usually there is the sequence with the value in database The corresponding end point of the end point of column, and the label of the value at the position by reading the starting point for being located at serial data is come really Surely the minimum value or maximum value of specified serial data.

Therefore, the method for sixth aspect present invention allows to determine the minimum of specified serial data by retrieval single data values Value or maximum value.Therefore, it is only to retrieve the cost of the data value that user, which executes the cost of the analysis,.

After traditional database usually requires that retrieval and analyzes all values in serial data, minimum value and/or most is determined Before big value, therefore higher retrieval and calculating cost are generated to user.

In contrast, the method for the sixth aspect present invention used with the database combination of fifth aspect present invention significantly subtracts I/O bus required for the serial data of small customer analysis storage in the database uses and/or network bandwidth.

Therefore, by generating the intermediate description of the data of storage in the database and visiting centre description can by user It asks, the present invention allows user to inquire database with the calculating cost of reduction and evaluates the data of storage in the database.

Term " starting point " and " end point " are related to the directionality of database.Typically, " end point " indicates data sequence The record recently input in column, and " starting point " indicates the record of previous time input.

The method of sixth aspect present invention is not intuitive to a certain extent, wherein it is not easy to be used pure intelligence The people of processing is used to identify the maximum value and/or minimum value in data sequence.There is no computer assisted people to be intended to simply Data value in scanning sequence identifies maximum value or minimum value, and it is desirable that avoid using sixth aspect present invention it is complicated not Intuitive method.

However, scan data sequence come identify maximum value and/or minimum value conventional method (handled by pure intelligence or A part as computer implemented method) be not suitable for obtaining mass data.In these cases, the side of sixth aspect present invention The more complicated and non-intuitive step of method allows to handle the implementation being readily used for using technique device, thus allows quickly and efficiently Evaluate mass data in ground.

In certain embodiments of the present invention, the database of fifth aspect present invention can be configured as generation and be stored The intermediate description of two of data sequence, first intermediate description are conducive to the identification of minimum value in specified serial data, Yi Ji Two intermediate descriptions are conducive to the identification of maximum value in specified serial data.

Database can be configured as the data that storage expression parameter changes at any time.The parameter can be such as description finance The parameter of transaction.Typically, measurement parameter, preferably smaller than 0.1s are come with the time interval less than 0.5s, more preferably less than 0.05s。

Typically, database is configured as storage with the data sequence more than 1000 values.

Specific embodiment

Application field

The present invention is conducive to data flow and changes over time and have continuous with immediate mode in the case where not expectable length The application of arrival.The limited storage of traditional DBMS method is not suitable for also needing the fast of data in addition to continuous-query and processing Speed and continuously load this application (M.Kontaki, A.N.Papadopoulos and Y.Manolopoulos, Adapative similarity search in streaming time series with siliding windows, Data&Knowledge Engineering,Volume 63,Issue 2,November 2007,Pages:478-502).Examination Additional difficulty when figure executes standard analysis to this data is the continuous and unpredictable behavior due to data flow and only may be used To read primary or limited number, so that random access data cannot be obtained, (P.Tsai, Mining top-k frequent closed itemsets over data streams using the sliding window model, Expert Systems with Applications:An International Journal,Volumn 37,Issue 10, October 2010,Pages:6968-6973).This requires the analysis method of modification, is directed to and does not require multiple data scannings Range query provides quick answer.

Query type

Mechanism of the present invention is able to carry out continuous data using required following query type (F.Buccafurri and G.Lax,Approximating sliding windows by cyclic tree-like histograms for efficient range queries,Data&knowledge Engineering,Volumn 69,Issue 9, September 2010,Pages:979-997)。

1. point inquiry: k-th of data point of returned data stream

2. range query: returning to the aggregated data in given interval

3. similarity query: whether occur in returned data stream similar pattern true value (P.Capitani and P.Ciaccia,Warping the time on data streams,Data&knowledge Engineering,Volumn 62,Issue 3,September 2007,Pages:438-458)。

Application examples

Finance

The market transparency

Financial market authorities are increasingly required to guarantee that their markets are fair and transparent to their participant.With The increase (in some markets, having hundred gigabytes of beam daily) of exchange hand, becomes increasingly difficult to dissipate to all participants Send out data.Only mechanism and large investor can undertake rise access these data completely.When for the reception that cannot be undertaken When the individual investor of this mass data, its own makes transparency become a big problem.When not only needing transaction data and And when more complex datas (such as order flowing and nontransaction execution information) also become demand, difficulty is increased.Of the invention Purpose is, provides the ability that accurate aggregated data is presented to all participants, it is big that customizable window can be used in participant Small mechanism selects their desired data.This provides following advantage: firstly, when storing cumulative data, being only performed once It calculates.Then, user requests desired data area and only receives the beginning data element that can complete request and terminate number According to element.It is calculated this saves a large amount of, when especially user/request quantity increases, this is because during data generate The upper limit of calculating is set.Second, limited expected data range is only sent to user, so that bandwidth is substantially reduced, No matter data area is requested, the upper limit is cost-effectively all provided with to calculating.Third does not need big calculating or bandwidth and opens It sells to extract the ability of customization data area and realization is made largely to synchronize analysis in real time and experiment.4th, it is not needing to transmit Allow participant to execute their all data queries in the case where entire data area, helps to protect official's data not non- Method uses or transmission.Finally, the present invention provides for keeping market fully transparent for all participants under aggregation level Means, and underground all personal data marks (it usually to sell at high price).

Telecommunications

Network flow monitoring

Network utilize optimization dependent on router and switch queue management (E.Hernandez-Orallo and J.Vila-Carbo,Network queue and loss analysis using histogram-based traffic models,Computer Communications,Volume 33,Issue2,February 2010,Pages:190-201) (S.K.Tanbeer, C.F.Ahmed, B.Jeong and Y.Lee, Sliding window-based frequent pattern mining over data streams,Information Sciences,Volumn 179,Issue 227,November 2009,Pages:3843-3865).In general, being modeled network from presenting with given service speed to the queue of limited size Flow is sent to form, it is therefore intended that determine the information utilized about queue.Then, from distribution and model is limited for this flow It is modeled, thus to average and undulating value continuously determines.With the increase of larger network, data generated and with The calculation amount of generation also increase, makes it more difficult to customize report for each user demand, the present invention can be for should Problem generates three main contributions:

1. the monitoring report (for example, every 24 hours update) of current fixed cycle is sent to user, this is because being directed to Each user is based on their the respective difficulty for requiring building from restriction report.The present invention allows precomputation and storage then can quilt User be used to generate they itself from limiting the data of report, to mitigate on each user class to calculating from limiting Demand.

2. statistics, which calculates, can be converted easily into cumulative calculation, and this makes the data volume and band of each user's request Width is using minimizing, this is because only needing to send the end point data item of requested range.

3. increase Information Security for provider, this is because only send with user's related data of request rather than it is whole A data set.

Intrusion detection

Information in data server must be protected from network attack (H.Li and S.Lee, Mining frequent itemsets over data streams using efficient window sliding techniques,Expert systems with Appliations,Volume 36,Issue 2,Part 1,March 2009, Pages:1466-1477) (W.Wang, X.Guan and X.Zhang, Processing of massive audit streams for real-time anomaly intrusion detection,Computer Communications, Volume 31,Issue 1,January 2008,Pages:58-72).Two basic skills include detection (its based on signature In, detect malicious act by match relative to the attack of previously stored prototype) and abnormality detection (wherein, maintenance is being just It commonly uses the summary at family and attempts to identify that unacceptable deviation is used as possible attack).Method based on signature can be used for benefit With the range measurement mechanism of invention, wherein record is at a distance from pre-determined signature set in real time.Spy of the invention has the advantage that not It can only obtain at a distance from full signature, but also available at a distance from signature subset.This to obtain quick experiment And power of test, do not require time-consuming distance to calculate again.Abnormality detection also has benefited from executing calculating for data subset Ability.This is particularly suitable for automatic calibrating method, wherein additional calculations can not be utilized to measure multiple periods, and It is compared with the traditional method and greatly reduces calculating cost.

Engineering

Entity structure

Monitoring technology be used to track big attenuation factor (mass damping system) for higher entity structure Performance, with enhance their decaying and keep they safety (J.M.W.Brownjohn, E.P.Carden, C.R.Goddard and G.Oudin, Real time performance monitoring of tuned mass damper system for a 183m reinforced concrete chimney,Journal of Wind Engineering and Industrial Aerodynamics Vol.8,No.3,March 2010,pp.169-179).Accelerometer is connected to the structure, And it is remotely monitored and determines whether the displacement structure is more than threshold limit value to provide real time information.The monitoring technology relative to The data management system (such as of the invention) that the large-scale application of several hundred or even thousands of a structures will seek unification, to allow to own User is performed simultaneously their requested monitor tasks.Primary using computer, more people's application methods, can be effectively by making With different size of window and data combination while there are many automatic systems to monitor different phenomenons.In addition, being stored by cumulative data The low bandwidth request that structure obtains means with can use small overhead communication cost remote locating and monitoring website.

Boring optimization

Purpose is that optimizing drilling is handled so as to cost minimization while keep safe operation standard.This is by during operation Connected reference drilling is handled to carry out, so that keeping the total rate penetrated maximum based on drilled accumulative total length (footage). Drill bit due to damage and the related time of delivery with the old component of new part replacement, it is necessary to which the basic determination of progress is to make to bore Head utilization rate is maximum and makes to shorten the construction period and compromises between time minimum.This is by continuously analyzing multiple variables (such as to brill Nose heave amount and rotation speed), be behindhand determined by these data applications in mathematical model and as far as possible so that drill bit makes With the progress for maximizing while not jeopardizing borehole engineering.It is by the advantages of solution of the invention using based on insignificant Loss is calculated at any time from limiting time window.When drilling is by different geological formations, this is very important, with work The penetration rate for being changed model of work or even real time calibration simultaneously ensure no matter how initial policy is carried out comprehensive point The ability of analysis.This also allows using identical primary data while evaluating multiple and different models.In addition, bandwidth requirement low is weight The feature wanted, especially when drilling is at the remote districts for being difficult to positioning analysis person, and it is usual obtaining big bandwidth communication channel When cost is very high.

Science data analysis

Earthquake prediction

Early warning equipment can the different spread speeds based on various vibrations generated tremble the premise reached greatly For short alarm.The application is distinguished by a large amount of continuous data, these continuous datas need Immediate management at value.Measuring center The reading from adjacent center can also be utilized to make great efforts the detection time before increase earthquake.Due to passing through each measuring device The mass data of generation, this for all centers but in addition to center of maximum for cost it is excessively high.However, the present invention is used as data The main force is stored, several hundred or even thousands of a adjacent centers can use bandwidth requirement low of the invention and carry out shared data.In addition, being used for It is different size of different to detect that the negligible computing cost of different sliding window sizes means that multistage detection may be performed simultaneously Often.This is critically important for experiment because a large amount of potential model can carry out simultaneously machine test and potentially by Using.

Tropical atmosphere ocean

It is located in the environment that they are monitored in situ environment sensor physics, and their time series data is continuously transferred To individual data warehouse (D.J.Hill and B.S.Minsker, Anomaly detection in streaming environmental sensor data:A data-driven modeling approach,Environmental modeling&software,Volume 25,Issue 9,September 2010,Pages:1014-1022).It needs to carry out Automaticdata quality assurance detects and identifies the abnormal data for deviating considerably from history pattern with control.This abnormality detection may be used also To be used for adaptive process monitoring field, wherein the phenomenon that abnormal data expression must ask further research.The present invention is used as leading Data store, wherein customizable sliding window mechanism, which can be used for executing synchrodata, to ensure to test, and is used for quality to increase The additional mixed layer (layer of sophistication) of monitoring.In addition, identical mechanism can be used for detecting simultaneously it is multiple different The direct variation for often occurring as or allowing inspection policies, without additional calculations cost.It is remote that the advantages of increase, can service other Journey user, their request of data only have the smallest communications investment due to low bandwidth data transmission mechanism.

Detailed description

Be described below the database for showing according to a first aspect of the present invention with example 1-8 how can be fabricated and how Information is provided for storing data and in response to user's request.

In the first embodiment, database is used for the sequence of storage time value.The value of each time value and variable y A value in sequence is associated.For example, variable y can be the variable of description financial transaction, such as price or exchange hand.

Time value is with the series arrangement from earliest time value to most recent value, and each value is evenly spaced.For example, continuous Difference between time value can be 0.5s, 0.1s or 0.05s.

Database is configured as storing over 1000 such time values.

In the first embodiment, database also stores the aggregate-value of y variable, that is, with from first time value to n-th The summation of the corresponding y variable of the time range of time valueTherefore, each time value n with from highest time value to The summation for the y variable that this nearest time value is recorded is associated.This makes by subtracting the time (a) from the ∑ y of time (b) ∑ y come calculate variable y corresponding with the period of time a to time b total amount (that is,).Therefore, user can lead to It crosses and retrieves two data point ∑ y (a) and ∑ y (b) from database to calculate the change for being added to database at a given time period The cusum of y is measured, and executes single operation to subtract a value from another value.

In order to calculate the average value of the variable y in the period between time a and time b, need to subtract from ∑ y (b) Remove ∑ y (a) and with the result divided by the quantity from the value in the subsequence that time a extends to time b.Therefore, user must hold The single subtraction operation of row and single divide operations.

In a comparative example, using traditional database, the cusum of the variable y of database is added in given time period Calculating retrieval will be required in the period to be added to all values of the variable y of database from database and by their phases each other Add.

Similarly, in another comparative example, the average value for being stored in the value in traditional database is all these by retrieving Value, by them, phase adduction is calculated divided by the sum of value each other.

In the other embodiments of first aspect present invention, database can be deposited for each period between 1 and n Store up the aggregate-value of the function of variable y.For example, database can store y²Aggregate-value.For example, this to come using following formula Calculate the variance of variable y:

Therefore, variable y time a to time b when interim variance can by retrieved from database four values (when Between=∑ (y of a²) and ∑ (y) and time=b ∑ (y²) and ∑ (y)) calculate.

In another embodiment of the present invention, each value of the sequence of the another input variable y ' of database purchase, y ' is equal It is associated with a time value in the sequence of time value.In this case, another input variable y ' can be with [y ' y] The form storage of aggregate-value is in the database.That is, database purchase is from earliest time value to each time value of n-th of time value [y ' y] cusumThis allows to calculate the period interested between time a and b using following formula The correlation of interior variable y and y ':

Example 1: data storage

Using the database of the first embodiment of first aspect present invention, wherein the aggregate-value of storage y variable, below List the cost (comparing with the comparative example of traditional database) of building database:

The cost of new record=to calculating cost of the summation of all precedence records addition new record+is added to be used to store new The carrying cost of bulk registration.

For traditional database, add the cost of new record=for store the storages of more than one or multiple records at This (not calculating cost)

Carrying cost is that the cost (use including network bandwidth) of record is sent to network memory.

Calculating cost is the cost that mathematical operations become expected form.

Therefore, the data storage in first embodiment of the invention and the additional meter to all precedence records addition new record Being counted as this is associated (comparative example relative to traditional database).Due to aggregate-value compared with initial data with bigger big Small, the carrying cost for first embodiment can be slightly larger than cost associated with traditional database.For example, two numbers Value (for example, raw value) is stored in database if keeping constant and adding up to will lead within 100 seconds four digital values.

Example 2: data manipulation

In this example, a series of numbers are calculated for the database of first embodiment according to a first aspect of the present invention According to the average value of record.

This needs following steps:

1. from retrieved data record from the beginning and end of sequence (retrieval cost=two extraction operation)

2. from terminating to subtract start recording value (calculating the operation of cost=mono- subtraction) in record value

3. executing the division (calculating cost=mono- divide operations) divided by size of data

In traditional database, which may require that following steps:

1. retrieving all data records (retrieval cost=n extraction operation) in sequence interested

2. executing addition (calculating cost=(n-1) a add operation) on all retrievals record

Therefore, the database of the first embodiment of first aspect present invention has in the average value for calculating institute's storing data When significant lower retrieval cost and execute calculate when lower calculating cost.In general, these of data manipulation are lower Cost, which would tend to offset, stores associated slightly higher nonrecurring cost with data.This is in particular for continuously repeating request Situation.For example, if the database of first embodiment of the invention is queried to obtain being averaged for last 100 data records Value, and update the value whenever inputting new record, then it is compared with the traditional method, the calculating and retrieval realized, which are saved, to be calculated The cost of precomputation expense is overcome after first average value.In addition, if modification request is to cover last 200 data records Average value, then cost will not increase.

Example 3: the calculating of volume weighted average price

The volume weighted average price (VWAP) of stock by by total transaction currency volume divided by total trading volume To obtain the volume weighted average price (VWAP) of stock.The amount of currency of transaction is the exchange hand of transaction multiplied by price.

In the database of second embodiment according to a first aspect of the present invention, adds up exchange hand and amount of currency (strikes a bargain The multiple of amount and price) it is stored as the function of time.In such a case, it is possible to calculate VWAP by following steps:

1. from the beginning and end of sequence interested retrieval amount of currency record (retrieval cost=two extraction operation)

3. from the beginning and end of sequence interested retrieval exchange hand record (retrieval cost=two extraction operation)

4. from terminating to subtract start recording value (calculating the operation of cost=mono- subtraction) in record value

5. by the value obtained in step 2 divided by the value (calculating cost=mono- divide operations) obtained in step 4

Therefore, the totle drilling cost for calculating VWAP is four extraction operations, two subtraction operations and a divide operations).

In traditional database, which will need following steps:

1. retrieving all data records in series interested

2. executing the addition of all records obtained in step 1

3. retrieving all activity datas record in sequence interested

4. executing the addition of all records obtained in step 3

5. by the value obtained in step 2 divided by the value obtained in step 4.

It therefore, the use of the total data cost that traditional database calculates VWAP is 2 × n extraction operation (its as comparative example In, n is the quantity of the value in data of interest sequence), 2 × (n-1) a add operations and a divide operations.

Example 4: pattern matching

In the third embodiment, there are six the period, each period has the database tool of first aspect present invention There are four the sequences recorded, are denoted as 1-5.Database is configured as storing the Europe between respective record in different time periods Formula distance.According to the following formula, Euclidean distance is stored in the form of accumulative:

Accumulative Euclidean distance:

Wherein, the quantity of r=record and TA are first time period, and TB is second time period.

Therefore, the Euclidean distance between the first record in database purchase each period TA and period TB.Data Library also stores second in Euclidean distance and these periods between the first record in each period TA and period TB The sum of Euclidean distance between record.Similarly, database also store cover in these periods first to third record, the The other aggregate-value of one to the 4th record and the first to the 5th record.

Similarly, database also stores the corresponding cumulative distance of the distance between the respective record of other times section Value.

The Euclidean distance between the correspondence subsequence of the record in different time sections is provided by following formula:

Wherein, subsequence extends between record p and q.

Therefore, two corresponding subsequences in different time sections can quickly be calculated according to the cumulative distance value stored Between Euclidean distance.

Cumulative distance value stores in the database, and can be easily reused in response to subsequent request.

Therefore, it is necessary in response to performed by the request for the distance between two corresponding subsequences in different time sections Operation are as follows:

1. extraction cumulative distance corresponding with the beginning and end of subvolume of interest sequence and two periods interested Value

2. subtracting the aggregate-value at the beginning of subsequence from the aggregate-value of end place of subsequence

3. calculating the square root of the difference between two aggregate-values.

On the contrary, in the comparative example using traditional database, according to storage initial data in the database, in response to Family request directly calculates the distance between two subsequences.Therefore, it is necessary to request the operation executed in response to user are as follows:

1. extracting 2n record (wherein, the length that n is subsequence)

2.n subtraction operates (executing using the corresponding record of the subsequence in different time sections)

3.n multiplication operation (calculating square of difference)

(4. n-1) a add operation

5. a square root functions.

Therefore, compared with the comparative example of traditional database, this example is in the data retrieval caused in response to user's request Be calculated as that present aspect provides very big savings.These savings will increase with the length of subvolume of interest sequence.

It, can be in the interested period (when typically, nearest using the embodiment of the first aspect present invention Between section) between other times section (more remote period) execute pattern match, with search and the interested period most Matched historical time section.

In this case, it is counted between interested subsequence and the corresponding subsequence in each period of history interested Calculate Euclidean distance.Then, sort operation is executed to identify that there is minimum Euclidean distance from the subsequence of interested period Historical data subsequence.

Clearly, in the embodiment of first aspect present invention, it is necessary to be executed after when being inserted into data in the database A large amount of operations, to calculate and store accumulative Euclidean distance.However, since stored data can be reused, so each The expense of request will disappear with the increase of number of requests.

Example 5: data preextraction

In general, request of data performed by method using second aspect of the present invention includes extracting the required period of data Beginning and end.It (is such as carried out when access database or by network when extraction process is expensive due to the waiting time When extraction), it can be advantageous to preextraction data are to eliminate waiting expense for adding extraction operation and during cache Utilize temporary position.

The example will show preextraction concept using being described below.Continuous Real time request is executed, and at present in the time At 105.Request includes the update with database and extracts the beginning and end of 10 nearest aggregate-values, such as to allow to calculate The average value of 10 the recently measured values.

In the embodiment of method according to a second aspect of the present invention, terminate to add up extracting first from database A When value (correspond to time 96), be also extracted all aggregate-values until aggregate-value for example corresponding with the time 100, and by they It is stored in local cache B.When 106,107 etc. aggregate-value carrys out more new database between when utilized, as needed from height Retrieval aggregate-value is for responding subsequent request in speed caching.

The cost that aggregate-value is retrieved from database A is α, and the cost that aggregate-value is retrieved from cache B is β.

Therefore, by table 3 provide whenever more new database for respond the extraction/search operaqtion continuously requested at This:

Table 3

Table 3 shows the aggregate-value when reception first is requested such as how cost α extraction time 96-100 from database A And it is stored in cache B.Then, aggregate-value 96 is extracted from cache B with cost β, and with cost α from number According to extraction aggregate-value 105 in the A of library.Two aggregate-values be used to calculate the measurement record of time 96 to 105 in this case Average value.

More new database with include the time 106 nearest aggregate-value after execute next calculating.With cost α from database The value is retrieved in A, and end value 97 is retrieved from cache B with cost β.

The processing is repeated until no longer storage value in the caches, or until no longer receiving the use for calculating Family request.

In this example, initial value when more new database for providing sequence and end value are provided by+5 β of 6 α Total retrieval or extraction cost.

On the contrary, if the method for not using second aspect of the present invention in this way, for extracting opening for five sequences Total retrieval cost of initial value and end value is 10 α (that is, every 10 initial values and end value must all be extracted from database A).

Assuming that for being significantly greater than from the delay cost α for extracting data in database for extracting data from cache Delay cost β, then this preextraction greatly reduces total delay cost for continuously requesting of response.

Example 6: change the construction of the database of granularity

In the embodiment of third aspect present invention, all received data are stored in primary database.Also Construct time database, storage with from one of following time to the corresponding record of the predetermined time interval for returning measurement: (i) recently The time of more new database or (ii) current time.

If to time time of measuring interval since recent renewal time, the update times evidence whenever inputting new record Library.If to time time of measuring interval since current time, such as all update times per second are according to library.

Therefore, in one example, secondary database stored before nearest update 5 seconds, 30 seconds, 5 minutes, 30 minutes and 2 hours records.In this way, the relevant more record of secondary database purchase and nearest data and related with past data Less record, that is, for nearest data database granularity be greater than for past data database granularity.

Possibility structure for secondary database be with high granularity provide data predetermined percentage, and remaining data with Low granularity provides.For example, referring to the data obtained from the total period of 600 minutes (10 hours), the number recorded in this period According to can as shown in table 4 as store:

Table 4

From current time to return extend period	Granularity
		0-0.5s	Millisecond
0.5s-5s	Half second
		5s-90s	Second
90s-30 minutes	Half a minute
		- 10 hours 30 minutes	Minute

In this example, memory space required by secondary database is only about 5% of space required by primary database, In, all data are stored with the granularity of millisecond.

Example 7: condition stub

In this example, database purchase data relevant to finance device, such as the market index of each equipment.According to The value of market index classifies to data.Since market index is updated, data value becomes unordered, it requires database Reclassify.

The embodiment according to a fourth aspect of the present invention executes the classification of data according to following algorithm:

For i=2to n

If x [i] < x [i-1]

sort_instruments

return

Wherein, i is finance device and x [i] is the market index for the equipment.

Therefore, reclassifying for database is executed when finance device is unordered according to their market index.Again divide The calculating cost of generic operation is limited to the unordered situation of data, otherwise only generates the cost for taking an overall view of (traverse) data.

For example, database can be configured as storage market index relevant to 10 finance devices.It is per second all to update city Field index, but classified based on 5 minutes average values of market index to data.In such a case, it is possible to it is per second all Database (when updating market index) is taken an overall view of whether still orderly to determine value, and only carry out to data if value is unordered It reclassifies.Due to the fluctuation of 5 minutes average value, data reclassify seldom needs, therefore will reduce and sort operation phase Associated calculating cost.

Rule of thumb, it finds compared with traditional sort operation (being carried out classification when whenever value in more new database), When use condition sorting algorithm, operation data library is always calculated as originally to reduce up to 50%.

Example 8: market index

Database according to a first aspect of the present invention can be used for generating complete by being synthesized according to general initial data Order descriptive financial indicator (description financial indicator) obtained.

In this example, database is used to generate typical financial indicator for providing following data type: amount of currency, Exchange hand, transaction, price, return, return square, report-back time index.The advantages of exemplary database is, can be with For provide from be performed transaction and order update (be typically due to they amount it is bigger and to thinking little of for itself interest It is lower) in extract other atypical variables.However, the data of these variables can be indicated within the period using statistical measurement The total characteristic in library can provide the significant opinion that do not realize by individually studying variable.Canonical variable can carry out in detail Thin decomposition such as decomposes performed transaction are as follows: normal/to hide/purchase that exception comes into force/sells transaction.Other orders become Change can be used for that order decomposition will be waited are as follows: to each independent order scale of price or any combined inquiry/bid side therein Addition/removal order.Then, transaction/order decomposition can be used for specifying atypia financial indicator.

The synthesis of order set forth below is handled.

The synthesis for the grade III (complete order) that service rating II updates

In general, grade III is only available to market maker and expert.However, the data can be used using following methods More general grade II data-message synthesizes.

Grade III (complete order)Grade II (input order)

Message format

Data field in table 5 is assumed to can be used as the input of synthesis processing and indicates the finance generated include in the document The minimum of information required by index.

Table 5

Type (transaction/order activity)

Symbol

Price

Exchange hand

Sequence

Table 6- supports data structure

All tables (in addition to inputTR_table) and list in table 6 are for bidding and inquiring that aspect all exists.

Processing includes: to keep for the table of complete order and the set of list as synthesis by using market letter institute The result of the input order of filling.

Processing 1

The processing starts to classify to input message with order activity.It includes by message coalescing to newest In inputOB_table (grade II), it is compared with newest fullOB_table (grade III), generates new tempOB_ Table and change list.Then, tempOB_table becomes newest fullOB_table, and changes list for transaction Message is made available by.

The algorithm of following four step is executed to compare inputOB_table and fullOB_table, and executes any institute The modification needed.Note that all steps of algorithm can match according to initial prices and be applied to inquiry/bid table/list. The subsequent modification for table/list has been carried out for matching aspect.

Price is deleted

Old scale of price is deleted in latest news expression from order.

State

Price (fullOB_table, fullOB_pointer) > price (inputOB_table, inputOB_ pointer)(bid side)

Price (fullOB_table, fullOB_pointer) < price (inputOB_table, inputOB_ pointer)(ask side)

or

InputOB_pointer → end and inputOB_pointer < level2size

It updates

Append [price (fullOB_table, fullOB_pointer) ,-vol (fullOB_table, fullOB_ pointer)]to change_list

increment fullOB_pointer

Price increases

Latest news indicates that new scale of price has been added to order.

State

Price (fullOB_table, fullOB_pointer) < price (inputOB_table, inputOB_ Pointer) (bid side) price (fullOB_table, fullOB_pointer) > price (inputOB_table, inputOB_pointer)(ask side)

or

fullOB_pointer→end(or empty)

It updates

Append [price (inputOB_table, inputOB_pointer), vol (inputOB_table, inputOB_pointer)]to tempOB_list

Append [price (inputOB_table, inputOB_pointer), vol (inputOB_table, inputOB_pointer)]to change_list

increment inputOB_pointer

Price is identical

Latest news does not influence the present price grade in order.

State

Price (fullOB_table, fullOB_pointer)=price (inputOB_table, inputOB_ pointer)

It updates

Append [price (inputOB_table, inputOB_pointer), Δ vol (inputOB_table, inputOB_pointer)]to change_list

increment fullOB_pointer

increment inputOB_pointer

If pointer is directed toward the ceiling price of order form, state of market is updated to open to the outside world.

Price decline

Scale of price is now below level2size.

State

InputOB_pointer > level2_size

It updates

Append [price (fullOB_table, fullOB_pointer), vol (fullOB_table, fullOB_ pointer)]to tempOB_list

increment fullOB_pointer

When scale of price is more than that 2 size of grade limits, it can be changed, therefore according to their return, valence Lattice grade may be not exclusively accurate.

Processing 2

The processing starts input message category with transaction.Input message is converted to inputTR_table simultaneously by it Matching and modification are executed to newest change_list.Therefore, change_list is All Activity and the movable synthesis of order, It and is the main source for generating the input of financial indicator.

Other than other conditions dictates mentioned below, algorithm is by matching inputTR_table to be existed according to sequence number Immediate matching composition is found in change_list.Note that all steps of algorithm can according to initial prices match and Applied to inquiry/bid list.For matching aspect carry out pair/

The subsequent modification of list.

Normally

State

Vol (inputTR_table)=vol (change_list)

Price (inputTR_table)=price (change_list)

or

Market status=open

It updates

Mark (price, vol) in change_list as a normal transaction

If state of market is open, it is determined that then top minimum bid/challenge quantity recycles and marks All Activity straight To transaction and it is equal to minimum.

It hides

State

Price (inputTR_table)=price (change_list)

Vol (inputTR_table) > vol (change_list)

It updates

Mark (price, vol) in change_list as a hidden transaction

It is unfiled

State

Price (inputTR_table)=price (change_list)

Vol (inputTR_table) < vol (change_list)

It updates

Mark (price, vol) in change_list as a hidden transaction

Note that there is delay during determining whether order activity is transaction, this is because must wait until that Transaction Information is received before can determining active state.This is the direct result for the mode that exchange handles this information.

Descriptive financial indicator

Financial indicator is served as reasons the triple of { data type, movement, type of action } composition.Table 7 and table 8 schematically illustrate this Each of a little tuples indicates anything.Note that order can indicate trade order (coming into force) or wait order (waiting order).

Table 7

Table 8

Table 9

According to desired movement and type of action, data type includes for all of the individual equipment in designated time period Order meets selection criteria shown in table 10:

Table 10

It as example , ﹛ amount of currency, waits, Chu Jia ﹜ is indicated for the particular device still waited in the bid side of order The summation of price multiplied by all orders exchange hand, and indicate wait purchase order.

Occur change in period (for example, one second) it is accumulative with they be added using aforementioned financial indicator be It is advantageous, because checking that the typical user of this data can monitor the change occurred in submicrosecond is real-time under any circumstance. Once be sent to user, this also result in data cost larger reduction (in addition to bandwidth cost, this cost can for I/O at This).

Explanation and example are in terms of the present invention the 5th and the 6th below, and show how database is used for so that more holding It changes places the minimum value identified in specified serial data.The maximum value in specified serial data can be identified in an analog fashion.Illustrate below For constructing and inquiring the algorithm of database.

To put it more simply, example described below is about with the relatively short data series for reaching about 20 values.So And clearly, described treatment principle can be readily used for biggish data series, wherein by using technological means Particularly effectively execute the evaluation of data series.

Start to sample the data sequence being stored in the database to form department of computer science's uniform part, it is therefore intended that The intermediate description of the record in data sequence is generated, to simplify the identification and extraction of the minimum value in subvolume of interest sequence.In this Between description by sequence be divided by local minimum demarcate a series of subsequences.By be known as covering matched technology come Realize intermediate description and a series of generation of subsequences.

Example 9

The technology example of covering mappings is shown sample data set shown in table 11 is used.The sample data set of table 11 has There are 10 records, is each assigned a position." direction " of location number restriction database, that is, location number is in database Direction of advance on increase and reduce in the direction of retreat of database.In general, data record is entered data with time sequencing Library, that is, first position includes oldest record and highest location number includes nearest record.

Table 11

Position	1	2	3	4	5	6	7	8	9	10
											Record	5	16	4	10	21	22	13	6	7	7

In order to generate intermediate description, for each data record r in sequence_cFollowing operation must successively be executed from most Old is recorded nearest record movement.

Operation 1

Since current location, checked to search the capped position being worth earliest.That is, in the database older It is moved back on the direction of record, is less than or equal to current record (r until encountering to have_c) value record (r₁).Then, remember Record (r₁) position be used to limit the coverage values of current location.Pass through (position (r₁)+1) provide coverage values.If do not recorded With search criterion, then position (r₁)=0, and coverage values are 1.

Table 12 shows the data set of table 1, is updated to show the coverage values for each record.

Table 12

Position	1	2	3	4	5	6	7	8	9	10
											Record	4	16	4	10	21	22	13	6	7	7
Covering	1	2	1	4	5	6	5	4	9	10

Coverage values are the identifiers of the local trend in data set.If local trend for record value to keep constant or Person is increase with time (for example, in the subsequence after position 4 and 6).Then coverage values are identical as positional value.If local trend Reduce (for example, in subsequence of record 7 and 8) at any time for record value, then coverage values will be greater than positional value.

Operation 2

If coverage values are less than record (r_c) position, then from (but being not limited to) covering position (r₁) until (and including) when Front position (r_c) all precedence records of (value of the record in current location) label.This may mean that when to the elder generation in sequence Preceding record executes the label distributed when the operation and is rewritten.Table 13 shows the revision of the database of table 11, wherein successively Operation 1 and 2 is executed for all 10 records.

Table 13

Position	1	2	3	4	5	6	7	8	9	10
											Record	5	16	4	10	21	22	13	6	7	7
Covering	1	2	1	4	5	6	5	4	9	10
											Label	4	4	4	6	6	6	6	6	7	7

Once successively performing operation 1 and 2 to each record in sequence, the database of modification may be used for identifying defeated Enter the minimum value in data sequence.

In order to position from the minimum value recorded in the specified serial data back extended recently, need to identify specified serial data Label at starting position.In this example, it if position 10 includes nearest data value, is recorded recently by the 5th (that is, label at position 6) is marked to provide the minimum values of last five records.Similarly, pass through the 9th nearest record Label (that is, label at position 2) provide it is last nine record minimum values.

In this example, for position the technology of minimum value only in interested serial data since nearest record to returning In the case where extension effectively.It needs for increasing to each new record of database come more new database (including each record Coverage values and mark value).

Example 10

Table 14 is shown as each new record is entered how database establishes the detailed of database in a series of stages Thin example.Therefore, in the stage 1, database has 1 record at position 1, and in stage n, database has n note Record, n-th of record are in position n.As each new record is inserted into database, as solved in the row for being designated as " algorithm steps " It releases, updates coverage values and mark value.

Table 14

In each stage, database can be used for determining since nearest record to the minimum for returning the designated character string extended Value.Therefore, it in stage n, can determine since the minimum value n-th of record into the designated character string for returning extension.

For example, in the stage 4, by second label recorded recently (that is, the label at position 3, in such case It is equal to the minimum value for 3) providing most latter two record down.

In the stage 5, (that is, the label at position 4, in this case etc. by second label recorded recently In the minimum value for 5) providing most latter two record.

Example 11

Table 15-18 is shown for how longer data series generate intermediate description.In these tables, relative to most The record value closely inputted updates covering and label.Table is indicated for the stage 5,10,15 and 20 (that is, inputting 5,10,15 respectively With 20 record after) database brief description.For in each moment since nearest record to Hui Yan The data of interest string stretched shows sample queries.

Table 15 (stage 5)

Table 16 (stage 10)

Table 17 (stage 15)

Table 18 (stage 20)

Determine the maximum value in subsequence

The method illustrated in the example 9-11 for constructing and inquiring database and algorithm can be modified, to allow to identify number According to the maximum value of the subsequence in library.

Furthermore each of data-base recording is assigned " position ".In general, input data will be recorded with time sequencing Library, so that oldest record distributes position 1, and n-th of record distributes position n.Therefore, forward direction of the location number in database Increase on (for example, as record became closer to the nearest time), and in database to time direction (for example, record becomes It is older) on reduce.

Generate the intermediate description of data, wherein nearest record is moved to from oldest record, for each of sequence Data record r_cExecute following operation.Whenever increasing new record, it is necessary to update intermediate description.

Operation 1: it since current location, is checked to search the capped position being worth earliest.That is, in older record Direction on move back in the database, until encounter have be less than or equal to current record (r_c) value record (r₁).So Afterwards, (r is recorded₁) position be used to limit the coverage values of current location.Pass through (position (r₁)+1) provide coverage values.If no Record matching search criterion, then position (r₁)=0, and coverage values are 1.

Operation 2: if coverage values are less than record (r_c) position, then from (but not including) covering position (r₁) arrive present bit Set (r_c) all precedence records of (value of the record in current location) label.This can require to hold the precedence record in sequence The label distributed when row operation 2 is rewritten.

Example 12

Table 19 show with each new record be entered database how in a series of stages establish for determine most The database being worth greatly.Therefore, in the stage 1, database has 1 record at position 1, and in stage n, database has N record, n-th of record are in position n.As each new record is inserted into database, such as in the row for being designated as " algorithm steps " It is middle explained, update coverage values and mark value.

Table 19

The database allows for the given stage to be determining maximum to the specified serial data extended is returned since nearest record Value.Thus, for example, providing last four notes by the label (there is value 17 in this case) at position 2 in the stage 5 Maximum value in record.

In this example, the inquiry of the maximum value in subsequence interested is positioned only to from the nearest note for increasing to database Record starts effective to the serial data for returning extension.

Although describing the present invention in conjunction with above-mentioned example embodiment, it shall be apparent to one skilled in the art that mentioning It is obvious for supplying many equivalent modifications and variations of the disclosure.Therefore, the exemplary embodiment party of invention set forth above Formula is considered illustrative rather than restrictive.It without departing from the spirit and scope of the present invention, can be to institute It states embodiment and carries out various change.

All references above-mentioned are hereby expressly incorporated by citation.

Claims

1. a kind of set for according to input value is come the computer system of number of computations, the computer system includes being configured To store the first input value x_nSequence database, wherein n=1,2 ... i,

Wherein, for any n value, x_nIt is located at x in the sequence_n+1And x_n-1Between, and associated with following values:

Second input value y_n；

Conversion value p_n, according to transfer algorithm by y_nIt is calculated；And

Output valve z_n, wherein z_n=z_n-1+p_n；

The database is further configured to store multiple output valve z_n。

2. computer system according to claim 1, wherein x_nFor time value.

3. computer system according to claim 1 or 2, wherein x_nFor digital value.

4. computer system according to claim 3, wherein x_n+1-x_n=x_n-x_n-1。

5. computer system according to claim 4, wherein x_nFor time value, and x_n-x_n-1Less than 0.5s, preferably Less than 0.1s, even more preferably less than 0.05s.

6. according to computer system described in any of the above-described claim, wherein i > 1000.

7. according to computer system described in any of the above-described claim, wherein the second input value y_nTo describe financial transaction Variable.

8. according to computer system described in any of the above-described claim, wherein

p_n=y_n；

p_n=y_n ²；

p_n=x_n·y_n；Or

p_nFor y_nThe function of the distance between another second input value.

9. according to computer system described in any of the above-described claim, wherein for any n value, x_nWith another input value y '_n It is associated, and p_n=y_n·y′_n。

10. a kind of computer implemented method, for according to the first input value x that n=b is extended to from n=a_nOne of sequence The corresponding second input value y of split-phase_nSet carry out number of computations, and including from the number according to any of the above-described claim According to extracting at least one output valve z in library_n, wherein n=b.

11. according to the method described in claim 10, including extracting another output valve z from the database_nAnother step, Wherein, n=a.

12. method described in 0 or 11 according to claim 1, wherein the quantity of calculating are as follows:

With the first input value x_nSequence the corresponding second input value y in the part_nSummation；

With the first input value x_nSequence the corresponding second input value y in the part_nAverage value；

With the first input value x_nSequence the corresponding second input value y in the part_nVariance；

By with the first input value x_nSequence the corresponding second input value y in the part_nThe distance covered；Or Person

The second input value y_nWith correspond to the first input value x_nSequence the part the first input value x_n Correlation.

13. method described in 0 or 11 according to claim 1, using database according to claim 9, wherein calculating Quantity are as follows:

Second input value y_nWith correspond to the first input value x_nSequence the part another input value y '_nProduct Average value；Or

Second input value y_nWith correspond to the first input value x_nSequence the part the another input value y '_n's Correlation.

14. method described in any one of 1 to 13 according to claim 1 further includes in the step for extracting the another output valve The step of extracting added value from the database while rapid, the added value storage is in the caches based on subsequent It calculates.

15. according to the method for claim 14, wherein the another output valve and the added value provide output valve together z_nSequence subsequence.

16. a kind of computer implemented method that preceding value of the database for storage time contingent variable is edited in time t, including Selection with from time t to the step of the set for the value for returning the corresponding time dependence variable of the sequence of predetermined time interval measured Suddenly.

17. according to the method for claim 16, the step including selection corresponding to the value of the time dependence variable of time t Suddenly.

18. method according to claim 16 or 17, wherein when two maximums in the sequence of the predetermined time interval Between be spaced difference be greater than the sequence in two minimum intervals difference.

19. according to the method for claim 18, wherein from time t to the sequence for the predetermined time interval for returning measurement Include:

Respectively less than one minute two continuous time intervals, and

Two other continuous time intervals in the sequence, the difference of described two other time intervals are at least one point Clock.

20. method described in 8 or 19 according to claim 1, wherein between the continuous time in the sequence of the predetermined time interval Increase every its difference with from time t to the increase of the amplitude for the time interval for returning measurement.

21. a kind of computer implemented method classified according to predtermined category standard logarithmic according to collection, comprising the following steps:

The data set is taken an overall view of to determine whether all values in the data set are unordered according to the predtermined category standard, and

If all values in the data set are unordered, classified according to the preassigned to the data set.

22. according to the method for claim 21, wherein each value in the data set is multiple measured data values Function.

23. according to the method for claim 22, wherein each value in the data set is equal are as follows:

A) average value of the multiple measured data values；

B) summation of the multiple measured data values；

C) maximum value or minimum value of the multiple measured data values；Or

D) standard deviation of the multiple measured data values.