CN102779097B

CN102779097B - Memory access method of flow data

Info

Publication number: CN102779097B
Application number: CN201110124780.7A
Authority: CN
Inventors: 杨威
Original assignee: Shanghai Zhenghua Heavy Industries Co Ltd
Current assignee: Shanghai Zhenghua Heavy Industries Co Ltd
Priority date: 2011-05-13
Filing date: 2011-05-13
Publication date: 2015-01-21
Anticipated expiration: 2031-05-13
Also published as: CN102779097A

Abstract

The invention relates to a memory access method of flow data for improving the real-time throughout of a database. The method has the characteristics that a structured inverted list is utilized to organize data, and comprises the following steps of: executing an initialization step; establishing a mapping table, an inverted list header structure and a data storage region in the memory, wherein the mapping table records a mapping relation between each data source and one unique ID, the inverted list header structure records the ID, the attribute information and the data queue pointer of each data source, and the data storage region stores one or more data items of each data source by the data queues; when receiving the data items, finding the inverted list header structure according to the ID corresponding to the data source of the data items; and then, filtering and compressing based on the information recorded in the inverted list header structure so as to determine whether to store the data items, if so, storing the data items in the storage region corresponding to the data source, if not, discarding the data items.

Description

The internal storage access method of flow data

Technical field

The present invention relates to RTDB in Industry Control, especially relate to the internal storage access method of the flow data of real-time data base.

Background technology

Real-time data base (Real Time Data Base, RTDB) for the service data of harvester, grasp the operation conditions of device, and the critical data of production run is monitored and analyzed, produced problem is processed in time, historical data is analysed scientifically, make production run state steady, material supply balances, and reduces unit consumption, increase economic efficiency, reduce costs.

Current minority u s company monopolizes RTDB in Industry Control field.Their valuable product, usually only for large-scale huge enterprise, real-time data base products quotation required for medium-sized enterprise Renminbi up to a million possibly, the deployment cost of great number becomes process control automation threshold together with informationalized, constrains the development of domestic medium and small sized enterprises.

Fig. 1 illustrates the flow chart of data processing of RTDB in Industry Control.With reference to shown in Fig. 1, first the data from data source 10 filter in compression filter 20, then send into buffer structure 30, and when file arrives opportunity, data will be deposited in historical data base 40, keep enduringly.Wherein buffer structure 30 is that computed internal memory realizes, and historical data base 40 realizes with the hard disk of computing machine usually.

How traditional real-time data base manufacturer improves data compression rate and reduces on storage space if being all placed on main attention, all do not focus on for data organizing in internal memory.This processing mode is done when more persistence is stayed in multitask again, in order to effectively utilize hard disk performance, needs certain optimization when writing data.Simultaneously because memory buffer plot structure is simple, cause and there is certain difficulty to the retrieval of historical data in internal memory, efficiency is not high.Such as U.S. OSI Software company PI (Plant Information System) system is that compressed significant data is put into high-speed cache (cache), this is cached in internal memory and realizes with internal memory Buffer, does not carry out classification process to data.So seem and simplify operation, improve handling capacity, but in write hard disk process, need a large amount of CPU to intervene.If persistence process is not optimized can cause hard disc request frequently, and in fact optimizing process again reduces the handling capacity of system.

In the Info Plus system of U.S. Ai Siben (Aspen Tech) company, historical data is stored in internal memory with interrecord structure, and each interrecord structure belongs to specific class or race.This way compares high-speed cache (cache) storage mode of PI system, although need the more CPU processing time, and can throughput of system be reduced, but the data structure of its customization will meet the request of application further, and easier embedding data treatment progress (as time tag, overload alarm etc.).But the memory model of this design is not still structurized, this just causes during data persistence still needs to intervene, and reduces performance further.

Summary of the invention

The invention provides a kind of internal storage access method of the flow data based on inverted list, to solve existing memory structure Problems existing.

The present invention is that to solve the problems of the technologies described above the technical scheme adopted be the internal storage access method proposing a kind of flow data, comprises the following steps:

Carry out an initialization step, this initialization step sets up mapping table, inverted list header structure and data storage areas in internal memory, this mapping table records the mapping relations of each data source and a unique ID, this inverted list header structure records the ID of each data source, attribute information and data queue's pointer, and this data storage areas stores one or more data item of each data source with data queue;

Receive a data item;

Find the ID corresponding to data source producing this data item;

If this data source does not have corresponding ID, then this data source is mapped to a new ID, and sets up inverted list header structure body for this data source;

If data source has corresponding ID, then corresponding ID is used to find inverted list header structure body;

The information recorded in application inverted list header structure body carries out filtering and compressing, to determine whether storing data item, if storing data item, then by data item stored in this data storage areas, if not storing data item, then abandon data item.

In one embodiment of this invention, this inverted list header structure comprises multiple header structure body, the corresponding data source of each header structure body.

In one embodiment of this invention, the memory size that takies of the data item of each this data source is constant.

In one embodiment of this invention, the head pointer of this data queue's this data storage areas of pointer record and current available pointer.

In one embodiment of this invention, said method is also included in inverted list header structure and maintains snapshot storage, and it records the latest data item of each data source.

In one embodiment of this invention, said method is also included in inverted list header structure and maintains compression related data, and it records the ephemeral data item in above-mentioned compression process.

In one embodiment of this invention, when said method is also included in persistence, directly by the data item queue write hard disk in this data storage areas.

The present invention is owing to adopting above technical scheme, make it compared with prior art, row structuring can be internally deposited into when system starts, thus the retrieval be conducive to internal storage data in system cloud gray model, and only internal storage data need be write direct when making persistence hard disk and without the need to through classification and optimize, therefore, the present invention can reach higher handling capacity.

Accompanying drawing explanation

For above-mentioned purpose of the present invention, feature and advantage can be become apparent, below in conjunction with accompanying drawing, the specific embodiment of the present invention is elaborated, wherein:

Fig. 1 illustrates the flow chart of data processing of RTDB in Industry Control.

Fig. 2 illustrates the internal storage access method process flow diagram of one embodiment of the invention.

Fig. 3 illustrates the mapping table of memory structure according to an embodiment of the invention.

Fig. 4 illustrates the inverted list header structure of memory structure according to an embodiment of the invention.

Fig. 5 illustrates the data storage area of memory structure according to an embodiment of the invention.

Embodiment

Summarily say, the present invention adopts a kind of datarams memory technology based on inverted list.Due structure when historical data has just been organized into persistence in internal memory, both the convenient tracking to individual data source state, supported again the storage of mass data.Although the structure of internal memory inverted list is influential system handling capacity to a certain extent, but structurized data make hard disk persistence process become easily simple in internal memory, do not need CPU too much to intervene, and the intervention of dma controller make the hard disk write operation efficiency of contiguous memory higher.

The memory structure of data forms with data storage area three part primarily of the mapping table of data source and ID, inverted list header structure vector.Wherein, two parts form an inverted list, and Part I is used for carrying out high speed access to inverted list content.

Fig. 3 illustrates the mapping table of memory structure according to an embodiment of the invention.With reference to shown in Fig. 3, this mapping table 300 is the mappings carrying out data source and ID.In table, data source represents with Tag Name, the corresponding ID of each Tag Name, and is stored in Hash bucket.Hash bucket can comprise one or more data source.This mapping table adds in new data source and revises renewal when deleting, and a corresponding ID given by the mapped device of Tag Name 302 entered.Data item for fundamental, can find Tag Name by id with (id, * value, the time kills).In addition, mapping table can carry out hard disk persistence in time when there is change.

Mapping table is the prerequisite of system cloud gray model.System starts, and first must set up the mapping table of data source and ID during Fault recovery, and any change carried out this table simultaneously operates and must be mutual exclusion and carry out persistence immediately after operation completes.In order to ID fast access data source, facilitate the hard-disc storage of historical data, designing with ID is array index, pointer be array item look into array of pointers 301 soon.When system adds new data source, mapper is used to distribute a system identifier for data source.In real-time data base, this ID is exactly unique sign of this data source, and the name of data source itself is saved in the mapping table, and in real-time dataBase system, this name will temporarily not used.In order to subsequent operation efficiency is considered, system identifier is order positive integer.

Fig. 4 illustrates the inverted list header structure of memory structure according to an embodiment of the invention.With reference to shown in Fig. 4, this inverted list header structure 400 comprises multiple header structure body 401, and each header structure body 401 is for a data source is set up.Inverted list header structure is the point data base storing data source important information, and be also the memory carrier of data item process (compression is filtered) required supplementary, vector subscript is system identifier.

As shown on the right side of Fig. 4, each header structure body 401 optionally comprises ID, point data base property set, and snapshot stores, compression related data, data queue's pointer, statistics gatherer information etc.ID can be used for the index of header structure body.Point data base property set records the attribute of this part data.Snapshot stores the latest data item recording each data source.Some ephemeral data items in the compression process that compression related data is carried out before recording and entering memory structure, such as, go up an observed reading, a upper storing value, maximum slope Kmax, minimum slope Kmin, maximum slope data U _kmaxwith a minimum slope data U _kmin, when using slope comparison method to carry out packed data, these information can make compression filter fast access, thus improve the speed of compression.Data queue's pointer is the pointer of the data queue's (will describe below with reference to Fig. 5) pointing to data storage areas.Statistics gatherer information records the Information Monitoring to the data queue be positioned at after header structure body, and this will further describe later.

In each header structure body 401, particular data record, in specific position, only needs structure beginning pointer to add that data-bias just can take out particular item during access.The entry address of the in store historical data memory storage area of this structure simultaneously, adding important historical data (need stored in the data of historical data base) operation for particular source also needs to access this data head.

Fig. 5 illustrates the data storage area of memory structure according to an embodiment of the invention.With reference to shown in Fig. 5, data storage area 500 is made up of data queue.Data queue is system is one piece of physical address continuous print internal memory that certain data source is distributed, and data item is with in basic structure { ID, value, timestamp } write memory.For particular source, guarantee that each data item committed memory constant magnitude in this source is with convenient search.Simultaneously the head pointer of recorded data zone and current available pointer (data queue's pointer statistics gatherer information) in the body of header structure shown in Fig. 4, add new data item to facilitate in data field.It is that data source distributes new memory headroom that triggering system is completely understood by data queue.The filing of data in EMS memory, by specific events trigger, as: memory usage arrives threshold value, arrives internal storage data time of filing point etc.During filing, without the need to extra process, only need that whole data storage areas is put into hard disk and carry out persistence.

Fig. 2 illustrates the internal storage access method process flow diagram of one embodiment of the invention.With reference to shown in Fig. 2, carry out initialization step S1 when system starts.

This step completes the configuration of most data point when system starts, set up mapping table as shown in Figure 3 thus, and inverted list header structure as shown in Figure 4 and Figure 5 and data storage areas, this data storage areas dynamic is subsequent data point reserved memory space.This just established vector structure and distributes internal memory for each item before real time data arrives.

During real-time data base process input data, only need will need the store data items of persistence after treatment in the memory headroom configured for respective counts strong point.Usually, the process carried out comprises filtration, and compression, level and smooth etc., the data record that these action needs are used and information also will be stored in internal storage structure.

Step S2 starts, receiving data item, and in mapper (see Fig. 3), find the mapping ID of the data source of this data item of generation as step S3, if do not found, then this data item is from new data source, the Tag Name in new data source is mapped to an ID, and is saved in mapping table.Then the data source being data item in step S9 sets up header structure body, then in step S7, by store data items in the data storage area of this newly-built header structure index.In addition, map ID if found in step S3, then enter step S5, find header structure body further by ID.Then in step S6, by the relevant information stored in header structure body, filter, squeeze operation, to determine whether store this data item, if so, then in step S7 by corresponding data item stored in data storage area, otherwise abandon these data item in step s 8.

When data storage area reaches certain condition (when arriving persistence time point or memory consumption scale arrival threshold value), carry out mass persistence.

When stored in historical data base, the memory block write hard disk only these need organized by header structure and data queue.Past just needs the data of carrying out before writing hard disk sort out and write the major part work of hard disk optimization by this, do, and during initialization, system does not have requirement of real-time, thus reach the object improving throughput of system when being placed on system initialization.

Due to simultaneously for different memory regions is distributed in different pieces of information source, therefore the memory operation of data can carry out multi-thread concurrent operation, and does not need the intervention of synchrolock.

In addition, internal storage data storage organization have employed inverted list structure, and each data source has unique ID when initialization and this ID uniquely determines an inverted list header structure.Such bottom data storage organization, distributedization for system provides effective guarantee.During distributed operation, when only having data source ID to change, each server just needs to carry out synchronous, other time etching system all do not need synchronously, each server all can independent operating.And data source change belongs to small probability event in the real-time dataBase system of stable operation.Such framework characteristic is that system expansion provides good architecture basics.

Meanwhile, based on the storage of inverted list, effective distributedization can be carried out.Existing real-time dataBase system retrains by hardware computing power, and handling capacity has the limit (PI system is outstanding figure in the industry, and his processing power is that 100w data are per second on high-end server).If necessary, native system can build cheap cluster (can based on Hadoop etc. increase income distributed processing platform), and theory unlimited expands processing power.

Although the present invention discloses as above with preferred embodiment; so itself and be not used to limit the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when doing a little amendment and perfect, therefore protection scope of the present invention is when being as the criterion of defining with claims.

Claims

1. an internal storage access method for flow data, comprises the following steps:

Receive a data item;

Find the ID corresponding to data source producing this data item;

If this data source does not have corresponding ID, then this data source is mapped to a new ID, and sets up inverted list header structure for this data source;

If data source has corresponding ID, then corresponding ID is used to find inverted list header structure;

The information recorded in application inverted list header structure is filtered data item and compresses, to determine whether storing data item, if storing data item, then by data item stored in this data storage areas, if not storing data item, then abandon data item.

2. the method for claim 1, is characterized in that, this inverted list header structure comprises multiple header structure body, the corresponding data source of each header structure body.

3. the method for claim 1, is characterized in that, the memory size that the data item of each this data source takies is constant.

4. the method for claim 1, is characterized in that, the head pointer of this data queue's this data storage areas of pointer record and current available pointer.

5. the method for claim 1, is characterized in that, be also included in inverted list header structure and maintain snapshot storage, it records the latest data item of each data source.

6. the method for claim 1, is characterized in that, is also included in inverted list header structure and maintains compression related data, and it records the ephemeral data item in compression process.

7. the method for claim 1, is characterized in that, when being also included in persistence, directly by the data item queue write hard disk in this data storage areas.