CN102779097B - Memory access method of flow data - Google Patents

Memory access method of flow data Download PDF

Info

Publication number
CN102779097B
CN102779097B CN201110124780.7A CN201110124780A CN102779097B CN 102779097 B CN102779097 B CN 102779097B CN 201110124780 A CN201110124780 A CN 201110124780A CN 102779097 B CN102779097 B CN 102779097B
Authority
CN
China
Prior art keywords
data
header structure
inverted list
data source
data item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110124780.7A
Other languages
Chinese (zh)
Other versions
CN102779097A (en
Inventor
杨威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhenghua Heavy Industries Co Ltd
Original Assignee
Shanghai Zhenghua Heavy Industries Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhenghua Heavy Industries Co Ltd filed Critical Shanghai Zhenghua Heavy Industries Co Ltd
Priority to CN201110124780.7A priority Critical patent/CN102779097B/en
Publication of CN102779097A publication Critical patent/CN102779097A/en
Application granted granted Critical
Publication of CN102779097B publication Critical patent/CN102779097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a memory access method of flow data for improving the real-time throughout of a database. The method has the characteristics that a structured inverted list is utilized to organize data, and comprises the following steps of: executing an initialization step; establishing a mapping table, an inverted list header structure and a data storage region in the memory, wherein the mapping table records a mapping relation between each data source and one unique ID, the inverted list header structure records the ID, the attribute information and the data queue pointer of each data source, and the data storage region stores one or more data items of each data source by the data queues; when receiving the data items, finding the inverted list header structure according to the ID corresponding to the data source of the data items; and then, filtering and compressing based on the information recorded in the inverted list header structure so as to determine whether to store the data items, if so, storing the data items in the storage region corresponding to the data source, if not, discarding the data items.

Description

The internal storage access method of flow data
Technical field
The present invention relates to RTDB in Industry Control, especially relate to the internal storage access method of the flow data of real-time data base.
Background technology
Real-time data base (Real Time Data Base, RTDB) for the service data of harvester, grasp the operation conditions of device, and the critical data of production run is monitored and analyzed, produced problem is processed in time, historical data is analysed scientifically, make production run state steady, material supply balances, and reduces unit consumption, increase economic efficiency, reduce costs.
Current minority u s company monopolizes RTDB in Industry Control field.Their valuable product, usually only for large-scale huge enterprise, real-time data base products quotation required for medium-sized enterprise Renminbi up to a million possibly, the deployment cost of great number becomes process control automation threshold together with informationalized, constrains the development of domestic medium and small sized enterprises.
Fig. 1 illustrates the flow chart of data processing of RTDB in Industry Control.With reference to shown in Fig. 1, first the data from data source 10 filter in compression filter 20, then send into buffer structure 30, and when file arrives opportunity, data will be deposited in historical data base 40, keep enduringly.Wherein buffer structure 30 is that computed internal memory realizes, and historical data base 40 realizes with the hard disk of computing machine usually.
How traditional real-time data base manufacturer improves data compression rate and reduces on storage space if being all placed on main attention, all do not focus on for data organizing in internal memory.This processing mode is done when more persistence is stayed in multitask again, in order to effectively utilize hard disk performance, needs certain optimization when writing data.Simultaneously because memory buffer plot structure is simple, cause and there is certain difficulty to the retrieval of historical data in internal memory, efficiency is not high.Such as U.S. OSI Software company PI (Plant Information System) system is that compressed significant data is put into high-speed cache (cache), this is cached in internal memory and realizes with internal memory Buffer, does not carry out classification process to data.So seem and simplify operation, improve handling capacity, but in write hard disk process, need a large amount of CPU to intervene.If persistence process is not optimized can cause hard disc request frequently, and in fact optimizing process again reduces the handling capacity of system.
In the Info Plus system of U.S. Ai Siben (Aspen Tech) company, historical data is stored in internal memory with interrecord structure, and each interrecord structure belongs to specific class or race.This way compares high-speed cache (cache) storage mode of PI system, although need the more CPU processing time, and can throughput of system be reduced, but the data structure of its customization will meet the request of application further, and easier embedding data treatment progress (as time tag, overload alarm etc.).But the memory model of this design is not still structurized, this just causes during data persistence still needs to intervene, and reduces performance further.
Summary of the invention
The invention provides a kind of internal storage access method of the flow data based on inverted list, to solve existing memory structure Problems existing.
The present invention is that to solve the problems of the technologies described above the technical scheme adopted be the internal storage access method proposing a kind of flow data, comprises the following steps:
Carry out an initialization step, this initialization step sets up mapping table, inverted list header structure and data storage areas in internal memory, this mapping table records the mapping relations of each data source and a unique ID, this inverted list header structure records the ID of each data source, attribute information and data queue's pointer, and this data storage areas stores one or more data item of each data source with data queue;
Receive a data item;
Find the ID corresponding to data source producing this data item;
If this data source does not have corresponding ID, then this data source is mapped to a new ID, and sets up inverted list header structure body for this data source;
If data source has corresponding ID, then corresponding ID is used to find inverted list header structure body;
The information recorded in application inverted list header structure body carries out filtering and compressing, to determine whether storing data item, if storing data item, then by data item stored in this data storage areas, if not storing data item, then abandon data item.
In one embodiment of this invention, this inverted list header structure comprises multiple header structure body, the corresponding data source of each header structure body.
In one embodiment of this invention, the memory size that takies of the data item of each this data source is constant.
In one embodiment of this invention, the head pointer of this data queue's this data storage areas of pointer record and current available pointer.
In one embodiment of this invention, said method is also included in inverted list header structure and maintains snapshot storage, and it records the latest data item of each data source.
In one embodiment of this invention, said method is also included in inverted list header structure and maintains compression related data, and it records the ephemeral data item in above-mentioned compression process.
In one embodiment of this invention, when said method is also included in persistence, directly by the data item queue write hard disk in this data storage areas.
The present invention is owing to adopting above technical scheme, make it compared with prior art, row structuring can be internally deposited into when system starts, thus the retrieval be conducive to internal storage data in system cloud gray model, and only internal storage data need be write direct when making persistence hard disk and without the need to through classification and optimize, therefore, the present invention can reach higher handling capacity.
Accompanying drawing explanation
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, below in conjunction with accompanying drawing, the specific embodiment of the present invention is elaborated, wherein:
Fig. 1 illustrates the flow chart of data processing of RTDB in Industry Control.
Fig. 2 illustrates the internal storage access method process flow diagram of one embodiment of the invention.
Fig. 3 illustrates the mapping table of memory structure according to an embodiment of the invention.
Fig. 4 illustrates the inverted list header structure of memory structure according to an embodiment of the invention.
Fig. 5 illustrates the data storage area of memory structure according to an embodiment of the invention.
Embodiment
Summarily say, the present invention adopts a kind of datarams memory technology based on inverted list.Due structure when historical data has just been organized into persistence in internal memory, both the convenient tracking to individual data source state, supported again the storage of mass data.Although the structure of internal memory inverted list is influential system handling capacity to a certain extent, but structurized data make hard disk persistence process become easily simple in internal memory, do not need CPU too much to intervene, and the intervention of dma controller make the hard disk write operation efficiency of contiguous memory higher.
The memory structure of data forms with data storage area three part primarily of the mapping table of data source and ID, inverted list header structure vector.Wherein, two parts form an inverted list, and Part I is used for carrying out high speed access to inverted list content.
Fig. 3 illustrates the mapping table of memory structure according to an embodiment of the invention.With reference to shown in Fig. 3, this mapping table 300 is the mappings carrying out data source and ID.In table, data source represents with Tag Name, the corresponding ID of each Tag Name, and is stored in Hash bucket.Hash bucket can comprise one or more data source.This mapping table adds in new data source and revises renewal when deleting, and a corresponding ID given by the mapped device of Tag Name 302 entered.Data item for fundamental, can find Tag Name by id with (id, * value, the time kills).In addition, mapping table can carry out hard disk persistence in time when there is change.
Mapping table is the prerequisite of system cloud gray model.System starts, and first must set up the mapping table of data source and ID during Fault recovery, and any change carried out this table simultaneously operates and must be mutual exclusion and carry out persistence immediately after operation completes.In order to ID fast access data source, facilitate the hard-disc storage of historical data, designing with ID is array index, pointer be array item look into array of pointers 301 soon.When system adds new data source, mapper is used to distribute a system identifier for data source.In real-time data base, this ID is exactly unique sign of this data source, and the name of data source itself is saved in the mapping table, and in real-time dataBase system, this name will temporarily not used.In order to subsequent operation efficiency is considered, system identifier is order positive integer.
Fig. 4 illustrates the inverted list header structure of memory structure according to an embodiment of the invention.With reference to shown in Fig. 4, this inverted list header structure 400 comprises multiple header structure body 401, and each header structure body 401 is for a data source is set up.Inverted list header structure is the point data base storing data source important information, and be also the memory carrier of data item process (compression is filtered) required supplementary, vector subscript is system identifier.
As shown on the right side of Fig. 4, each header structure body 401 optionally comprises ID, point data base property set, and snapshot stores, compression related data, data queue's pointer, statistics gatherer information etc.ID can be used for the index of header structure body.Point data base property set records the attribute of this part data.Snapshot stores the latest data item recording each data source.Some ephemeral data items in the compression process that compression related data is carried out before recording and entering memory structure, such as, go up an observed reading, a upper storing value, maximum slope Kmax, minimum slope Kmin, maximum slope data U kmaxwith a minimum slope data U kmin, when using slope comparison method to carry out packed data, these information can make compression filter fast access, thus improve the speed of compression.Data queue's pointer is the pointer of the data queue's (will describe below with reference to Fig. 5) pointing to data storage areas.Statistics gatherer information records the Information Monitoring to the data queue be positioned at after header structure body, and this will further describe later.
In each header structure body 401, particular data record, in specific position, only needs structure beginning pointer to add that data-bias just can take out particular item during access.The entry address of the in store historical data memory storage area of this structure simultaneously, adding important historical data (need stored in the data of historical data base) operation for particular source also needs to access this data head.
Fig. 5 illustrates the data storage area of memory structure according to an embodiment of the invention.With reference to shown in Fig. 5, data storage area 500 is made up of data queue.Data queue is system is one piece of physical address continuous print internal memory that certain data source is distributed, and data item is with in basic structure { ID, value, timestamp } write memory.For particular source, guarantee that each data item committed memory constant magnitude in this source is with convenient search.Simultaneously the head pointer of recorded data zone and current available pointer (data queue's pointer statistics gatherer information) in the body of header structure shown in Fig. 4, add new data item to facilitate in data field.It is that data source distributes new memory headroom that triggering system is completely understood by data queue.The filing of data in EMS memory, by specific events trigger, as: memory usage arrives threshold value, arrives internal storage data time of filing point etc.During filing, without the need to extra process, only need that whole data storage areas is put into hard disk and carry out persistence.
Fig. 2 illustrates the internal storage access method process flow diagram of one embodiment of the invention.With reference to shown in Fig. 2, carry out initialization step S1 when system starts.
This step completes the configuration of most data point when system starts, set up mapping table as shown in Figure 3 thus, and inverted list header structure as shown in Figure 4 and Figure 5 and data storage areas, this data storage areas dynamic is subsequent data point reserved memory space.This just established vector structure and distributes internal memory for each item before real time data arrives.
During real-time data base process input data, only need will need the store data items of persistence after treatment in the memory headroom configured for respective counts strong point.Usually, the process carried out comprises filtration, and compression, level and smooth etc., the data record that these action needs are used and information also will be stored in internal storage structure.
Step S2 starts, receiving data item, and in mapper (see Fig. 3), find the mapping ID of the data source of this data item of generation as step S3, if do not found, then this data item is from new data source, the Tag Name in new data source is mapped to an ID, and is saved in mapping table.Then the data source being data item in step S9 sets up header structure body, then in step S7, by store data items in the data storage area of this newly-built header structure index.In addition, map ID if found in step S3, then enter step S5, find header structure body further by ID.Then in step S6, by the relevant information stored in header structure body, filter, squeeze operation, to determine whether store this data item, if so, then in step S7 by corresponding data item stored in data storage area, otherwise abandon these data item in step s 8.
When data storage area reaches certain condition (when arriving persistence time point or memory consumption scale arrival threshold value), carry out mass persistence.
When stored in historical data base, the memory block write hard disk only these need organized by header structure and data queue.Past just needs the data of carrying out before writing hard disk sort out and write the major part work of hard disk optimization by this, do, and during initialization, system does not have requirement of real-time, thus reach the object improving throughput of system when being placed on system initialization.
Due to simultaneously for different memory regions is distributed in different pieces of information source, therefore the memory operation of data can carry out multi-thread concurrent operation, and does not need the intervention of synchrolock.
In addition, internal storage data storage organization have employed inverted list structure, and each data source has unique ID when initialization and this ID uniquely determines an inverted list header structure.Such bottom data storage organization, distributedization for system provides effective guarantee.During distributed operation, when only having data source ID to change, each server just needs to carry out synchronous, other time etching system all do not need synchronously, each server all can independent operating.And data source change belongs to small probability event in the real-time dataBase system of stable operation.Such framework characteristic is that system expansion provides good architecture basics.
Meanwhile, based on the storage of inverted list, effective distributedization can be carried out.Existing real-time dataBase system retrains by hardware computing power, and handling capacity has the limit (PI system is outstanding figure in the industry, and his processing power is that 100w data are per second on high-end server).If necessary, native system can build cheap cluster (can based on Hadoop etc. increase income distributed processing platform), and theory unlimited expands processing power.
Although the present invention discloses as above with preferred embodiment; so itself and be not used to limit the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when doing a little amendment and perfect, therefore protection scope of the present invention is when being as the criterion of defining with claims.

Claims (7)

1. an internal storage access method for flow data, comprises the following steps:
Carry out an initialization step, this initialization step sets up mapping table, inverted list header structure and data storage areas in internal memory, this mapping table records the mapping relations of each data source and a unique ID, this inverted list header structure records the ID of each data source, attribute information and data queue's pointer, and this data storage areas stores one or more data item of each data source with data queue;
Receive a data item;
Find the ID corresponding to data source producing this data item;
If this data source does not have corresponding ID, then this data source is mapped to a new ID, and sets up inverted list header structure for this data source;
If data source has corresponding ID, then corresponding ID is used to find inverted list header structure;
The information recorded in application inverted list header structure is filtered data item and compresses, to determine whether storing data item, if storing data item, then by data item stored in this data storage areas, if not storing data item, then abandon data item.
2. the method for claim 1, is characterized in that, this inverted list header structure comprises multiple header structure body, the corresponding data source of each header structure body.
3. the method for claim 1, is characterized in that, the memory size that the data item of each this data source takies is constant.
4. the method for claim 1, is characterized in that, the head pointer of this data queue's this data storage areas of pointer record and current available pointer.
5. the method for claim 1, is characterized in that, be also included in inverted list header structure and maintain snapshot storage, it records the latest data item of each data source.
6. the method for claim 1, is characterized in that, is also included in inverted list header structure and maintains compression related data, and it records the ephemeral data item in compression process.
7. the method for claim 1, is characterized in that, when being also included in persistence, directly by the data item queue write hard disk in this data storage areas.
CN201110124780.7A 2011-05-13 2011-05-13 Memory access method of flow data Active CN102779097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110124780.7A CN102779097B (en) 2011-05-13 2011-05-13 Memory access method of flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110124780.7A CN102779097B (en) 2011-05-13 2011-05-13 Memory access method of flow data

Publications (2)

Publication Number Publication Date
CN102779097A CN102779097A (en) 2012-11-14
CN102779097B true CN102779097B (en) 2015-01-21

Family

ID=47124015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110124780.7A Active CN102779097B (en) 2011-05-13 2011-05-13 Memory access method of flow data

Country Status (1)

Country Link
CN (1) CN102779097B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077405B (en) * 2014-07-08 2018-06-08 国家电网公司 Time sequential type data access method
CN105488047B (en) * 2014-09-16 2019-01-18 华为技术有限公司 Metadata reading/writing method and device
CN112181732A (en) * 2020-10-29 2021-01-05 第四范式(北京)技术有限公司 Recovery method and recovery system of parameter server node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向实时应用的时态数据库***体系结构;万达等;《计算机引用研究》;20081031;第25卷(第10期);2926-2928、2945 *

Also Published As

Publication number Publication date
CN102779097A (en) 2012-11-14

Similar Documents

Publication Publication Date Title
CN102521405B (en) Massive structured data storage and query methods and systems supporting high-speed loading
CN102521406B (en) Distributed query method and system for complex task of querying massive structured data
CN102467570B (en) Connection query system and method for distributed data warehouse
CN102779138B (en) The hard disk access method of real time data
CN100407203C (en) Method for processing mass data
CN100565512C (en) Eliminate the system and method for redundant file in the document storage system
WO2020024799A1 (en) Method for aggregation optimization of time series data
CN103473276B (en) Ultra-large type date storage method, distributed data base system and its search method
CN105243155A (en) Big data extracting and exchanging system
WO2021223451A1 (en) Method, system and device for acquiring action data of automobile production and storage medium
CN103366015A (en) OLAP (on-line analytical processing) data storage and query method based on Hadoop
CN102073697A (en) Data processing method and data processing device
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN109947363A (en) Data caching method of distributed storage system
CN110309233A (en) Method, apparatus, server and the storage medium of data storage
CN107220188A (en) A kind of automatic adaptation cushion block replacement method
CN106570145B (en) Distributed database result caching method based on hierarchical mapping
CN103177035A (en) Data query device and data query method in data base
CN107895046A (en) A kind of Heterogeneous Database Integration Platform
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN102779097B (en) Memory access method of flow data
CN109165096B (en) Cache utilization system and method for web cluster
CN109918450A (en) Based on the distributed parallel database and storage method under analysis classes scene
CN102724279B (en) System for realizing log-saving and log-managing
Chai et al. Adaptive lower-level driven compaction to optimize LSM-tree key-value stores

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant