CN104516912A

CN104516912A - Dynamic data storage method and device

Info

Publication number: CN104516912A
Application number: CN201310459768.0A
Authority: CN
Inventors: 苏燕; 赵洪松; 关德军; 李振嘉; 段云峰; 李红燕; 张美鸥; 王依兴; 孙德志; 迟建德; 李宏昌; 王雅文
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Heilongjiang Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Heilongjiang Co Ltd
Priority date: 2013-09-29
Filing date: 2013-09-29
Publication date: 2015-04-15
Anticipated expiration: 2033-09-29
Also published as: CN104516912B

Abstract

The embodiment of the invention provides a dynamic data storage method. The dynamic data storage method comprises performing integral metadata definition on data to be stored to define the storage strategy of every attribute list of the data to be stored; organizing the attribute lists into different data subsets according to a key-value pair model; according to the storage strategies of the attribute lists, defining physical storage formats for the data subsets, and storing the data subsets according to the physical storage formats. The embodiment of the invention also provides a corresponding dynamic data storage device. According to the dynamic data storage method and device, the dynamic data storage method is achieved through a layered and configurable storage structure and can meet the storage requirements of sparse data sets and dens data sets in mass data treatment simultaneously.

Description

A kind of dynamic date storage method and device

Technical field

The present invention relates to field of computer technology, particularly relate to a kind of dynamic date storage method and device.

Background technology

Along with the high speed development of internet, Internet of Things and ICT industry, associated traffic data presents the situation of explosive growth, and according to IDC report, coming 10 years global metadata amount is by with the speed increment of 40+%.For Business Analysis Support Systemin China Mobile Communication Corporation (calling in the following text through subsystem), along with the introducing of the emerging service data such as internet, data store difficulty, system support pressure increases suddenly, and as the core system of business diagnosis, the support quality of " through subsystem " directly will affect managerial decision, concern operation and the development of company.

Along with the explosive growth through subsystem data amount, how classifying rationally, efficient storage are carried out to all kinds of business datum, realize the high-quality of support system, low cost construction is the problem needing solution at present badly, date storage method conventional at present comprises following two kinds:

Row stores: the memory technology of data warehouse, is carried by Oracle, based on structured data, as: product, bill, service, inventory class business datum.The main advantage that row stores is directly to store by tuple, and the efficiency writing data is higher; Integrality, the reliability of data are high; Be suitable for oltp(On-Line Transaction Processing online transaction processing system) scene.

Row store: the memory technology of internet data platform, is carried by Hadoop Hbase, based on structured, semi-structured data, as: web data, WAP inventory class data.There is not redundant columns when being digital independent in the main advantage that row store, data hit rate is high; The data type of each row (file) is homogeneity, there is not ambiguity, and Data Analysis efficiency is high; Can obtain higher data compression ratio, the treatment effect for sparse data is more obvious; Mode altering cost is low, can ad infinitum arrange by extend property in theory; Be applicable to olap(On-Line Analytical Processing on-line analytical processing) scene.

Row stores and row to store be at present through the main flow storage scheme that subsystem adopts, but along with the introducing of Internet of Things, internet etc. " emerging service data ", the bottleneck of this scheme manifests gradually, major embodiment the following aspects:

Storage scheme is single, there is technology short slab; Waste of storage space situation is serious; The storage scheme of partial service data is reasonable not, very flexible; The mixing of sparse data and dense data, efficient storage cannot be realized; The quality of data of partial service is poor, not accurate enough to the process of " missing values ".

Summary of the invention

In order to solve the problems referred to above that in prior art, data storage aspect exists, the embodiment of the present invention proposes a kind of dynamic date storage method and device.

One aspect of the present invention, provides a kind of dynamic date storage method, comprising:

Overall metadata definition is carried out to data to be stored, defines the storage policy of each attribute column in described data to be stored;

Each attribute column described is woven to different data subsets according to key-value pair model group;

Be described data subset definition physical storage format according to the storage policy of described attribute column, and store described data subset by described physical storage format.

Another aspect of the present invention, provides a kind of dynamic data storage device, comprising:

Definition unit, for carrying out overall metadata definition to data to be stored, defines the storage policy of each attribute column in described data to be stored;

Organization unit, for being woven to different data subsets by each attribute column described according to key-value pair model group;

Storage unit, for being described data subset definition physical storage format according to the storage policy of described attribute column, and stores described data subset by described physical storage format.

Use the disclosed dynamic date storage method of the embodiment of the present invention and device, a kind of dynamic date storage method is realized by layering, configurable storage organization, the storage demand of large Data processing " sparse data and dense data set " can be met simultaneously, solve at present through subsystem " storage scheme single " problem, for the mass data storage such as Internet of Things, internet provides one storage scheme flexibly, the data that can effectively support under " new business pattern " store and analyze;

RCFILE storage means (first horizontal division, vertical division again) inner structure complicated, mode altering cost is high, relatively be suitable for " read-only " data warehouse, and the present invention realizes layering definition, the key assignments separation of storage organization, structure is simple, the degree of coupling is low, can change (as: " increasing or delete columns " only need increase in " the soft mode layer of tabular " configuration or delete a file) by implementation pattern fast;

" all properties row of entity " are processed (first horizontal division according to " principle " by RCFILE, vertical division again), flexibility ratio is inadequate, and the present invention row can be selected pointedly to store for " data characteristics " of entity attribute row or row store, as: the sparse data column selection row for entity store, realize storing and save, store for dense data column selection row, realize efficiently writing;

For the problem of the business ubiquity magnanimity missing data such as Internet of Things, internet, the present invention completes the type definition of missing values, provides a kind of effective technological means for improving large data " analysis quality ";

The technology of this invention realizes the data Layer be positioned at through subsystem " information services domain ", and upper-layer service realizes data interaction by " transparent access layer " between applying, the business degree of coupling is low, processing logic simple and clear, meets through divided data warehouse mass data storage requirements for access completely.

Accompanying drawing explanation

It is the dynamic date storage method schematic diagram that the embodiment of the present invention provides shown in Fig. 1;

It is the schematic diagram of the data store organisation level that the embodiment of the present invention provides shown in Fig. 2;

It is the structurized key assignments model schematic of one that the embodiment of the present invention provides shown in Fig. 3;

It is " multi version " data store organisation schematic diagram of the data object that the embodiment of the present invention provides shown in Fig. 4;

Shown in Fig. 5 be the embodiment of the present invention provide ranks mixing store entirety realize logical schematic;

Figure 6 shows that the entirety that the row that the embodiment of the present invention provides stores realizes logical schematic;

That the entirety that the row that provide of the embodiment of the present invention store realizes logical schematic shown in Fig. 7;

It is the structural representation of the dynamic data storage device that the embodiment of the present invention provides shown in Fig. 8.

Embodiment

In conjunction with the analysis to prior art, build one and both supported that row stored, have and good read performance, support that again row stores, there is good write performance, thus support that " the mixing storage organization " of oltp and olap application exists huge current demand simultaneously, the present invention provides a kind of physical storage format of mix and match by ranks mixing storage layout, the line that not only can realize entity stores or column storage, simultaneously an entity inside, can sparse for different field, the data characteristics such as dense, the row be applicable to is selected to store or row storage format, thus realize the storage of dynamic data, for large data processing provides a kind of dynamic date storage method, the method can adapt to the feature of traditional dense data, the storage demand of magnanimity sparse data can be taken into account again, solve the problem through subsystem magnanimity sparse data waste of storage space.

The applicating example of the heavy point-supported typical storage scenarios of the present invention:

Row stores: for obtaining better write performance, can be row storage format by data integrity, physical arrangements that reliability requirement is higher; As: through the subscriber's meter, bill table, order relations table etc. of subsystem;

Row store: for obtaining better statistical study performance, can data volume is large, that mode altering is frequent, missing values is more physical arrangements be row storage format; As: through the wap gateway table etc. of subsystem;

Ranks mixing stores:

An entity inside, can, for the data characteristics such as sparse, dense of different field, select the row be applicable to store or row storage format, thus the data realizing ranks mixing store; As: the dense data row such as the calling number of voice signaling table, imsi, start time " are stored " by row, the sparse data row such as Original Signaling Point, CGI code " are stored by row ".

Below in conjunction with the dynamic date storage method that the accompanying drawing detailed description embodiment of the present invention provides.

Figure 1 shows that and according to this figure, each step is described in detail in detail as follows by the dynamic date storage method schematic diagram that one embodiment of the invention provides:

Step 101, overall metadata definition is carried out to data to be stored, define the storage policy of each attribute column in described data to be stored;

Wherein, the storage policy of each attribute column in the described data to be stored of described definition, comprising:

Build the soft mode formula of tabular, and increase defines to the pattern of each attribute column described, with visual angle, key-value pair set is defined as the elastic container of data storage.

The storage policy of each attribute column in the described data to be stored of described definition, comprising:

If the data in described attribute column are dense data, then row is adopted to store;

If the data in described attribute column are sparse data, then row is adopted to store.

Step 102, each attribute column described is woven to different data subsets according to key-value pair model group;

Wherein, described each attribute column described is woven to different data subsets according to key-value pair model group, comprises:

Define the key territory in described key-value pair model and codomain respectively, and described key territory and codomain are separately preserved;

Described key territory comprises at least one sub-key, and the right pattern definition of described attribute is kept in described key territory;

Store in described codomain and data content corresponding to described key territory.

Sub-key in described key territory comprises RK, CG and CA, and the combination of described RK, CG and CA key forms inquiry major key, and a data cell is uniquely corresponding.

Step 103, be described data subset definition physical storage format according to the storage policy of described attribute column, and store described data subset by described physical storage format.

Wherein, the described storage policy according to described attribute column is described data subset definition physical storage format, comprising:

Described each attribute column is divided at least one subregion, for each subregion,

Store if described subregion is applicable to row, then by described partition map to an independent file, form a row group;

Store if described subregion is applicable to row, then at least one row in described subregion are stored in a file.

The data content stored in described codomain also comprises timestamp, the order of described data content according to timestamp is arranged.

Said method also comprises: the version number presetting the data content preserved, and preserves the version of predetermined number according to the timestamp of described data content; Or,

Preset the time period of the data content preserved, preserve the version in preset time period according to described timestamp.

In said method, the missing data in described attribute column is defined as disappearance and inapplicable and lack and be suitable for two types.

Use the disclosed dynamic date storage method of the embodiment of the present invention, a kind of dynamic date storage method is realized by layering, configurable storage organization, the storage demand of large Data processing " sparse data and dense data set " can be met simultaneously, solve at present through subsystem " storage scheme single " problem, for the mass data storage such as Internet of Things, internet provides one storage scheme flexibly, the data that can effectively support under " new business pattern " store and analyze;

Below by way of concrete application example, the method that the embodiment of the present invention provides is described in detail.

The present invention realizes a kind of dynamic date storage method by layering, configurable storage organization, can meet row, column mixing storage demand that is sparse and dense data set in large data handling procedure simultaneously.Figure 2 shows that the schematic diagram of the data store organisation level that the embodiment of the present invention provides.Three basic steps of the date storage method that the embodiment of the present invention provides are introduced according to this figure:

Step one: the soft mode formula definition of tabular, is equivalent to the definition of overall metadata; Logically be presented as a kind of soft mode formula of tabular, the storage policy of overall each attribute column of definition and data constraint etc.;

Step 2: the key assignments model based on Key-Value defines, is equivalent to the definition of key assignments data definition layer; Right set that all kinds of mixed data set is presented as (key, value), each (key, value), to a corresponding entity property value, is organized as different data subsets by key assignments model each attribute column;

Step 3: optional physical storage format definition, is equivalent to the physical storage layer definition of business datum; According to metadata and the definition of key assignments model, for the definition of each data subset is towards row or the physical storage format towards row.　

Below each hierarchy is described in detail:

Layering, configurable storage organization

The soft mode formula of tabular

Traditional " key-value model " have search rapidly, feature that extendability is high, and a kind of layered mode data of description structure is provided, but its " key-be worth to " only comprises property value, lack pattern definition, and the present invention tradition (key-value) on basis, build a kind of soft mode formula of tabular, incorporate pattern definition, with visual angle, " key-value pair set " is defined as the elastic container of data storage, support mode definition and expansion, comprise following main member, application example can see the following form 1:

Table (Table): the logical storage container of data, shows to be made up of independent row and one or more row group, and independent RowKey row store the keyword of tuple, and row group is defined as the set arranged, different row group non-overlapping copies each other;

Row group (ColumnGroup): The present invention gives two kinds of row group: ColumnGroup_R and ColumnGroup_C, for ColumnGroup_R, data store with the form towards row, and for ColumnGroup_C, data store with the form towards row;

Row (Column): row indistinguishably define entity attributes territory, except RowKey row, remaining row must belong to a certain specific row group;

Key domain file (KeyFile): the physical store container of key numeric field data, the different lines group of usual same table can share key domain file;

Codomain file (DataFile): the physical store container of codomain data, different data files can be mounted to for the feature of forefathers, row, as: a data file can be defined for ColumnGroup_R, corresponding with each attribute column for the one or more data file of ColumnGroup_C definable;

Location type (PosType): the location positioning method defining each attribute column, as: row is fixing, separator etc.;

Positional information (PosValue): according to the detail location of " location type " positioning properties row, as: location type is " separator ", definable the 5th territory is " name " row, or definition position type is " row is fixing ", definable the 30 to the 45 character is " address " row etc.;

Data type (DataType): the data type recording each attribute column.

Table 1: " Main Patterns information " legend of tabular soft mode formula definition

Structurized key assignments model

Key assignments model proposes a kind of structurized (key-value) model define method, key territory is divided into multiple data cell, and adds self-defining data structure to codomain.Figure 3 shows that the structurized key assignments model schematic of one that the embodiment of the present invention provides, introduce this model according to this figure below.

Key territory: the space-filling curve be made up of different sub-keys, comprises the data cells such as pattern information; Key territory is stored by independent physical file, and this file is equivalent to data directory file;

Codomain: the data content in storage key territory, " key-value pair " comprises the chained list of a multiple value composition of continuous print, and each data trnascription has a timestamp, to identify different versions of data; Codomain is stored by independent physical file, and this file is equivalent to data content file;

The present invention uses (key-value) to representing a basic property value, and an entity can comprise multiple attribute, and a data object is mapped in one (key-value) right set, and key territory consists of the following components:

RK: the unique identification of data object (tuple), RK can make (key-value) containing like-identified to pointing to same object;

LT: for locking (key-value) to support transaction semantics, be responsible for data by each application process and write; Be exclusive access by semaphore by tuple identity, release locking after affairs complete;

CG: attribute column array, comprises one or more attribute column;

The addressable group of CA:CG row, preserves the address of one or more attribute column, can be regarded as data cell pointer, points to the data cell of codomain file;

Illustrate: the data cell of concrete file, record, row correspondence is pointed in addressable address, mask content is " fn_idxn_coln ", and each masking unit is separated with " _ ", and each unit is defined as follows:

Fn identification data file, as: the corresponding DataFile2 of corresponding DataFile1, the f2 of f1 etc.;

Idxn is the line displacement amount of certain file, (row can corresponding many records, are distinguished with different timestamps, idxn hit be the maximum record of timestamp, that is: sort at the record of most top layer), as: f1_idx1 is the record that DataFile1 file line offsets after 1 row;

Coln is the n-th row for certain record, as: f1_idx1_col3 is the 3rd row of DataFile1 file the 1st line item, and Column Cata Format content is shown in that PosType defines.

The timestamp of Tn:n edition data;

The data content of Vn:n version.

In figure 3, the key domain model in left side is kept in key domain file, and the codomain model on right side is kept in codomain file; For certain literary name section of key domain model CG territory definition, CA points in territory the data cell of this field at codomain file.

The combination of RK, CG and CA key forms inquiry major key, uniquely can determine a data cell; Consider the high concurrency of data processing, data cell effectively must avoid read/write collision, the present invention assists to carry out transaction consistency control by LT key, multiple versions of the in store data object of each data cell simultaneously, by " timestamp " index between version, timestamp type is 64 integers, is accurate to millisecond, the data acquisition flat structure of different editions, the temporally arrangement of stamp inverted order.

Selectable physical storage format

For the physical storage structure that entity (key-value) is right, the present invention adopts the thinking of horizontal fragmentation to adapt to Data Update, is divided vertically into multiple subregion by each section, for a given subregion, row can be selected to store or row store:

Row stores: be applicable to dense data and store; By whole partition map to an independent file, thus form a row group, this row group can comprise one or more dense data rows;

Row store: be applicable to sparse data and store; According to the data characteristics of each attribute column, the some row in subregion or certain several row can be stored in a physical file, recommend attribute column and physical file to store one to one.

There is the situation of multi version property value for a data object, the present invention adopts and constantly piles up (key-value) right method hereof, data content is temporally stabbed inverted order arrangement; Simultaneously, for the administrative burden (comprising storage and index etc.) avoiding data multi version to cause, the invention provides two kinds of versions of data reclaim mechanisms: one is last n the version preserving data, two is preserve the version (such as nearest 3 days) in nearest a period of time, can carry out specific aim configuration for the data characteristics of each row group, row." multi version " data store organisation schematic diagram of the data object that the embodiment of the present invention provides is shown in Fig. 4.

Below layering, the application of configurable storage organization are described in detail.

Three kinds of examples have been described in detail for the application example of the present invention under the typical storage scenarios such as ranks mixing storage, row storage, row storage below; Application entity table is respectively Table_Test1, Table_Test2, Table_Test3, and three list structures are identical, but the data characteristics such as sparse, dense is different, therefore have chosen different storage meanss respectively.

Ranks mixing stores

Physical contents and ranks memory utilization

As shown in table 2 below, the data of Col1, Col2 row are full, belong to dense data row, row therefore can be adopted to store, be subordinated to row group ColumnGroup_R1; Col3, Col4, Col5 row missing values is more, belongs to sparse data row, therefore adopts row to store, is subordinated to row group ColumnGroup_C1.

Table 2

Table 3

Illustrate: key numeric field data is saved in KeyFile1 file; For codomain data, Col1, Col2 store by row, are all saved in DataFile1 file, Col3, Col4, Col5 store by row, for promoting statistic property, making each row separate, stored, being saved in respectively in DataFile2, DataFile3, DataFile4 file.

Table 4 key assignments model definition (key domain file-KeyFile1)

The value of upper table CA row is equivalent to the file index of five row " codomain content " such as col1 to col5, and CA train value description of contents is as follows:

A) data file that fn is corresponding concrete, as: the corresponding DataFile2 of corresponding DataFile1, the f2 of f1 etc.;

B) idxn is the line displacement amount of certain file, (row can corresponding many records, are distinguished with different timestamps, idxn hit be the maximum record of timestamp, that is: sort at the record of most top layer), as: f1_idx1 is the record that DataFile1 file line offsets after 1 row;

C) coln is the n-th row for certain record, as: f1_idx1_col3 is the 3rd row of DataFile1 file the 1st line item, and Column Cata Format content is shown in that PosType defines.

DataFile1 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，1，ＬｉＰｉｎｇ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，2，ＷａｎｇＬｉｎ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，3，ＺｈａｎｇＬｉ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile1) that table 5 stores by row

DataFile2 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，ＮＵＬＬ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＮＵＬＬ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，2010／2／1	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile2) that table 6 stores by row

DataFile3 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，ＴＲＵＥ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＮＵＬＬ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，ＶＡＣＡＮＴ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile3) that table 7 stores by row

DataFile4 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，201	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＮＵＬＬ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，　ＮＵＬＬ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile4) that table 8 stores by row

Figure 5 shows that the entirety that the ranks mixing that the embodiment of the present invention provides stores realizes logical schematic.

Row stores

Physical contents and ranks memory utilization

As shown in table 9 below, the data of Col1, Col2, Col3, Col4, Col5 row are full, belong to dense data row, row all can be adopted to store, be subordinated to row group ColumnGroup_R1.

Table 9

The soft mode formula definition of table 10 tabular

Illustrate: key numeric field data is saved in KeyFile1 file, and codomain data are all saved in DataFile1 file.

Table 11 key assignments model definition (key domain file-KeyFile1)

In table 11, the value of CA row is equivalent to the file index of five row " codomain content " such as col1 to col5, and CA train value description of contents is as follows:

The data file that fn is corresponding concrete, as: the corresponding DataFile1 of f1;

File layout definition (codomain file-DataFile1) that table 12 stores by row

Figure 6 shows that the entirety that the row that the embodiment of the present invention provides stores realizes logical schematic.　

Row store

Physical contents and ranks memory utilization

As shown in table 13 below, the missing values of Col1, Col2, Col3, Col4, Col5 row is more, all belongs to sparse data row, adopts row to store, is subordinated to row group ColumnGroup_C1.

Table 13

The soft mode formula definition of table 14 tabular

Illustrate: key numeric field data is saved in KeyFile1 file, and codomain data are saved in DataFile1 file.

Table 15 key assignments model definition (key domain file-KeyFile1)

DataFile1 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，1	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，2	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，3	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile1) that table 16 stores by row

DataFile2 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，ＬｉＰｉｎｇ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＶＡＣＡＮＴ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，ＺｈａｎｇＬｉ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile2) that table 17 stores by row

DataFile3 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，ＮＵＬＬ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＮＵＬＬ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，2010／2／1	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile3) that table 18 stores by row

DataFile4 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，ＴＲＵＥ	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＶＡＣＡＮＴ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，ＶＡＣＡＮＴ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile4) that table 19 stores by row

DataFile5 file content	Explanation
		Ｒｏｗ1，ｔｉｍｅｓｔａｍｐ1，201	This line displacement amount is idx1, and separator is ", "
Ｒｏｗ2，ｔｉｍｅｓｔａｍｐ2，ＮＵＬＬ	This line displacement amount is idx2, and separator is ", "
		Ｒｏｗ3，ｔｉｍｅｓｔａｍｐ3，ＮＵＬＬ	This line displacement amount is idx3, and separator is ", "

File layout definition (codomain file-DataFile5) that table 20 stores by row

Figure 7 shows that the entirety that the row that the embodiment of the present invention provides store realizes logical schematic.

For the processing mode of missing values

There is the problem of a large amount of missing values for the emerging service such as internet, Internet of Things data, the present invention also defines the classification of missing values pointedly, and introducing a kind of " quaternary logic " carrys out the operation of specification to missing values.

The type definition of missing values

In large data handling procedure, except solving the efficient storage of " mixed data set ", the process for sparse data missing values is also a key issue; Missing values refers to the property value of disappearance in data object (tuple), the disappearance of property value comprises multiple situation, the missing values of each type reacts a kind of different conjecture, the disappearance of the present invention's definition comprises applicable and inapplicable two kinds of situations, that is: lack and be suitable for and lack and inapplicable, two kinds of definition have fully demonstrated the form of expression of " null value ":

Lack and inapplicable (Unknown): value exists but cannot know; As: the reasons such as transmission abnormality or gateway backfill data failure cause " network element field " for empty (normal condition does not allow for sky), and now this numerical value is unavailable;

Lack and be suitable for (Nonexistent): value does not exist really; As: certain field is null value (allowing for sky), and now this numerical value can be used;

The present invention's NULL symbol represents the missing values of Unknown type, and VACANT symbol represents the missing values of Nonexistent type, and two symbols need meet following constraint:

NULL and VACANT is two kinds and represents symbol instead of two values, that is: NULL → Unknown, VACANT → Nonexistent;

The NULL of Unknown type can be revised as other values by update operation, but the VACANT of Nonexistent type is well-determined, should not be modified.

Quaternary logic

Missing values is defined as Unknown and Nonexistent two type by the present invention, while realization " symbolic formulation ", " the conditional expression true value " of SQL statement also needs synchronous definition, the present invention uses Maybe symbol table registration, and according to the result compared with NULL, (result is True or False, no matter NULL compares with other data, or compare with another NULL, capital produces Maybe), by Neglect symbol table registration according to the result compared with VACANT, therefore, any condition judgment statement has four kinds of possible true value: True, False, Maybe and Neglect, that is: this two classes missing values relatively produce a kind of quaternary logic (4VL), it is true that logical expression " basic " value has t(), f(is false) and the inapplicable or undefined undefined of i(), each true value is set { t, f, a nonvoid subset of i}.

1、True={t}；2、False={f}；3、Maybe={t，f}；4、Neglect={i}

Table 21 defines for the truth table of quaternary logic

Below introduce the dynamic data storage device of one that the embodiment of the present invention provides, in order to perform the dynamic date storage method provided in previous embodiment.

Figure 8 shows that the structural representation of the dynamic data storage device that one embodiment of the invention provides.The ingredient that this device is described in detail in detail according to this figure is as follows:

Definition unit 801, for carrying out overall metadata definition to data to be stored, defines the storage policy of each attribute column in described data to be stored;

The storage policy of each attribute column in the described data to be stored of described definition unit definition, comprising:

Organization unit 802, for being woven to different data subsets by each attribute column described according to key-value pair model group;

Each attribute column described is woven to different data subsets according to key-value pair model group by described organization unit, comprising:

Storage unit 803, for being described data subset definition physical storage format according to the storage policy of described attribute column, and stores described data subset by described physical storage format.

Described storage unit is described data subset definition physical storage format according to the storage policy of described attribute column, comprising:

In aforementioned dynamic data storage device, described definition unit also for, the missing data in described attribute column is defined as disappearance and inapplicable and lack and be suitable for two types.

Use the disclosed dynamic data storage device of the embodiment of the present invention, a kind of dynamic date storage method is realized by layering, configurable storage organization, the storage demand of large Data processing " sparse data and dense data set " can be met simultaneously, solve at present through subsystem " storage scheme single " problem, for the mass data storage such as Internet of Things, internet provides one storage scheme flexibly, the data that can effectively support under " new business pattern " store and analyze;

One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that programmed instruction is relevant, aforesaid program can be stored in read/write memory medium, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium comprises: ROM, RAM, magnetic disc, CD, network node, scheduler etc. various can be program code stored medium.

Last it is noted that these are only the preferred embodiments of the present invention, be not limited to the present invention, although with reference to previous embodiment to invention has been detailed description, for a person skilled in the art, it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a dynamic date storage method, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, the storage policy of each attribute column in the described data to be stored of described definition, comprising:

3. method according to claim 2, is characterized in that, described each attribute column described is woven to different data subsets according to key-value pair model group, comprising:

4. method according to claim 3, is characterized in that, the sub-key in described key territory comprises RK, CG and CA, and the combination of described RK, CG and CA key forms inquiry major key, and a data cell is uniquely corresponding.

5. according to the method in claim 1-4 described in any one, it is characterized in that, the storage policy of each attribute column in the described data to be stored of described definition, comprising:

6. method according to claim 5, is characterized in that, the described storage policy according to described attribute column is described data subset definition physical storage format, comprising:

7. method according to claim 3, is characterized in that, the data content stored in described codomain also comprises timestamp, the order of described data content according to timestamp is arranged.

8. method according to claim 7, is characterized in that, also comprises:

Preset the version number of the data content preserved, preserve the version of predetermined number according to the timestamp of described data content; Or,

9. according to the method in claim 1-8 described in any one, it is characterized in that, the missing data in described attribute column is defined as disappearance and inapplicable and lack and be suitable for two types.

10. a dynamic data storage device, is characterized in that, comprising:

11. devices according to claim 10, is characterized in that, the storage policy of each attribute column in the described data to be stored of described definition unit definition, comprising:

12. devices according to claim 11, is characterized in that, each attribute column described is woven to different data subsets according to key-value pair model group by described organization unit, comprising:

13. devices according to any one of claim 10-12, is characterized in that, the storage policy of each attribute column in the described data to be stored of described definition unit definition, comprising:

14. devices according to claim 13, is characterized in that, described storage unit is described data subset definition physical storage format according to the storage policy of described attribute column, comprising:

15. devices according to any one of claim 10-14, is characterized in that, described definition unit also for, the missing data in described attribute column is defined as disappearance and inapplicable and lack and be suitable for two types.