CN104123300B

CN104123300B - Data distribution formula storage system and method

Info

Publication number: CN104123300B
Application number: CN201310150539.0A
Authority: CN
Inventors: 吴朱华; 潘志铭
Original assignee: SHANGHAI PEOPLEYUN INFORMATION TECHNOLOGY Co Ltd
Current assignee: SHANGHAI PEOPLEYUN INFORMATION TECHNOLOGY Co Ltd
Priority date: 2013-04-26
Filing date: 2013-04-26
Publication date: 2017-10-13
Anticipated expiration: 2033-04-26
Also published as: CN104123300A

Abstract

Present invention is disclosed a kind of data distribution formula storage system and method, the system includes node cluster module, data import modul, memory module；Node cluster module by the back end in cluster to connect corresponding management node；Data import modul is scanned according to the data block being sized to the data to input and is loaded into internal memory, and the data in internal memory are grouped according to the characteristic value of data, and the data after packet then are sent into corresponding data node；Data fragmentation to be retained in internal memory by memory module after back end receives file fragmentation, back end output journal to hard disk；Judge whether the size of data in internal memory exceedes set threshold values, data are reorganized as more than if, hard disk are write after compression, and delete the journal file of corresponding user memory data recovery.The present invention can realize cluster of the acceleration based on internal memory computing capability；The real-time loading and disposal ability to large-scale data, the response time of lifting system can be improved.

Description

Data distribution formula storage system and method

Technical field

The invention belongs to database storage techniques field, it is related to a kind of distributed memory system, more particularly to a kind of data Distributed memory system；Meanwhile, the invention further relates to a kind of data distribution formula storage method.

Background technology

At present, the data storage method of database has：1. unit data storage method；2. master-slave back-up storage mode；3. Utilize the storage mode of distributed file system.However, no matter using which kind of mode above, all there is certain deficiency.

Although unit data storage method is easy to manage and use, but scalability exist major defect be difficult to meet work as The access of modern mass data needs, and there is also problem for the security of data.Master-slave back-up storage mode solve only security and ask Topic, other problemses are still present.Utilize the database purchase mode of distributed file system, although solve the security of data With the access requirement of mass data, but data access and processing that those require low latency are not appropriate for.

In view of this, nowadays in the urgent need to designing a kind of new distributed memory system and method for database, with Just the drawbacks described above of existing storage system is solved.

The content of the invention

The technical problems to be solved by the invention are：A kind of distributed memory system for database is provided, be can be achieved Cluster and lifting based on rapid memory computing capability accelerate whole system to large-scale data real-time loading and disposal ability Response time.

In addition, the present invention also provides a kind of data distribution formula storage method, it can be achieved based on rapid memory computing capability Cluster and lifting accelerate the response time of whole system to large-scale data real-time loading and disposal ability.

In order to solve the above technical problems, the present invention is adopted the following technical scheme that：

A kind of data distribution formula storage system, the system includes：

Registering modules, the back end in cluster is registered into management node by client；

Data import modul, is scanned according to the data block being sized to the data to input and is loaded into internal memory, Data in internal memory are grouped according to the characteristic value of data, and the data after packet then are sent into corresponding back end； The data import modul specifically includes data scanning unit, packet rule match unit, data packet units, data hair Send unit；The data scanning unit is scanned according to the data block being sized with the data to input and is loaded into internal memory, And data are carried out with cutting according to data feature values and an integer numerical value is generated as the mark of data according to characteristic value Code；The packet rule match unit is used to the identification code according to the Data Identification code of different pieces of information according to rule of classification It is grouped；The data packet units by what is be scanned through in internal memory to be sized characteristic value of the data block according to data It is grouped；The data transmission unit sends the data after packet to corresponding back end；

Memory module, data fragmentation is retained in internal memory after back end receives file fragmentation, judgement is The data are backuped to other back end by no needs, if desired for then being backed up by backup module；Back end exports day Will is to hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set threshold values, such as exceed Then data are sorted out according to metadata feature, after the reorganization of data, then are compressed；To the group again of data The mode knitted is mainly the characteristic value according to data, and similarity between data is ranked up so that the number of maximum similarity It is that the compression storing data of next step is prepared according to can continuously deposit；After the reorganization of data, due to similar number According to that can store together, it is compressed using LZAM algorithms, to obtain higher compression ratio, hard disk is then write afterwards again, and delete Except the journal file of corresponding user memory data recovery；

Backup module, after in data transfer to corresponding back end, to backup number of the data according to setting Mesh is backed up, and the data of backup will be distributed on other back end；

Module is retrieved, corresponding data is retrieved after the request to receive data retrieval in management node；Retrieve mould Block specifically includes positioning unit, failure judging unit, request Dispatching Unit, retrieval unit, result combining unit；Management node is led to Cross the back end involved by positioning unit location data retrieval request；Management node uses Lease by the judging unit that fails Mechanism determines whether the back end fails, and request failure information is directly returned if failure, if effectively, management node is by asking Dispatching Unit distribution request is asked to arrive respective nodes；Back end is received after data retrieval request, by retrieval unit to respective counts According to returning results to client after being retrieved；Client is merged the result received using result combining unit.

A kind of data distribution formula storage system, the system includes：

Node cluster module, the back end in cluster is connected into corresponding management node；

Data import modul, is scanned according to the data block being sized to the data to input and is loaded into internal memory, Data in internal memory are grouped according to the characteristic value of data, and the data after packet then are sent into corresponding back end；

Memory module, data fragmentation is retained in internal memory after back end receives file fragmentation, data section Point output journal is to hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set valve Value, reorganizes data as more than if, hard disk is write after compression, and delete the daily record text of corresponding user memory data recovery Part.

As a preferred embodiment of the present invention, the data import modul specifically includes data cutting unit, file and swept Retouch unit, packet rule match unit, data packet units, data transmission unit；

The data cutting unit is scanned according to the data block being sized to the data to input and is loaded into interior Deposit；The packet rule match unit is used to set the feature that different rules calculate data according to different data types Value；The data packet units are the data block being sized being scanned through to be grouped according to the feature of data；Institute Data transmission unit is stated to send the data after packet to corresponding back end.

As a preferred embodiment of the present invention, the system also include backup module, to data transfer to accordingly Back end on after, the data are backed up according to the backup number of setting, the data of backup will be distributed to other numbers According on node.

As a preferred embodiment of the present invention, the system also includes retrieval module, to receive number in management node According to being retrieved after the request of retrieval to corresponding data；

The retrieval module specifically includes positioning unit, failure judging unit, request Dispatching Unit, retrieval unit, result Combining unit；

Management node passes through the back end involved by positioning unit location data retrieval request；Management node passes through failure Judging unit determines whether the back end fails using Lease mechanism, request failure information is directly returned to if failure, if having Effect, management node is by asking Dispatching Unit distribution request to arrive respective nodes；Back end is received after data retrieval request, is passed through Retrieval unit returns results to client after being retrieved to corresponding data；Client will be received using result combining unit As a result merge.

A kind of data distribution formula storage method, methods described comprises the following steps：

Node cluster step：Back end in cluster is connected into corresponding management node；

Data steps for importing：Data to input are scanned according to the data block being sized and are loaded into internal memory, internal memory In data be grouped according to the characteristic value of data, the data after packet are then sent to corresponding back end；

Storing step：Data fragmentation is retained in internal memory after back end receives file fragmentation, back end is defeated Go out daily record to hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set threshold values, such as More than then data are reorganized, hard disk is write after compression, and delete the journal file of corresponding user memory data recovery.

As a preferred embodiment of the present invention, the data steps for importing includes：

Data scanning step, the data to input are scanned according to the data block being sized and are loaded into internal memory；

Packet rule match step, the feature that different rules calculate data is set according to different data types Value；

Packet step, the data block being sized being scanned through is grouped according to the feature of data；

Data sending step, the data after packet are sent to corresponding back end.

As a preferred embodiment of the present invention, methods described also includes backup-step：In data transfer to corresponding number After on node, the data are backed up according to the backup number of setting, the data of backup will be distributed to other data sections Point on.

As a preferred embodiment of the present invention, methods described also includes searching step, and data inspection is received in management node Corresponding data is retrieved after the request of rope；

The searching step is specifically included：

Back end involved by management node location data retrieval request；

Management node determines whether the back end fails using Lease mechanism, and request failure is directly returned if failure Information, if effectively, respective nodes are arrived in management node distribution request；

Back end is received after data retrieval request, and client is returned results to after being retrieved to corresponding data；

Client merges the result received.

The beneficial effects of the present invention are：Data distribution formula storage system and method proposed by the present invention, it is possible to achieve base The cluster calculated in internal memory；The real-time transaction management to large-scale data, the response time of lifting system can be achieved.At each On back end, internal storage data is all backed up on disk, it is ensured that the safety of unit data；Simultaneity factor is set using redundant Meter, each number according to all there is redundancy backup on different nodes, and the machine of delaying of any node does not influence data complete and system is available Property.

Brief description of the drawings

Fig. 1 is the composition schematic diagram of data distribution formula storage system of the present invention.

Fig. 2 is the flow chart of importing data in data distribution formula storage method of the present invention.

Fig. 3 is the composition schematic diagram of the data import modul of present system.

Fig. 4 is the flow chart of data storage in data distribution formula storage method of the present invention.

Fig. 5 is the flow chart of data retrieval in data distribution formula storage method of the present invention.

Embodiment

The preferred embodiment that the invention will now be described in detail with reference to the accompanying drawings.

Embodiment one

Referring to Fig. 1, present invention is disclosed a kind of data distribution formula storage system, the system includes：Registering modules 1 （It is referred to as " node cluster module "）, data import modul 2, memory module 3, backup module, retrieval module 4.

Registering modules 1 by the back end in cluster by client to be registered to management node；

Data import modul 2 is scanned according to the data block being sized to the data to input and is loaded into internal memory, Data in internal memory are grouped according to the characteristic value of data, and the data after packet then are sent into corresponding back end.

Specifically, referring to Fig. 3, in the present embodiment, the data import modul specifically includes data cutting unit, file Scanning element, packet rule match unit, data packet units, data transmission unit.

The data cutting unit is scanned according to the data block being sized to the data to input and is loaded into interior Deposit；The packet rule match unit is used to set the feature that different rules calculate data according to different data types Value；The data packet units are the data block that is sized being scanned through in internal memory to be divided according to the characteristic value of data Group；The data transmission unit sends the data after packet to corresponding back end.

Data fragmentation to be retained in internal memory by memory module 3 after back end receives file fragmentation, and judgement is The data are backuped to other back end by no needs, if desired for then being backed up by backup module.Backup module is used to Data transfer is backed up to after on corresponding back end to the data according to the backup number of setting, and the data of backup will It is distributed on other back end.Back end output journal is to hard disk, for datarams data recovery；Judge in internal memory Size of data whether exceed set threshold values, such as exceed if data are reorganized, then be compressed；To data again The mode of tissue is mainly the characteristic value according to data, and similarity between data is ranked up so that maximum similarity Data can be deposited continuously, be that the compression storing data of next step is prepared；After the reorganization of data, due to similar Data can be stored together, and it is compressed using LZAM algorithms, to obtain higher compression ratio, then write hard disk afterwards again, and Delete the journal file of corresponding user memory data recovery.

Retrieval module 4 after management node receives the request of data retrieval to corresponding data to retrieve.Retrieve mould Block specifically includes positioning unit, failure judging unit, request Dispatching Unit, retrieval unit, result combining unit.

Specifically, management node passes through the back end involved by positioning unit location data retrieval request；Management node Determine whether the back end fails using Lease mechanism by the judging unit that fails, request failure is directly returned if failure Information, if effectively, management node is by asking Dispatching Unit distribution request to arrive respective nodes；Back end receives data retrieval please After asking, client is returned results to after being retrieved by retrieval unit to corresponding data；Client utilizes result combining unit The result received is merged.

The composition of data distribution formula storage system of the present invention is described above, it is of the invention while said system is disclosed, Also disclose a kind of data distribution formula storage method；Fig. 2, Fig. 4 are referred to, methods described comprises the following steps：

【Step S1】Node cluster step（That is registration step）：By the corresponding management section of back end connection in cluster Point, can complete connection by way of registration, and such as client sends log-on message, the back end in cluster is registered into pipe Manage on node.

【Step S2】Data steps for importing：Data to input are scanned according to the data block being sized and are loaded into interior Deposit, the data in internal memory are grouped according to the characteristic value of data, the data after packet are then sent to corresponding data section Point.With reference to Fig. 3, the data steps for importing is specifically included：

Step S21, data scanning step, the data to input are scanned according to the data block being sized and are loaded into interior Deposit；

Step S22, packet rule match step, set different rules according to different data types and calculate data Characteristic value；

Step S23, packet step, the data block being sized being scanned through is divided according to the feature of data Group；

Step S24, data sending step, the data after packet are sent to corresponding back end.

【Step S3】Storing step：As shown in figure 4, data fragmentation is retained in after back end receives file fragmentation In internal memory, judge whether to need the data backuping to other back end, if desired for then being backed up.

After backup-step is included in data transfer to corresponding back end, to backup number of the data according to setting Backed up, the data of backup will be distributed on other back end.Back end output journal is to hard disk, in data Deposit data recovers.

Judge whether the size of data in internal memory exceedes set threshold values, reorganize data as more than if, then enter Row compression；It is mainly the characteristic value according to data to the mode of the reorganizations of data, and the similarity between data is arranged Sequence so that the data of maximum similarity can be deposited continuously, is that the compression storing data of next step is prepared；By data again After tissue, because similar data can be stored together, it is compressed using LZAM algorithms, to obtain higher compression ratio, Then write hard disk afterwards again, and delete the journal file of corresponding user memory data recovery.

【Step S4】Searching step, is retrieved after management node receives the request of data retrieval to corresponding data.Please Refering to Fig. 5, the searching step is specifically included：

The request of data retrieval is sent on the node of data management by step S40, client；

Back end involved by step S41, management node location data retrieval request；

Step S42, management node determine whether the back end fails using Lease mechanism, are directly returned if failure Failure information is asked, if effectively, respective nodes are arrived in management node distribution request；

Step S43, back end are received after data retrieval request, and client is returned results to after being retrieved to corresponding data End；

Step S44, client merge the result received.

In summary, data distribution formula storage system and method proposed by the present invention, it is possible to achieve calculated based on internal memory Cluster；The real-time transaction management to large-scale data, the response time of lifting system can be achieved.On each back end, Internal storage data is all backed up on disk, it is ensured that the safety of unit data；Simultaneity factor is designed using redundant, each number According to all there is redundancy backup on different nodes, the machine of delaying of any node does not influence data complete and system availability.

Here description of the invention and application be illustrative, be not wishing to limit the scope of the invention to above-described embodiment In.The deformation and change of embodiments disclosed herein are possible, real for those skilled in the art The replacement and equivalent various parts for applying example are known.It should be appreciated by the person skilled in the art that not departing from the present invention Spirit or essential characteristics in the case of, the present invention can in other forms, structure, arrangement, ratio, and with other components, Material and part are realized.In the case where not departing from scope and spirit of the present invention, embodiments disclosed herein can be entered The other deformations of row and change.

Claims

1. a kind of data distribution formula storage system, it is characterised in that the system includes：

Data import modul, is scanned according to the data block being sized to the data to input and is loaded into internal memory, internal memory In data be grouped according to the characteristic value of data, the data after packet are then sent to corresponding back end；It is described Data import modul specifically includes data cutting unit, data scanning unit, packet rule match unit, packet list Member, data transmission unit；The data cutting unit is scanned to the data to input according to the data block being sized And it is loaded into internal memory；The packet rule match unit is used to calculate data according to different data type setting Different Rules Characteristic value；The data packet units by what is be scanned through in internal memory to be sized characteristic value of the data block according to data It is grouped；The data transmission unit sends the data after packet to corresponding back end；

Memory module, data fragmentation is retained in internal memory after back end receives file fragmentation, judges whether to need The data are backuped into other back end, if desired for then being backed up by backup module；Back end output journal is extremely Hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set threshold values, will if exceeding Data are sorted out according to metadata feature, after the reorganization of data, then are compressed；To the reorganizations of data Mode is the characteristic value according to data, and the similarity between data is ranked up so that the data of maximum similarity can be continuous Storage, is that the compression storing data of next step is prepared；After the reorganization of data, because similar data can be deposited Together, it is compressed using LZAM algorithms, to obtain higher compression ratio, then writes hard disk afterwards again, and delete corresponding The journal file of user memory data recovery；

The data after in data transfer to corresponding back end, are entered by backup module according to the backup number of setting Row backup, the data of backup will be distributed on other back end；

Module is retrieved, corresponding data is retrieved after the request to receive data retrieval in management node；Retrieve module tool Body includes positioning unit, failure judging unit, request Dispatching Unit, retrieval unit, result combining unit；It is fixed that management node passes through Back end involved by bit location location data retrieval request；Management node is by the judging unit that fails using Lease mechanism Determine whether the back end fails, request failure information is directly returned if failure, if effectively, management node passes through request point Respective nodes are arrived in bill member distribution request；Back end is received after data retrieval request, and corresponding data is entered by retrieval unit Client is returned results to after row retrieval；Client is merged the result received using result combining unit.

2. a kind of data distribution formula storage system, it is characterised in that the system includes：

Data import modul, is scanned according to the data block being sized to the data to input and is loaded into internal memory, internal memory In data be grouped according to the characteristic value of data, the data after packet are then sent to corresponding back end；

Memory module, data fragmentation is retained in internal memory after back end receives data fragmentation, back end is defeated Go out daily record to hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set threshold values, such as More than then data are reorganized, hard disk is write after compression, and delete the journal file of corresponding user memory data recovery；

The system also includes retrieval module, and corresponding data is examined after the request to receive data retrieval in management node Rope；

The retrieval module specifically includes positioning unit, failure judging unit, request Dispatching Unit, retrieval unit, result and merged Unit；

Management node passes through the back end involved by positioning unit location data retrieval request；Management node is judged by failing Unit determines whether the back end fails using Lease mechanism, and request failure information is directly returned if failure, if effectively, Management node is by asking Dispatching Unit distribution request to arrive respective nodes；Back end is received after data retrieval request, passes through inspection Cable elements return results to client after being retrieved to corresponding data；Client is using result combining unit by the knot received Fruit merges.

3. data distribution formula storage system according to claim 2, it is characterised in that：

The data import modul specifically includes data cutting unit, document scanning unit, packet rule match unit, number According to grouped element, data transmission unit；

The data cutting unit is scanned according to the data block being sized to the data to input and is loaded into internal memory；Institute State the characteristic value that packet rule match unit is used to calculate data according to different data type setting Different Rules；It is described Data packet units are the data block being sized being scanned through to be grouped according to the feature of data；The data hair Unit is sent to send the data after packet to corresponding back end.

4. data distribution formula storage system according to claim 2, it is characterised in that：

The system also includes backup module, after in data transfer to corresponding back end, to the data according to setting Fixed backup number is backed up, and the data of backup will be distributed on other back end.

5. a kind of data distribution formula storage method, it is characterised in that methods described comprises the following steps：

Data steps for importing：Data to input are scanned according to the data block being sized and are loaded into internal memory, in internal memory Data are grouped according to the characteristic value of data, and the data after packet then are sent into corresponding back end；

Storing step：Data fragmentation is retained in internal memory after back end receives file fragmentation, back end output day Will is to hard disk, for datarams data recovery；Judge whether the size of data in internal memory exceedes set threshold values, such as exceed Then data are reorganized, hard disk are write after compression, and delete the journal file of corresponding user memory data recovery；

Methods described also includes searching step, and corresponding data is retrieved after management node receives the request of data retrieval；

The searching step is specifically included：

Back end involved by management node location data retrieval request；

Management node determines whether the back end fails using Lease mechanism, and request failure information is directly returned if failure, If effectively, respective nodes are arrived in management node distribution request；

Client merges the result received.

6. data distribution formula storage method according to claim 5, it is characterised in that：

The data steps for importing includes：

Packet rule match step, the characteristic value that different rules calculate data is set according to different data types；

Data sending step, the data after packet are sent to corresponding back end.

7. data distribution formula storage method according to claim 5, it is characterised in that：

Methods described also includes backup-step：After in data transfer to corresponding back end, to the data according to setting Backup number is backed up, and the data of backup will be distributed on other back end.