CN105956041A

CN105956041A - Data model processing method based on Spring Data for MongoDB cluster

Info

Publication number: CN105956041A
Application number: CN201610264378.1A
Authority: CN
Inventors: 王祥; 张海英; 胡冰
Original assignee: Jiangsu IoT Research and Development Center
Current assignee: Jiangsu IoT Research and Development Center
Priority date: 2016-04-26
Filing date: 2016-04-26
Publication date: 2016-09-21

Abstract

The invention discloses a data model processing method based on Spring Data for a MongoDB cluster, in particular to a data processing method of a distributed cluster MapReduce. A MongoDB document-oriented NoSQL (Structured Query Language) database is used as bottom data storage, the data of other documents is quoted through the virtual union query of a client side to solve the problem of the union query of unstructured data; data processing adopts a Map/Reduce computational frame provided by the Spring Data for a MongoDB, and a MapReduce task can carry out data concurrent processing on a plurality of nodes so as to improve system processing speed; and on a data storage aspect, the long-term operation of mass data storage service provides a data center of high availability and expansibility. The high-availability data cluster which supports horizontal and longitudinal expansion is designed by the MongoDB technology, a compound key of double fields is used for optimization, so that the operation of CRUD (Create, Retrieve, Update and Delete) can utilize locality to improve query efficiency and support cloud-level flexibility.

Description

Data model processing method based on Spring Data for MongoDB cluster

Technical field

The present invention relates to a kind of data model processing method based on Spring Data for MongoDB cluster, particularly to the data processing method of distributed type assemblies MapReduce.

Background technology

It is various that the data characteristics of big data age is mainly manifested in data form, quantity is big, destructuring, lack the problems such as unified storage management, data magnitude all presents quick growth trend, the most efficient and errorless storage, analyze, understand and utilize these large-scale datas, become a critical problem.When using relational data library storage, performance that it is not enough: after data store certain phase, carry out efficiency during SQL query (or carrying out multi-table join inquiry) in the database table of more than one hundred million the lowest；Being difficult to extending transversely, the demand of extensibility and high availability is difficult to meet；Relational data model is at present by most widely employed a kind of tissue model, but utilizes this pattern extremely difficult to store unstructured data and process hierarchical structure data, is not suitable for for storing above-mentioned data object.Single computing cluster processes can not meet demand.Integrating various heterogeneous resource, setting up a distributed computation storage model becomes the feasible program solving data resource storage and processing.

NoSQL is the generalized definition of non-relational data storage, possesses the performance advantage that relevant database is incomparable in big data access.Traditional relevant database, in terms of dealing with big data quantity analysis and high concurrency performance, exposes the problem being much difficult to overcome, the such as demand to data base's height concurrent reading and writing；Efficient storage and the demand of access to mass data；High expansivity and the demand of high availability to data base.

And the data structure that MongoDB is supported and the characteristic structure when storage and process have the health account data of above feature that can carry out cluster expansion just seem masterly.Spring Data for MongoDB is that Data Statistics Inquiry Through provides abundant class libraries with processing, and especially supports the mode result that MapReduce distribution calculates.MapReduce distributed computing framework is the software architecture that Google company proposes, and has used for reference the thought of functional expression programming, has carried out the Distributed Calculation of large-scale dataset efficiently.

Summary of the invention

It is an object of the invention to provide a kind of data model processing method based on Spring Data for MongoDB cluster, it is the inquiry of MapReduce burst and the processing method of a kind of MongoDB cluster.Solve storage unstructured data and process the problem that hierarchical structure data is extremely difficult.

For reaching this purpose, the present invention by the following technical solutions:

NoSQL data base initially with MongoDB document-type stores as bottom data, and MongoDB does not support the conjunctive query between document, and under default situations, the object and the main object that are cited are stored in same document.The data of other documents can be quoted by the inquiry of the virtual combination of client and solve this problem.

In order to reduce the time processed in service layer's data, simultaneously taking account of the distributed storage of data, during statistical query data, the present invention uses Map/Reduce Computational frame.MapReduce task can on multiple nodes concurrent processing data, thus improve system processing speed.

In order to the long-term operation of mass data storage service provides high availability and the data center of autgmentability, the present invention utilizes the auto plate separation and copy function that MongoDB technology provides, design the data cluster of the High Availabitity of a support level, Longitudinal Extension, can add and remove storage server dynamically, support the retractility of cloud rank.

MongoDB data are to carry out burst according to sheet key, uniform and inquiry the efficiency for data distribution of specifying of sheet key has important impact, therefore, the present invention uses the compound keys of both field to be optimized, both ensure that insertion data can be evenly distributed on three bursts, also the operation enabling CRUD utilizes locality to improve search efficiency, ensure that sufficiently fine granularity of division simultaneously, it is to avoid add cannot burst after new engine causes later.Simultaneously, it is ensured that the high availability of system.

Accompanying drawing explanation

Fig. 1 data statistics processing sequential chart.

Fig. 2 stores service cluster architecture design.

Detailed description of the invention

Below in conjunction with concrete drawings and Examples, the invention will be further described.

Below in conjunction with embodiment, the present invention is further described.

1.Spring Data for MongoDB infrastructure configures

The main purpose of Spring Data template, is also the purpose of every other Spring template, it is simply that resource distribution and abnormality processing simultaneously.Resource mentioned here is exactly data storage resource, as a rule can be by long-range TCP/IP connected reference.Example below illustrates the template how configuring MongoDB by XML mode:

<mongo:db-factory host="localhost" port="27017" dbname="test" />

<constructor-arg name="mongoDbFactory" ref= "mongoDbFactory"/>

Connecting factory firstly the need of definition, MongoTemplate can quote this and connect factory.In this example, Spring Data have employed the database-driven of relatively bottom, MongoDB Java driver.

In general, this kind of a set of abnormality processing strategy having oneself compared with the database-driven of bottom.The abnormality processing of Spring uses and does not checks exception (unchecked exception), and therefore developer can be oneself to decide whether to catch the exception.The template of MongoDB be achieved in that the exception of the bottom captured to be packaged into does not checks exception, and these are abnormal is all the subclass of DataAccessException in Spring.

Template provides operation based on data storage, such as preserves, updates, deletes single record or the method performing inquiry.But all these methods are only used for the storage of corresponding bottom data.

2. data Layer design

The NoSQL data base using MongoDB document-type stores as bottom data, and bottom stores with BSON form, with the form view of JSON.Collection Yu the JavaBean one_to_one corresponding of MongoDB, the relation between Collection with Collection is nested with JAVA object form.

MongoDB does not support the conjunctive query between document, and under default situations, the object and the main object that are cited are stored in same document.The data of other documents can also be quoted by the inquiry of the virtual combination of client and solve this problem.

The present invention uses Map/Reduce Computational frame.Calculating process is divided into two steps of Map, Reduce, and concrete calculating process is as follows: one MapReduce host process of system creation, substantial amounts of Map task is assigned to each data memory node of cluster by host process；Corresponding node data is processed by Map task by Map function, and preserves intermediate object program with the form of key-value pair；Map task generation intermediate data is collected by Reduce task, and returns final result.MapReduce task can on multiple nodes concurrent processing data, thus improve system processing speed.

The data that service layer processes obtain from the cluster of bottom, and the sequential chart of native system data statistics distribution is as shown in Figure 1.

3. storage system cluster design

One MongoDB cluster is made up of configuration service process, burst node, routing daemon.As in figure 2 it is shown, native system uses three bursts, each burst configures three platform independent server composition copy sets.Configuration service device be also adopted by same backup policy, it stores the configuration information of whole cluster, is most important part in whole cluster.Routing server owing to not storing data or configuration information, the simply information of cached configuration server, the most only need a station server.MongoDB data are to carry out burst according to sheet key, uniform and inquiry the efficiency for data distribution of specifying of sheet key has important impact, therefore, the present invention uses the compound keys of both field to be optimized, both ensure that insertion data can be evenly distributed on three bursts, also the operation enabling CRUD utilizes locality to improve search efficiency, ensure that sufficiently fine granularity of division simultaneously, it is to avoid add cannot burst after new engine causes later.Simultaneously, it is ensured that the high availability of system

When inserting new data, first storing according to the corresponding server of sheet key distribution, other two parts of copies then can be synchronized on other two-server.Three parts of copy only portions are in active state, referred to as host node, and it is responsible for processing client request, and records all operations performed thereon.Other two obtain these operations from node periodic polling host node, and then the data trnascription to oneself performs these operations, is ensured the concordance of main and subordinate node data by this mechanism.In order to reduce the load of host node, inquiry is placed on from node by the mode read with MongoDB extension.It addition, after host node rolls off the production line for some reason, new host node can be produced by election mechanism from node and continue to provide service.

Some terms that the present invention relates to are explained as follows:

Spring Data be one for simplifying database access, and support the Open Framework of cloud service.

MongoDB is a data base based on distributed document storage.Write by C++ language.It is intended that WEB application provides extendible high-performance data storage solution.

MongoOperations, MongoTemplate are two interfaces of MongoDB data base.

The Collection of the MongoDB i.e. set of MongoDB or table.

JavaBean is the Reusable Module that a kind of JAVA language is write as.

CRUD refer to do calculating process time increase (Create), re-fetch data (Retrieve), update (Update) and deletion (Delete) several word initial write a Chinese character in simplified form.

NoSQL data base, refers to non-relational database.

JSON (JavaScript Object Notation) is the data interchange format of a kind of lightweight.

BSON is the storage format of a kind of binary form of kind json, is called for short Binary JSON

REST refers to one group of framework constraints and principle.Meet these constraintss and the application program of principle or design is exactly RESTful.

One startup service of mongod:MongoDB.

Claims

1. a data model processing method based on Spring Data for MongoDB cluster, it is characterized in that, the NoSQL data base using MongoDB document-type stores as bottom data, is inquired about by the virtual combination of client and quotes the problem that the data of other documents solve unstructured data conjunctive query；Use Spring Data for MongoDB provide Map/Reduce Computational frame, MapReduce task can on multiple nodes concurrent processing data；Utilize auto plate separation and copy function that MongoDB technology provides, select the compound keys of both field to carry out cluster optimization.

Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterized in that, the NoSQL data base using MongoDB document-type stores as bottom data, the Collection of MongoDB i.e. gathers or table and JavaBean one_to_one corresponding, and the relation between Collection with Collection is nested with JAVA object form.

Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterised in that the MapReduce model that MongoOperations or MongoTemplate provides carries out calculating process on cluster；MongoOperations, MongoTemplate are two interfaces of MongoDB data base.

Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterised in that use the compound keys of both field to be optimized management in terms of cluster.