CN105956041A - Data model processing method based on Spring Data for MongoDB cluster - Google Patents

Data model processing method based on Spring Data for MongoDB cluster Download PDF

Info

Publication number
CN105956041A
CN105956041A CN201610264378.1A CN201610264378A CN105956041A CN 105956041 A CN105956041 A CN 105956041A CN 201610264378 A CN201610264378 A CN 201610264378A CN 105956041 A CN105956041 A CN 105956041A
Authority
CN
China
Prior art keywords
data
mongodb
cluster
spring
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610264378.1A
Other languages
Chinese (zh)
Inventor
王祥
张海英
胡冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu IoT Research and Development Center
Original Assignee
Jiangsu IoT Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu IoT Research and Development Center filed Critical Jiangsu IoT Research and Development Center
Priority to CN201610264378.1A priority Critical patent/CN105956041A/en
Publication of CN105956041A publication Critical patent/CN105956041A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data model processing method based on Spring Data for a MongoDB cluster, in particular to a data processing method of a distributed cluster MapReduce. A MongoDB document-oriented NoSQL (Structured Query Language) database is used as bottom data storage, the data of other documents is quoted through the virtual union query of a client side to solve the problem of the union query of unstructured data; data processing adopts a Map/Reduce computational frame provided by the Spring Data for a MongoDB, and a MapReduce task can carry out data concurrent processing on a plurality of nodes so as to improve system processing speed; and on a data storage aspect, the long-term operation of mass data storage service provides a data center of high availability and expansibility. The high-availability data cluster which supports horizontal and longitudinal expansion is designed by the MongoDB technology, a compound key of double fields is used for optimization, so that the operation of CRUD (Create, Retrieve, Update and Delete) can utilize locality to improve query efficiency and support cloud-level flexibility.

Description

Data model processing method based on Spring Data for MongoDB cluster
Technical field
The present invention relates to a kind of data model processing method based on Spring Data for MongoDB cluster, particularly to the data processing method of distributed type assemblies MapReduce.
Background technology
It is various that the data characteristics of big data age is mainly manifested in data form, quantity is big, destructuring, lack the problems such as unified storage management, data magnitude all presents quick growth trend, the most efficient and errorless storage, analyze, understand and utilize these large-scale datas, become a critical problem.When using relational data library storage, performance that it is not enough: after data store certain phase, carry out efficiency during SQL query (or carrying out multi-table join inquiry) in the database table of more than one hundred million the lowest;Being difficult to extending transversely, the demand of extensibility and high availability is difficult to meet;Relational data model is at present by most widely employed a kind of tissue model, but utilizes this pattern extremely difficult to store unstructured data and process hierarchical structure data, is not suitable for for storing above-mentioned data object.Single computing cluster processes can not meet demand.Integrating various heterogeneous resource, setting up a distributed computation storage model becomes the feasible program solving data resource storage and processing.
NoSQL is the generalized definition of non-relational data storage, possesses the performance advantage that relevant database is incomparable in big data access.Traditional relevant database, in terms of dealing with big data quantity analysis and high concurrency performance, exposes the problem being much difficult to overcome, the such as demand to data base's height concurrent reading and writing;Efficient storage and the demand of access to mass data;High expansivity and the demand of high availability to data base.
And the data structure that MongoDB is supported and the characteristic structure when storage and process have the health account data of above feature that can carry out cluster expansion just seem masterly.Spring Data for MongoDB is that Data Statistics Inquiry Through provides abundant class libraries with processing, and especially supports the mode result that MapReduce distribution calculates.MapReduce distributed computing framework is the software architecture that Google company proposes, and has used for reference the thought of functional expression programming, has carried out the Distributed Calculation of large-scale dataset efficiently.
Summary of the invention
It is an object of the invention to provide a kind of data model processing method based on Spring Data for MongoDB cluster, it is the inquiry of MapReduce burst and the processing method of a kind of MongoDB cluster.Solve storage unstructured data and process the problem that hierarchical structure data is extremely difficult.
For reaching this purpose, the present invention by the following technical solutions:
NoSQL data base initially with MongoDB document-type stores as bottom data, and MongoDB does not support the conjunctive query between document, and under default situations, the object and the main object that are cited are stored in same document.The data of other documents can be quoted by the inquiry of the virtual combination of client and solve this problem.
In order to reduce the time processed in service layer's data, simultaneously taking account of the distributed storage of data, during statistical query data, the present invention uses Map/Reduce Computational frame.MapReduce task can on multiple nodes concurrent processing data, thus improve system processing speed.
In order to the long-term operation of mass data storage service provides high availability and the data center of autgmentability, the present invention utilizes the auto plate separation and copy function that MongoDB technology provides, design the data cluster of the High Availabitity of a support level, Longitudinal Extension, can add and remove storage server dynamically, support the retractility of cloud rank.
MongoDB data are to carry out burst according to sheet key, uniform and inquiry the efficiency for data distribution of specifying of sheet key has important impact, therefore, the present invention uses the compound keys of both field to be optimized, both ensure that insertion data can be evenly distributed on three bursts, also the operation enabling CRUD utilizes locality to improve search efficiency, ensure that sufficiently fine granularity of division simultaneously, it is to avoid add cannot burst after new engine causes later.Simultaneously, it is ensured that the high availability of system.
Accompanying drawing explanation
Fig. 1 data statistics processing sequential chart.
Fig. 2 stores service cluster architecture design.
Detailed description of the invention
Below in conjunction with concrete drawings and Examples, the invention will be further described.
Below in conjunction with embodiment, the present invention is further described.
1.Spring Data for MongoDB infrastructure configures
The main purpose of Spring Data template, is also the purpose of every other Spring template, it is simply that resource distribution and abnormality processing simultaneously.Resource mentioned here is exactly data storage resource, as a rule can be by long-range TCP/IP connected reference.Example below illustrates the template how configuring MongoDB by XML mode:
<!-- Connection to MongoDB server -->
<mongo:db-factory host="localhost" port="27017" dbname="test" />
<!-- MongoDB Template -->
<bean id="mongoTemplate" class="org.springframework.data. mongodb.core.MongoTemplate">
<constructor-arg name="mongoDbFactory" ref= "mongoDbFactory"/>
Connecting factory firstly the need of definition, MongoTemplate can quote this and connect factory.In this example, Spring Data have employed the database-driven of relatively bottom, MongoDB Java driver.
In general, this kind of a set of abnormality processing strategy having oneself compared with the database-driven of bottom.The abnormality processing of Spring uses and does not checks exception (unchecked exception), and therefore developer can be oneself to decide whether to catch the exception.The template of MongoDB be achieved in that the exception of the bottom captured to be packaged into does not checks exception, and these are abnormal is all the subclass of DataAccessException in Spring.
Template provides operation based on data storage, such as preserves, updates, deletes single record or the method performing inquiry.But all these methods are only used for the storage of corresponding bottom data.
2. data Layer design
The NoSQL data base using MongoDB document-type stores as bottom data, and bottom stores with BSON form, with the form view of JSON.Collection Yu the JavaBean one_to_one corresponding of MongoDB, the relation between Collection with Collection is nested with JAVA object form.
MongoDB does not support the conjunctive query between document, and under default situations, the object and the main object that are cited are stored in same document.The data of other documents can also be quoted by the inquiry of the virtual combination of client and solve this problem.
The present invention uses Map/Reduce Computational frame.Calculating process is divided into two steps of Map, Reduce, and concrete calculating process is as follows: one MapReduce host process of system creation, substantial amounts of Map task is assigned to each data memory node of cluster by host process;Corresponding node data is processed by Map task by Map function, and preserves intermediate object program with the form of key-value pair;Map task generation intermediate data is collected by Reduce task, and returns final result.MapReduce task can on multiple nodes concurrent processing data, thus improve system processing speed.
The data that service layer processes obtain from the cluster of bottom, and the sequential chart of native system data statistics distribution is as shown in Figure 1.
3. storage system cluster design
One MongoDB cluster is made up of configuration service process, burst node, routing daemon.As in figure 2 it is shown, native system uses three bursts, each burst configures three platform independent server composition copy sets.Configuration service device be also adopted by same backup policy, it stores the configuration information of whole cluster, is most important part in whole cluster.Routing server owing to not storing data or configuration information, the simply information of cached configuration server, the most only need a station server.MongoDB data are to carry out burst according to sheet key, uniform and inquiry the efficiency for data distribution of specifying of sheet key has important impact, therefore, the present invention uses the compound keys of both field to be optimized, both ensure that insertion data can be evenly distributed on three bursts, also the operation enabling CRUD utilizes locality to improve search efficiency, ensure that sufficiently fine granularity of division simultaneously, it is to avoid add cannot burst after new engine causes later.Simultaneously, it is ensured that the high availability of system
When inserting new data, first storing according to the corresponding server of sheet key distribution, other two parts of copies then can be synchronized on other two-server.Three parts of copy only portions are in active state, referred to as host node, and it is responsible for processing client request, and records all operations performed thereon.Other two obtain these operations from node periodic polling host node, and then the data trnascription to oneself performs these operations, is ensured the concordance of main and subordinate node data by this mechanism.In order to reduce the load of host node, inquiry is placed on from node by the mode read with MongoDB extension.It addition, after host node rolls off the production line for some reason, new host node can be produced by election mechanism from node and continue to provide service.
Some terms that the present invention relates to are explained as follows:
Spring Data be one for simplifying database access, and support the Open Framework of cloud service.
MongoDB is a data base based on distributed document storage.Write by C++ language.It is intended that WEB application provides extendible high-performance data storage solution.
MongoOperations, MongoTemplate are two interfaces of MongoDB data base.
The Collection of the MongoDB i.e. set of MongoDB or table.
JavaBean is the Reusable Module that a kind of JAVA language is write as.
CRUD refer to do calculating process time increase (Create), re-fetch data (Retrieve), update (Update) and deletion (Delete) several word initial write a Chinese character in simplified form.
NoSQL data base, refers to non-relational database.
JSON (JavaScript Object Notation) is the data interchange format of a kind of lightweight.
BSON is the storage format of a kind of binary form of kind json, is called for short Binary JSON
REST refers to one group of framework constraints and principle.Meet these constraintss and the application program of principle or design is exactly RESTful.
One startup service of mongod:MongoDB.

Claims (4)

1. a data model processing method based on Spring Data for MongoDB cluster, it is characterized in that, the NoSQL data base using MongoDB document-type stores as bottom data, is inquired about by the virtual combination of client and quotes the problem that the data of other documents solve unstructured data conjunctive query;Use Spring Data for MongoDB provide Map/Reduce Computational frame, MapReduce task can on multiple nodes concurrent processing data;Utilize auto plate separation and copy function that MongoDB technology provides, select the compound keys of both field to carry out cluster optimization.
Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterized in that, the NoSQL data base using MongoDB document-type stores as bottom data, the Collection of MongoDB i.e. gathers or table and JavaBean one_to_one corresponding, and the relation between Collection with Collection is nested with JAVA object form.
Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterised in that the MapReduce model that MongoOperations or MongoTemplate provides carries out calculating process on cluster;MongoOperations, MongoTemplate are two interfaces of MongoDB data base.
Data model processing method based on Spring Data for MongoDB cluster the most according to claim 1, it is characterised in that use the compound keys of both field to be optimized management in terms of cluster.
CN201610264378.1A 2016-04-26 2016-04-26 Data model processing method based on Spring Data for MongoDB cluster Pending CN105956041A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610264378.1A CN105956041A (en) 2016-04-26 2016-04-26 Data model processing method based on Spring Data for MongoDB cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610264378.1A CN105956041A (en) 2016-04-26 2016-04-26 Data model processing method based on Spring Data for MongoDB cluster

Publications (1)

Publication Number Publication Date
CN105956041A true CN105956041A (en) 2016-09-21

Family

ID=56915382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610264378.1A Pending CN105956041A (en) 2016-04-26 2016-04-26 Data model processing method based on Spring Data for MongoDB cluster

Country Status (1)

Country Link
CN (1) CN105956041A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491544A (en) * 2017-08-25 2017-12-19 上海德拓信息技术股份有限公司 A kind of data processing platform (DPP) for strengthening non-relational database analysis ability
CN108829805A (en) * 2018-06-06 2018-11-16 福建南威软件有限公司 A kind of fragment storage method based on MongoDB
CN109344198A (en) * 2018-09-19 2019-02-15 国网浙江省电力有限公司嘉兴供电公司 Log system and sharding method based on MongoDB distributed type assemblies framework
CN114327261A (en) * 2021-12-06 2022-04-12 神州融安数字科技(北京)有限公司 Data file storage method and data security agent

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207919A (en) * 2013-04-26 2013-07-17 北京亿赞普网络技术有限公司 Method and device for quickly inquiring and calculating MangoDB cluster
CN103700010A (en) * 2013-12-30 2014-04-02 世纪禾光科技发展(北京)有限责任公司 Commodity trajectory system and correlation method
CN104731907A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 NOSQL-based data storage method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207919A (en) * 2013-04-26 2013-07-17 北京亿赞普网络技术有限公司 Method and device for quickly inquiring and calculating MangoDB cluster
CN103700010A (en) * 2013-12-30 2014-04-02 世纪禾光科技发展(北京)有限责任公司 Commodity trajectory system and correlation method
CN104731907A (en) * 2015-03-24 2015-06-24 浪潮集团有限公司 NOSQL-based data storage method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OSCHINA: "MongoDB 索引技巧 #3:太多字段要索引怎么办?使用通用索引", 《HTTPS://WWW.OSCHINA.NET/TRANSLATE/MONGODB-INDEXING-TIP-3-TOO-MANY-FIELDS-TO-INDEX-USE#COMMENTS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491544A (en) * 2017-08-25 2017-12-19 上海德拓信息技术股份有限公司 A kind of data processing platform (DPP) for strengthening non-relational database analysis ability
CN107491544B (en) * 2017-08-25 2020-12-29 上海德拓信息技术股份有限公司 Data processing platform for enhancing analysis capability of non-relational database
CN108829805A (en) * 2018-06-06 2018-11-16 福建南威软件有限公司 A kind of fragment storage method based on MongoDB
CN109344198A (en) * 2018-09-19 2019-02-15 国网浙江省电力有限公司嘉兴供电公司 Log system and sharding method based on MongoDB distributed type assemblies framework
CN114327261A (en) * 2021-12-06 2022-04-12 神州融安数字科技(北京)有限公司 Data file storage method and data security agent

Similar Documents

Publication Publication Date Title
US11704290B2 (en) Methods, devices and systems for maintaining consistency of metadata and data across data centers
US10073888B1 (en) Adjusting partitioning policies of a database system in view of storage reconfiguration
US11182356B2 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
Makris et al. A classification of NoSQL data stores based on key design characteristics
US9501550B2 (en) OLAP query processing method oriented to database and HADOOP hybrid platform
CN103106286B (en) Method and device for managing metadata
Gajendran A survey on nosql databases
US20140279986A1 (en) System and Method for Performing a Transaction in a Massively Parallel Processing Database
CN101930472A (en) Parallel query method for distributed database
CN111881223B (en) Data management method, device, system and storage medium
EP3714378A1 (en) Multi-region, multi-master replication of database tables
CN111984696B (en) Novel database and method
Tauro et al. A comparative analysis of different nosql databases on data model, query model and replication model
CN105069151A (en) HBase secondary index construction apparatus and method
CN105956041A (en) Data model processing method based on Spring Data for MongoDB cluster
US11003550B2 (en) Methods and systems of operating a database management system DBMS in a strong consistency mode
Matri et al. Týr: blob storage meets built-in transactions
US10970177B2 (en) Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
Pankowski Consistency and availability of Data in replicated NoSQL databases
Vilaça et al. On the expressiveness and trade-offs of large scale tuple stores
US11853298B2 (en) Data storage and data retrieval methods and devices
Saxena et al. NoSQL Databases-Analysis, Techniques, and Classification
US11995084B1 (en) Database system for querying time-series data stored in a tiered storage using a cloud platform
Cheng et al. BF-matrix: A secondary index for the cloud storage
CN117609246A (en) Data processing method and device for columnar storage of multiple bins

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160921

RJ01 Rejection of invention patent application after publication