CN103678603B - Multi-source heterogeneous data efficient converging and storing frame system - Google Patents

Multi-source heterogeneous data efficient converging and storing frame system Download PDF

Info

Publication number
CN103678603B
CN103678603B CN201310687009.XA CN201310687009A CN103678603B CN 103678603 B CN103678603 B CN 103678603B CN 201310687009 A CN201310687009 A CN 201310687009A CN 103678603 B CN103678603 B CN 103678603B
Authority
CN
China
Prior art keywords
data
module
source heterogeneous
subsystem
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310687009.XA
Other languages
Chinese (zh)
Other versions
CN103678603A (en
Inventor
葛浩栋
陈曙东
刘文娣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai state core Internet of things Technology Co., Ltd.
Original Assignee
Jiangsu IoT Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu IoT Research and Development Center filed Critical Jiangsu IoT Research and Development Center
Priority to CN201310687009.XA priority Critical patent/CN103678603B/en
Publication of CN103678603A publication Critical patent/CN103678603A/en
Application granted granted Critical
Publication of CN103678603B publication Critical patent/CN103678603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-source heterogeneous data efficient converging and storing frame system which comprises a multi-source heterogeneous data cooperative management subsystem, a multi-source heterogeneous data high-speed cache subsystem and a multi-source heterogeneous data converging and storing subsystem. The multi-source heterogeneous data cooperative management subsystem comprises a data management module, a resource monitoring and managing module and a retrieval index module, and is used for controlling and coordinating the whole storing frame system. The multi-source heterogeneous data high-speed cache subsystem comprises a non-structural file cache module, a memory database module and a delay write-in module and is used for efficiently and fast reading heterogeneous data. The multi-source heterogeneous data converging and storing subsystem comprises a non-structural data processing module, a structural data processing module and a distributed file system and is used for efficiently converging and storing the heterogeneous data. The multi-source heterogeneous data efficient converging and storing frame system can effectively solve the problems that storage efficiency is low and data converging management lacking under a mass data environment of the Internet of Things.

Description

Multi-source heterogeneous data efficient converges access architecture system
Technical field
The present invention relates to a kind of system architecture is and in particular to a kind of multi-source heterogeneous data efficient of Internet of Things converges access Basic framework.Belong to the technical field of Internet of Things big data storage.
Background technology
Now with the high speed development of technology of Internet of things, various terminals, the value volume and range of product of basic collecting device constantly increase Plus, all mass data be can produce all the time, these data class are various, structural data and unstructured data are divided into.Pass The distributed file storage system of system, under Internet of Things mass data environment, storage efficiency is low, and management is converged in data deficiency.Urgently Need a kind of basic framework of the multi-source heterogeneous data storage of new Internet of Things, realize efficiently quickly converging of magnanimity isomeric data With access.
Content of the invention
Present invention aims to the efficient quick storage demand of current multi-source isomeric data storage, provide one kind many Source isomeric data efficiently converges access architecture system.The technical solution used in the present invention is:
A kind of multi-source heterogeneous data efficient converges access architecture system, comprising:
Multi-source heterogeneous data cooperative management subsystem, multi-source heterogeneous data high-speed cache subsystem, multi-source heterogeneous data are converged Poly- storage subsystem;
Multi-source heterogeneous data cooperative management subsystem includes three modules: data management module, monitoring resource and management mould Block, search index module;
Multi-source heterogeneous data high-speed cache subsystem includes three modules: unstructured document cache module, internal storage data Library module, delay writing module;
Multi-source heterogeneous convergence storage subsystem includes unstructured data processing module, structural data processes mould Block, distributed file system;Wherein unstructured data processing module includes file declustering submodule, file combines submodule, File verification submodule;Structural data processing module includes: file generated submodule, file management submodule;
Multi-source heterogeneous data cooperative management subsystem is used for controlling, coordinates whole access architecture system;Data pipe therein Reason module is responsible for multi-source heterogeneous data upload, data download, data modification and the api of application layer is supported;Monitoring resource with Management module is responsible for monitoring the resource of multi-source heterogeneous data high-speed cache subsystem and multi-source heterogeneous convergence storage subsystem Service condition, carries out pre- when the physical cache resource in this two subsystems or physical memory resources abnormal or in short supply Alert;Search index module is used for providing multi-source heterogeneous data high-speed cache subsystem and multi-source heterogeneous convergence storage subsystem Interior data access index;
Multi-source heterogeneous data high-speed cache subsystem is used for providing efficiently quickly reading of isomeric data;Non-structural therein Using cache and recently least algorithms most in use accelerates the reading to unstructured data for the application layer to change file cache module Process;Internal storage data library module makes structural data be operated in internal memory using cache;Postpone writing module In write distributed file system after file modified in cache is postponed according to the rule setting;
Multi-source heterogeneous convergence storage subsystem is used for realizing the efficient convergence storage of isomeric data;Non-structural therein Change data processing module by file declustering submodule, jumbo single unstructured document to be split, and be stored in point In cloth file system;By file combine submodule and file verification submodule come to distributed file system in split after Data block is combined;Structural data processing module passes through file generated submodule and file management submodule, comes to structure The tables of data changed carries out xml document conversion according to the rule setting, and the xml document after conversion is stored in distributed file system In.
Further, when described access architecture system carries out multi-source heterogeneous data efficient convergence access:
Multi-source heterogeneous data is entered by the data management module of multi-source heterogeneous data cooperative management subsystem from application layer After system, according to different data structure features, that is, according to unstructured data, structural data respectively by multi-source heterogeneous data Converge the unstructured data processing module in subsystem and structural data processing module reads, and carry out at corresponding data After reason, send in distributed file system;
When application layer needs data, send instructions to data management module, this module calls search index module to carry out root According to document identification number or keyword travel through unstructured document cache module in multi-source heterogeneous data high-speed cache subsystem and Internal storage data library module, after finding not find desired data, data management module sends instructions to distributed file system, Therefrom search out required source data, after data combination or xml document conversion, be transmitted separately to multi-source heterogeneous data high In unstructured document cache module in fast cache subsystem or internal storage data library module, passed by data management module afterwards Transport to application layer;
When application layer needs this data again, data management module is from unstructured document cache module and internal storage data Directly transfer to application layer in library module;
When application layer needs to be rewritten to fetching data, by data management module to unstructured document cache module Or the corresponding data in memory database is modified;If the rewriting of structural data, data rewriting process is passed through day Will mode leaves in delay writing module;If the rewriting of unstructured data, data rewriting process is passed through literary composition temporarily The mode of part leaves in delay writing module;Application layer repeatedly can be write to desired data by postponing writing module Update, by unstructured document cache module or memory database, the write of this data and reading speed can be accelerated;Warp After spending a time cycle, postpone writing module and send into amended data in distributed file system, carry out final number According to renewal.
Advantages of the present invention: the invention provides a kind of scientific and reasonable access architecture it is achieved that isomeric data efficient Converge storage.Efficiently solve that storage efficiency under Internet of Things mass data environment is low, the problem of management is converged in data deficiency.
Brief description
Fig. 1 is the structured flowchart of the present invention.
Specific embodiment
With reference to concrete drawings and Examples, the invention will be further described.
As shown in figure 1, multi-source heterogeneous data efficient converges access architecture system framework figure includes three subsystems: multi-source is different Structure data cooperative management subsystem, multi-source heterogeneous data high-speed cache subsystem, multi-source heterogeneous convergence storage subsystem.Many Source isomeric data coordinated management subsystem includes three modules: data management module, monitoring resource and management module, search index Module.Multi-source heterogeneous data high-speed cache subsystem includes three modules: unstructured document cache module, memory database mould Block, delay writing module.Multi-source heterogeneous convergence storage subsystem includes unstructured data processing module, structural data Processing module, distributed file system, wherein unstructured data processing module include file declustering submodule, file group zygote Module, file verification submodule;Structural data processing module includes: file generated submodule, file management submodule.
The function of multi-source heterogeneous data cooperative management subsystem is to control, coordinate whole access architecture system.Number therein It is responsible for multi-source heterogeneous data upload, data download, data modification and the api to application layer according to the major function of management module (application programming interface, application programming interface) is supported, is a top control module.Data Upload function uploads to the data that application layer is submitted in multi-source heterogeneous convergence storage subsystem, and according to data spy's structure Levy after being split or changing, enter in distributed file system.The data that application-level request is downloaded is returned by data download function Return application layer.The major function of monitoring resource and management module is responsible for monitoring multi-source heterogeneous data high-speed cache subsystem and many Source isomeric data converges the resource service condition of storage subsystem, when the physical cache resource in this two subsystems or physics are deposited Storage resource (as hard-disk capacity) carries out early warning when abnormal or in short supply.Search index module major function is to provide multi-source different Structure data high-speed cache subsystem is indexed with the data access in multi-source heterogeneous convergence storage subsystem, to facilitate application layer Carry out data manipulation.
The major function of multi-source heterogeneous data high-speed cache subsystem is to provide efficiently quickly reading of isomeric data.Wherein Unstructured document cache module mainly utilize cache (physical memory) and recently least algorithms most in use (lfu) come to accelerate should With the reading process to unstructured data for the layer;Internal storage data library module mainly utilizes cache (physical memory) to make to tie Structure data is operated in internal memory.Postpone writing module primarily to solve data multi-tenant write after data with many Source isomeric data converges the stationary problem of storage subsystem, and file modified in cache is prolonged according to the rule setting Lag in write distributed file system.
The major function of multi-source heterogeneous convergence storage subsystem is to realize the efficient convergence storage of isomeric data.Wherein Unstructured data processing module is split to jumbo single unstructured document by file declustering submodule, and It is stored in distributed file system;Combine submodule by file to tear open in distributed file system with file verification submodule Data block after point is combined;Because the data volume of most of unstructured documents is very big, it is unfavorable for that efficient storage accesses, leads to Cross fractionation and the operation combined, to realize the efficient access of the unstructured data to arbitrary size.In addition at structural data Reason module passes through file generated submodule and file management submodule, comes to structurized tables of data according to the rule (ratio setting As the time period) carry out xml document conversion, and the xml document after conversion is stored in distributed file system, finally realize isomery The efficient convergence storage of data.
It is as follows that the multi-source heterogeneous data efficient of the system converges access procedure: multi-source heterogeneous data passes through multi-source from application layer After the data management module first time entrance system of isomeric data coordinated management subsystem, according to different data structure features, Processed by the unstructured data in multi-source heterogeneous convergence subsystem respectively according to unstructured data, structural data Module and structural data processing module read, and after carrying out corresponding data processing, send in distributed file system.Distribution Formula file system can be disposed from popular at present ripe swift.
When application layer needs data, send instructions to data management module, this module calls search index module to carry out root According to document identification number or keyword travel through unstructured document cache module in multi-source heterogeneous data high-speed cache subsystem and Internal storage data library module, after finding not find desired data, data management module sends instructions to distributed file system, Therefrom search out required source data, after data combination or xml document conversion, be transmitted separately to multi-source heterogeneous data high In unstructured document cache module in fast cache subsystem or internal storage data library module, passed by data management module afterwards Transport to application layer.
When application layer needs this data again, data management module by from the unstructured document cache module of high speed and Directly transfer to application layer in internal storage data library module.
When application layer needs to be rewritten to fetching data, by data management module to unstructured document cache module Or the corresponding data in memory database is modified;If the rewriting of structural data, data rewriting process is passed through day Will mode leaves in delay writing module;If the rewriting of unstructured data, data rewriting process is passed through literary composition temporarily The mode of part leaves in delay writing module;Application layer repeatedly can be write to desired data by postponing writing module Update, by unstructured document cache module or memory database, the write of this data and reading speed can be accelerated;Warp After spending a time cycle, postpone writing module and send into amended data in distributed file system, carry out final number According to renewal.

Claims (1)

1. a kind of multi-source heterogeneous data efficient converges access architecture system it is characterised in that including:
Multi-source heterogeneous data cooperative management subsystem, multi-source heterogeneous data high-speed cache subsystem, multi-source heterogeneous convergence are deposited Storage subsystem;
Multi-source heterogeneous data cooperative management subsystem includes three modules: data management module, monitoring resource and management module, inspection Rope index module;
Multi-source heterogeneous data high-speed cache subsystem includes three modules: unstructured document cache module, memory database mould Block, delay writing module;
Multi-source heterogeneous convergence storage subsystem includes unstructured data processing module, structural data processing module, divides Cloth file system;Wherein unstructured data processing module includes file declustering submodule, file combines submodule, file is tested Card submodule;Structural data processing module includes: file generated submodule, file management submodule;
Multi-source heterogeneous data cooperative management subsystem is used for controlling, coordinates whole access architecture system;Data management mould therein Block is responsible for multi-source heterogeneous data upload, data download, data modification and the api of application layer is supported;Monitoring resource and management Module is responsible for monitoring multi-source heterogeneous data high-speed cache subsystem and the resource of multi-source heterogeneous convergence storage subsystem uses Situation, when the physical cache resource in this two subsystems or physical memory resources carry out early warning when abnormal or in short supply; Search index module is used for providing in multi-source heterogeneous data high-speed cache subsystem and multi-source heterogeneous convergence storage subsystem Data access index;
Multi-source heterogeneous data high-speed cache subsystem is used for providing efficiently quickly reading of isomeric data;Destructuring literary composition therein Using cache and recently least algorithms most in use accelerates the reading process to unstructured data for the application layer to part cache module; Internal storage data library module makes structural data be operated in internal memory using cache;Postponing writing module will at a high speed In write distributed file system after modified file postpones according to the rule setting in caching;
Multi-source heterogeneous convergence storage subsystem is used for realizing the efficient convergence storage of isomeric data;Destructuring number therein By file declustering submodule, jumbo single unstructured document is split according to processing module, and be stored in distributed In file system;By file combine submodule and file verification submodule come to distributed file system in split after data Block is combined;Structural data processing module passes through file generated submodule and file management submodule, comes to structurized Tables of data carries out xml document conversion according to the rule setting, and the xml document after conversion is stored in distributed file system;
When described access architecture system carries out multi-source heterogeneous data efficient convergence access:
Multi-source heterogeneous data is entered by the data management module first time of multi-source heterogeneous data cooperative management subsystem from application layer After entering system, according to different data structure features, that is, according to unstructured data, structural data respectively by multi-source heterogeneous number Read according to the unstructured data processing module converging in storage subsystem and structural data processing module, and carry out corresponding After data processing, send in distributed file system;
When application layer needs data, send instructions to data management module, this module calls search index module to come according to literary composition Part identification number or keyword travel through unstructured document cache module and internal memory in multi-source heterogeneous data high-speed cache subsystem DBM, after finding not find desired data, data management module sends instructions to distributed file system, therefrom Search out required source data, after data combination or xml document conversion, be transmitted separately to multi-source heterogeneous data high-speed and delay Deposit in unstructured document cache module or the internal storage data library module in subsystem, afterwards by data management module transmit to Application layer;
When application layer needs this data again, data management module is from unstructured document cache module and memory database mould Directly transfer to application layer in block;
When application layer needs to be rewritten to fetching data, by data management module to unstructured document cache module or interior Corresponding data in deposit data storehouse is modified;If the rewriting of structural data, data rewriting process is passed through daily record side Formula leaves in delay writing module;If the rewriting of unstructured data, data rewriting process is passed through temporary file Mode leaves in delay writing module;Application layer repeatedly can be write more to desired data by postponing writing module Newly, by unstructured document cache module or memory database, the write of this data and reading speed can be accelerated;Through After one time cycle, postpone writing module and send into amended data in distributed file system, carry out final data Update.
CN201310687009.XA 2013-12-13 2013-12-13 Multi-source heterogeneous data efficient converging and storing frame system Active CN103678603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310687009.XA CN103678603B (en) 2013-12-13 2013-12-13 Multi-source heterogeneous data efficient converging and storing frame system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310687009.XA CN103678603B (en) 2013-12-13 2013-12-13 Multi-source heterogeneous data efficient converging and storing frame system

Publications (2)

Publication Number Publication Date
CN103678603A CN103678603A (en) 2014-03-26
CN103678603B true CN103678603B (en) 2017-01-25

Family

ID=50316148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310687009.XA Active CN103678603B (en) 2013-12-13 2013-12-13 Multi-source heterogeneous data efficient converging and storing frame system

Country Status (1)

Country Link
CN (1) CN103678603B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761240B (en) * 2013-12-12 2017-12-15 北京奇虎科技有限公司 Data bank access method and device
CN104021194A (en) * 2014-06-13 2014-09-03 浪潮(北京)电子信息产业有限公司 Mixed type processing system and method oriented to industry big data diversity application
CN105138543A (en) * 2015-07-09 2015-12-09 广州杰赛科技股份有限公司 Data storage method and system
CN105630903B (en) * 2015-12-21 2020-02-21 中国电子科技集团公司第十五研究所 Method and device for rapidly storing mass data
CN106126511B (en) * 2015-12-30 2019-11-05 宁夏巨能机器人***有限公司 A kind of data base management system and its data managing method for intelligent production line
CN105893610A (en) * 2016-04-26 2016-08-24 中国科学院信息工程研究所 Deficiency-source completion method of multi-source heterogeneous large data
CN106095796A (en) * 2016-05-30 2016-11-09 中国邮政储蓄银行股份有限公司 Distributed data storage method, Apparatus and system
CN106528448A (en) * 2016-10-11 2017-03-22 杭州数强网络科技有限公司 Distributed caching mechanism for multi-source heterogeneous electronic commerce big data
CN107194007A (en) * 2017-06-20 2017-09-22 哈尔滨工业大学 A kind of integrated management system of spacecraft isomery test data
CN113515494B (en) * 2020-04-09 2024-03-22 ***通信集团广东有限公司 Database processing method based on distributed file system and electronic equipment
CN112579626A (en) * 2020-09-28 2021-03-30 京信数据科技有限公司 Construction method and device of multi-source heterogeneous SQL query engine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101083656A (en) * 2007-07-05 2007-12-05 上海交通大学 Data stream technique based multi-source heterogeneous data integrated system
CN101945126A (en) * 2010-09-09 2011-01-12 中国林业科学研究院资源信息研究所 Forest resource heterogeneous data distributed management system
CN102917038A (en) * 2012-10-10 2013-02-06 江苏物联网研究发展中心 Cloud computation based remote service system for medical internet of things

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9430612B2 (en) * 2009-02-04 2016-08-30 NaviNet, Inc. System and method for healthcare data management
CN102004743B (en) * 2009-09-02 2013-08-14 ***股份有限公司 System and method for copying data among heterogeneous databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101083656A (en) * 2007-07-05 2007-12-05 上海交通大学 Data stream technique based multi-source heterogeneous data integrated system
CN101945126A (en) * 2010-09-09 2011-01-12 中国林业科学研究院资源信息研究所 Forest resource heterogeneous data distributed management system
CN102917038A (en) * 2012-10-10 2013-02-06 江苏物联网研究发展中心 Cloud computation based remote service system for medical internet of things

Also Published As

Publication number Publication date
CN103678603A (en) 2014-03-26

Similar Documents

Publication Publication Date Title
CN103678603B (en) Multi-source heterogeneous data efficient converging and storing frame system
Aly et al. M3: Stream processing on main-memory mapreduce
CN106980669B (en) Data storage and acquisition method and device
US8677366B2 (en) Systems and methods for processing hierarchical data in a map-reduce framework
CN107193960B (en) Distributed crawler system and periodic incremental grabbing method
CN103699660B (en) A kind of method of large scale network stream data caching write
CN103617211A (en) HBase loaded data importing method
TWI537962B (en) Memory controlled data movement and timing
CN104407879B (en) A kind of power network sequential big data loaded in parallel method
US20150212846A1 (en) Reducing redundant network transmissions in virtual machine live migration
CN105630810A (en) Method for uploading mass small files in distributed storage system
CN103034650A (en) System and method for processing data
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium
CN105874431A (en) Computing system with reduced data exchange overhead and related data exchange method thereof
Zhang et al. A strategy to deal with mass small files in HDFS
CN117056303B (en) Data storage method and device suitable for military operation big data
CN104156316B (en) A kind of method and system of Hadoop clusters batch processing job
CN104112024A (en) Method and device for high-performance query of database
Xie et al. On massive spatial data retrieval based on spark
US20130346983A1 (en) Computer system, control system, control method and control program
CN106776810A (en) The data handling system and method for a kind of big data
EP2990895B1 (en) Industrial monitoring system
Jiang et al. Application and implementation of private cloud in agriculture sensory data platform
EP2765517A2 (en) Data stream splitting for low-latency data access
CN107609129A (en) Daily record real time processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171026

Address after: 201600 room 108, building 4, No. 455, research Exhibition Road, Shanghai, Songjiang District, -85

Patentee after: Shanghai state core Internet of things Technology Co., Ltd.

Address before: 214135 Jiangsu New District of Wuxi City Linghu Road No. 200 China Sensor Network International Innovation Park building C

Patentee before: Jiangsu Internet of Things Research & Develoment Co., Ltd.

TR01 Transfer of patent right