CN103853613A - Method for reading data based on digital family content under distributed storage - Google Patents

Method for reading data based on digital family content under distributed storage Download PDF

Info

Publication number
CN103853613A
CN103853613A CN201210512519.9A CN201210512519A CN103853613A CN 103853613 A CN103853613 A CN 103853613A CN 201210512519 A CN201210512519 A CN 201210512519A CN 103853613 A CN103853613 A CN 103853613A
Authority
CN
China
Prior art keywords
data
content
client
read
datanode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210512519.9A
Other languages
Chinese (zh)
Inventor
罗笑南
刘海亮
杨艾琳
苏航
曾坤
林哲祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute of Sun Yat Sen University
Original Assignee
Shenzhen Research Institute of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute of Sun Yat Sen University filed Critical Shenzhen Research Institute of Sun Yat Sen University
Priority to CN201210512519.9A priority Critical patent/CN103853613A/en
Publication of CN103853613A publication Critical patent/CN103853613A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for reading a data based on a digital family content under distributed storage. The method comprises the following steps: opening a to-be-read file on the basis of a client by a user; calling a master control node through a RPC (Remote Procedure Call) by a distributed file system and confirming the position of the file; starting to read the data by the client by calling a read () method; at the beginning of DFSInputStream, storing the addresses of the data nodes of the previous data blocks; firstly, connecting to the nearest DataNode; returning and calling the read () method by the client and reading the data from the data node in a streaming mode; at the moment of reading the tail of the block, closing a linkage to the present DataNode by the DFShiPutstream, and then finding the best DataNode of the next block; after finishing the reading, closing the input stream and releasing the object. The method provided by the invention is implemented, so that the application services, such as, mass data, multimedia interaction, and the like, are preferably supplied to the user.

Description

A kind of method of the digital home's content read data based under distributed storage
Technical field
The present invention relates to digital home technical field, be specifically related to a kind of method of the digital home's content read data based under distributed storage.
Background technology
In digital household environment, the interactive service based on centered by home gateway or Intelligent set top box, intelligent television box etc. is the emphasis direction of digital home's development.The interactive service of family comprises various information and the contents such as audio-visual amusement, game, security protection, this abundant in content and disperse environment under, it is very necessary that Content Management seems.A good distributed content management system of design can utilize numerous home gateways in digital home that stores service stable and magnanimity is provided fully, for upper layer application provides abundanter content service.
Under the interactive application environment of digital home, its content type relating to is various, wide material sources, between business, share content, need to analyse in depth the relevance of content, integration process content, diversified content service is provided, thus can be for user provides better service under this isomerous environment.We not only will solve multimedia storage, the more important thing is and need to allow these information carry out interaction with user.Content Management System will provide the service of interaction multimedia, gos deep into that organize content value chain comprises that content obtaining, content are shared, content innovation, content application comprehensively, deeply excavates content value behind, realizes the increment of content.The target of Content Management System is to provide unified effective contents processing and management and control function, makes content have reusability, elasticity, reduces whole development cost.
Existing have many enterprise content management service system and a Web Content Management System, also has some Content Management Systems towards specific industry customization, for example publishing business Content Management System.Aspect enterprise content management, IBM Content Management provides a set of reliable, easy upgrading, powerful enterprise content management architectural framework, it also provides powerful, safety and service high extended capability simultaneously, makes enterprise customer can be very easy to the content of accessing ecommerce.But they are based on server, we are based on home gateway.
Existing Content Management System is To enterprises or website, it is not Facing Digital home environment, these Content Management Systems operate under a sane hardware environment, and use client is specific, through the client of good training, aspect isomery integration, relate to fewer, to common hardware environment in other words poor, environment that network stabilization is poor being not suitable for; In addition, towards client's difference, its service providing is not identical yet, the user of digital home is more, live relevant content service relevant to family or community.
Summary of the invention
Object of the present invention provides a kind of design Storage and realization of Content Management System under digital household environment, and this storage system can be good at utilizing a large amount of intelligent home gateway of digital home to provide a kind of distributed, high fault-tolerant transplantable storage that a kind of method of read-write data is provided.
The embodiment of the present invention provides a kind of method of the digital home's content read data based under distributed storage, and the method comprises:
User opens the file that need to read based on client;
Distributed file system is called main controlled node and is determined the position at blocks of files place by remote procedure call;
Client call read () method starts reading out data;
Inlet flow object DFSInputStream has deposited the back end address, place of former data blocks while beginning, be first connected to nearest DataNode; Then client is returned to the read () method of calling, in the mode of streaming from back end reading out data;
In the time reading the ending of block, DFShiPutstream is closed to the link of current DataNode, then searches the best DataNode of next block;
After reading, close inlet flow, releasing object.
Described distributed file system is called main controlled node by remote procedure call and is determined that the position at blocks of files place comprises:
Return to the address of the back end DataNode that contains this piece according to each data block; The back end returning can sort, choose according to the measurement of distance or network state etc. from client, chooses a node that is conducive to data transmission most as data source; File system is that this read operation creates an inlet flow object DFSInputStream.
In described client by the cache read word home content data of peeking.
What between described client and back end DataNode, carry out is data communication, and between main controlled node, carry out be control communication.
Described distributed storage mode is for adopting Java content repository JSR-170 to provide main calling interface for top service layer; Improvement for Content Management System to Hadoop and the increase of correlation function.
By implementing the present invention, on Content Organizing, adopt a kind of content repository standard JSR-170 of cross-platform, canonical form, can make full use of the independence of the cross-platform of Java language and this standard itself, make the content storage module of native system there is cross-platform character, it is convenient to transplant, and upgrading is easy and extensibility is good; The distributed storage framework HDFS that adopts distributed computer framework Hadoop and its to use, the equipment such as terminal household gateway, intelligent box that make full use of in digital home are as calculating cluster and storage cluster.Hadoop allows hardware and network failure, storage and the calculating of highly redundant are provided, this has adapted to home network environment, also makes all devices of digital home set up into a network truly, better for user provides the application service such as mass data and multimedia interaction simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the distributed storage architecture schematic diagram in the embodiment of the present invention;
Fig. 2 is the tree-like institutional framework schematic diagram in the storage organization in the embodiment of the present invention;
Fig. 3 is the structural representation that the employing hadoop in the embodiment of the present invention realizes accumulation layer;
Fig. 4 is the write operation schematic flow sheet based on distributed storage in the embodiment of the present invention;
Fig. 5 is the read operation schematic flow sheet based on distributed storage in the embodiment of the present invention;
Fig. 6 is the band caching function client terminal structure schematic diagram in the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills, not making all other embodiment that obtain under creative work prerequisite, belong to the scope of protection of the invention.
Complete Content Management System mainly comprises collection to content, storage, conversion, search, retrieval, polymerization, distribution, management and control function etc.The present invention mainly relates to the design Storage of Content Management System.The storage administration of content is mainly that the data under distributed environment are managed; Backup storage management: data are made to multiple duplication, and management interface is provided.Realize the distributed storage (low cost, height are handled up and reliability) of file, promote the access efficiency of data, the transaction management of file storage is provided.
Accumulation layer of the present invention realizes and is mainly divided into two large divisions: the one, and content storage organization, the top service layer that is embodied as of the JSR-170 of employing provides main calling interface; The 2nd, the improvement for Content Management System to Hadoop and the increase of correlation function.Its structure as shown in Figure 1.JSR-170 is the standard of the Content Organizing mode of a kind of tree shape model of Java content repository (JCR, Java Content Repository).In JSR-170, JCR has defined standard set API and has visited content repository, and this standard is the same with Java, be independently, the mode of standard.Content repository is appreciated that the core of asking Content Management System, is exactly the place that multi-medium data is deposited, and what we need to design is exactly how to organize these contents, and what the structure of employing is; Under content storage organization, be Hadoop, it is a kind of distributed computing architecture, and its mathematical model is MapReduce, is proposed the earliest by Google company, and Hadoop is its a kind of implementation; At present the distributed structure/architecture based on Hadoop becomes that to be more and more subject to IT circles very intimate, and its main advantage is to disobey holder large server, but calculating is operated in numerous unsettled common systems such as PC, and its allows the mistake of hardware and network.
Above-mentioned Fig. 1 mentions content storage organization, and the present invention adopts the tree-like memory model of JSR-170 to carry out the various content-datas in organization number family.To introduce in detail this module below.
Content is made up of metadata and entity file, according to user's access characteristics, only needs metadata while browsing access, in the time of download or reading content, just relates to entity file, so the separately storage of the metadata of content and entity file.Accumulation layer is divided into multiple districts according to content at system the inside mobile, is respectively content acquisition district, content preliminary hearing district, content production district, content repository district, content operation district and content off-line district.Each district provides the different phase of content life cycle to use.And in Yi Ge district, all the elements are organized into tree structure, as shown in Figure 2.One tree in each district, tree is made up of node and attribute: every tree has a root node, and except root node, each node has and only have a father, and the child nodes of arbitrary number and the attribute of arbitrary number, in Fig. 2, corresponding father node and child node is the relation of 1 couple of n.Attribute (being in fact also a kind of node), for the leaf of tree, has and only has a father (node), there is no child node, is made up of a name and one or more value.In district, the content of all reality is stored in attribute, and node is used to create tree inner " path ", similar with the catalogue of traditional file system; And Attribute class is similar to file, really deposit the node of content.The type of attribute intermediate value has: character string (String), date (Date), long shaping (Long), double-precision floating point (Double), Boolean type, binary stream (stream) etc.
The function of the interface that accumulation layer provides to upper strata application service layer is mainly under the root node in each district, to increase, delete, change, look into node, and the attribute of node is increased, deletes, changes, looked into.
The above-mentioned organizational form that has illustrated that content of the present invention adopts is tree-like tissue, and next how we utilize Hadoop to realize this model explanation.
As shown in Figure 3, be that the present invention adopts Hadoop to realize the structural drawing of accumulation layer.It comprises client, main controlled node (NameNode+JSR-170 realization), back end (DataNode).NameNode is the monitor task of the distribution of mainly finishing the work and scheduling, system in Hadoop system; DataNode is mainly used for the storage of content in Hadoop, and each node can be served as DataNode, and all home gateways can carry out use as DataNode, and NameNode generally chooses reliable node.In the host node NameNede of Hadoop, file and catalogue are to be stored in below a catalogue take "/" as root node.In order to realize JSR-170, file and catalogue are expanded, added more attribute information.The node of the corresponding JSR-170 of catalogue in NameNode, the binary attribute of the corresponding JSR-170 of file.Attribute information below node need to be persisted on disk, in order to improve the efficiency that reads of storage, the attribute below a node is packaged into a Bundle (bag), when read-write take whole Bundle as unit.
Client need to comprise the access to main controlled node and back end to the access of data; In order to improve the robustness of system and the load that alleviates main controlled node, when client only has the message reference of node itself, as initial stage distribution and the control of failure node etc. of task scheduling, just access main controlled node; Otherwise all direct and back end DataNode communicates data, services such as data transmission; Client itself also has buffer memory design, needn't all carry out request of data to far-end so at every turn, and in fact, if client such equipment that is home gateway, itself also may be served as a DataNode and be existed.And some handheld devices of picture, when it can not carry this storage system, can be used as controller to exist, and home gateway in client Shi Gai family, i.e. a client of the common composition of this handheld device and home gateway; So just guaranteed in digital home, equipment as much as possible can sharing contents service.
What above-mentioned part had illustrated that the present invention stores employing is hardware foundation based on home gateway, adopts the distributed storage architecture of Hadoop framework.Next we describe several key operations of storage system in detail, i.e. write operation mechanism, read operation mechanism and buffer memory design.
The schematic diagram of write operation process as shown in Figure 4.Idiographic flow is as follows:
Step1: client creates interface requests by distributed file system and creates file, for data writing; Turn Step2;
Step2: distributed file system is sent RPC request to main controlled node NameNode, creates file, but now do not distribute any storage block in NameNode, is equivalent in main controlled node registration the same; NameNede carries out a lot of inspection and guarantees not exist the file that will create Already in file system, also has and checks whether have corresponding authority to create file.If these inspections have all completed, NameN0de will record the information of this new file so, otherwise document creation failure, and client can be received an IoExpection.If created successfully, distributed file system is returned to a data stream object output FSDataoutputstream and is used for data writing to client.This data stream object is by the communication work between the back end of being responsible in client and distributed file system; Turn Step3;
Step3: client starts to write data.DFSDataoutputstream is divided into bag the data that will write, and they are written in a data queue.Because after establishment, main controlled node NameNode can distribute some back end to receive the data stream of this write operation, and suppose to have three back end to receive here, the data pipe of three grades of flowing structures of this three back end compositions.Data are first written to first node in streamline by data stream object; Turn Step4;
Step4: then by first node by packet transmission and be written to second node, then second by packet transmission and be written to the 3rd node.Turn Step5;
Step5: queue whether being successfully written into about each packet of data stream object output FSDataoutputstream internal maintenance, i.e. ack queue.After a bag sends, in queue, safeguard an item of information of this bag, in the time that the confirmation ack of this bag returns, the item of this bag correspondence is deleted from queue.Turn Step6;
Step6: when complete data write after the dose method of client call stream; Before notice main controlled node NameNode, call flush operation and can guarantee that the information that some are not yet transmitted is written in back end, turns Step7;
Step7: write data and complete, notify main controlled node, can safeguard the attribute information of this file in main controlled node.These information are for follow-up operation.So far a complete write operation completes.
We describe the process of read operation in detail below, as shown in Figure 5, specific as follows:
Step1: client is opened the file that need to read; Turn Step2;
Step2: distributed file system is called main controlled node and determined the position at blocks of files place by RPC; For each data block, NameNode returns to the address of the back end DataNode that contains this piece; The back end returning can sort, choose according to the measurement of distance or network state etc. from client, chooses a node that is conducive to data transmission most as data source; File system is that this read operation creates an inlet flow object DFSInputStream; Turn Step3;
Step3: client call read () method starts reading out data; Turn Step4;
Step4: inlet flow object DFSInputStream has deposited the back end address, place of former data blocks while beginning, be first connected to nearest DataNode; Then client is returned to the read () method of calling, in the mode of streaming from back end reading out data; Turn Step5;
Step5: in the time reading the ending of block, DFShiPutstream can be closed to the link of current DataNode, then searches the best DataNode of next block; Turn Step6;
Step6: close inlet flow, releasing object after reading, finish.
Because user access has locality, the data of recent visit probably also can continue to read recently, and in order to improve the efficiency of access, the data that can read at client-cache, to again directly obtain when reading out data.
As shown in Figure 6, be the client with buffer memory design, it has increased cache module in client inside.Client upwards application program provides data, services, and what between it and back end DataNode, carry out is data communication; And between main controlled node, carry out be control communication.
It should be noted that replace Algorithm in buffer memory design have a lot of in, we adopt least recently used algorithm, are used for eliminating in following period of time not accessed data for a long time recently.To sum up, on Content Organizing, adopt a kind of content repository standard JSR-170 of cross-platform, canonical form, can make full use of the independence of the cross-platform of Java language and this standard itself, make the content storage module of native system there is cross-platform character, it is convenient to transplant, and upgrading is easy and extensibility is good; The distributed storage framework HDFS that adopts distributed computer framework Hadoop and its to use, the equipment such as terminal household gateway, intelligent box that make full use of in digital home are as calculating cluster and storage cluster.Hadoop allows hardware and network failure, storage and the calculating of highly redundant are provided, this has adapted to home network environment, also makes all devices of digital home set up into a network truly, better for user provides the application service such as mass data and multimedia interaction simultaneously.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of above-described embodiment is can carry out the hardware that instruction is relevant by program to complete, this program can be stored in a computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
The method of the digital home's content read data based under the distributed storage above embodiment of the present invention being provided is described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (5)

1. a method for the digital home's content read data based under distributed storage, is characterized in that, the method comprises:
User opens the file that need to read based on client;
Distributed file system is called main controlled node and is determined the position at blocks of files place by remote procedure call;
Client call read () method starts reading out data;
Inlet flow object DFSInputStream has deposited the back end address, place of former data blocks while beginning, be first connected to nearest DataNode; Then client is returned to the read () method of calling, in the mode of streaming from back end reading out data;
In the time reading the ending of block, DFShiPutstream is closed to the link of current DataNode, then searches the best DataNode of next block;
After reading, close inlet flow, releasing object.
2. the method for the digital home's content read data based under distributed storage as claimed in claim 1, is characterized in that, described distributed file system is called main controlled node by remote procedure call and determined that the position at blocks of files place comprises:
Return to the address of the back end DataNode that contains this piece according to each data block; The back end returning can sort, choose according to the measurement of distance or network state etc. from client, chooses a node that is conducive to data transmission most as data source; File system is that this read operation creates an inlet flow object DFSInputStream.
3. the method for the digital home's content read data based under distributed storage as claimed in claim 2, is characterized in that, in described client by the cache read word home content data of peeking.
4. the method for the digital home's content read data based under distributed storage as claimed in claim 3, is characterized in that, what between described client and back end DataNode, carry out is data communication, and between main controlled node, carry out be control communication.
5. the method for the digital home's content read data based under distributed storage as claimed in claim 4, is characterized in that, described distributed storage mode is for adopting JaVa content repository JSR-170 to provide main calling interface for top service layer; Improvement for Content Management System to Hadoop and the increase of correlation function.
CN201210512519.9A 2012-12-04 2012-12-04 Method for reading data based on digital family content under distributed storage Pending CN103853613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210512519.9A CN103853613A (en) 2012-12-04 2012-12-04 Method for reading data based on digital family content under distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210512519.9A CN103853613A (en) 2012-12-04 2012-12-04 Method for reading data based on digital family content under distributed storage

Publications (1)

Publication Number Publication Date
CN103853613A true CN103853613A (en) 2014-06-11

Family

ID=50861300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210512519.9A Pending CN103853613A (en) 2012-12-04 2012-12-04 Method for reading data based on digital family content under distributed storage

Country Status (1)

Country Link
CN (1) CN103853613A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919713A (en) * 2017-03-13 2017-07-04 郑州云海信息技术有限公司 A kind of cluster file system and distributed file system multi-client document control method
CN108446392A (en) * 2018-03-23 2018-08-24 北京星震维度信息技术有限公司 The management system and method for CD server
US10447763B2 (en) 2016-12-08 2019-10-15 Nanning Fugui Precision Industrial Co., Ltd. Distributed storage method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102638566A (en) * 2012-02-28 2012-08-15 山东大学 BLOG system running method based on cloud storage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120182891A1 (en) * 2011-01-19 2012-07-19 Youngseok Lee Packet analysis system and method using hadoop based parallel computation
CN102638566A (en) * 2012-02-28 2012-08-15 山东大学 BLOG system running method based on cloud storage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹风兵: ""基于Hadoop的云计算模型研究与应用"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨坤: ""基于Hadoop的云存储***客户端的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10447763B2 (en) 2016-12-08 2019-10-15 Nanning Fugui Precision Industrial Co., Ltd. Distributed storage method and system
CN106919713A (en) * 2017-03-13 2017-07-04 郑州云海信息技术有限公司 A kind of cluster file system and distributed file system multi-client document control method
CN108446392A (en) * 2018-03-23 2018-08-24 北京星震维度信息技术有限公司 The management system and method for CD server

Similar Documents

Publication Publication Date Title
Padhy et al. RDBMS to NoSQL: reviewing some next-generation non-relational database’s
CN104160381B (en) Managing method and system for tenant-specific data sets in a multi-tenant environment
CN103299267B (en) The method and system connecting for carrying out the interleaved of many tenant's storages
CN102855239B (en) A kind of distributed geographical file system
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
CN102033912A (en) Distributed-type database access method and system
US10853193B2 (en) Database system recovery using non-volatile system memory
CN103853612A (en) Method for reading data based on digital family content under distributed storage
CN109906448A (en) Promote the operation on pluggable database using individual logical time stamp service
CN103647797A (en) Distributed file system and data access method thereof
CN101093497A (en) Document management server, document management method, computer readable medium, computer data signal, and system for managing document use
CN102129469A (en) Virtual experiment-oriented unstructured data accessing method
CN106775446A (en) Based on the distributed file system small documents access method that solid state hard disc accelerates
CN105808753B (en) A kind of regionality digital resources system
JPWO2011108695A1 (en) Parallel data processing system, parallel data processing method and program
Ruflin et al. Social-data storage-systems
CN110287150A (en) A kind of large-scale storage systems meta-data distribution formula management method and system
CN107832392A (en) A kind of metadata management system
CN103595799A (en) Method for achieving distributed shared data bank
CN103942301A (en) Distributed file system oriented to access and application of multiple data types
CN103853613A (en) Method for reading data based on digital family content under distributed storage
CN104951475A (en) Distributed file system and implementation method
CN105631010A (en) Optimization method based on HDFS small file storage
CN109413130A (en) A kind of cloud storage system
CN104391947A (en) Real-time processing method and system of mass GIS (geographic information system) data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140611