CN103853613A

CN103853613A - Method for reading data based on digital family content under distributed storage

Info

Publication number: CN103853613A
Application number: CN201210512519.9A
Authority: CN
Inventors: 罗笑南; 刘海亮; 杨艾琳; 苏航; 曾坤; 林哲祺
Original assignee: Shenzhen Research Institute of Sun Yat Sen University
Current assignee: Shenzhen Research Institute of Sun Yat Sen University
Priority date: 2012-12-04
Filing date: 2012-12-04
Publication date: 2014-06-11

Abstract

The invention discloses a method for reading a data based on a digital family content under distributed storage. The method comprises the following steps: opening a to-be-read file on the basis of a client by a user; calling a master control node through a RPC (Remote Procedure Call) by a distributed file system and confirming the position of the file; starting to read the data by the client by calling a read () method; at the beginning of DFSInputStream, storing the addresses of the data nodes of the previous data blocks; firstly, connecting to the nearest DataNode; returning and calling the read () method by the client and reading the data from the data node in a streaming mode; at the moment of reading the tail of the block, closing a linkage to the present DataNode by the DFShiPutstream, and then finding the best DataNode of the next block; after finishing the reading, closing the input stream and releasing the object. The method provided by the invention is implemented, so that the application services, such as, mass data, multimedia interaction, and the like, are preferably supplied to the user.

Description

A kind of method of the digital home's content read data based under distributed storage

Technical field

The present invention relates to digital home technical field, be specifically related to a kind of method of the digital home's content read data based under distributed storage.

Background technology

In digital household environment, the interactive service based on centered by home gateway or Intelligent set top box, intelligent television box etc. is the emphasis direction of digital home's development.The interactive service of family comprises various information and the contents such as audio-visual amusement, game, security protection, this abundant in content and disperse environment under, it is very necessary that Content Management seems.A good distributed content management system of design can utilize numerous home gateways in digital home that stores service stable and magnanimity is provided fully, for upper layer application provides abundanter content service.

Under the interactive application environment of digital home, its content type relating to is various, wide material sources, between business, share content, need to analyse in depth the relevance of content, integration process content, diversified content service is provided, thus can be for user provides better service under this isomerous environment.We not only will solve multimedia storage, the more important thing is and need to allow these information carry out interaction with user.Content Management System will provide the service of interaction multimedia, gos deep into that organize content value chain comprises that content obtaining, content are shared, content innovation, content application comprehensively, deeply excavates content value behind, realizes the increment of content.The target of Content Management System is to provide unified effective contents processing and management and control function, makes content have reusability, elasticity, reduces whole development cost.

Existing have many enterprise content management service system and a Web Content Management System, also has some Content Management Systems towards specific industry customization, for example publishing business Content Management System.Aspect enterprise content management, IBM Content Management provides a set of reliable, easy upgrading, powerful enterprise content management architectural framework, it also provides powerful, safety and service high extended capability simultaneously, makes enterprise customer can be very easy to the content of accessing ecommerce.But they are based on server, we are based on home gateway.

Existing Content Management System is To enterprises or website, it is not Facing Digital home environment, these Content Management Systems operate under a sane hardware environment, and use client is specific, through the client of good training, aspect isomery integration, relate to fewer, to common hardware environment in other words poor, environment that network stabilization is poor being not suitable for; In addition, towards client's difference, its service providing is not identical yet, the user of digital home is more, live relevant content service relevant to family or community.

Summary of the invention

Object of the present invention provides a kind of design Storage and realization of Content Management System under digital household environment, and this storage system can be good at utilizing a large amount of intelligent home gateway of digital home to provide a kind of distributed, high fault-tolerant transplantable storage that a kind of method of read-write data is provided.

The embodiment of the present invention provides a kind of method of the digital home's content read data based under distributed storage, and the method comprises:

User opens the file that need to read based on client;

Distributed file system is called main controlled node and is determined the position at blocks of files place by remote procedure call;

Client call read () method starts reading out data;

Inlet flow object DFSInputStream has deposited the back end address, place of former data blocks while beginning, be first connected to nearest DataNode; Then client is returned to the read () method of calling, in the mode of streaming from back end reading out data;

In the time reading the ending of block, DFShiPutstream is closed to the link of current DataNode, then searches the best DataNode of next block;

After reading, close inlet flow, releasing object.

Described distributed file system is called main controlled node by remote procedure call and is determined that the position at blocks of files place comprises:

Return to the address of the back end DataNode that contains this piece according to each data block; The back end returning can sort, choose according to the measurement of distance or network state etc. from client, chooses a node that is conducive to data transmission most as data source; File system is that this read operation creates an inlet flow object DFSInputStream.

In described client by the cache read word home content data of peeking.

What between described client and back end DataNode, carry out is data communication, and between main controlled node, carry out be control communication.

Described distributed storage mode is for adopting Java content repository JSR-170 to provide main calling interface for top service layer; Improvement for Content Management System to Hadoop and the increase of correlation function.

By implementing the present invention, on Content Organizing, adopt a kind of content repository standard JSR-170 of cross-platform, canonical form, can make full use of the independence of the cross-platform of Java language and this standard itself, make the content storage module of native system there is cross-platform character, it is convenient to transplant, and upgrading is easy and extensibility is good; The distributed storage framework HDFS that adopts distributed computer framework Hadoop and its to use, the equipment such as terminal household gateway, intelligent box that make full use of in digital home are as calculating cluster and storage cluster.Hadoop allows hardware and network failure, storage and the calculating of highly redundant are provided, this has adapted to home network environment, also makes all devices of digital home set up into a network truly, better for user provides the application service such as mass data and multimedia interaction simultaneously.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the distributed storage architecture schematic diagram in the embodiment of the present invention;

Fig. 2 is the tree-like institutional framework schematic diagram in the storage organization in the embodiment of the present invention;

Fig. 3 is the structural representation that the employing hadoop in the embodiment of the present invention realizes accumulation layer;

Fig. 4 is the write operation schematic flow sheet based on distributed storage in the embodiment of the present invention;

Fig. 5 is the read operation schematic flow sheet based on distributed storage in the embodiment of the present invention;

Fig. 6 is the band caching function client terminal structure schematic diagram in the embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills, not making all other embodiment that obtain under creative work prerequisite, belong to the scope of protection of the invention.

Complete Content Management System mainly comprises collection to content, storage, conversion, search, retrieval, polymerization, distribution, management and control function etc.The present invention mainly relates to the design Storage of Content Management System.The storage administration of content is mainly that the data under distributed environment are managed; Backup storage management: data are made to multiple duplication, and management interface is provided.Realize the distributed storage (low cost, height are handled up and reliability) of file, promote the access efficiency of data, the transaction management of file storage is provided.

Accumulation layer of the present invention realizes and is mainly divided into two large divisions: the one, and content storage organization, the top service layer that is embodied as of the JSR-170 of employing provides main calling interface; The 2nd, the improvement for Content Management System to Hadoop and the increase of correlation function.Its structure as shown in Figure 1.JSR-170 is the standard of the Content Organizing mode of a kind of tree shape model of Java content repository (JCR, Java Content Repository).In JSR-170, JCR has defined standard set API and has visited content repository, and this standard is the same with Java, be independently, the mode of standard.Content repository is appreciated that the core of asking Content Management System, is exactly the place that multi-medium data is deposited, and what we need to design is exactly how to organize these contents, and what the structure of employing is; Under content storage organization, be Hadoop, it is a kind of distributed computing architecture, and its mathematical model is MapReduce, is proposed the earliest by Google company, and Hadoop is its a kind of implementation; At present the distributed structure/architecture based on Hadoop becomes that to be more and more subject to IT circles very intimate, and its main advantage is to disobey holder large server, but calculating is operated in numerous unsettled common systems such as PC, and its allows the mistake of hardware and network.

Above-mentioned Fig. 1 mentions content storage organization, and the present invention adopts the tree-like memory model of JSR-170 to carry out the various content-datas in organization number family.To introduce in detail this module below.

Content is made up of metadata and entity file, according to user's access characteristics, only needs metadata while browsing access, in the time of download or reading content, just relates to entity file, so the separately storage of the metadata of content and entity file.Accumulation layer is divided into multiple districts according to content at system the inside mobile, is respectively content acquisition district, content preliminary hearing district, content production district, content repository district, content operation district and content off-line district.Each district provides the different phase of content life cycle to use.And in Yi Ge district, all the elements are organized into tree structure, as shown in Figure 2.One tree in each district, tree is made up of node and attribute: every tree has a root node, and except root node, each node has and only have a father, and the child nodes of arbitrary number and the attribute of arbitrary number, in Fig. 2, corresponding father node and child node is the relation of 1 couple of n.Attribute (being in fact also a kind of node), for the leaf of tree, has and only has a father (node), there is no child node, is made up of a name and one or more value.In district, the content of all reality is stored in attribute, and node is used to create tree inner " path ", similar with the catalogue of traditional file system; And Attribute class is similar to file, really deposit the node of content.The type of attribute intermediate value has: character string (String), date (Date), long shaping (Long), double-precision floating point (Double), Boolean type, binary stream (stream) etc.

The function of the interface that accumulation layer provides to upper strata application service layer is mainly under the root node in each district, to increase, delete, change, look into node, and the attribute of node is increased, deletes, changes, looked into.

The above-mentioned organizational form that has illustrated that content of the present invention adopts is tree-like tissue, and next how we utilize Hadoop to realize this model explanation.

As shown in Figure 3, be that the present invention adopts Hadoop to realize the structural drawing of accumulation layer.It comprises client, main controlled node (NameNode+JSR-170 realization), back end (DataNode).NameNode is the monitor task of the distribution of mainly finishing the work and scheduling, system in Hadoop system; DataNode is mainly used for the storage of content in Hadoop, and each node can be served as DataNode, and all home gateways can carry out use as DataNode, and NameNode generally chooses reliable node.In the host node NameNede of Hadoop, file and catalogue are to be stored in below a catalogue take "/" as root node.In order to realize JSR-170, file and catalogue are expanded, added more attribute information.The node of the corresponding JSR-170 of catalogue in NameNode, the binary attribute of the corresponding JSR-170 of file.Attribute information below node need to be persisted on disk, in order to improve the efficiency that reads of storage, the attribute below a node is packaged into a Bundle (bag), when read-write take whole Bundle as unit.

Client need to comprise the access to main controlled node and back end to the access of data; In order to improve the robustness of system and the load that alleviates main controlled node, when client only has the message reference of node itself, as initial stage distribution and the control of failure node etc. of task scheduling, just access main controlled node; Otherwise all direct and back end DataNode communicates data, services such as data transmission; Client itself also has buffer memory design, needn't all carry out request of data to far-end so at every turn, and in fact, if client such equipment that is home gateway, itself also may be served as a DataNode and be existed.And some handheld devices of picture, when it can not carry this storage system, can be used as controller to exist, and home gateway in client Shi Gai family, i.e. a client of the common composition of this handheld device and home gateway; So just guaranteed in digital home, equipment as much as possible can sharing contents service.

What above-mentioned part had illustrated that the present invention stores employing is hardware foundation based on home gateway, adopts the distributed storage architecture of Hadoop framework.Next we describe several key operations of storage system in detail, i.e. write operation mechanism, read operation mechanism and buffer memory design.

The schematic diagram of write operation process as shown in Figure 4.Idiographic flow is as follows:

Step1: client creates interface requests by distributed file system and creates file, for data writing; Turn Step2;

Step2: distributed file system is sent RPC request to main controlled node NameNode, creates file, but now do not distribute any storage block in NameNode, is equivalent in main controlled node registration the same; NameNede carries out a lot of inspection and guarantees not exist the file that will create Already in file system, also has and checks whether have corresponding authority to create file.If these inspections have all completed, NameN0de will record the information of this new file so, otherwise document creation failure, and client can be received an IoExpection.If created successfully, distributed file system is returned to a data stream object output FSDataoutputstream and is used for data writing to client.This data stream object is by the communication work between the back end of being responsible in client and distributed file system; Turn Step3;

Step3: client starts to write data.DFSDataoutputstream is divided into bag the data that will write, and they are written in a data queue.Because after establishment, main controlled node NameNode can distribute some back end to receive the data stream of this write operation, and suppose to have three back end to receive here, the data pipe of three grades of flowing structures of this three back end compositions.Data are first written to first node in streamline by data stream object; Turn Step4;

Step4: then by first node by packet transmission and be written to second node, then second by packet transmission and be written to the 3rd node.Turn Step5;

Step5: queue whether being successfully written into about each packet of data stream object output FSDataoutputstream internal maintenance, i.e. ack queue.After a bag sends, in queue, safeguard an item of information of this bag, in the time that the confirmation ack of this bag returns, the item of this bag correspondence is deleted from queue.Turn Step6;

Step6: when complete data write after the dose method of client call stream; Before notice main controlled node NameNode, call flush operation and can guarantee that the information that some are not yet transmitted is written in back end, turns Step7;

Step7: write data and complete, notify main controlled node, can safeguard the attribute information of this file in main controlled node.These information are for follow-up operation.So far a complete write operation completes.

We describe the process of read operation in detail below, as shown in Figure 5, specific as follows:

Step1: client is opened the file that need to read; Turn Step2;

Step2: distributed file system is called main controlled node and determined the position at blocks of files place by RPC; For each data block, NameNode returns to the address of the back end DataNode that contains this piece; The back end returning can sort, choose according to the measurement of distance or network state etc. from client, chooses a node that is conducive to data transmission most as data source; File system is that this read operation creates an inlet flow object DFSInputStream; Turn Step3;

Step3: client call read () method starts reading out data; Turn Step4;

Step4: inlet flow object DFSInputStream has deposited the back end address, place of former data blocks while beginning, be first connected to nearest DataNode; Then client is returned to the read () method of calling, in the mode of streaming from back end reading out data; Turn Step5;

Step5: in the time reading the ending of block, DFShiPutstream can be closed to the link of current DataNode, then searches the best DataNode of next block; Turn Step6;

Step6: close inlet flow, releasing object after reading, finish.

Because user access has locality, the data of recent visit probably also can continue to read recently, and in order to improve the efficiency of access, the data that can read at client-cache, to again directly obtain when reading out data.

As shown in Figure 6, be the client with buffer memory design, it has increased cache module in client inside.Client upwards application program provides data, services, and what between it and back end DataNode, carry out is data communication; And between main controlled node, carry out be control communication.

It should be noted that replace Algorithm in buffer memory design have a lot of in, we adopt least recently used algorithm, are used for eliminating in following period of time not accessed data for a long time recently.To sum up, on Content Organizing, adopt a kind of content repository standard JSR-170 of cross-platform, canonical form, can make full use of the independence of the cross-platform of Java language and this standard itself, make the content storage module of native system there is cross-platform character, it is convenient to transplant, and upgrading is easy and extensibility is good; The distributed storage framework HDFS that adopts distributed computer framework Hadoop and its to use, the equipment such as terminal household gateway, intelligent box that make full use of in digital home are as calculating cluster and storage cluster.Hadoop allows hardware and network failure, storage and the calculating of highly redundant are provided, this has adapted to home network environment, also makes all devices of digital home set up into a network truly, better for user provides the application service such as mass data and multimedia interaction simultaneously.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of above-described embodiment is can carry out the hardware that instruction is relevant by program to complete, this program can be stored in a computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.

The method of the digital home's content read data based under the distributed storage above embodiment of the present invention being provided is described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims

1. a method for the digital home's content read data based under distributed storage, is characterized in that, the method comprises:

User opens the file that need to read based on client;

Client call read () method starts reading out data;

After reading, close inlet flow, releasing object.

2. the method for the digital home's content read data based under distributed storage as claimed in claim 1, is characterized in that, described distributed file system is called main controlled node by remote procedure call and determined that the position at blocks of files place comprises:

3. the method for the digital home's content read data based under distributed storage as claimed in claim 2, is characterized in that, in described client by the cache read word home content data of peeking.

4. the method for the digital home's content read data based under distributed storage as claimed in claim 3, is characterized in that, what between described client and back end DataNode, carry out is data communication, and between main controlled node, carry out be control communication.

5. the method for the digital home's content read data based under distributed storage as claimed in claim 4, is characterized in that, described distributed storage mode is for adopting JaVa content repository JSR-170 to provide main calling interface for top service layer; Improvement for Content Management System to Hadoop and the increase of correlation function.