CN101645919B

CN101645919B - Popularity-based duplicate rating calculation method and duplicate placement method

Info

Publication number: CN101645919B
Application number: CN200910081302A
Authority: CN
Inventors: 王劲林; 尤佳莉; 齐向东; 邓浩江; 王玲芳
Original assignee: Institute of Acoustics CAS
Current assignee: Zhengzhou Xinrand Network Technology Co ltd
Priority date: 2009-04-01
Filing date: 2009-04-01
Publication date: 2012-10-17
Anticipated expiration: 2029-04-01
Also published as: CN101645919A

Abstract

The invention provides a popularity-based duplicate rating calculation method and a duplicate placement method which are applied in a distributed network. The duplicate rating calculation method comprises the steps of: obtaining popularity information of a file f in the local region; calculating the access probability of the file; obtaining access probability values of the files with highest popularity in the distributed files of the current region; and calculating the duplicate rating number L of the file. The duplication placement method is based on the result of the rating calculation, and a main node of the current file is found according to the DHT routing algorithm; the file, the corresponding duplicate rating and other related information are downloaded to local; and all the nodes of L site which are matched with an ID of the main node are found through a routing table of the main node, and the file is duplicated on the nodes. The methods obtain the duplicate rating number and the corresponding placement position of the file in a structural PSP network through the analysis and the calculation of the popularity of the data file and reasonably and effectively place data in the network, thereby reducing user access delay and improving the performances of a system.

Description

A kind of duplicate rating computational methods and copy laying method thereof based on popularity

Technical field

The present invention relates to the information network technique field, particularly a kind of duplicate rating computational methods and copy laying method that is applied in the distributed network based on popularity.

Background technology

In distributed system, the center service node is not set stores all contents, its data all are dispersed on each node in the network.Therefore, in many application, can a plurality of copies of data be stored on the different nodes, make the user, from obtaining required information near user's place with the fastest speed.This method can significantly reduce data transfer delay, solves network congestion, improves response speed and service quality, in the application that has source server, can reduce its load pressure, reduces the network operation cost, makes Virtual network operator and user reach " doulbe-sides' victory ".

For example, the structurized P2P network of wherein a kind of form of above-mentioned distributed system be meant all nodes through certain rule, mode of fixed topology organizes.Each node all has fixing node degree, the expression neighbours' that get in touch number, and each neighbour is selected meticulously, but can guarantee internodal route and can search.The distinguishing feature of structured P 2 P network is a self-organization, under the situation of dynamics such as node adding, inefficacy, can effectively keep topological structure, guarantees the routing performance of network.This organizational form has obtained using widely in distributed network system (DNS).

Fig. 2 is that the copy in the structured P 2 P network is placed sketch map.As shown in Figure 2, in structured P 2 P network, each node all obtains a unique ID value through certain hash algorithm; Equally, all application entities are (like object, object); Also through this hash algorithm obtain one with node Id in the objectID of same numerical space value; A unique placement node is equally also arranged, and this placement node just is called the host node of this object, is called home node.To the location of searching promptly of object to its home node, this process be initiate node through inquiry neighbor node as intermediary and hop-by-hop carries out, till finding target or searching failure.

Structured P 2 P network becomes distributed hashtable (being called for short DHT) again, and above resource position fixing process is a common method.In practical application, different topological structures are suggested, also corresponding different DHT algorithm.Commonly used have CAN, Chord, Kademlia, Pastry, a Tapestry etc.In the DHT network, the ID space of supposing cryptographic hash is the numerical value of M bit position, and from left to right, all couplings are called a grade, i.e. a level more than or equal to the ID of l bit value.If its level of object is l, expression route l jumps and just can find this object.Therefore, need on all nodes outside the l jumping covers, back up.Such as, be Routing Protocol with Pastry, the level of object be l then corresponding the node region of a wedge shape; All nodes of representing this zone need back up object, if node adds up to N in the network, radix is b; Then need N/bl node to back up; Therefore, how calculating the pairing level value of each object, then is the key that copy is placed.

The content distributing network of other a kind of form of distributed system; It is Content Distribution/Delivery Network (being called for short CDN); Be to carry out the typical application that many copies are placed to file; Main through in existing Internet, increasing the new network architecture of one deck, utilize technology such as distributed caching/duplicate, load balancing, traffic engineering and client are redirected, with content release near user's network edge.Along with the development of multimedia application, the CDN technology usually is used for the transmission of media content, has improved user experience effectively, has received increasing concern.

It is through under the condition of certain delay or bandwidth occupancy mostly that existing duplicate rating calculates, and calculates the duplicate rating of All Files in the network through the method for global optimization.Although the result that this method obtains can make systematic function optimum, need calculate through optimization algorithm global information, be extremely consuming time and take the thing of resource.In addition, when new file distributing is in system, need carry out optimization computation again to the duplicate rating of All Files, this can't satisfy actual engine request.

Summary of the invention

The present invention provides a kind of duplicate rating computational methods and copy laying method based on popularity that is applied in the distributed network.The present invention is through the analytical calculation to data document flow row degree (such as by the difference of user's program request, download time); Obtain duplicate rating number and the corresponding placement position of file in structured P 2 P network; The data rational and effective is positioned in the network; Reduce user capture and postpone, improve systematic function.

For achieving the above object, a kind of duplicate rating computational methods based on popularity of the present invention is characterized in that, comprise the steps:

A) obtain file f popularity information in this area;

B) the access probability prob_f of calculation document f;

C) obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network.

In addition, in the duplicate rating computational methods based on popularity of the invention described above, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of file f calculates through following formula:

prob_f＝prob(rank_f)

Wherein, rank_f is the popularity rank of file f in the whole distract.

In addition, a kind of copy laying method based on popularity provided by the invention need comprise the steps: based on the duplicate rating computational methods based on popularity of the invention described above

A) obtain file f popularity information in this area;

B) the access probability prob_f of calculation document f;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network;

E) obtain according to the DHT routing algorithm, finding the host node of current file after the above-mentioned L value;

F) host node clump data source downloads to this locality with relevant informations such as file and corresponding duplicate rating parameter L;

G) after host node obtained data, the duplicate rating of the required placement that views file through the routing table of host node, found all nodes that mate the L position with the ID of host node, and file copy is accomplished the copy placement to these nodes.

Of the present inventionly be applied to being in the distributed network: can be according to the file popularity information based on the duplicate rating computational methods of popularity and the beneficial effect of copy laying method thereof; Convenient, the duplicate rating number of calculated data copy required storage in distributed network effectively; And in distributed network, place; In different application,, different service ability is provided to different files.This method is calculated simple, and resources occupation rate is low, in actual engineering, has great application prospect.

Description of drawings

Fig. 1 is the sketch map that the node of content distributing network constitutes example.

Fig. 2 is that the copy in the structured P 2 P network is placed sketch map.

Fig. 3 is the flow chart of the duplicate rating computational methods based on popularity of the present invention.

Fig. 4 is the flow chart of the copy laying method based on popularity of the present invention.

Embodiment

Below in conjunction with accompanying drawing and specific embodiment, duplicate rating computational methods and the copy laying method based on popularity of the present invention done further to set forth.

Fig. 1 is the sketch map that the node of content distributing network constitutes example.Here, suppose that a content distributing network is a two-layer structure, comprises management level and copy placed layer.Management level mainly are responsible for the maintenance and the content of All Files index and are distributed needed computational process; The copy placed layer then is responsible for the backup to the data copy.Here, the management level of CDN are made up of the server of mutual full-mesh, and the copy placed layer is organized all nodes through the Pastry Routing Protocol, and simultaneously, each node connects a management node in the management level at least.The information such as popularity of management node storage All Files, and mainly media file is distributed.

The popularity of file is meant that a file receives user's welcome degree, usually representes the access frequency of this document with the user.The present invention is according to the difference (such as by the difference of user's program request, download time) of each file popularity; The duplicate rating number of calculation document in structured P 2 P network; The data rational and effective is positioned in the network, promotes the service performance of related application, improve user experience.

Fig. 3 is the flow chart of the duplicate rating computational methods based on popularity of the present invention.A kind of duplicate rating computational methods based on popularity of the present invention is characterized in that, comprise the steps:

A) obtain file f popularity information in this area:

The grade of a file (level) number is low more, and required jumping figure is few more, and expression is found its copy, the user capture of being more convenient for more easily.Therefore, the required level number of each file is relevant with its popularity (popularity).The popularity of supposing file f is pop, when distributing, at first need obtain the rank rank_f (populairity from high to low) of this document popularity in the whole distract.

B) the access probability prob_f of calculation document f:

Suppose that existing file access probability function is prob (x), wherein, x is the ordering of file popularity in the whole distract, then the access probability prob_f=prob (rank_f) of file f.Here, the probability function that file is visited can be estimated to obtain to it according to history access record, and many pertinent literatures repeat no more to this existing introduction here.

C) in different application, node has different method for organizing, through the mode of management server stores information or the regular acquisition of information of certain node; Can obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

In addition, Fig. 2 is that the copy in the structured P 2 P network is placed sketch map, and wherein, what fill " * " is host node, fills "/" and places node for copy.Fig. 4 is the flow chart of the copy laying method based on popularity of the present invention.A kind of copy laying method based on popularity of the present invention is based on the copy laying method based on the duplicate rating result of calculation of popularity of the invention described above, comprises the steps:

A) obtain file f popularity information in this area;

B) the access probability prob_f of calculation document f;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

F) host node downloads to this locality with relevant informations such as file and corresponding duplicate rating parameter L from data source or server;

In this step g); The node of host node Home node in routing table sends query requests, seeks the node with the own L of coupling position, the node of receiving query requests equally in the routing table of self node send query requests; Till reaching the TTL step; All qualified nodal informations all return to home node, form the replica node set, on each node in then file copy being gathered to replica node.

Embodiment

As an example this method is described with content distributing network (be Content Distribution/Delivery Network, be called for short CDN).Fig. 1 is the sketch map that the node of content distributing network constitutes example.Here, suppose that a content distributing network is a two-layer structure, comprises management level and copy placed layer.Management level mainly are responsible for the maintenance and the content of All Files index and are distributed needed computational process; The copy placed layer then is responsible for the backup to the data copy.Here, the management level of CDN are made up of the server of mutual full-mesh, and the copy placed layer is organized all nodes through the Pastry Routing Protocol, and simultaneously, each node connects a management node in the management level at least.The information such as popularity of management node storage All Files, and mainly media file is distributed.

Suppose that node ID is 128 in the copy placed layer; Totally 1000 files will be distributed in the network, and the popularity of file f is rank 50 in All Files, and file access probability function

x=1; 2; ..., N, wherein x representes rank.Can learn from the Content Management node, the probable value of 5 files that access probability is the highest be 1,0.60,0.45,0.36,0.31}, therefore, maxProb=(1+0.60+0.45+0.36+0.31)/5=0.55.In addition, the prob=0.06 of file f, then L=128-(prob/maxProb) * 128=114.

Fig. 2 is that the copy in the structured P 2 P network is placed sketch map, and wherein, what fill " * " is host node, fills "/" and places node for copy.The duplicate rating of tentation data is L ₁, then with host node coupling L ₁All need place the copy of data on the node of position.Therefore, after obtaining the L value of file f, find node hn_f nearest on the ID space through the Pastry routing algorithm, as the host node of f with the ID of file f.Hn_f downloads to this locality with file and duplicate rating relevant information from data source, and simultaneously, the routing table through hn_f finds the node of 114 of the ID couplings of all and hn_f, and data are placed on these nodes.

Explain that the other guide in the document is directed against the those of ordinary skill in this professional domain, all can carry out technology and realize, repeat no more here.

Claims

1. the duplicate rating computational methods based on popularity is characterized in that, comprise the steps:

A) obtain file f popularity information in this area;

B) the access probability prob_f of calculation document f;

C) obtain in the distribution of document of current regional institute, the access probability value of N the file that popularity is the highest, the maximum probable value maxProb of order equals the mean value of the access probability of the highest N of a popularity file;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

Wherein, M is total number of degrees, i.e. total number of the bit bit value in the ID space of the cryptographic hash in the distributed hashtable network; Prob is the access probability value of file.

2. the duplicate rating computational methods based on popularity as claimed in claim 1 is characterized in that, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of said file f calculates through following formula:

prob_f＝prob(rank_f)

Wherein, rank_f is the popularity rank of file f in the whole distract.

3. the copy laying method based on popularity comprises the steps:

A) obtain file f popularity information in this area;

B) the access probability pro_f of calculation document f;

D) count L through the duplicate rating of computes file f:

L = \{\begin{matrix} M - (prob / \max Prob) \times M & prob \leq \max Prob \\ 0 & prob > \max Prob \end{matrix}

Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network; Prob is the access probability value of file;

E) obtain according to the distributed hashtable routing algorithm, finding the host node of current file after the above-mentioned L value;

4. the copy laying method based on popularity as claimed in claim 3 is characterized in that, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of said file f calculates through following formula:

prob_f＝prob(rank_f)

Wherein, rank_f is the popularity rank of file f in the whole distract.