CN101645919B - Popularity-based duplicate rating calculation method and duplicate placement method - Google Patents

Popularity-based duplicate rating calculation method and duplicate placement method Download PDF

Info

Publication number
CN101645919B
CN101645919B CN200910081302A CN200910081302A CN101645919B CN 101645919 B CN101645919 B CN 101645919B CN 200910081302 A CN200910081302 A CN 200910081302A CN 200910081302 A CN200910081302 A CN 200910081302A CN 101645919 B CN101645919 B CN 101645919B
Authority
CN
China
Prior art keywords
file
prob
popularity
access probability
duplicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200910081302A
Other languages
Chinese (zh)
Other versions
CN101645919A (en
Inventor
王劲林
尤佳莉
齐向东
邓浩江
王玲芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Xinrand Network Technology Co ltd
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN200910081302A priority Critical patent/CN101645919B/en
Publication of CN101645919A publication Critical patent/CN101645919A/en
Application granted granted Critical
Publication of CN101645919B publication Critical patent/CN101645919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a popularity-based duplicate rating calculation method and a duplicate placement method which are applied in a distributed network. The duplicate rating calculation method comprises the steps of: obtaining popularity information of a file f in the local region; calculating the access probability of the file; obtaining access probability values of the files with highest popularity in the distributed files of the current region; and calculating the duplicate rating number L of the file. The duplication placement method is based on the result of the rating calculation, and a main node of the current file is found according to the DHT routing algorithm; the file, the corresponding duplicate rating and other related information are downloaded to local; and all the nodes of L site which are matched with an ID of the main node are found through a routing table of the main node, and the file is duplicated on the nodes. The methods obtain the duplicate rating number and the corresponding placement position of the file in a structural PSP network through the analysis and the calculation of the popularity of the data file and reasonably and effectively place data in the network, thereby reducing user access delay and improving the performances of a system.

Description

A kind of duplicate rating computational methods and copy laying method thereof based on popularity
Technical field
The present invention relates to the information network technique field, particularly a kind of duplicate rating computational methods and copy laying method that is applied in the distributed network based on popularity.
Background technology
In distributed system, the center service node is not set stores all contents, its data all are dispersed on each node in the network.Therefore, in many application, can a plurality of copies of data be stored on the different nodes, make the user, from obtaining required information near user's place with the fastest speed.This method can significantly reduce data transfer delay, solves network congestion, improves response speed and service quality, in the application that has source server, can reduce its load pressure, reduces the network operation cost, makes Virtual network operator and user reach " doulbe-sides' victory ".
For example, the structurized P2P network of wherein a kind of form of above-mentioned distributed system be meant all nodes through certain rule, mode of fixed topology organizes.Each node all has fixing node degree, the expression neighbours' that get in touch number, and each neighbour is selected meticulously, but can guarantee internodal route and can search.The distinguishing feature of structured P 2 P network is a self-organization, under the situation of dynamics such as node adding, inefficacy, can effectively keep topological structure, guarantees the routing performance of network.This organizational form has obtained using widely in distributed network system (DNS).
Fig. 2 is that the copy in the structured P 2 P network is placed sketch map.As shown in Figure 2, in structured P 2 P network, each node all obtains a unique ID value through certain hash algorithm; Equally, all application entities are (like object, object); Also through this hash algorithm obtain one with node Id in the objectID of same numerical space value; A unique placement node is equally also arranged, and this placement node just is called the host node of this object, is called home node.To the location of searching promptly of object to its home node, this process be initiate node through inquiry neighbor node as intermediary and hop-by-hop carries out, till finding target or searching failure.
Structured P 2 P network becomes distributed hashtable (being called for short DHT) again, and above resource position fixing process is a common method.In practical application, different topological structures are suggested, also corresponding different DHT algorithm.Commonly used have CAN, Chord, Kademlia, Pastry, a Tapestry etc.In the DHT network, the ID space of supposing cryptographic hash is the numerical value of M bit position, and from left to right, all couplings are called a grade, i.e. a level more than or equal to the ID of l bit value.If its level of object is l, expression route l jumps and just can find this object.Therefore, need on all nodes outside the l jumping covers, back up.Such as, be Routing Protocol with Pastry, the level of object be l then corresponding the node region of a wedge shape; All nodes of representing this zone need back up object, if node adds up to N in the network, radix is b; Then need N/bl node to back up; Therefore, how calculating the pairing level value of each object, then is the key that copy is placed.
The content distributing network of other a kind of form of distributed system; It is Content Distribution/Delivery Network (being called for short CDN); Be to carry out the typical application that many copies are placed to file; Main through in existing Internet, increasing the new network architecture of one deck, utilize technology such as distributed caching/duplicate, load balancing, traffic engineering and client are redirected, with content release near user's network edge.Along with the development of multimedia application, the CDN technology usually is used for the transmission of media content, has improved user experience effectively, has received increasing concern.
It is through under the condition of certain delay or bandwidth occupancy mostly that existing duplicate rating calculates, and calculates the duplicate rating of All Files in the network through the method for global optimization.Although the result that this method obtains can make systematic function optimum, need calculate through optimization algorithm global information, be extremely consuming time and take the thing of resource.In addition, when new file distributing is in system, need carry out optimization computation again to the duplicate rating of All Files, this can't satisfy actual engine request.
Summary of the invention
The present invention provides a kind of duplicate rating computational methods and copy laying method based on popularity that is applied in the distributed network.The present invention is through the analytical calculation to data document flow row degree (such as by the difference of user's program request, download time); Obtain duplicate rating number and the corresponding placement position of file in structured P 2 P network; The data rational and effective is positioned in the network; Reduce user capture and postpone, improve systematic function.
For achieving the above object, a kind of duplicate rating computational methods based on popularity of the present invention is characterized in that, comprise the steps:
A) obtain file f popularity information in this area;
B) the access probability prob_f of calculation document f;
C) obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network.
In addition, in the duplicate rating computational methods based on popularity of the invention described above, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of file f calculates through following formula:
prob_f=prob(rank_f)
Wherein, rank_f is the popularity rank of file f in the whole distract.
In addition, a kind of copy laying method based on popularity provided by the invention need comprise the steps: based on the duplicate rating computational methods based on popularity of the invention described above
A) obtain file f popularity information in this area;
B) the access probability prob_f of calculation document f;
C) obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network;
E) obtain according to the DHT routing algorithm, finding the host node of current file after the above-mentioned L value;
F) host node clump data source downloads to this locality with relevant informations such as file and corresponding duplicate rating parameter L;
G) after host node obtained data, the duplicate rating of the required placement that views file through the routing table of host node, found all nodes that mate the L position with the ID of host node, and file copy is accomplished the copy placement to these nodes.
Of the present inventionly be applied to being in the distributed network: can be according to the file popularity information based on the duplicate rating computational methods of popularity and the beneficial effect of copy laying method thereof; Convenient, the duplicate rating number of calculated data copy required storage in distributed network effectively; And in distributed network, place; In different application,, different service ability is provided to different files.This method is calculated simple, and resources occupation rate is low, in actual engineering, has great application prospect.
Description of drawings
Fig. 1 is the sketch map that the node of content distributing network constitutes example.
Fig. 2 is that the copy in the structured P 2 P network is placed sketch map.
Fig. 3 is the flow chart of the duplicate rating computational methods based on popularity of the present invention.
Fig. 4 is the flow chart of the copy laying method based on popularity of the present invention.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment, duplicate rating computational methods and the copy laying method based on popularity of the present invention done further to set forth.
Fig. 1 is the sketch map that the node of content distributing network constitutes example.Here, suppose that a content distributing network is a two-layer structure, comprises management level and copy placed layer.Management level mainly are responsible for the maintenance and the content of All Files index and are distributed needed computational process; The copy placed layer then is responsible for the backup to the data copy.Here, the management level of CDN are made up of the server of mutual full-mesh, and the copy placed layer is organized all nodes through the Pastry Routing Protocol, and simultaneously, each node connects a management node in the management level at least.The information such as popularity of management node storage All Files, and mainly media file is distributed.
The popularity of file is meant that a file receives user's welcome degree, usually representes the access frequency of this document with the user.The present invention is according to the difference (such as by the difference of user's program request, download time) of each file popularity; The duplicate rating number of calculation document in structured P 2 P network; The data rational and effective is positioned in the network, promotes the service performance of related application, improve user experience.
Fig. 3 is the flow chart of the duplicate rating computational methods based on popularity of the present invention.A kind of duplicate rating computational methods based on popularity of the present invention is characterized in that, comprise the steps:
A) obtain file f popularity information in this area:
The grade of a file (level) number is low more, and required jumping figure is few more, and expression is found its copy, the user capture of being more convenient for more easily.Therefore, the required level number of each file is relevant with its popularity (popularity).The popularity of supposing file f is pop, when distributing, at first need obtain the rank rank_f (populairity from high to low) of this document popularity in the whole distract.
B) the access probability prob_f of calculation document f:
Suppose that existing file access probability function is prob (x), wherein, x is the ordering of file popularity in the whole distract, then the access probability prob_f=prob (rank_f) of file f.Here, the probability function that file is visited can be estimated to obtain to it according to history access record, and many pertinent literatures repeat no more to this existing introduction here.
C) in different application, node has different method for organizing, through the mode of management server stores information or the regular acquisition of information of certain node; Can obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network.
In addition, Fig. 2 is that the copy in the structured P 2 P network is placed sketch map, and wherein, what fill " * " is host node, fills "/" and places node for copy.Fig. 4 is the flow chart of the copy laying method based on popularity of the present invention.A kind of copy laying method based on popularity of the present invention is based on the copy laying method based on the duplicate rating result of calculation of popularity of the invention described above, comprises the steps:
A) obtain file f popularity information in this area;
B) the access probability prob_f of calculation document f;
C) obtain in the distribution of document of current regional institute; The N that popularity is the highest is individual, and promptly the access probability value of Top-N file (wherein, can rule of thumb set by the value of N; The normal numerical value of selecting within the 1-100), the maximum probable value maxProb of order equals the mean value of Top-N probability;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network;
E) obtain according to the DHT routing algorithm, finding the host node of current file after the above-mentioned L value;
F) host node downloads to this locality with relevant informations such as file and corresponding duplicate rating parameter L from data source or server;
G) after host node obtained data, the duplicate rating of the required placement that views file through the routing table of host node, found all nodes that mate the L position with the ID of host node, and file copy is accomplished the copy placement to these nodes.
In this step g); The node of host node Home node in routing table sends query requests, seeks the node with the own L of coupling position, the node of receiving query requests equally in the routing table of self node send query requests; Till reaching the TTL step; All qualified nodal informations all return to home node, form the replica node set, on each node in then file copy being gathered to replica node.
Embodiment
As an example this method is described with content distributing network (be Content Distribution/Delivery Network, be called for short CDN).Fig. 1 is the sketch map that the node of content distributing network constitutes example.Here, suppose that a content distributing network is a two-layer structure, comprises management level and copy placed layer.Management level mainly are responsible for the maintenance and the content of All Files index and are distributed needed computational process; The copy placed layer then is responsible for the backup to the data copy.Here, the management level of CDN are made up of the server of mutual full-mesh, and the copy placed layer is organized all nodes through the Pastry Routing Protocol, and simultaneously, each node connects a management node in the management level at least.The information such as popularity of management node storage All Files, and mainly media file is distributed.
Suppose that node ID is 128 in the copy placed layer; Totally 1000 files will be distributed in the network, and the popularity of file f is rank 50 in All Files, and file access probability function
Figure GDA0000116960220000061
x=1; 2; ..., N, wherein x representes rank.Can learn from the Content Management node, the probable value of 5 files that access probability is the highest be 1,0.60,0.45,0.36,0.31}, therefore, maxProb=(1+0.60+0.45+0.36+0.31)/5=0.55.In addition, the prob=0.06 of file f, then L=128-(prob/maxProb) * 128=114.
Fig. 2 is that the copy in the structured P 2 P network is placed sketch map, and wherein, what fill " * " is host node, fills "/" and places node for copy.The duplicate rating of tentation data is L 1, then with host node coupling L 1All need place the copy of data on the node of position.Therefore, after obtaining the L value of file f, find node hn_f nearest on the ID space through the Pastry routing algorithm, as the host node of f with the ID of file f.Hn_f downloads to this locality with file and duplicate rating relevant information from data source, and simultaneously, the routing table through hn_f finds the node of 114 of the ID couplings of all and hn_f, and data are placed on these nodes.
Explain that the other guide in the document is directed against the those of ordinary skill in this professional domain, all can carry out technology and realize, repeat no more here.

Claims (4)

1. the duplicate rating computational methods based on popularity is characterized in that, comprise the steps:
A) obtain file f popularity information in this area;
B) the access probability prob_f of calculation document f;
C) obtain in the distribution of document of current regional institute, the access probability value of N the file that popularity is the highest, the maximum probable value maxProb of order equals the mean value of the access probability of the highest N of a popularity file;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, i.e. total number of the bit bit value in the ID space of the cryptographic hash in the distributed hashtable network; Prob is the access probability value of file.
2. the duplicate rating computational methods based on popularity as claimed in claim 1 is characterized in that, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of said file f calculates through following formula:
prob_f=prob(rank_f)
Wherein, rank_f is the popularity rank of file f in the whole distract.
3. the copy laying method based on popularity comprises the steps:
A) obtain file f popularity information in this area;
B) the access probability pro_f of calculation document f;
C) obtain in the distribution of document of current regional institute, the access probability value of N the file that popularity is the highest, the maximum probable value maxProb of order equals the mean value of the access probability of the highest N of a popularity file;
D) count L through the duplicate rating of computes file f:
L = M - ( prob / max Prob ) × M prob ≤ max Prob 0 prob > max Prob
Wherein, M is total number of degrees, equals total number of bit bit value in the ID space of the cryptographic hash in the distributed hashtable network; Prob is the access probability value of file;
E) obtain according to the distributed hashtable routing algorithm, finding the host node of current file after the above-mentioned L value;
F) host node downloads to this locality with relevant informations such as file and corresponding duplicate rating parameter L from data source or server;
G) after host node obtained data, the duplicate rating of the required placement that views file through the routing table of host node, found all nodes that mate the L position with the ID of host node, and file copy is accomplished the copy placement to these nodes.
4. the copy laying method based on popularity as claimed in claim 3 is characterized in that, in the said step b), if hypothesis prob (x) is existing file access probability function, then the access probability of said file f calculates through following formula:
prob_f=prob(rank_f)
Wherein, rank_f is the popularity rank of file f in the whole distract.
CN200910081302A 2009-04-01 2009-04-01 Popularity-based duplicate rating calculation method and duplicate placement method Active CN101645919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910081302A CN101645919B (en) 2009-04-01 2009-04-01 Popularity-based duplicate rating calculation method and duplicate placement method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910081302A CN101645919B (en) 2009-04-01 2009-04-01 Popularity-based duplicate rating calculation method and duplicate placement method

Publications (2)

Publication Number Publication Date
CN101645919A CN101645919A (en) 2010-02-10
CN101645919B true CN101645919B (en) 2012-10-17

Family

ID=41657639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910081302A Active CN101645919B (en) 2009-04-01 2009-04-01 Popularity-based duplicate rating calculation method and duplicate placement method

Country Status (1)

Country Link
CN (1) CN101645919B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075563B (en) * 2010-12-21 2013-03-13 武汉大学 Duplicate copy method for unstructured peer-to-peer (P2P) network
CN102868542B (en) * 2011-07-04 2018-02-16 中兴通讯股份有限公司 The control method and system of service quality in a kind of service delivery network
CN102984188B (en) * 2011-09-06 2015-06-17 中国科学院声学研究所 Content replica placement method and content replica placement system used in content delivery network (CDN)
CN102497394B (en) * 2011-11-28 2014-01-15 中国科学院研究生院 Duplicate file placement method in content distribution network based on optimized model
CN103458315B (en) * 2013-08-29 2016-05-11 北京大学深圳研究生院 A kind of P2P Streaming Media clone method based on popularity
CN104202407B (en) * 2014-09-10 2018-04-13 北京奇艺世纪科技有限公司 A kind of video file synchronous method and device
CN104853384B (en) * 2015-05-14 2018-08-24 南京邮电大学 A kind of content buffering method based on popularity in 5th Generation Mobile Communication System
CN106161170B (en) * 2016-07-12 2019-08-02 广东工业大学 A kind of asynchronous file selection and Replica placement method that interval executes
CN106934050A (en) * 2017-03-16 2017-07-07 郑州云海信息技术有限公司 The determination method and device of file copy amount in a kind of distributed memory system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996996A (en) * 2006-12-19 2007-07-11 北京邮电大学 The method for stream media file buffer for the mobile stream media proxy server

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996996A (en) * 2006-12-19 2007-07-11 北京邮电大学 The method for stream media file buffer for the mobile stream media proxy server

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋晓华
宋晓华;黄河清;曹元大.基于用户访问统计特性的流媒体文件复制策略.《南京理工大学学报》.2007,第31卷(第5期),第617-621页. *
曹元大.基于用户访问统计特性的流媒体文件复制策略.《南京理工大学学报》.2007,第31卷(第5期),第617-621页.
黄河清

Also Published As

Publication number Publication date
CN101645919A (en) 2010-02-10

Similar Documents

Publication Publication Date Title
CN101645919B (en) Popularity-based duplicate rating calculation method and duplicate placement method
Zhu et al. Efficient, proximity-aware load balancing for DHT-based P2P systems
CN101645888B (en) Data distribution method based on access frequency variable-length logic section
CN104717304B (en) A kind of CDN P2P content optimizations select system
CN111046065B (en) Extensible high-performance distributed query processing method and device
CN110866046B (en) Extensible distributed query method and device
CN102368776A (en) Optimization function module of node list in content distribution/delivery network (CDN)
Guan et al. Push or pull? toward optimal content delivery using cloud storage
CN105306247A (en) Method and apparatus for deploying a minimal-cost CCN topology
Meng A churn-aware durable data storage scheme in hybrid P2P networks
Inoue et al. Efficient content replication strategy for data sharing considering storage capacity restriction in hybrid Peer-to-Peer networks
CN110990448A (en) Distributed query method and device supporting fault tolerance
JP4533923B2 (en) Super-peer with load balancing function in hierarchical peer-to-peer system and method of operating the super-peer
Kang Survey of search and optimization of P2P networks
Rahmani et al. A comparative study of replication schemes for structured P2P networks
Cao et al. Cost-effective replication schemes for query load balancing in DHT-based peer-to-peer file searches
Renda et al. The robustness of content-based search in hierarchical peer to peer networks
Guomin et al. A distributed multimedia CDN model with P2P architecture
CN101645920B (en) Duplicate rating attenuation method based on time parameter
Sacha et al. A service-oriented peer-to-peer architecture for a digital ecosystem
Su et al. Consistency control to manage dynamic contents over vehicular communication networks
Rathore et al. Adaptive searching and replication of images in mobile hierarchical peer-to-peer networks
Li et al. Semantic overlay network for grid resource discovery
Meroufel et al. Availability management in data grid
JP5419909B2 (en) Cache design apparatus and cache design method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210802

Address after: Room 1601, 16th floor, East Tower, Ximei building, No. 6, Changchun Road, high tech Industrial Development Zone, Zhengzhou, Henan 450001

Patentee after: Zhengzhou xinrand Network Technology Co.,Ltd.

Address before: 100190 Institute of acoustics, Chinese Academy of Sciences, No. 21 West Fourth Ring Road, Haidian District, Beijing

Patentee before: INSTITUTE OF ACOUSTICS, CHINESE ACADEMY OF SCIENCES

TR01 Transfer of patent right