CN104253855B - Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network - Google Patents

Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network Download PDF

Info

Publication number
CN104253855B
CN104253855B CN201410384637.5A CN201410384637A CN104253855B CN 104253855 B CN104253855 B CN 104253855B CN 201410384637 A CN201410384637 A CN 201410384637A CN 104253855 B CN104253855 B CN 104253855B
Authority
CN
China
Prior art keywords
content
popularity
cache
node
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410384637.5A
Other languages
Chinese (zh)
Other versions
CN104253855A (en
Inventor
张国印
邢志静
武俊鹏
夏松竹
李庆显
唐滨
徐林枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201410384637.5A priority Critical patent/CN104253855B/en
Publication of CN104253855A publication Critical patent/CN104253855A/en
Application granted granted Critical
Publication of CN104253855B publication Critical patent/CN104253855B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to the classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network.The present invention includes:Whether the first remaining spatial cache of decision node can accommodate new data content;If enough spatial caches cache new data content;The popularity of all the elements classification in criterion calculation node is calculated according to exponentially weighted moving average (EWMA), selects the content type of popularity minimum;The minimum content item of number will be requested in the content type of popularity minimum in time predefined and removes nodal cache;Extraction new data content name character string feature is simultaneously classified;Newly arrived data content item is stored in node in corresponding content type, more new category Thermometer and daily record.The present invention can preferably be managed the caching of CCN interior joints by content name classification, network is started with communication process from content name and content is searched and is replaced, and the diversity of content in balance nodes caching, improves the efficiency that caching is replaced.

Description

Content classification-based category popularity cache replacement method in content-oriented center network
Technical Field
The invention relates to a content-center-network-oriented category popularity cache replacement method based on content classification.
Background
With the continuous development of the internet, people have an increasing demand for content in the network. The current network architecture based on TCP/IP has increasingly highlighted problems in network control, resource allocation, etc., and the center of gravity of the internet has shifted from communication between hosts to how to quickly obtain requested content from the internet. For this reason, scholars at home and abroad proceed to research a new next generation network system architecture, develop researches on a plurality of related project topics, promote the development of the next generation network, and have an epoch-crossing significance. The invention mainly researches a novel future network system architecture, namely a content-centric network. The CCN abandons the communication mode taking the host address as the core in the traditional network, changes the communication mode into the network idea taking the named content as the center, and simultaneously constructs a new architecture and a communication mechanism to adapt to the development of the future network. The literature, "research and analysis on CCN research progress of content-centric networking" reviews the relevant research of CCN, introduces the working mechanism of CCN, investigates the current research hotspot problems and challenges of CCN, analyzes the main comparative advantages and existing problems of CCN, and finally verifies the working mode of CCN through an experimental test bed.
Cache replacement policy is a key part of CCN research and is related to the overall performance of the network. The cache replacement strategy frequently used in the CCN has a least recently used strategy and a least frequently used strategy and their improvement strategies. The LRU cache replacement strategy mentioned in the document Modeling data transfer in content-centralized networking has the characteristics of simple algorithm, easy implementation, convenient deployment and the like, but the characteristics of CCN dynamic are not fully considered, so that the LRU cache replacement strategy has great inadaptability.
The invention provides a category popularity cache replacement strategy based on content name classification, which provides a method for combining all-gram and R-value to extract and classify the characteristics of content name character strings according to the characteristics of naming mode and name uniqueness of content in CCN, so that the cache in each node is managed by taking category as a unit. And the popularity of each category in each node is calculated by adopting an exponential weighted moving average idea, and different weights are given to the number of times that each category is visited in a specified time through the time distance to reflect the real-time popularity condition. In the cache replacement process, the content item in the content category with the lowest popularity in the node is replaced preferentially, and then the new content is stored in the corresponding category to which the node belongs according to the classification judgment method.
Disclosure of Invention
The invention aims to provide a method for replacing the category popularity cache based on content classification in a content-centric network, which realizes cache replacement by means of content classification and dynamic popularity calculation, can fully consider the recent dynamic characteristic of network content, improve the distribution efficiency of the network content and reduce the waste of the node-limited network cache.
The purpose of the invention is realized by the following steps:
(1) When new data content arrives, judging whether the residual cache space of the node can contain the new data content; if the cache space is enough to cache the new data content, directly executing the step (4); if the cache space does not have enough cache data content, executing the step (2) to perform cache replacement;
(2) Calculating the popularity of all content categories in the node according to an exponential weighted moving average calculation standard, and selecting the content category with the minimum popularity;
(3) Removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;
(4) Extracting and classifying the character string characteristics of the new data content name;
(5) And storing the newly arrived data content items into corresponding content categories in the nodes, and updating the category heat table and the log.
In the step (1), before judging whether the remaining cache space of the node can accommodate the new data content, the CS table of the node is checked to see whether the new data content is cached in the cache.
Extracting the character string characteristics of the new data content name according to a method of combining all-gram and R-value and classifying the content: the n-gram model intercepts a series of substrings by utilizing a sliding window with the length of n, the sliding window slides one length unit each time, and the content name sequence processed by the n-gram model is divided into continuous substrings with the lengths of n.
The invention has the following beneficial effects:
the invention provides a cache replacement algorithm of category popularity based on content classification, which effectively avoids processing all contents independently when calculating the popularity and only needs to calculate the popularity of each content category. Thus, when cache replacement is required to be performed, a certain content item in the category with the lowest popularity in the node is replaced out of the cache, and then the newly arrived content data is classified into the existing category in the node cache according to the name, so that the cache replacement process is completed. Unlike the conventional LRU replacement method, the category popularity cache replacement method based on content classification can allow content with high popularity to be stored in a network node for a relatively long time in consideration of the content category popularity. Unlike the conventional LRU method that selects the least recently used content block for replacement, the method of the present invention selects the content with the lowest category popularity for replacement in steps 2 and 3. And, the idea of classifying the content according to the method of combining the all-gram and the R-value is provided in the step 4, the cache of the node in the CCN can be better managed according to the classification of the content name, so that the network starts from the content name to search and replace the content in the communication process, the diversity of the content in the node cache is balanced, and the cache replacement efficiency is improved. Simulation experiment results show that the category popularity cache replacement strategy based on content name classification provided by the invention has certain advantages in performance compared with other classical replacement strategies.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a schematic diagram of the network topology of the present invention;
FIG. 3 is a table of experimental simulation parameters for the present invention;
FIG. 4 is a schematic diagram of an example of the computational popularity of the present invention;
FIG. 5 is a schematic diagram of an example n-gram of the present invention;
FIG. 6 is a flow chart of the all-gram computed r value extraction feature combination method of the present invention;
FIG. 7 is a schematic diagram of the average cache hit rate under different node cache spaces according to the present invention;
FIG. 8 is a graph illustrating average cache hit rates for different numbers of stub domains according to the present invention;
FIG. 9 is a graph illustrating the recovery capability of the cache hit rate of the present invention;
FIG. 10 is a schematic diagram of the average load of servers under different sizes of node caches according to the present invention;
FIG. 11 is a schematic diagram illustrating the average load of servers in different numbers of root domains according to the present invention;
FIG. 12 is a graph illustrating the effect of sample time select size on cache hit rate and server load in accordance with the present invention.
Detailed Description
The invention is described in more detail below by way of example with reference to the accompanying drawings.
1. A content classification-based category popularity cache replacement method for a content-centric network is characterized in that:
step 1: when new data content arrives, firstly judging whether the residual cache space of the node can contain the new content; if the cache space is enough to cache the new data, directly entering the step 4; if not enough buffer space is available for buffering data, step 2 is entered for performing buffer replacement to buffer new data.
Step 2: calculating the popularity of all content categories in the nodes according to an Exponential Weighted Moving Average (EWMA), and selecting the content category with the minimum popularity;
and step 3: removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;
and 4, step 4: extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;
and 5: and storing the newly arrived content items into the corresponding content categories in the nodes, and updating the category heat table and the log.
In step 1, before judging whether there is enough cache space to cache new data, the CS table of the node is checked to see whether the data is cached in the cache. The CS table stores all contents that pass through the node and are not cached by the node.
And step 2, extracting the character string characteristics of the new content name, classifying the content, and calculating the popularity of the content category.
Extracting the character string characteristics of the new content name according to a method of combining all-gram and R-value and classifying the content: the n-gram model n-gram is a sliding window with the length of n to intercept a series of substrings, and the sliding window slides by one length unit each time. When a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n.
In the classification process, the accuracy of classification is often greatly related to the selection of the n value, and the n-gram algorithm has no fixed method for selecting the n value, and sometimes the final value is selected after trying according to human experience. If the value of n is too small, the structure and the sequence of the character strings may be ignored, and if the value of n is too large, the similarity between the character strings may be reduced, resulting in an erroneous classification result. Thus, the present invention proposes an all-gram concept. Instead of using a fixed n-value to divide the name string, a series of n-values are used to divide the name string, so that n-gram substrings with different lengths are generated, and generally the substrings must include important features and keywords contained in the original string. Therefore, the feature vector space formed finally through the all-gram thought segmentation can be used for efficiently and quickly classifying the training samples through learning, and the classification accuracy is improved.
The invention adopts the R-value characteristic selection method, which can judge the characteristics of the characteristics according to the calculated R value, rank the characteristics and select the characteristic set which is easier to classify, thereby providing an ideal standard for classification. In this method an r factor is used to balance the word frequency. As shown in the following equation:
where t is a feature, C is an object classification,is a non-target classification. r is an adjustable factor and has a value ranging from 0 to 1. P (t | C) is the prior probability of t being in C,is t atThe calculation method of the prior probability in (1) is shown by the following two formulas:
wherein | C t I andare respectively C andthe number of documents that appear at t. | C | andare respectively C andthe number of documents in (1).
The value of the factor r is adjustable between 0 and 1, when the value of r is smaller, the calculated characteristic t has the characteristic of low frequency but high discrimination, and when the value of r is larger, the calculated characteristic t has the characteristic of high frequency but low discrimination.
The content name in the CCN is subjected to all-gram and r value calculation combined to obtain characteristics, so that the purpose of classifying the content in the cache is achieved, and the specific flow is shown in FIG. 1.
The popularity of all content categories in the standard compute node is calculated from an Exponentially Weighted Moving Average (EWMA): the computation of content category popularity uses an Exponentially Weighted Moving Average (EWMA) as a basis for the measurement. Moving average is an important principle method in statistics, called average line for short. The term "move" means that the data object to be calculated changes during the calculation process, and the data is to be changed with time. Moving average is a method of analyzing data over a time series.
In the CCN, due to the dynamic characteristics of the network, the popularity of the cached content in the node varies greatly with time, so that when calculating the popularity of the content class, it is only possible to calculate the popularity value of the content class within a certain period of time, and the more recent the popularity of the content class is reflected, which can create a trace log for each content class by using the idea of exponentially weighted moving average to record the number of times of requesting within a predefined period of time. The time is subdivided into small time periods, the value of the number of times of requested access in the time period closest to the current time period is given a higher weight, and the value of the number of times of requested access in the time period farther away is given a lower weight. As can be seen from the formula of the exponentially weighted moving average, such EWMA values determine the popularity of the content class to some extent, which is used as a criterion for the calculation herein, the calculation formula is as follows:
wherein C is i [j]Is the number of times category i has been requested within the jth time period. In the formula, t is a positive integer and represents the total sampling time. α represents a weight, defined herein as 2/(t + 1).
And 3, when the new data reaches the node and needs to be cached, if the remaining cache space does not have enough space for accommodating the new data, replacing the original data in the CS table.
The invention classifies the cache contents in the CCN node, calculates the popularity of each content classification, removes the node cache of the content item with the least number of requests within the predefined time in the content category with the minimum popularity, and vacates enough cache space to accommodate new data. When the old data cache new data is removed, the content with the minimum number of requested times in the content classification with the minimum popularity is selected, and the popularity of the content in the network is considered, so that the hot content can reside in the node cache for a long time, the cache hit rate is improved, and the network performance is improved.
And step 4, comprising an all-gram model and an R-value characteristic selection method.
The method combines all-gram and r value calculation to obtain the characteristics of the content names in the CCN, thereby achieving the purpose of classifying the content in the cache. The cache of the nodes in the CCN can be better managed according to the classification of the content names, so that the network searches and replaces the content from the content names in the communication process, the diversity of the content in the node cache is balanced, and the cache replacement efficiency is improved.
Step 5 includes a category heat table.
The category hotlist is used for recording the hit times and popularity value of each content category. When the requested content is in the cache node, it is considered as a request hit, and the number of times the content is hit in the category is increased in the category hotlist. When the content in the node is requested and hit, the category heat table is updated, the popularity of the content category in any time period can be calculated, the characteristic of the change of the popularity of the content in the network at any time is reflected, and the dynamic characteristic of the network is adapted.
The invention adopts an NDnSIM network simulator based on NS-3 to simulate the CCN. The performance of the category popularity cache replacement policy proposed herein based on content classification in the CCN was evaluated by simulation and compared to representative cache replacement policies LRU, LRU-K, LFU, and LFU-Aging. GT-ITM is used to generate a Transit-Stub network topology as shown in FIG. 2. The topological network in the figure comprises a plurality of stub networks. Because the stub network only processes the communication of the source and the target in the sub-network, only a part of hosts communicate with the outside, and only one boundary router is arranged, each stub network is equivalent to an interest group, and when the stub network requests a data content which is interested by the outside, the data content can be transmitted in the stub domain of the stub network, so that the popularity of the data content is changed.
Because of the limit of the simulation environment, 10 content categories are configured during simulation, each content category comprises 50 content items, the time interval of communication between each stub network and the outside is 30 seconds, namely, after 30 seconds of obtaining the content interested by the stub network, the next content request can be carried out, thereby being beneficial to simulating the dynamic of the network and changing the popularity of the content all the time. The popularity of the content class is calculated once by taking 7 seconds as a unit by default, namely, the time calculation sample is 7 seconds, in the time sample, the number of times of accessing the content class per second is recorded, and a certain weight is given to the number of times of accessing according to the distance of time to calculate the popularity of the content class, and simulation parameters are shown in fig. 3.
Each node in the CCN has caching capacity, and in the test simulation, the node caching size is defined according to the relative size of the node and the total content of the network. The node cache size is typically defined to be between 10% and 30% of the total content in the network. For example, when a node cache size of 10% is defined and there are a total of 1000 content items in the network, then a maximum of 100 content items can be cached by each node. Of course, in a real network, the caching capacity of the nodes is very limited relative to the total amount of network content. Since the simulated network size is small, the node cache size is expressed in a proportional form. The invention includes:
step 1: when new data content arrives, firstly judging whether the residual cache space of the node can contain the new content; if the cache space is enough to cache the new data, directly entering the step 4; if not enough buffer space is available for buffering data, step 2 is entered for performing buffer replacement to buffer new data.
Step 2: calculating the popularity of all content categories in the nodes according to an Exponential Weighted Moving Average (EWMA), and selecting the content category with the minimum popularity;
and 3, step 3: removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;
and 4, step 4: extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;
and 5: and storing the newly arrived content items into the corresponding content categories in the nodes, and updating the category heat table and the log.
The node i establishes a category hotlist for recording the hit times and popularity values of each content category.
The node i is any cache node in the content-centric network.
A specific embodiment of the present invention will be described in detail with reference to fig. 1. The invention relates to a content-center-network-oriented category popularity cache replacement method based on content classification, which comprises the following steps:
step 1: when new data content arrives, firstly judging whether the residual cache space of the node can contain the new content; if the cache space is enough to cache the new data, directly entering the step 4; if not enough buffer space is available for buffering data, step 2 is entered for performing buffer replacement to buffer new data.
Step 2: calculating the popularity of all content categories in the nodes according to an Exponential Weighted Moving Average (EWMA), and selecting the content category with the minimum popularity;
the process of calculating the popularity of a content category using an exponentially weighted moving average is described below with an example. As shown in fig. 4, it is assumed that a certain cache node is divided into 10 categories by contents, and the predefined time is divided into 7 small time periods, as shown in fig. 4, the numerical value below each time period in the figure indicates the number of times the category of the contents is requested in the time period. In this example it can be seen that the total number of times that content in the first, ninth and tenth categories is requested is the same within a predefined time period, whereas the number of times that content in each small time period is requested varies greatly, in particular the number of times that content in the ninth and tenth categories is requested forms a clear contrast within a single time period. If only the average value of the number of times that each content category is requested within a predefined time is considered, the popularity of each content category in the time is considered to be the same, but it is obviously not logical, because in the CCN, the popularity may change at any time, the estimation calculation of averaging within a period of time is definitely not accurate, the time should be divided, and the more detailed the division is, the better the calculated popularity of the content category can well reflect the real network condition. The popularity of the content categories is dynamically calculated by using an exponential weighted moving average method, by giving a higher weight to the value of the number of times each content category is requested in the latest time period, the other numerical values are sequentially decreased according to the weights given by the distance of the time period, the calculated result is shown in the right part of fig. 4, and the calculated popularity values are ranked from high to low. By means of calculation in the mode, new data can be effectively stored in the cache when arriving at the node, if the cache residual space of the node is not enough to accommodate the new data when arriving at the node, the cache replacement process is carried out, and the cache replacement process can be more efficient through dynamic calculation and ranking of popularity of content categories.
And step 3: caching the content item which is requested the least times within the predefined time in the content category with the minimum popularity;
and 4, step 4: extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;
the N-gram model is substantially an N-1 order Markov model, the N-gram being a LilyA sliding window of length n is used to intercept a series of substrings, the sliding window sliding one length unit at a time. When a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n. The model is that the probability of the whole sentence is the product of the appearance probabilities of all words, assuming that in a sentence with a certain length composed of a plurality of words, the appearance of the nth word is only related to the first n-1 words and is not related to any other words. The mathematical model can be expressed as: assuming that a sentence consists of m words, W = W is defined 1 ,w 2 ,w 3 ,...,w m Then the word w is considered i (1. Ltoreq. I. Ltoreq.m) only occurs with the whole preamble w 1 w 2 w 3 ...w i-1 On the other hand, the probability of the sentence W is: p (W) = p (W) 1 ,w 2 ,w 3 ,...,w m )
=p(w 1 )p(w 2 |w 1 )p(w 3 |w 1 2 )...p(w m |w 1 m-1 )
W in the formula 1 m-1 Denotes w 1 ,w 2 ,w 3 ,...,w m-1 ,p(w m |w 1 m-1 ) Indicating information w in a given preamble 1 ,w 2 ,w 3 ,...,w m-1 In case of (2), root word w m The probability of occurrence. The probability can be calculated by the number of times n words appear in the corpus simultaneously. However, in practice, the value of m will often be very large, resulting in p (w) m |w 1 m-1 ) Is very complex and requires more memory space. To overcome this problem, it may further be assumed that the current root appears to depend only on the first n-1 roots. Then the following equation can be obtained:
p(W)=p(w 1 )p(w 2 |w 1 )p(w 2 |w 1 2 )...p(w n |w 1 n-1 )...p(w m |w m-n+1 m-1 )
in the above formula, w m-n+1 m-1 Denotes w m-n+1 w m-n+2 ...w m-1
The n-gram is widely applied in a support vector machine classifier, the text content is divided into text segment sequences with certain length by applying an n-gram algorithm, then filtering selection is carried out, and the segment sequences with high frequency meeting the requirements are reserved to form a feature vector table of the text content. Strings can also be viewed as text to handle classification. The invention mainly aims at English content names, assumes that all contents in CCN are named hierarchically in English form, and the relevance of letter words in English letter sequences is not large, which is very in line with the assumed conditions of n-gram model. The content name "myvideo" in sina. Com. Cn/myvideo/tiger t. Mpg/_ v < timeverinfo >/seg2 is taken as an example, as shown in fig. 5.
In the classification process, the accuracy of classification is often greatly related to the selection of the n value, and the n-gram algorithm has no fixed method for selecting the n value, and sometimes the final value is selected after trying according to human experience. If the value of n is too small, the structure and sequence of the character strings may be ignored, the words like "software" may be associated with the meaning of software by 5 sub-strings soft, ftwa, tware and ware obtained through 4-gram, but if the value of n is too large, the similarity between the character strings may be reduced, and an erroneous classification result may be caused. For example, the word "keyword" does not highlight the important features of the original word in the string formed by the 6-gram, so that the segmentation has no meaning. Thus, the present invention proposes an all-gram concept. The name character string is not divided by a fixed n value, but by a series of n values, n-gram substrings with different lengths are generated, and generally the substrings definitely cover important features and keywords contained in the original character string. Therefore, the feature vector space formed finally through the all-gram thought segmentation can be used for efficiently and quickly classifying the training samples through learning, and the classification accuracy is improved.
The method obtains the characteristics by combining all-gram and r value calculation for the content name in the CCN, thereby achieving the purpose of classifying the content in the cache. The specific flow is shown in fig. 6. Firstly training a sample set, then training the sample set by using an all-gram method to perform feature extraction to obtain a feature set S of a content name, meanwhile, calculating and ranking features in the feature set S by using a method for calculating an R value, selecting the features with the top ranking to form a feature dictionary, and finally forming a feature set S1. And (4) carrying out classification experiments according to the feature set S1, and classifying the contents cached in the CCN according to the content names.
And 5: and storing the newly arrived content items into the corresponding content categories in the nodes, and updating the category heat table and the log.
In order to verify the superiority of the class popularity cache replacement based on content classification provided by the invention on performance, the method is compared with the traditional cache replacement method replacement strategy of LRU, LRU-K, LFU and LFU-Aging through experiments.
FIG. 7 is a graph illustrating average cache hit rates for different node cache spaces. It can be seen from fig. 7 that under the conditions that the ratios of the cache capacity of the node to the total amount of the network memory are respectively set to 10%,20% and 30%, the number of the stub networks is 5, and the time sample is 7 seconds, the class popularity policy for content classification proposed by the present invention always shows better performance than other replacement policies under the condition that the set sizes of the node capacities are different. In contrast, LFU and LFU-Aging show poor hit rates.
FIG. 8 is a graph illustrating average cache hit rates for different numbers of stub domains. It can be seen from fig. 8 that most replacement strategies have a significant reduction in the performance of the average cache hit rate with an increasing number of stub domains, with the cache size of each node set to 20% of the total amount of network content. Particularly LRU-K and LFU-Aging, which perform better than LRU and LFU only when the number of stub fields is one. However, the popularity policy based on classification does not change much from beginning to end in performance, which indicates that such policy can adequately accommodate the change in the number of stub domains in the network.
The broken line graph of fig. 9 illustrates the recovery capability of three cache replacement strategies in terms of cache hit rate after a momentary interruption and recovery of the network. The red line shown in the figure shows that the network is interrupted at the 150 th second, the cache hit rates of the three strategies are all in a reduced state, and after the network is recovered, the classified popularity strategy is rapidly recovered to the state before the network is disconnected, and the cache hit rate is very high. In contrast, the other two strategies show poor performance, slow and unstable recovery, and especially LFU-Aging shows very poor cache hit rate compared to the other two strategies.
Fig. 10 and 11 are server average load cases tested under conditions of different sized node caches and different numbers of root domains, respectively. Similar to the previous simulation results regarding average cache hit rates, the LFU and LFU-Aging policies are the worst performance in reducing the average load of origin servers, as shown in fig. 10, when the number of stub domains is 5 and the node cache capacity is 10% of the total amount of network content, the classified popularity policy reduces the origin server load amount by about 39% compared to the LFU policy. When the number of stub domains is 9, the popularity policy for classification is approximately 65% of the server load of the LFU policy. Therefore, the cache replacement strategy based on the category popularity of the content classification can well reduce the load condition of the server and relieve the pressure of the network.
FIG. 12 is a graph illustrating the effect of sample time size selection on the average cache hit rate of a node and the average load of a server in relation to calculating content category popularity. It can be seen from the figure that when the time sample is selected to be 7 seconds, the average cache hit rate of the node reaches the highest value, and the average load of the server is in a very small condition. If the sample time is too large or too small, a good simulation result cannot be obtained.

Claims (2)

1. A content classification-based category popularity cache replacement method for a content-centric network is characterized in that:
(1) When new data content arrives, firstly judging whether the residual cache space of the node can contain the new data content; if the cache space is enough to cache the new data content, directly executing the step (4); if the data content is not cached in enough cache space, executing the step (2) to perform cache replacement;
(2) Calculating the popularity of all content categories in the node according to an exponential weighted moving average calculation standard, and selecting the content category with the minimum popularity; calculating the popularity of all content categories in the standard calculation node according to the exponentially weighted moving average EWMA, wherein the calculation formula is as follows as the calculation standard:
C i [j]is the number of times category i has been requested within the jth time period, α represents a weight;
extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;
the N-gram model is substantially an N-1 order Markov model, the N-gram is a sliding window with the length of N to intercept a series of substrings, and the sliding window slides one length unit each time; when a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n; the invention adopts a feature selection method of R-value, which judges the feature of the feature according to the calculated R value, ranks the feature and selects a feature set which is easier to classify, thereby providing an ideal standard for classification; as shown in the following equation:
where t is a feature, C is an object classification,is a non-target classification; r is an adjustable factor and has a value range of 0 to 1; p (t | C) is the prior probability of t being in C,is t atThe calculation method of the prior probability is shown by the following two formulas:
wherein | C t I andare respectively C andthe number of documents appearing at t; | C | andare respectively C andthe number of documents in (1);
(3) Removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;
(4) Extracting and classifying the character string characteristics of the new data content name;
(5) And storing the newly arrived data content items into corresponding content categories in the nodes, and updating the category heat table and the log.
2. The method for replacing the category popularity cache based on the content classification in the content-oriented center network according to claim 1, wherein: in the step (1), before determining whether the remaining cache space of the node can accommodate the new data content, the CS table of the node is checked to see whether the new data content is cached in the cache.
CN201410384637.5A 2014-08-07 2014-08-07 Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network Expired - Fee Related CN104253855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410384637.5A CN104253855B (en) 2014-08-07 2014-08-07 Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410384637.5A CN104253855B (en) 2014-08-07 2014-08-07 Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network

Publications (2)

Publication Number Publication Date
CN104253855A CN104253855A (en) 2014-12-31
CN104253855B true CN104253855B (en) 2018-04-24

Family

ID=52188380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410384637.5A Expired - Fee Related CN104253855B (en) 2014-08-07 2014-08-07 Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network

Country Status (1)

Country Link
CN (1) CN104253855B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105188088B (en) * 2015-07-17 2019-07-12 中国科学院信息工程研究所 Caching method and device based on content popularit and node replacement rate
CN106453451A (en) * 2015-08-08 2017-02-22 陈昶宇 Shared adaptive content data cache network (SADCN)
WO2017049488A1 (en) * 2015-09-23 2017-03-30 华为技术有限公司 Cache management method and apparatus
CN105577537A (en) * 2015-12-25 2016-05-11 中国科学院信息工程研究所 Multipath forwarding method and system of history record based information centric network
CN105657054B (en) * 2016-03-04 2018-10-12 重庆大学 A kind of content center network caching method based on K mean algorithms
CN106161252B (en) * 2016-06-14 2019-03-15 电子科技大学 A kind of load-balancing method applied to content center net
CN105939385B (en) * 2016-06-22 2019-05-10 湖南大学 Real time data replacement method based on request frequency in a kind of NDN caching
US20180062935A1 (en) * 2016-08-25 2018-03-01 Futurewei Technologies, Inc. Hybrid approach with classification for name resolution and producer selection in icn
CN106603646B (en) * 2016-12-07 2019-07-09 北京邮电大学 A kind of information centre's network-caching method based on user interest preference
US10469348B2 (en) * 2016-12-29 2019-11-05 Futurewei Technologies, Inc. Centrality-based caching in information-centric networks
CN106888262A (en) * 2017-02-28 2017-06-23 北京邮电大学 A kind of buffer replacing method and device
CN108076144B (en) * 2017-12-03 2020-09-11 北京邮电大学 Fair caching algorithm and device for content-centric network
CN108259929B (en) * 2017-12-22 2020-03-06 北京交通大学 Prediction and caching method for video active period mode
CN108156249B (en) * 2017-12-29 2021-01-12 南京邮电大学 Network cache updating method based on approximate Markov chain
CN111225267B (en) * 2018-11-26 2022-05-06 中国电信股份有限公司 Content cache scheduling method, device and system and content distribution network node
CN111104365A (en) * 2019-11-25 2020-05-05 深圳市网心科技有限公司 File deployment method, device, equipment and readable storage medium
CN112862060B (en) * 2019-11-28 2024-02-13 南京大学 Content caching method based on deep learning
CN111465057B (en) * 2020-03-30 2021-06-04 北京邮电大学 Edge caching method and device based on reinforcement learning and electronic equipment
CN113905354B (en) * 2021-11-11 2023-09-26 南京邮电大学 Vehicle-mounted network content transfer method and system based on regional content popularity
US12003803B2 (en) 2021-11-11 2024-06-04 Nanjing University Of Posts And Telecommunications Content delivery method and system through in-vehicle network based on regional content popularity
CN116112434B (en) * 2023-04-12 2023-06-09 深圳市网联天下科技有限公司 Router data intelligent caching method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501315A (en) * 2013-09-06 2014-01-08 西安交通大学 Cache method based on relative content aggregation in content-oriented network
CN103905545A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Reinforced LRU cache replacement method in content-centric network
CN103905538A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Neighbor cooperation cache replacement method in content center network
CN103905539A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Optimal cache storing method based on popularity of content in content center network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501315A (en) * 2013-09-06 2014-01-08 西安交通大学 Cache method based on relative content aggregation in content-oriented network
CN103905545A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Reinforced LRU cache replacement method in content-centric network
CN103905538A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Neighbor cooperation cache replacement method in content center network
CN103905539A (en) * 2014-03-22 2014-07-02 哈尔滨工程大学 Optimal cache storing method based on popularity of content in content center network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Similarity content search in content centric networks";Petors Daras;《18th ACM international conference on Multimedia》;20101025;775-778 *
"一种基于内容流行度的内容中心网络缓存概率置换策略";朱轶等;《电子与信息学报》;20130615;第35卷(第6期);1305-1310 *

Also Published As

Publication number Publication date
CN104253855A (en) 2014-12-31

Similar Documents

Publication Publication Date Title
CN104253855B (en) Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network
RU2729227C2 (en) Method and device for extracting web-pages subject-matter
CN106202124B (en) Webpage classification method and device
US7975301B2 (en) Neighborhood clustering for web spam detection
US7447684B2 (en) Determining searchable criteria of network resources based on a commonality of content
KR20160149978A (en) Search engine and implementation method thereof
CN106557777B (en) One kind being based on the improved Kmeans document clustering method of SimHash
WO2022141876A1 (en) Word embedding-based search method, apparatus and device, and storage medium
US20120233096A1 (en) Optimizing an index of web documents
Dixit et al. A novel approach to priority based focused crawler
EP3314468A1 (en) Matching documents using a bit vector search index
CN105589976B (en) Method and device is determined based on the target entity of semantic relevancy
CN110728136A (en) Multi-factor fused textrank keyword extraction algorithm
CN115374362A (en) Multi-way recall model training method, multi-way recall device and electronic equipment
Chauhan et al. Web page ranking using machine learning approach
JP2005346598A (en) Web information collection device, web crawler program and web information collection method
Bhatt et al. Focused web crawler
RU105758U1 (en) ANALYSIS AND FILTRATION SYSTEM FOR INTERNET TRAFFIC BASED ON THE CLASSIFICATION METHODS OF MULTI-DIMENSIONAL DOCUMENTS
CN110363015A (en) A kind of construction method of the markov Prefetching Model based on user property classification
Baskaran et al. Improved performance by combining web pre-fetching using clustering with web caching based on SVM learning method
CN103902687B (en) The generation method and device of a kind of Search Results
Thenmalar et al. The modified concept based focused crawling using ontology
Mukhopadhyay et al. A dynamic web page prediction model based on access patterns to offer better user latency
Sunita et al. Web URLs retrieval with least execution time using MPV clustering approach
CN106649537A (en) Search engine keyword optimization technology based on improved swarm intelligence algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180424