CN104253855B

CN104253855B - Classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network

Info

Publication number: CN104253855B
Application number: CN201410384637.5A
Authority: CN
Inventors: 张国印; 邢志静; 武俊鹏; 夏松竹; 李庆显; 唐滨; 徐林枫
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2014-08-07
Filing date: 2014-08-07
Publication date: 2018-04-24
Anticipated expiration: 2034-08-07
Also published as: CN104253855A

Abstract

The present invention relates to the classification popularity buffer replacing method based on classifying content in a kind of content oriented central site network.The present invention includes：Whether the first remaining spatial cache of decision node can accommodate new data content；If enough spatial caches cache new data content；The popularity of all the elements classification in criterion calculation node is calculated according to exponentially weighted moving average (EWMA), selects the content type of popularity minimum；The minimum content item of number will be requested in the content type of popularity minimum in time predefined and removes nodal cache；Extraction new data content name character string feature is simultaneously classified；Newly arrived data content item is stored in node in corresponding content type, more new category Thermometer and daily record.The present invention can preferably be managed the caching of CCN interior joints by content name classification, network is started with communication process from content name and content is searched and is replaced, and the diversity of content in balance nodes caching, improves the efficiency that caching is replaced.

Description

Content classification-based category popularity cache replacement method in content-oriented center network

Technical Field

The invention relates to a content-center-network-oriented category popularity cache replacement method based on content classification.

Background

With the continuous development of the internet, people have an increasing demand for content in the network. The current network architecture based on TCP/IP has increasingly highlighted problems in network control, resource allocation, etc., and the center of gravity of the internet has shifted from communication between hosts to how to quickly obtain requested content from the internet. For this reason, scholars at home and abroad proceed to research a new next generation network system architecture, develop researches on a plurality of related project topics, promote the development of the next generation network, and have an epoch-crossing significance. The invention mainly researches a novel future network system architecture, namely a content-centric network. The CCN abandons the communication mode taking the host address as the core in the traditional network, changes the communication mode into the network idea taking the named content as the center, and simultaneously constructs a new architecture and a communication mechanism to adapt to the development of the future network. The literature, "research and analysis on CCN research progress of content-centric networking" reviews the relevant research of CCN, introduces the working mechanism of CCN, investigates the current research hotspot problems and challenges of CCN, analyzes the main comparative advantages and existing problems of CCN, and finally verifies the working mode of CCN through an experimental test bed.

Cache replacement policy is a key part of CCN research and is related to the overall performance of the network. The cache replacement strategy frequently used in the CCN has a least recently used strategy and a least frequently used strategy and their improvement strategies. The LRU cache replacement strategy mentioned in the document Modeling data transfer in content-centralized networking has the characteristics of simple algorithm, easy implementation, convenient deployment and the like, but the characteristics of CCN dynamic are not fully considered, so that the LRU cache replacement strategy has great inadaptability.

The invention provides a category popularity cache replacement strategy based on content name classification, which provides a method for combining all-gram and R-value to extract and classify the characteristics of content name character strings according to the characteristics of naming mode and name uniqueness of content in CCN, so that the cache in each node is managed by taking category as a unit. And the popularity of each category in each node is calculated by adopting an exponential weighted moving average idea, and different weights are given to the number of times that each category is visited in a specified time through the time distance to reflect the real-time popularity condition. In the cache replacement process, the content item in the content category with the lowest popularity in the node is replaced preferentially, and then the new content is stored in the corresponding category to which the node belongs according to the classification judgment method.

Disclosure of Invention

The invention aims to provide a method for replacing the category popularity cache based on content classification in a content-centric network, which realizes cache replacement by means of content classification and dynamic popularity calculation, can fully consider the recent dynamic characteristic of network content, improve the distribution efficiency of the network content and reduce the waste of the node-limited network cache.

The purpose of the invention is realized by the following steps:

(1) When new data content arrives, judging whether the residual cache space of the node can contain the new data content; if the cache space is enough to cache the new data content, directly executing the step (4); if the cache space does not have enough cache data content, executing the step (2) to perform cache replacement;

(2) Calculating the popularity of all content categories in the node according to an exponential weighted moving average calculation standard, and selecting the content category with the minimum popularity;

(3) Removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;

(4) Extracting and classifying the character string characteristics of the new data content name;

(5) And storing the newly arrived data content items into corresponding content categories in the nodes, and updating the category heat table and the log.

In the step (1), before judging whether the remaining cache space of the node can accommodate the new data content, the CS table of the node is checked to see whether the new data content is cached in the cache.

Extracting the character string characteristics of the new data content name according to a method of combining all-gram and R-value and classifying the content: the n-gram model intercepts a series of substrings by utilizing a sliding window with the length of n, the sliding window slides one length unit each time, and the content name sequence processed by the n-gram model is divided into continuous substrings with the lengths of n.

The invention has the following beneficial effects:

the invention provides a cache replacement algorithm of category popularity based on content classification, which effectively avoids processing all contents independently when calculating the popularity and only needs to calculate the popularity of each content category. Thus, when cache replacement is required to be performed, a certain content item in the category with the lowest popularity in the node is replaced out of the cache, and then the newly arrived content data is classified into the existing category in the node cache according to the name, so that the cache replacement process is completed. Unlike the conventional LRU replacement method, the category popularity cache replacement method based on content classification can allow content with high popularity to be stored in a network node for a relatively long time in consideration of the content category popularity. Unlike the conventional LRU method that selects the least recently used content block for replacement, the method of the present invention selects the content with the lowest category popularity for replacement in steps 2 and 3. And, the idea of classifying the content according to the method of combining the all-gram and the R-value is provided in the step 4, the cache of the node in the CCN can be better managed according to the classification of the content name, so that the network starts from the content name to search and replace the content in the communication process, the diversity of the content in the node cache is balanced, and the cache replacement efficiency is improved. Simulation experiment results show that the category popularity cache replacement strategy based on content name classification provided by the invention has certain advantages in performance compared with other classical replacement strategies.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a schematic diagram of the network topology of the present invention;

FIG. 3 is a table of experimental simulation parameters for the present invention;

FIG. 4 is a schematic diagram of an example of the computational popularity of the present invention;

FIG. 5 is a schematic diagram of an example n-gram of the present invention;

FIG. 6 is a flow chart of the all-gram computed r value extraction feature combination method of the present invention;

FIG. 7 is a schematic diagram of the average cache hit rate under different node cache spaces according to the present invention;

FIG. 8 is a graph illustrating average cache hit rates for different numbers of stub domains according to the present invention;

FIG. 9 is a graph illustrating the recovery capability of the cache hit rate of the present invention;

FIG. 10 is a schematic diagram of the average load of servers under different sizes of node caches according to the present invention;

FIG. 11 is a schematic diagram illustrating the average load of servers in different numbers of root domains according to the present invention;

FIG. 12 is a graph illustrating the effect of sample time select size on cache hit rate and server load in accordance with the present invention.

Detailed Description

The invention is described in more detail below by way of example with reference to the accompanying drawings.

1. A content classification-based category popularity cache replacement method for a content-centric network is characterized in that:

step 1: when new data content arrives, firstly judging whether the residual cache space of the node can contain the new content; if the cache space is enough to cache the new data, directly entering the step 4; if not enough buffer space is available for buffering data, step 2 is entered for performing buffer replacement to buffer new data.

Step 2: calculating the popularity of all content categories in the nodes according to an Exponential Weighted Moving Average (EWMA), and selecting the content category with the minimum popularity;

and step 3: removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;

and 4, step 4: extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;

and 5: and storing the newly arrived content items into the corresponding content categories in the nodes, and updating the category heat table and the log.

In step 1, before judging whether there is enough cache space to cache new data, the CS table of the node is checked to see whether the data is cached in the cache. The CS table stores all contents that pass through the node and are not cached by the node.

And step 2, extracting the character string characteristics of the new content name, classifying the content, and calculating the popularity of the content category.

Extracting the character string characteristics of the new content name according to a method of combining all-gram and R-value and classifying the content: the n-gram model n-gram is a sliding window with the length of n to intercept a series of substrings, and the sliding window slides by one length unit each time. When a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n.

In the classification process, the accuracy of classification is often greatly related to the selection of the n value, and the n-gram algorithm has no fixed method for selecting the n value, and sometimes the final value is selected after trying according to human experience. If the value of n is too small, the structure and the sequence of the character strings may be ignored, and if the value of n is too large, the similarity between the character strings may be reduced, resulting in an erroneous classification result. Thus, the present invention proposes an all-gram concept. Instead of using a fixed n-value to divide the name string, a series of n-values are used to divide the name string, so that n-gram substrings with different lengths are generated, and generally the substrings must include important features and keywords contained in the original string. Therefore, the feature vector space formed finally through the all-gram thought segmentation can be used for efficiently and quickly classifying the training samples through learning, and the classification accuracy is improved.

The invention adopts the R-value characteristic selection method, which can judge the characteristics of the characteristics according to the calculated R value, rank the characteristics and select the characteristic set which is easier to classify, thereby providing an ideal standard for classification. In this method an r factor is used to balance the word frequency. As shown in the following equation:

where t is a feature, C is an object classification,is a non-target classification. r is an adjustable factor and has a value ranging from 0 to 1. P (t | C) is the prior probability of t being in C,is t atThe calculation method of the prior probability in (1) is shown by the following two formulas:

wherein | C _t I andare respectively C andthe number of documents that appear at t. | C | andare respectively C andthe number of documents in (1).

The value of the factor r is adjustable between 0 and 1, when the value of r is smaller, the calculated characteristic t has the characteristic of low frequency but high discrimination, and when the value of r is larger, the calculated characteristic t has the characteristic of high frequency but low discrimination.

The content name in the CCN is subjected to all-gram and r value calculation combined to obtain characteristics, so that the purpose of classifying the content in the cache is achieved, and the specific flow is shown in FIG. 1.

The popularity of all content categories in the standard compute node is calculated from an Exponentially Weighted Moving Average (EWMA): the computation of content category popularity uses an Exponentially Weighted Moving Average (EWMA) as a basis for the measurement. Moving average is an important principle method in statistics, called average line for short. The term "move" means that the data object to be calculated changes during the calculation process, and the data is to be changed with time. Moving average is a method of analyzing data over a time series.

In the CCN, due to the dynamic characteristics of the network, the popularity of the cached content in the node varies greatly with time, so that when calculating the popularity of the content class, it is only possible to calculate the popularity value of the content class within a certain period of time, and the more recent the popularity of the content class is reflected, which can create a trace log for each content class by using the idea of exponentially weighted moving average to record the number of times of requesting within a predefined period of time. The time is subdivided into small time periods, the value of the number of times of requested access in the time period closest to the current time period is given a higher weight, and the value of the number of times of requested access in the time period farther away is given a lower weight. As can be seen from the formula of the exponentially weighted moving average, such EWMA values determine the popularity of the content class to some extent, which is used as a criterion for the calculation herein, the calculation formula is as follows:

wherein C is _i [j]Is the number of times category i has been requested within the jth time period. In the formula, t is a positive integer and represents the total sampling time. α represents a weight, defined herein as 2/(t + 1).

And 3, when the new data reaches the node and needs to be cached, if the remaining cache space does not have enough space for accommodating the new data, replacing the original data in the CS table.

The invention classifies the cache contents in the CCN node, calculates the popularity of each content classification, removes the node cache of the content item with the least number of requests within the predefined time in the content category with the minimum popularity, and vacates enough cache space to accommodate new data. When the old data cache new data is removed, the content with the minimum number of requested times in the content classification with the minimum popularity is selected, and the popularity of the content in the network is considered, so that the hot content can reside in the node cache for a long time, the cache hit rate is improved, and the network performance is improved.

And step 4, comprising an all-gram model and an R-value characteristic selection method.

The method combines all-gram and r value calculation to obtain the characteristics of the content names in the CCN, thereby achieving the purpose of classifying the content in the cache. The cache of the nodes in the CCN can be better managed according to the classification of the content names, so that the network searches and replaces the content from the content names in the communication process, the diversity of the content in the node cache is balanced, and the cache replacement efficiency is improved.

Step 5 includes a category heat table.

The category hotlist is used for recording the hit times and popularity value of each content category. When the requested content is in the cache node, it is considered as a request hit, and the number of times the content is hit in the category is increased in the category hotlist. When the content in the node is requested and hit, the category heat table is updated, the popularity of the content category in any time period can be calculated, the characteristic of the change of the popularity of the content in the network at any time is reflected, and the dynamic characteristic of the network is adapted.

The invention adopts an NDnSIM network simulator based on NS-3 to simulate the CCN. The performance of the category popularity cache replacement policy proposed herein based on content classification in the CCN was evaluated by simulation and compared to representative cache replacement policies LRU, LRU-K, LFU, and LFU-Aging. GT-ITM is used to generate a Transit-Stub network topology as shown in FIG. 2. The topological network in the figure comprises a plurality of stub networks. Because the stub network only processes the communication of the source and the target in the sub-network, only a part of hosts communicate with the outside, and only one boundary router is arranged, each stub network is equivalent to an interest group, and when the stub network requests a data content which is interested by the outside, the data content can be transmitted in the stub domain of the stub network, so that the popularity of the data content is changed.

Because of the limit of the simulation environment, 10 content categories are configured during simulation, each content category comprises 50 content items, the time interval of communication between each stub network and the outside is 30 seconds, namely, after 30 seconds of obtaining the content interested by the stub network, the next content request can be carried out, thereby being beneficial to simulating the dynamic of the network and changing the popularity of the content all the time. The popularity of the content class is calculated once by taking 7 seconds as a unit by default, namely, the time calculation sample is 7 seconds, in the time sample, the number of times of accessing the content class per second is recorded, and a certain weight is given to the number of times of accessing according to the distance of time to calculate the popularity of the content class, and simulation parameters are shown in fig. 3.

Each node in the CCN has caching capacity, and in the test simulation, the node caching size is defined according to the relative size of the node and the total content of the network. The node cache size is typically defined to be between 10% and 30% of the total content in the network. For example, when a node cache size of 10% is defined and there are a total of 1000 content items in the network, then a maximum of 100 content items can be cached by each node. Of course, in a real network, the caching capacity of the nodes is very limited relative to the total amount of network content. Since the simulated network size is small, the node cache size is expressed in a proportional form. The invention includes:

and 3, step 3: removing the content item with the least number of requests within a predefined time in the content category with the least popularity from the node cache;

The node i establishes a category hotlist for recording the hit times and popularity values of each content category.

The node i is any cache node in the content-centric network.

A specific embodiment of the present invention will be described in detail with reference to fig. 1. The invention relates to a content-center-network-oriented category popularity cache replacement method based on content classification, which comprises the following steps:

the process of calculating the popularity of a content category using an exponentially weighted moving average is described below with an example. As shown in fig. 4, it is assumed that a certain cache node is divided into 10 categories by contents, and the predefined time is divided into 7 small time periods, as shown in fig. 4, the numerical value below each time period in the figure indicates the number of times the category of the contents is requested in the time period. In this example it can be seen that the total number of times that content in the first, ninth and tenth categories is requested is the same within a predefined time period, whereas the number of times that content in each small time period is requested varies greatly, in particular the number of times that content in the ninth and tenth categories is requested forms a clear contrast within a single time period. If only the average value of the number of times that each content category is requested within a predefined time is considered, the popularity of each content category in the time is considered to be the same, but it is obviously not logical, because in the CCN, the popularity may change at any time, the estimation calculation of averaging within a period of time is definitely not accurate, the time should be divided, and the more detailed the division is, the better the calculated popularity of the content category can well reflect the real network condition. The popularity of the content categories is dynamically calculated by using an exponential weighted moving average method, by giving a higher weight to the value of the number of times each content category is requested in the latest time period, the other numerical values are sequentially decreased according to the weights given by the distance of the time period, the calculated result is shown in the right part of fig. 4, and the calculated popularity values are ranked from high to low. By means of calculation in the mode, new data can be effectively stored in the cache when arriving at the node, if the cache residual space of the node is not enough to accommodate the new data when arriving at the node, the cache replacement process is carried out, and the cache replacement process can be more efficient through dynamic calculation and ranking of popularity of content categories.

And step 3: caching the content item which is requested the least times within the predefined time in the content category with the minimum popularity;

the N-gram model is substantially an N-1 order Markov model, the N-gram being a LilyA sliding window of length n is used to intercept a series of substrings, the sliding window sliding one length unit at a time. When a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n. The model is that the probability of the whole sentence is the product of the appearance probabilities of all words, assuming that in a sentence with a certain length composed of a plurality of words, the appearance of the nth word is only related to the first n-1 words and is not related to any other words. The mathematical model can be expressed as: assuming that a sentence consists of m words, W = W is defined ₁ ，w ₂ ，w ₃ ，...，w _m Then the word w is considered _i (1. Ltoreq. I. Ltoreq.m) only occurs with the whole preamble w ₁ w ₂ w ₃ ...w _i-1 On the other hand, the probability of the sentence W is: p (W) = p (W) ₁ ，w ₂ ，w ₃ ，...，w _m )

＝p(w ₁ )p(w ₂ |w ₁ )p(w ₃ |w ₁ ² )...p(w _m |w ₁ ^m-1 )

W in the formula ₁ ^m-1 Denotes w ₁ ，w ₂ ，w ₃ ，...，w _m-1 ，p(w _m |w ₁ ^m-1 ) Indicating information w in a given preamble ₁ ，w ₂ ，w ₃ ，...，w _m-1 In case of (2), root word w _m The probability of occurrence. The probability can be calculated by the number of times n words appear in the corpus simultaneously. However, in practice, the value of m will often be very large, resulting in p (w) _m |w ₁ ^m-1 ) Is very complex and requires more memory space. To overcome this problem, it may further be assumed that the current root appears to depend only on the first n-1 roots. Then the following equation can be obtained:

p(W)＝p(w ₁ )p(w ₂ |w ₁ )p(w ₂ |w ₁ ² )...p(w _n |w ₁ ^n-1 )...p(w _m |w _m-n+1 ^m-1 )

in the above formula, w _m-n+1 ^m-1 Denotes w _m-n+1 w _m-n+2 ...w _m-1 。

The n-gram is widely applied in a support vector machine classifier, the text content is divided into text segment sequences with certain length by applying an n-gram algorithm, then filtering selection is carried out, and the segment sequences with high frequency meeting the requirements are reserved to form a feature vector table of the text content. Strings can also be viewed as text to handle classification. The invention mainly aims at English content names, assumes that all contents in CCN are named hierarchically in English form, and the relevance of letter words in English letter sequences is not large, which is very in line with the assumed conditions of n-gram model. The content name "myvideo" in sina. Com. Cn/myvideo/tiger t. Mpg/_ v < timeverinfo >/seg2 is taken as an example, as shown in fig. 5.

In the classification process, the accuracy of classification is often greatly related to the selection of the n value, and the n-gram algorithm has no fixed method for selecting the n value, and sometimes the final value is selected after trying according to human experience. If the value of n is too small, the structure and sequence of the character strings may be ignored, the words like "software" may be associated with the meaning of software by 5 sub-strings soft, ftwa, tware and ware obtained through 4-gram, but if the value of n is too large, the similarity between the character strings may be reduced, and an erroneous classification result may be caused. For example, the word "keyword" does not highlight the important features of the original word in the string formed by the 6-gram, so that the segmentation has no meaning. Thus, the present invention proposes an all-gram concept. The name character string is not divided by a fixed n value, but by a series of n values, n-gram substrings with different lengths are generated, and generally the substrings definitely cover important features and keywords contained in the original character string. Therefore, the feature vector space formed finally through the all-gram thought segmentation can be used for efficiently and quickly classifying the training samples through learning, and the classification accuracy is improved.

The method obtains the characteristics by combining all-gram and r value calculation for the content name in the CCN, thereby achieving the purpose of classifying the content in the cache. The specific flow is shown in fig. 6. Firstly training a sample set, then training the sample set by using an all-gram method to perform feature extraction to obtain a feature set S of a content name, meanwhile, calculating and ranking features in the feature set S by using a method for calculating an R value, selecting the features with the top ranking to form a feature dictionary, and finally forming a feature set S1. And (4) carrying out classification experiments according to the feature set S1, and classifying the contents cached in the CCN according to the content names.

In order to verify the superiority of the class popularity cache replacement based on content classification provided by the invention on performance, the method is compared with the traditional cache replacement method replacement strategy of LRU, LRU-K, LFU and LFU-Aging through experiments.

FIG. 7 is a graph illustrating average cache hit rates for different node cache spaces. It can be seen from fig. 7 that under the conditions that the ratios of the cache capacity of the node to the total amount of the network memory are respectively set to 10%,20% and 30%, the number of the stub networks is 5, and the time sample is 7 seconds, the class popularity policy for content classification proposed by the present invention always shows better performance than other replacement policies under the condition that the set sizes of the node capacities are different. In contrast, LFU and LFU-Aging show poor hit rates.

FIG. 8 is a graph illustrating average cache hit rates for different numbers of stub domains. It can be seen from fig. 8 that most replacement strategies have a significant reduction in the performance of the average cache hit rate with an increasing number of stub domains, with the cache size of each node set to 20% of the total amount of network content. Particularly LRU-K and LFU-Aging, which perform better than LRU and LFU only when the number of stub fields is one. However, the popularity policy based on classification does not change much from beginning to end in performance, which indicates that such policy can adequately accommodate the change in the number of stub domains in the network.

The broken line graph of fig. 9 illustrates the recovery capability of three cache replacement strategies in terms of cache hit rate after a momentary interruption and recovery of the network. The red line shown in the figure shows that the network is interrupted at the 150 th second, the cache hit rates of the three strategies are all in a reduced state, and after the network is recovered, the classified popularity strategy is rapidly recovered to the state before the network is disconnected, and the cache hit rate is very high. In contrast, the other two strategies show poor performance, slow and unstable recovery, and especially LFU-Aging shows very poor cache hit rate compared to the other two strategies.

Fig. 10 and 11 are server average load cases tested under conditions of different sized node caches and different numbers of root domains, respectively. Similar to the previous simulation results regarding average cache hit rates, the LFU and LFU-Aging policies are the worst performance in reducing the average load of origin servers, as shown in fig. 10, when the number of stub domains is 5 and the node cache capacity is 10% of the total amount of network content, the classified popularity policy reduces the origin server load amount by about 39% compared to the LFU policy. When the number of stub domains is 9, the popularity policy for classification is approximately 65% of the server load of the LFU policy. Therefore, the cache replacement strategy based on the category popularity of the content classification can well reduce the load condition of the server and relieve the pressure of the network.

FIG. 12 is a graph illustrating the effect of sample time size selection on the average cache hit rate of a node and the average load of a server in relation to calculating content category popularity. It can be seen from the figure that when the time sample is selected to be 7 seconds, the average cache hit rate of the node reaches the highest value, and the average load of the server is in a very small condition. If the sample time is too large or too small, a good simulation result cannot be obtained.

Claims

(1) When new data content arrives, firstly judging whether the residual cache space of the node can contain the new data content; if the cache space is enough to cache the new data content, directly executing the step (4); if the data content is not cached in enough cache space, executing the step (2) to perform cache replacement;

(2) Calculating the popularity of all content categories in the node according to an exponential weighted moving average calculation standard, and selecting the content category with the minimum popularity; calculating the popularity of all content categories in the standard calculation node according to the exponentially weighted moving average EWMA, wherein the calculation formula is as follows as the calculation standard:

C _i [j]is the number of times category i has been requested within the jth time period, α represents a weight;

extracting and classifying the character string characteristics of the new content name according to a method of combining the all-gram and the R-value;

the N-gram model is substantially an N-1 order Markov model, the N-gram is a sliding window with the length of N to intercept a series of substrings, and the sliding window slides one length unit each time; when a content name sequence is processed by an n-gram model, the content name sequence is divided into a plurality of continuous substrings with the length of n; the invention adopts a feature selection method of R-value, which judges the feature of the feature according to the calculated R value, ranks the feature and selects a feature set which is easier to classify, thereby providing an ideal standard for classification; as shown in the following equation:

where t is a feature, C is an object classification,is a non-target classification; r is an adjustable factor and has a value range of 0 to 1; p (t | C) is the prior probability of t being in C,is t atThe calculation method of the prior probability is shown by the following two formulas:

wherein | C _t I andare respectively C andthe number of documents appearing at t; | C | andare respectively C andthe number of documents in (1);

2. The method for replacing the category popularity cache based on the content classification in the content-oriented center network according to claim 1, wherein: in the step (1), before determining whether the remaining cache space of the node can accommodate the new data content, the CS table of the node is checked to see whether the new data content is cached in the cache.