CN113127267B - Strong-consistency multi-copy data access response method in distributed storage environment - Google Patents

Strong-consistency multi-copy data access response method in distributed storage environment Download PDF

Info

Publication number
CN113127267B
CN113127267B CN202110488820.XA CN202110488820A CN113127267B CN 113127267 B CN113127267 B CN 113127267B CN 202110488820 A CN202110488820 A CN 202110488820A CN 113127267 B CN113127267 B CN 113127267B
Authority
CN
China
Prior art keywords
node
data
credit
nodes
copy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110488820.XA
Other languages
Chinese (zh)
Other versions
CN113127267A (en
Inventor
孙胜耀
李华英
杨颖辉
王仙吉
张少辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Normal University
Original Assignee
Zhengzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Normal University filed Critical Zhengzhou Normal University
Priority to CN202110488820.XA priority Critical patent/CN113127267B/en
Publication of CN113127267A publication Critical patent/CN113127267A/en
Application granted granted Critical
Publication of CN113127267B publication Critical patent/CN113127267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1456Hardware arrangements for backup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/82Solving problems relating to consistency

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A strong-consistency multi-copy data access response method in a distributed storage environment selects a plurality of copy nodes with strong node capacity and high activity to respond according to the service capacity and the activity capacity of the nodes in a dynamic distributed storage environment and the expectation of user access data, replaces the traditional data response mode of indifference participation in service, and achieves the aims of reducing the waiting time of access of multi-copy data and reducing the communication overhead of a system. Firstly, according to the characteristics of the nodes influencing data access, the service capability of the nodes is represented by credit currency values, the influence of the characteristics on the data access capability is evaluated, and the comprehensive influence is marked by adopting a credit currency issuing mode; then, according to the conditions that the node provides service and obtains service in unit time, measuring and marking the activity of the node by adopting a mobile credit monetary value; and finally, determining the copy needing to participate in the response according to the node activity, the node credit monetary value and the expectation of the user on the data freshness.

Description

Strong-consistency multi-copy data access response method in distributed storage environment
Technical Field
The invention relates to a distributed storage technology, in particular to a strong-consistency multi-copy data access response method in a dynamic distributed storage environment.
Background
Distributed computing and distributed storage have been widely used in today's real world scenario. In these applications, multiple copy techniques are typically used to improve the usability of the system and to increase the efficiency of data access. The quality of the system performance is closely related to the number of copies. Generally, the greater the number of copies, the greater the availability, scalability, and access efficiency of the system. While more copies can effectively improve the performance of the system, additional maintenance overhead is added to the system. For example, when an update occurs to one copy, other copies identical to the copy also need to complete the update in time to avoid the poor availability of the system due to data inconsistency. Obviously, the higher the number of copies, the higher the cost of consistency maintenance.
To effectively deal with the problem of multi-copy consistency, a large number of consistency maintenance schemes have been proposed. Generally, these policies can be classified into three types of push mode, polling mode, and hybrid mode. In the push-based policy, update messages are usually pushed to other copies in some form, such as "heartbeat"; in a polling-based policy, a replica node actively polls a service node to actively obtain updates; and the mixed mode combines pushing and polling to complete copy consistency updating. Although these policies can effectively guarantee the consistency of the access data, in the dynamic distributed storage environment, the freshness of the access data cannot be completely guaranteed due to frequent changes of nodes and data.
In order to deal with the problem of strong consistency of access data in a dynamic distributed environment, a distributed system adopts a certain strategy on the response work of the copy; as in a distributed-storage database, all atomic transaction participants are typically managed in a 2PC fashion to commit or terminate transactions; when accessing a data, if and only if all the replica nodes agree to submit the data access request, then the correct data access service is agreed to be provided; otherwise the access transaction is terminated. Although the 2PC mode effectively ensures the correctness of the read data, the response is carried out after each copy node confirms no error; the communication overhead of the system is increased undoubtedly. In order to better balance the performance of the system, in a distributed storage scenario (such as the database of the Taobao maritime, the SQL Server database of Microsoft, etc.) which is currently in practical use, the solutions based on Paxos are adopted to meet the requirement of strong consistency of data access. The scheme can agree to the service by submitting only the values agreed upon by most copies without waiting for each copy to respond. When Paxos responds to the accessed data, the duplicate nodes need to go through multiple rounds of negotiation first, and then take the values adopted by most nodes as response data. The mode enables all the replica nodes to participate in the data response service in a peer-to-peer manner, and effectively reduces the data communication overhead. However, in this method, a problem of whether to allow a part of nodes to replace all nodes for negotiation is not considered, and if the availability of cloud storage is not affected, the communication overhead of the system can be further reduced by using a method in which a part of nodes replace services.
In fact, any distributed storage cannot guarantee 100% perfect consistency of the accessed data, especially dynamic distributed storage. Whenever a user accesses data, they always expect that the data that they can access will not be erroneous, i.e., when accessing data, the user accesses data with an expectation, e.g., 99.99999999%, that it is the correct data. Therefore, in order to effectively respond to the strong consistency expectation of the user to the access data in the distributed application environment, it is necessary to design a strong consistency multi-copy data access response method, so as to reduce the communication overhead of the system on the premise of ensuring the user access data expectation.
Disclosure of Invention
The invention provides a strong-consistency multi-copy data access response method in a distributed storage environment aiming at the defects of the prior art, designs an on-demand response data response scheme, and aims to selectively participate in response service by a plurality of copies of nodes according to the expectation of a user on the freshness of access data in a dynamic distributed storage environment, reduce the access waiting time of the multi-copy data and reduce the communication overhead of a system.
The technical proposal of the invention has the main technical conception that:
according to the service capability and the activity capability of the nodes in the dynamic distributed storage environment and the expectation of a user for accessing data, a plurality of replica nodes with strong node capability and high activity are selected to respond, the traditional data response mode without difference for participating in service is replaced, and the aims of reducing the waiting time of access of a plurality of replica data and reducing the communication overhead of a system are fulfilled.
The invention is mainly divided into three parts of representing the service capability of the node by credit currency, measuring the activity of the node by currency transaction amount and determining the copy needing to participate in response. Firstly, according to the characteristics of nodes influencing data access, the influence of the characteristics on the data access capacity is evaluated by using a class multivariate discriminant z-score model, and the comprehensive influence is marked by adopting a credit currency issuing mode; then, according to the conditions that the node provides service and obtains service in unit time, the activity of the node is marked by adopting a mobile credit monetary value; and finally, determining the copy needing to participate in the response according to the node activity, the node credit currency and the expectation of the user on the data freshness.
The idea of using credit currency to represent the service capability of the node is as follows:
firstly, according to some factors influencing data access by nodes in a dynamic distributed environment, objectively evaluating the factors by using a method similar to multivariate discriminant z-score; the influence degrees of different factors on data access are distinguished by weight; the "Zeta" value of the node is then obtained using the multivariate discriminant z-score method and is labeled in the form of the dispensed currency.
The idea of measuring the node liveness by using the currency transaction amount is as follows:
if a node successfully provides data access service to other nodes, the node charges credit money for other nodes; if one node successfully asks other nodes for the data access service, credit currency is provided for the other nodes. The invention refers to the sum of the two as the liquidity credit fund, which is used for indicating the node activity.
Determining the idea of participating in the response copy:
firstly, the invention acquires all replica nodes of the data to be accessed, obtains the relationship between the credit activity and the probability that the replica is the latest replica according to the credit activity of each node, and then obtains the number of the replicas needing to participate in data response according to the expectation of the freshness of the accessed data by a user and a probability formula. The probability that the copies on the nodes can acquire updates in time is closely related to the access capability and the activity of the nodes for data access, and is in direct proportion. In order to embody the direct proportion relation, the product of the credit currency of the node and the activity of the node is called credit activity. When the responded replica nodes are selected, the replica node set is sequenced according to the credit activity of the nodes; and then selecting the copies on the nodes with high credit activity from the sorted set according to the number of the copies needing to be accessed to participate in data response.
The invention has the beneficial effects that:
1. the invention provides a data participation response method for multi-copy data access in a dynamic distributed storage environment, which reduces the access waiting time of the multi-copy data and reduces the communication overhead of a system on the premise of ensuring the expectation of the user for accessing the data. Under the dynamic distributed storage environment, according to the expectation of a user on the freshness of the access data, a data response scheme responding according to the requirement is adopted, and a plurality of copies of nodes selectively participate in response service, so that the access waiting time for participation of all copies is effectively made up when the access data is requested under the strong consistency environment, and the communication overhead of the system is reduced.
2. The invention represents the service capability of the node in the form of credit currency. And evaluating the service capability of the node by adopting a multivariate judgment z-score method according to the multivariate factors of the node, and intuitively expressing the service willingness of the node by using the credit monetary value.
3. The invention uses the liquidity to characterize the activity of the participating services of the point. According to whether the node provides service or requests service, the node participation service activity is visually represented by the size of the mobile credit currency, and the probability that the copy on the node is the latest copy is measured.
4. The invention obtains the number of the copies to be accessed and the nodes to participate in response according to the relationship between the credit activity and the probability that the copies on the nodes are the latest copies. According to the relation between the credit activity and the probability that the copy on the node is the latest copy, the node with high service capability and high activity is selected to participate in the response service, and the communication overhead of the system is reduced.
Drawings
FIG. 1 is a general flow chart of a multi-copy data access response method according to the present invention;
FIG. 2 is a flow chart illustrating the service capabilities of a node using credit currency;
FIG. 3 is a flow chart for measuring node liveness using monetary transaction amounts;
fig. 4 is a flow diagram of determining that participation in a copy of a response is required.
Detailed Description
The technical solution of the present invention is further described in detail below by means of specific embodiments and with reference to the accompanying drawings.
Example 1
Referring to fig. 1, the strong consistency multi-copy data access response method in the distributed storage environment of the present invention selects a plurality of copy nodes with strong node capability and high liveness to respond according to the service capability and the active capability of the nodes in the dynamic distributed storage environment and the expectation of the user to access data, replaces the traditional data response mode without difference participating in the service, and realizes the objectives of reducing the waiting time of the access of the multi-copy data and reducing the communication overhead of the system, including the following steps:
firstly, according to the characteristics of the nodes influencing data access, the service capability of the nodes is represented by credit currency values, the influence of the characteristics on the data access capability is evaluated, and the comprehensive influence is marked by adopting a credit currency issuing mode;
then, according to the condition that the node provides service and obtains service in unit time, measuring and marking the activity of the node by adopting a flowing credit monetary value (monetary transaction amount);
and finally, determining the copy needing to participate in the response according to the activity of the node, the credit monetary value of the node and the expectation of the user on the data freshness.
Example 2
The strong-consistency multi-copy data access response method in the distributed storage environment of the embodiment is different from the embodiment 1 in that, referring to fig. 2, objective evaluation is performed on some factors of the influence of nodes in the dynamic distributed environment on data access by using a multi-discriminant z-score model similar to the multi-discriminant z-score model; the influence degrees of different factors on data access are distinguished by weight; then, a 'Zeta' value of the node is obtained by a multivariate discriminant z-score method, the value is marked in a currency issuing mode, and the service capacity of the node is represented by a credit currency value.
Example 3
The difference between the strong-consistency multi-copy data access response method in the distributed storage environment of this embodiment and embodiments 1 and 2 is that, referring to the flowchart shown in fig. 3, the node liveness is measured by using the amount of money transactions: if a node successfully provides data access service to other nodes, the node charges credit money for other nodes; if one node successfully asks for the data access service from other nodes, credit money is provided for other nodes; the sum of the two is called the liquidity credit fund and is used for indicating the activity of the node.
Example 4
The method for responding to the access of the strongly consistent multi-copy data in the distributed storage environment of the embodiment is different from the embodiment 3 in that the method determines the steps of participating in the response copy according to the node activity, the node credit monetary value and the expectation of the user on the data freshness:
referring to fig. 4, the process of determining that participation in the response copy is required is as follows:
firstly, acquiring all replica nodes of data to be accessed, and acquiring the relationship between the credit activity and the probability that a replica is the latest replica according to the credit activity of each node;
then, the number of copies needing to participate in data response is obtained according to the expectation of the user on the freshness of the access data and a probability formula.
Example 5
The strong-consistency multi-copy data access response method in the distributed storage environment of this embodiment is different from embodiment 4 in that, since the probability that the copy on the node can be updated in time is closely related to the access capability and the activity of the node for data access, and both are in a direct proportion relationship, in order to embody the direct proportion relationship, the product of the node credit currency and the node activity is called the credit activity; when the responded replica nodes are selected, the replica node sets are sequenced according to the credit activity of the nodes; and then selecting the copies on the nodes with high credit activity from the sorted set according to the number of the copies needing to be accessed to participate in data response.
Example 6
Referring to fig. 1-4, the strong-consistency multi-copy data access response method in the distributed storage environment of the present invention mainly selects a plurality of copy nodes with strong node capability and high liveness for response according to the service capability and the active capability of the nodes in the dynamic distributed storage environment and the expectation of the user for accessing data, and replaces the traditional data response mode without difference participating in service, so as to achieve the objectives of reducing the waiting time for accessing multi-copy data and reducing the communication overhead of the system.
Referring to fig. 1, the present invention is mainly divided into three parts, namely, characterizing the service capability of a node by using credit money, measuring the activity of the node by using money transaction amount, and determining the copy needing to participate in a response. Firstly, according to the characteristics of nodes influencing data access, the influence of the characteristics on the data access capacity is evaluated by using a class multivariate discriminant z-score model, and the comprehensive influence is marked by adopting a credit currency issuing mode; then, according to the conditions that the node provides service and obtains service in unit time, the activity of the node is marked by adopting a mobile credit monetary value; and finally, determining the copy needing to participate in the response according to the node activity, the node credit currency and the expectation of the user on the data freshness.
Some parameters involved in the present invention are:
the nodes in the dynamic distributed storage are represented as { Pn 1 ,Pn 2 ,Pn 3 ,…Pn N };
1. Time period U T : a time constant customized by the user; represents a periodic time unit, such as 1 minute;
2. node available computing power Cu i : represents node Pn i At a unit time T i Available computing resources, represented as:
Figure BDA0003048596880000051
wherein, cu i,used Represents Pn i Computing resource already in use, cu i,all Represents Pn i The total available computing resources. Cu i The larger the size, the more powerful the node is at handling data accesses.
3. Node available storage capacity Ns i : represents node Pn i At a unit time T i Available storage resources, denoted as;
Figure BDA0003048596880000061
among them, ns i,used Represents Pn i Storage resources, ns, that have already been used i,all Represents Pn i The total available storage resources. Ns (natural gas) i The larger the data storage capacity of the node, the more data will be stored to the node.
4. Available bandwidth Nw of node i : represents node Pn i At a unit time T i Available bandwidth resources, denoted as;
Figure BDA0003048596880000062
wherein, nw i,used Represents Pn i Already occupied bandwidth resource, nw i,all Represents Pn i The total available bandwidth resources. Nw i The larger the node, the stronger the communication capability, the shorter the latency in processing the data access.
5. Node load Lq i : represents node Pn i At a unit time T i Load size, expressed as:
Figure BDA0003048596880000063
wherein, V i,f Indicating the passage of Pn per unit time i The forwarding amount of (2); f. of j,c Represents Pn i Upper data f j The amount of requests per unit time; m represents Pn i The amount of data on;
Figure BDA0003048596880000064
represents Pn i Requested amount of all data in unit time; eta (eta is more than or equal to 1) is a weighted value and represents that the load of the node is more influenced by the access quantity received by the local copy; v i,max Represents Pn i The maximum number of requests a node can normally respond to per unit of time. Then, there is, lq i The larger the size, the less capable the representative node is to handle data accesses in a timely manner.
6. Active neighbor ratio An i : represents node Pn i Node connected with it in unit time T i An activity coefficient, expressed as;
Figure BDA0003048596880000065
wherein N is conn Represents node Pn i At a unit time T i The number of neighbor nodes (i.e., the number of nodes directly connected to other nodes); n is a radical of av Representing the average number of nodes directly connected to other nodes in a P2P network environment. An i The larger the data access service is on behalf of the node.
7. Average data request delay rate Da i : represents node Pn i The average duration of data when providing data access service is expressed as:
Figure BDA0003048596880000071
wherein, t j The access delay of a request task j is pointed, and K represents the number of the request tasks; da (Da) av Representing all nodes in P2PAverage delay of. Apparently, da i Smaller data indicates that the data on the node can be responded to in time.
8. Node continuous service duration St i : represents node Pn i The duration of service in the P2P network is expressed as:
St i =(n+1)×U T n∈N
wherein n is a natural number; each pass through U T N plus 1 when Pn i When rejoining the P2P system after leaving, st i =0。St i The larger the size, the longer the data access service is continuously provided on behalf of the node.
9. User request data freshness expectation Fe f : indicating that the user expects the data to be the most recent data when requesting the data f. The value being a probability value, e.g. Fe f =99.9999%。
The invention utilizes the idea that credit currency represents the service capability of the nodes:
firstly, according to some factors influencing data access by nodes in a dynamic distributed environment, objectively evaluating the factors by using a method similar to multivariate discriminant z-score; the influence degrees of different factors on data access are distinguished by weight; the "Zeta" value of the node is then obtained using the multivariate discriminant z-score method and is labeled in the form of the dispensed currency.
Referring to fig. 2, the steps for characterizing the service capability of the node by using the credit currency are as follows:
1. and the nodes adopt a self-adaptive mode to acquire the attribute influencing data access.
The attributes assessed by the invention include: node available computing power Cu i Node available storage capacity Ns i Node available bandwidth Nw i Node load Lq i Active neighbor ratio An i Average data request latency Da i And node duration of continuous service St i
2. And objectively evaluating the attributes according to the influence of the attributes on the data access service.
The invention carries out scoring by adopting a method similar to a multivariate discriminant z-score, and the scoring rule is as follows:
Figure BDA0003048596880000081
note that: each item of score value is just one scoring example set by the invention; in practical application, different scoring modes (such as expert valuation) can be adopted for scoring according to actual conditions.
3. And carrying out weight distribution on the investigation attributes.
According to the z-score method, different weights are added to the above factors to distinguish the difference of the influence of each attribute on data access, and the weights are assigned as follows:
Figure BDA0003048596880000091
4. obtaining Zeta of each node according to a multivariate discriminant z-score method i "value". The calculation formula is as follows:
Figure BDA0003048596880000092
5. and expressing the service capability of the node by using credit currency. By Cm i Represents node Pn i The credit currency of (3) is issued according to the following rules:
Cm i =β×Zeta i (β>0) (3)
wherein β is a customized constant value, which can be adjusted according to the actual situation, and in the present invention, β =1 is set. According to formula (3), if node Pn i Cm of i The larger the service willingness of the node, and vice versa.
Description of the drawings: each node according to the attribute value of interest zeta i The values adopt a dispersion self-adaptive mode, and in each period U T Evaluating the self; and then issuing the credit currency to the user according to the evaluation result. Dispensed currency does not accumulateI.e. in the period U T Does not accumulate to U T+1 And (4) period. And in each dispensing, the currency in the previous period is cleared and then dispensed again.
The invention utilizes the idea of measuring the node liveness by using the currency transaction amount:
if a node successfully provides data access service to other nodes, the node charges credit money for other nodes; if one node successfully asks other nodes for the data access service, credit currency is provided for the other nodes. The invention refers to the sum of the two as the liquidity credit fund, which is used for indicating the node activity.
Referring to fig. 3, the steps of measuring the node activity by using the currency transaction amount are as follows:
6. earn credit money.
When node Pn i When providing data access service to other nodes (note: providing data access service to other nodes herein means not only accessing duplicate data on the node but also including data request service forwarded by the node), the other nodes requesting data service are charged with delta i A credit currency. By ECm i Indicating earning a credit point, pn i The formula for earning credit money each time is as follows:
ECm i =ECm ii δ i ≥1 (4)
wherein, delta i Is a constant, e.g. delta i =1, which collects money from the other party according to the actual service situation.
7. The credit currency is paid.
When node Pn i When the data is requested to be accessed from other nodes (note: the data service requested from other nodes not only needs to access some copy data by the node itself but also needs to forward some data by the node itself), the node providing the data service is paid for s i A credit currency. By CCm i Credit currency representing a claim, then Pn i The formula for paying the credit money each time is as follows:
CCm i =CCm ii ε i ≥1 (5)
wherein epsilon i Is a constant, e.g. epsilon i =1, paying epsilon to the other party depending on the actual service conditions i And (4) the currency.
8. And calculating the amount of the node floating fund.
The invention refers to the sum of earning money and paying money as node mobile credit money; by Wf i Represents node Pn i The liquidity of (b) according to equations (3) and (4), the liquidity equation is:
Wf i =|ECm i |+|CCm i | (6)
according to equation (6), if node Pn i Wf of (b) i The more services the node is represented to participate in, the higher the probability that the copy on the node is the most up-to-date copy.
9. And calculating the credit activity of the nodes.
With NCA i Representing node Pn i The credit activity of (2) is calculated by the following formula:
Figure BDA0003048596880000101
note that: due to the characteristic of dynamic distributed storage, in the step, the nodes adopt a self-adaptive mode to count the self node flow fund amount and the node credit activity degree in each period.
The invention determines the idea of participating in the response copy:
firstly, the invention acquires all replica nodes of the data to be accessed, obtains the relationship between the credit activity and the probability that the replica is the latest replica according to the credit activity of each node, and then obtains the number of the replicas needing to participate in data response according to the expectation of the freshness of the accessed data of a user and a probability formula. The probability that the copies on the nodes can acquire updates in time is closely related to the access capability and the activity of the nodes for data access, and is in direct proportion. In order to embody the direct proportion relation, the product of the credit currency of the node and the activity of the node is called credit activity. When the responded replica nodes are selected, the replica node set is sequenced according to the credit activity of the nodes; and then selecting the copies on the nodes with high credit activity from the sorted set according to the number of the copies needing to be accessed to participate in data response.
Referring to fig. 4, the step of determining participation in the reply copy is:
10. the set of all replicas of the requested data f is determined. Assuming the number of copies of f is n, re is used in the present invention f Representing the replica set, then:
Re f ={Re f,i |Re f,1 ,Re f,2 ,Re f,3 ,…,Re f,N }。
11. obtaining Re f The set of nodes. Assuming that the copies are stored on different nodes, the invention uses Pn f Representing a set of nodes, then:
Pn f ={Pn f,i |Pn f,1 ,Pn f,2 ,Pn f,3 ,…,Pn f,n }
12. obtaining Pn f And the credit activity of the node where the set is located. NCA for use in the invention f The node credit activity set representing the node set comprises:
NCA f ={NCA f,i |NCA f,1 ,NCA f,2 ,NCA f,3 ,…,NCA f,n }
13. the probability that each copy in the data f is the most recent copy is calculated.
According to the formula (3) and the formula (6), when the node Pn i NCA of (2) i The larger the node is, the stronger the willingness of the node to service is, and the frequency of the node participating in the service is high. If the service willingness is strong, the node can provide larger data access service for other nodes; the high activity of the node means that the probability that the copy on the node is the newest copy is higher, that is, the credit activity of the node is in direct proportion to the probability that the copy on the node is updated. By P i Representing the replica data freshness probability, the probability that a replica is the newest replica can be expressed in relation to the credit activity by the following formula:
f(P i )=α×NCA f,i α>0 (8)
wherein, alpha is a constant, and represents that the freshness probability and the node credit activity are different by a constant level. In the present invention, let α =1; that is, the probability that a replica is the most current replica can be replaced with the ratio of node credit activity of a node in the replica node set. Then, the probability that each copy in the data f is the latest copy can be represented by the following equation (9):
Figure BDA0003048596880000121
14. the computation requires several copies to ensure the user's expectations for data freshness.
Setting the probability that a copy on a node is the newest copy as an independent event, then given k' copies, the probability that it can be obtained is expressed by the following equation (10):
Figure BDA0003048596880000122
that is, when data f is requested to be accessed, k ' copies can be responded to obtain Fe ' meeting the user expectation of data freshness ' f . Therefore, if the user's desire for access data freshness is Fe f According to the formula (10), the number of copies to be responded to can be obtained by the following formula (11):
Figure BDA0003048596880000123
15. a copy that requires a response is determined. Because the probability that the copy generated by the high credit activity node is the newest copy is higher, when the data is requested to be accessed, the invention firstly uses the node set Pn f According to NCA f In descending order
Figure BDA0003048596880000124
And is assembled by nodes
Figure BDA0003048596880000125
The copies on the first k nodes in the set respond.

Claims (7)

1. A strong consistency multi-copy data access response method in a distributed storage environment is characterized in that: according to the service capability and the active capability of the nodes in the dynamic distributed storage environment and the expectation of a user for accessing data, a plurality of replica nodes with strong node capability and high activity are selected to respond, and the traditional data response mode of indifference participation in service is replaced, so that the aims of reducing the waiting time of access of multi-replica data and reducing the communication overhead of a system are fulfilled, and the method comprises the following steps:
firstly, according to the characteristics of the nodes influencing data access, the service capability of the nodes is represented by credit currency values, the influence of the characteristics on the data access capability is evaluated, and the comprehensive influence is marked by adopting a credit currency issuing mode;
then, according to the conditions that the node provides service and obtains service in unit time, measuring and marking the activity of the node by adopting a mobile credit monetary value;
finally, according to the node activity, the credit monetary value of the node and the expectation of the user to the data freshness, determining the copy needing to participate in the response, and the steps are as follows:
firstly, acquiring all replica nodes of data to be accessed, and acquiring the relationship between the credit activity and the probability that a replica is the latest replica according to the credit activity of each node;
then, the number of copies needing to participate in data response is obtained according to the expectation of the user on the freshness of the access data and a probability formula.
2. A strongly consistent multi-copy data access response method in a distributed storage environment as claimed in claim 1, wherein: according to some factors influencing data access by nodes in a dynamic distributed environment, objective evaluation is carried out on the factors by utilizing a multivariate discriminant z-score model; the influence degrees of different factors on data access are distinguished by weight; the "Zeta" value of the node is then obtained using the multivariate discriminant z-score method and is labeled in the form of the dispensed currency.
3. A strongly consistent multi-copy data access response method in a distributed storage environment according to claim 1 or 2, wherein: measuring node activity by using currency transaction amount: if a node successfully provides data access service to other nodes, the node charges credit money for other nodes; if one node successfully asks for the data access service from other nodes, credit money is provided for other nodes; the sum of these two is called the liquidity credit monetary value and is used to indicate node liveness.
4. A strongly-consistent multi-copy data access response method in a distributed storage environment as claimed in claim 3, wherein: because the copy on the node can acquire the updated probability in time and the access capability and the activity of the node to the data access are closely related and are in a direct proportion relationship, in order to embody the direct proportion relationship, the product of the credit currency of the node and the activity of the node is called the credit activity; when the responded replica nodes are selected, the replica node sets are sequenced according to the credit activity of the nodes; and then selecting the copy on the node with high credit activity from the sorted set according to the number of the copies needing to be accessed to participate in data response.
5. A strongly consistent multi-copy data access response method in a distributed storage environment according to claim 1, 2 or 4, wherein: and (3) characterizing the service capability of the node by using credit currency:
1) The node acquires the attribute influencing data access in a self-adaptive mode;
2) Objectively evaluating the attributes according to the influence of the attributes on the data access service, and grading by adopting a multivariate discriminant z-score method;
3) And (3) carrying out weight distribution on the investigation attributes:
according to the factors influencing data access by nodes in the dynamic distributed environment and the z-score method, different weights are added to various influencing factors to distinguish the difference of the influence of each attribute on the data access, and the weight distribution is shown as the following formula:
Figure FDA0004003147950000021
4) Obtaining Zeta of each node according to a multivariate discriminant z-score method i "value; the calculation formula is as follows:
Figure FDA0004003147950000022
5) The service capability of the node is represented by credit currency:
by Cm i Represents node Pn i The credit currency of (2) is as follows:
Cm i =β×Zeta i (β>0) (3)
wherein β is a custom constant value, which can be adjusted according to the actual situation, in the present invention, β =1 is set, and according to the formula (3), if the node Pn i Cm of i The larger the node is, the stronger the service willingness of the node is, and vice versa;
in the formula (1), the node has available computing power Cu i Node available storage capacity Ns i Node available bandwidth Nw i Node load Lq i Active neighbor ratio An i Average data request latency Da i And node duration of continuous service St i
6. A strongly-consistent multi-copy data access response method in a distributed storage environment as claimed in claim 5, wherein: the steps of measuring the node activity by using the currency transaction amount are as follows:
1) Earning credit money:
when node Pn i To other nodesWhen providing data access service, receiving delta from other nodes requesting data service i A credit currency; by ECm i Indicating that a credit point is earned, then Pn i The formula for earning credit money each time is as follows:
ECm i =ECm ii δ i ≥1 (4)
wherein, delta i Is a constant, set delta i =1, which collects money from a partner according to an actual service situation;
2) Paying credit currency:
when node Pn i When the data is requested to be accessed from other nodes, the nodes providing the data service are paid with the epsilon i A credit currency; by CCm i Credit currency representing a claim, then Pn i The formula for paying the credit money each time is as follows:
CCm i =CCm ii ε i ≥1 (5)
wherein epsilon i Is a constant, let ε i =1, paying epsilon to the other party depending on the actual service conditions i A currency;
3) Calculating the node flow fund amount:
the sum of earning money and paying money is called node flowing credit money; by Wf i Represents node Pn i The liquidity of (b) according to equations (3) and (4), the liquidity equation is:
Wf i =|ECm i |+|CCm i | (6)
according to equation (6), if node Pn i Wf of i The larger the value is, the more services the node participates in, and the higher the probability that the copy on the node is the latest copy is;
4) The node adopts a self-adaptive mode to count the self node flowing fund amount and the node credit activity in each period, and uses NCA i Represents node Pn i The credit activity of (2) is calculated by the following formula:
Figure FDA0004003147950000031
7. a strongly-consistent multi-copy data access response method in a distributed storage environment as claimed in claim 6, wherein: the step of determining the participation response copy comprises the following steps:
1) Determining all duplicate sets of the request data f:
let the number of f copies be n, re is used in the invention f Representing the replica set, then:
Re f ={Re f,i |Re f,1 ,Re f,2 ,Re f,3 ,...,Re f,n };
2) Obtaining Re f The node set is as follows:
assuming that these copies are stored on different nodes, with Pn f Representing a set of nodes, then:
Pn f ={Pn f,i |Pn f,1 ,Pn f,2 ,Pn f,3 ,...,Pn f,n };
3) Obtaining Pn f Credit activity of the node where the set is located:
with NCA f The node credit activity set representing the node set comprises:
NCA f ={NCA f,i |NCA f,1 ,NCA f,2 ,NCA f,3 ,...,NCA f,n };
4) Calculate the probability that each copy in data f is the most recent copy:
according to the formula (3) and the formula (6), when the node Pn i NCA of (a) i The larger the node is, the stronger the service will of the node is, the frequency of the node participating in the service is high, and P is used i Expressing the freshness probability of the copy data, and expressing the relation between the probability that the copy is the newest copy and the credit activity by the following formula:
f(P i )=α×NCA f,i α>0 (8)
wherein, alpha is a constant and represents that the freshness probability and the node credit activity are different by a constant level; let α =1; namely, the probability that the replica is the latest replica is replaced by the ratio of the credit activity of the node in the replica node set; then, the probability that each copy in the data f is the latest copy can be represented by the following equation (9):
Figure FDA0004003147950000041
5) Computing requires several copies to ensure the user's expectations of data freshness:
setting the probability that a copy on a node is the newest copy as an independent event, then given k' copies, the probability that it can be obtained is expressed by the following equation (10):
Figure FDA0004003147950000042
that is, when data f is requested to be accessed, k 'copies are responded to obtain Fe' satisfying the user's expectation of data freshness' f (ii) a If the expectation of the user on the freshness of the access data is Fe f According to the formula (10), the number of copies to be responded to can be obtained by the following formula (11):
Figure FDA0004003147950000043
6) Determining the copy that needs to respond:
since the probability that the replica generated by the high credit activity node is the newest replica is high, when the data is requested to be accessed, the node set Pn is firstly selected f According to NCA f In descending order
Figure FDA0004003147950000044
And is assembled by nodes
Figure FDA0004003147950000045
The copies on the first k nodes in the set respond.
CN202110488820.XA 2021-04-30 2021-04-30 Strong-consistency multi-copy data access response method in distributed storage environment Active CN113127267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110488820.XA CN113127267B (en) 2021-04-30 2021-04-30 Strong-consistency multi-copy data access response method in distributed storage environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110488820.XA CN113127267B (en) 2021-04-30 2021-04-30 Strong-consistency multi-copy data access response method in distributed storage environment

Publications (2)

Publication Number Publication Date
CN113127267A CN113127267A (en) 2021-07-16
CN113127267B true CN113127267B (en) 2023-02-17

Family

ID=76781516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110488820.XA Active CN113127267B (en) 2021-04-30 2021-04-30 Strong-consistency multi-copy data access response method in distributed storage environment

Country Status (1)

Country Link
CN (1) CN113127267B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114415978B (en) * 2022-03-29 2022-06-21 维塔科技(北京)有限公司 Multi-cloud cluster data reading and writing method and device, storage medium and electronic equipment
CN115811525B (en) * 2023-02-09 2023-06-16 杭州合众数据技术有限公司 Data exchange and processing method based on distributed architecture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7734820B1 (en) * 2003-12-31 2010-06-08 Symantec Operating Corporation Adaptive caching for a distributed file sharing system
CN112492026A (en) * 2020-11-26 2021-03-12 郑州师范学院 Hybrid self-adaptive copy consistency updating method in dynamic cloud storage environment
CN112532581A (en) * 2020-10-26 2021-03-19 南京辰阔网络科技有限公司 Improved PBFT consensus method based on consensus participation and transaction activity

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097285A1 (en) * 2003-10-30 2005-05-05 Christos Karamanolis Method of determining data placement for distributed storage system
CN102306157B (en) * 2011-07-12 2012-11-14 中国人民解放军国防科学技术大学 Energy-saving-oriented high-reliability data storage method in data center environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7734820B1 (en) * 2003-12-31 2010-06-08 Symantec Operating Corporation Adaptive caching for a distributed file sharing system
CN112532581A (en) * 2020-10-26 2021-03-19 南京辰阔网络科技有限公司 Improved PBFT consensus method based on consensus participation and transaction activity
CN112492026A (en) * 2020-11-26 2021-03-12 郑州师范学院 Hybrid self-adaptive copy consistency updating method in dynamic cloud storage environment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ShenYao Sun ; XianJi Wang.RPCC: A Replica Placement Method to Alleviate the Replica Consistency under Dynamic Cloud.《2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics)》.2020, *
数据供应链模型及服务质量保障研究;李鹏;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20180915;全文 *
数据副本一致性的算法研究与实现;余安东等;《计算机应用研究》;20200630;全文 *

Also Published As

Publication number Publication date
CN113127267A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
Gao et al. Truthful incentive mechanism for nondeterministic crowdsensing with vehicles
CN113127267B (en) Strong-consistency multi-copy data access response method in distributed storage environment
Jiang et al. Scalable mobile crowdsensing via peer-to-peer data sharing
Dai et al. Collaborative caching in wireless video streaming through resource auctions
CN107295109A (en) Task unloading and power distribution joint decision method in self-organizing network cloud computing
CN108322541B (en) Self-adaptive distributed system architecture
Haddi et al. A survey of incentive mechanisms in static and mobile p2p systems
Tan et al. A payment-based incentive and service differentiation mechanism for peer-to-peer streaming broadcast
Lim et al. Incentive mechanism design for resource sharing in collaborative edge learning
Zhang et al. Price learning-based incentive mechanism for mobile crowd sensing
Zhou et al. Online auction for scheduling concurrent delay tolerant tasks in crowdsourcing systems
Liwang et al. Resource trading in edge computing-enabled IoV: An efficient futures-based approach
CN112492026B (en) Hybrid self-adaptive copy consistency updating method in dynamic cloud storage environment
CN111932106A (en) Effective and practical cloud manufacturing task and service resource matching method
Wang et al. Joint service caching, resource allocation and computation offloading in three-tier cooperative mobile edge computing system
An et al. Market Based Resource Allocation with Incomplete Information.
Chen et al. A pricing approach toward incentive mechanisms for participant mobile crowdsensing in edge computing
Kong et al. Incentivizing federated learning
CN111047040A (en) Web service combination method based on IFPA algorithm
CN106506229A (en) A kind of SBS cloud applications adaptive resource optimizes and revises system and method
CN115361392A (en) Control method, system and storage medium of computing power network based on block chain
US20230259898A1 (en) Virtual currency based on communities providing for each others needs
CN109101329A (en) The finegrained tasks distribution method and system of data are acquired by multiple mobile terminals
Li et al. A mechanism of bandwidth allocation for peer-to-peer file-sharing networks via particle swarm optimization
CN113377537A (en) On-line multi-target resource allocation method combining bipartite graph matching method and constraint solver

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant