WO2021238305A1 - 一种基于强化学习的通用分布式图处理方法及*** - Google Patents

一种基于强化学习的通用分布式图处理方法及*** Download PDF

Info

Publication number
WO2021238305A1
WO2021238305A1 PCT/CN2021/076484 CN2021076484W WO2021238305A1 WO 2021238305 A1 WO2021238305 A1 WO 2021238305A1 CN 2021076484 W CN2021076484 W CN 2021076484W WO 2021238305 A1 WO2021238305 A1 WO 2021238305A1
Authority
WO
WIPO (PCT)
Prior art keywords
vertex
data processing
processing center
probability
graph
Prior art date
Application number
PCT/CN2021/076484
Other languages
English (en)
French (fr)
Inventor
周池
罗鹃云
毛睿
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2021238305A1 publication Critical patent/WO2021238305A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of large-scale graph segmentation processing, and in particular to a general distributed graph processing method and system based on reinforcement learning.
  • Pregel the traditional mainstream large-scale graph processing systems Pregel, PowerGraph, etc. all use heuristic segmentation algorithms.
  • Pregel's default partitioning method is to achieve the optimization goal of enhancing the locality of the partition and reducing the network traffic between computing nodes by modulo the hash value of the vertex id.
  • PowerGraph uses a greedy point segmentation method. For a newly added edge, if one of its vertices already exists on a certain machine, the edge will be allocated to the corresponding machine, thereby minimizing the cross-machine The number of edges reduces the amount of communication. This heuristic graph segmentation algorithm is easy to fall into the local optimal solution, and some better solution spaces have not been searched.
  • Phamet et al. proposed a graph partitioning method, which specifically refers to allocating operations (nodes) on the Tensorflow calculation graph to available devices to minimize the calculation time. They use a reinforcement learning model and use the seq2seq strategy to allocate operations. This method is only suitable for the case where the number of graph nodes is small, so that the strategy space will not be too large, and this method is suitable.
  • Naziet et al. proposed an algorithm GAP that uses deep learning to solve the problem of graph partitioning. GAP is an unsupervised learning method that treats the problem of balanced graph partitioning as a vertex classification problem to solve. But if the optimization goal involves the heterogeneity of network prices and bandwidth, the calculation of nodes' embeddings is very complicated.
  • the application scenarios of these existing machine learning models for graph segmentation are relatively single. When the graph scale becomes larger and the optimization goal becomes more complex, these methods cannot solve the graph segmentation problem well.
  • the technical problem to be solved by this application is to overcome the defects of the graph cutting model in the prior art that it is easy to fall into the local optimal solution, the use scene is single and the segmentation effect is poor, thereby providing a general distributed graph processing method based on reinforcement learning And system.
  • an embodiment of the present application provides a general distributed graph processing method based on reinforcement learning, which includes the following steps: a distributed data processing center is defined based on graph theory to form a distributed graph, and a preset graph is used to cut the model and the preset graph. Processing model, cutting distributed graphs based on preset constraints;
  • the learning automaton selects the data processing center with the highest probability for the vertex according to the preset action selection method ;
  • the learning automaton will select the data processing center with the highest probability for the vertex, and compare it with the data processing center where the vertex is currently located. If it is inconsistent, it will move the vertex to the data processing center corresponding to the action, otherwise it will not do any operation;
  • Each learning automaton calculates the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition
  • Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
  • the learning automaton calculates all data processing center correspondences for its vertices based on the weight vector Enhanced signal;
  • the learning automaton updates the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guides the next action selection to iterate;
  • the preset graph cutting model is a hybrid-cut graph cutting model
  • the preset graph processing model is a GAS graph processing model
  • the GAS graph processing model is used to iteratively perform vertex calculation
  • the constraint condition is funding The budget cost and data transmission time are the smallest.
  • the data transmission time is expressed as the sum of the data transmission time of the collection phase and the application phase, and the calculation formula of the data transmission time T(i) of the i-th iteration is:
  • a v (i) represents the amount of data sent from the master vertex v to each replica in the application phase of the i-th iteration
  • U r /D r represents the upload/download bandwidth of DCr
  • R v represents a collection of data processing centers DC containing copies of v
  • the communication cost between the data processing centers DC is the sum of the cost of uploading data in the collection phase and the application phase.
  • the unit cost of uploading data from DC r to the network is P r , and the capital budget cost is expressed as:
  • B is the capital budget for using network resources.
  • initializing the probability of each vertex in each data processing center, and the step of selecting the data processing center with the highest probability for the vertex according to a preset action selection method by the learning automaton includes:
  • the probability P(v i ) of initializing vertex v in the data processing center DC i is M is the number of distributed DCs;
  • the probability distribution acquisition vertex for each vertex of a cumulative probability of a data processing center DC, Q (v i) represents the cumulative probability of vertex v in a data processing center DC I, wherein,
  • initializing the probability of each vertex in each data processing center, and the step of selecting the data processing center with the highest probability for the vertex according to a preset action selection method by the learning automaton includes:
  • a trial and error parameter ⁇ randomly generate a floating-point number r ⁇ [0,1], if r ⁇ , the learning automaton randomly selects a DC for its vertex; if r> ⁇ , the learning automaton is its vertex Select the data processing center DC with the largest P(v i) value.
  • each learning automaton calculates the score when its vertex is in each data processing center, which is calculated by the following formula:
  • B represents the capital budget for using network resources
  • T b represents the overall data transmission time of the system before the score is calculated
  • C b represents the overall data transmission cost of the system before the score is calculated.
  • Indicates the overall data transmission time of the system when the vertex is at DCi Indicates the overall data transmission cost of the system when the vertex is calculated at DCi.
  • tw and cw respectively represent the time weight and capital cost weight; when C b ⁇ B, cw decreases from 1 to 0 evenly as the number of iterations increases, and tw increases with iteration The increase of the number of times increases uniformly from 0 to 1; when C b ⁇ B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations.
  • Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
  • the learning automaton calculates all data processing center correspondences for its vertices based on the weight vector.
  • the steps to strengthen the signal include:
  • the reference standard for calculating the weight vector is calculated by the following formula:
  • the learning automaton calculates the corresponding reinforcement signal according to the weight vector, the calculation formula is as follows:
  • the regularization weight which is divided into two parts: the reward and the penalty regularization weight, among which: Represents the regularization weight of the vertex v for DCi, calculated by the following formula:
  • Neg() is the inverse function, Represents the enhanced signal of vertex v to the data processing center DCi, Represents the weight vector of vertex v to DC i, Represents the weight vector of vertex v to DCk;
  • the probability of vertex v is updated according to the regularization weight, and the update sequence is performed according to the reward regularization weight for the data processing center DC from small to large.
  • the update formula Given the vertex v and DC i , The smallest of all reward regularization weights, priority use Probability update for all DCs, the update formula is as follows:
  • the update order of the DC is based on the penalty regularization weight for DC from small to large, assuming that a given vertex v and DC i , DC k, The largest of all penalty regularization weights, The smallest of all penalty regularization weights, the priority is used Probability update for all DCs, the update formula is as follows:
  • represents the penalty weight
  • the embodiments of the present application provide a general distributed graph processing system based on reinforcement learning, including:
  • Distributed graph definition and constraint condition setting module used to define distributed data processing center based on graph theory to form distributed graph, use preset graph cutting model and preset graph processing model, and cut distributed graph based on preset constraint conditions ;
  • the action selection module is used to assign a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is vertex selection according to the preset action selection method The most probable data processing center;
  • Vertex migration module the learning automaton is used to select the data processing center with the highest probability for the vertices and compare it with the data processing center where the vertices are currently located. Do any operation
  • a score calculation module each learning automaton is used to calculate the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition;
  • each learning automaton is used to propagate the data processing center number corresponding to the maximum score to the learning automata of its vertex neighbors, and generate a corresponding weight vector, and the learning automaton is its vertex according to the weight vector Calculate the enhanced signals corresponding to all data processing centers;
  • Probability update module the learning automaton is used to update the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guide the next action selection to iterate;
  • the segmentation result acquisition module is used to generate the segmentation result of the distributed graph that meets the preset constraint condition until the preset number of iterations is reached or the constraint condition converges.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the enhancement based on the first aspect of the embodiments of the present application. Learned general distributed graph processing method.
  • an embodiment of the present application provides a computer device, including: a memory and a processor, the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes all The computer instructions are described to execute the general distributed graph processing method based on reinforcement learning in the first aspect of the embodiments of the present application.
  • the general distributed graph processing method and system based on reinforcement learning provided in this application define distributed data processing centers based on graph theory to form distributed graphs, use preset graph cutting models and preset graph processing models, and utilize preset constraint conditions
  • the method of reinforcement learning cuts the distributed graph, assigns a learning automaton to each vertex, and finds the most suitable data processing center for the vertex through training.
  • the possibility of each vertex in all data processing centers obeys a certain probability distribution.
  • the system includes five steps in each iteration process: action selection, vertex migration, score calculation, enhanced signal calculation, and probability update. When the maximum number of iterations is reached or the constraint conditions are converged, the end of the iteration is judged.
  • the distributed graph processing model formed by the general distributed graph processing method provided in this application is a distributed graph model with good adaptability. For different optimization goals, only different score calculation schemes and different weight vectors need to be designed.
  • FIG. 1 is a flowchart of a specific example of a general distributed graph processing method based on reinforcement learning in an embodiment of the application;
  • FIG. 3 is a functional block diagram of a specific example of a general distributed graph processing system based on reinforcement learning in an embodiment of the application;
  • Fig. 4 is a composition diagram of a specific example of a computer device provided by an embodiment of the application.
  • the embodiment of the application provides a general distributed graph processing method based on reinforcement learning, which can be applied to different optimization goals, for example, in the performance and cost optimization, load balancing, and performance optimization of a geographically distributed graph processing system, such as As shown in Figure 1, it includes the following steps:
  • Step S10 Define a distributed data processing center based on graph theory to form a distributed graph, use a preset graph cutting model and a preset graph processing model, and cut the distributed graph based on preset constraint conditions.
  • the embodiment of the application takes the geographically distributed graph segmentation process as an example. It is assumed that the vertex data is not backed up on the data processing center (hereinafter referred to as DC), and a machine can only perform graph processing tasks for one vertex at a time; each DC
  • the computing resources are not limited, and the data communication between DCs is the performance bottleneck of geographically distributed graph processing; assuming that the connection between DCs is free from network congestion, the network bottleneck only comes from the uplink between DC and WAN Uplink and downlink bandwidth; only charge for uploading data from DC to WAN.
  • V is the set of vertices
  • E is the set of edges
  • DC geographically distributed data processing centers
  • the embodiment of this application uses a hybrid-cut graph cutting model, which follows the following rules: Given a threshold theta, for a vertex v, if its in-degree is greater than or equal to theta, it is called a high-degree vertex, on the contrary, it is called a high-degree vertex. low-degree vertex. If the vertex v is low-degree, all its incoming edges are assigned to the DC where it is located. If the vertex v is high-degree, its incoming edge will be assigned to the DC where the opposite vertex of the edge is located.
  • the embodiment of this application uses a GAS graph processing model, which iteratively performs user-defined vertex calculations.
  • GAS global analysis system
  • each active vertex collects neighbors' data, and the sum function (Sum) is defined as aggregating the received data into a gathered sum.
  • each active vertex uses aggregation and updates its data.
  • each active vertex activates its neighbors executed in the next iteration.
  • a global barrier is defined as ensuring that all vertices complete their calculations before starting the next step.
  • the transmission time in the i-th iteration can be expressed as the sum of the data transmission time in the gather phase and the apply phase.
  • the calculation formula for the transmission time of the i-th iteration is:
  • a v (i) represents the amount of data sent from the master vertex v to each replica in the application phase of the i-th iteration
  • U r /D r represents the upload/download bandwidth of DCr
  • R v represents a collection of data processing centers DC containing copies of v
  • the communication cost between DCs is the sum of the cost of uploading data in the gather phase and the apply phase.
  • the unit cost of uploading data from DC r to the Internet is defined as P r .
  • the total communication cost can be expressed as:
  • the geographical distribution map segmentation problem is expressed as a constrained optimization problem, that is, the constraint conditions are:
  • the geographical distribution map segmentation problem to be solved is the optimization problem under the constraint conditions described in formulas (3) and (4).
  • Step S11 Assign a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is the one with the highest probability of vertex selection according to the preset action selection method Data processing center.
  • P(v i ) represents the probability of vertex v at DC i, which is initialized as M is the number of distributed DC
  • Q (v i) represents the vertex v in a cumulative probability of DC I, calculated as follows:
  • the LA uses a roulette algorithm to select the appropriate action (DC) for its vertices.
  • LA first obtains the cumulative probability of the vertex for each DC according to the probability distribution of the vertex, and then randomly generates a floating-point number r ⁇ [0,1]. If r is less than or equal to Q(v 0 ), DC0 will be selected; if r is between Q(v k-1 ) and Q(v k ) (k ⁇ 1), DCk will be selected. In this way, an action with a higher probability has a greater chance of being selected, but an action with a lower probability may also be selected.
  • Step S12 The learning automaton will select the data processing center with the highest probability for the vertex and compare it with the data processing center where the vertex is currently located. If it is inconsistent, the vertex will be migrated to the data processing center corresponding to the action, otherwise no operation will be performed.
  • the action obtained in step S11 is compared with the DC where the vertex is currently located, and if it is inconsistent, the vertex is migrated to the DC corresponding to the action, otherwise, no operation is performed.
  • Step S13 Each learning automaton calculates the score when its vertex is in each data processing center, and the score is determined according to the preset constraint condition.
  • the score of the vertex at each DC is calculated for its vertex.
  • L v is defined as the DC where vertex v is currently located
  • T b is the overall data transmission time of the system before the score is calculated.
  • Formula (1) is calculated, It represents the data transmission time of the entire system when the calculation vertex is at DC i
  • C b represents the data transmission cost of the entire system before calculating the score, which is calculated according to formula (2)
  • the calculation method of is: move vertex v to DC i, then calculate according to formula (1) and formula (2), and finally move vertex v back to L v .
  • the calculation method is as follows:
  • B represents capital budget
  • tw and cw represent time weight and cost weight respectively.
  • the purpose is to prioritize the optimization of the overall communication cost of the graph processing system and explore more The graph partition state that can reduce the system cost; when C b ⁇ B, tw uniformly decreases from 1 to 0 with the increase of the number of iterations, and cw uniformly increases from 0 to 1 with the increase of the number of iterations, the purpose is to prioritize the optimization of graph processing The overall data transmission time of the system and the optimization speed of slowing down the transmission time, so as to achieve a better optimization effect.
  • Step 14 Each learning automaton propagates the data processing center number corresponding to the maximum score to the learning automaton of its vertex neighbors, and generates a corresponding weight vector.
  • the learning automaton calculates all data for its vertices based on the weight vector The enhanced signal corresponding to the processing center.
  • each LA will communicate with other LAs to generate enhanced signals for all DCs for its vertices. Before calculating the enhanced signals, it is necessary to calculate the weight vectors of the vertices for all DCs. After each LA has calculated the scores of all DCs, it will propagate the DC number corresponding to the maximum score to the LAs of its vertex neighbors, and these LAs immediately generate corresponding weight term vectors.
  • the definition ⁇ v represents the DC corresponding to the maximum score of vertex v
  • Nbr(v) represents the set of neighbor vertices of vertex v
  • the vertex v is moved to ⁇ v, and then to move the vertex u [rho] v data transmission time of the whole system;
  • the vertex v is moved to ⁇ v, then vertex u capital cost of the whole system to move the [rho] v;
  • Means that when the label is received from vertex u [rho] v v propagation of its neighbors, the reference standard for calculating the weight vector, is calculated as follows:
  • LA After calculating the weight vector of the vertex for all DCs, LA will calculate the corresponding enhancement signal according to the weight vector, the formula is as follows:
  • Step 15 The learning automaton updates the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guides the next action selection to iterate.
  • the LA will use the weight vector and the reinforcement signal obtained in step 14 to update the probability value of its vertex at each DC, so as to guide the next action selection.
  • the regularization weight needs to be calculated first, which is divided into two parts: the reward and the penalty regularization weight.
  • Neg() is the inverse function.
  • the calculation method is as follows:
  • the probability of the vertex v can be updated.
  • LA will first update the vertex for its enhanced signal as The update order of the DC is based on the regularization weight of the reward for the DC from small to large. Assuming that given vertex v and DC i, Among all the reward regularization weights, the smallest one will be used first Probability update for all DCs, the update formula is as follows:
  • formula (11) increases the probability of DC i and decreases the probability of other DCs. Then, LA will find larger ones in turn Then use it to update the probability of all DCs.
  • the beneficial effect of this implementation is that it can ultimately make The largest DC has the greatest probability.
  • LA will update those vertices for its enhanced signal as
  • the update order of the DC is based on the penalty regularization weight for the DC from small to large. Assuming that given vertex v and DC i and DC k, The largest of all penalty regularization weights, Among all the penalty regularization weights, the smallest one is used first Probability update for all DCs, the update formula is as follows:
  • represents the penalty weight
  • Indicates the probability of the vertex v for DC j in the nth iteration.
  • the above formula (12) lowers the probability of DC k and increases the probability of other DCs. Then, LA will find larger ones in turn And the corresponding DC k, then use Probability update is performed on all DCs.
  • the beneficial effect of this implementation is that it can ultimately make The smallest DC has the smallest probability.
  • Step 16 Until the preset number of iterations is reached or the constraint condition converges, a segmentation result of the distributed graph meeting the preset constraint condition is generated.
  • the action selection in the N+1 iteration will use the updated probability of the N iteration as a reference, and continue to perform vertex migration, score calculation, enhanced signal calculation, probability update, and the next iteration Wait until the iteration ends, and generate a geographically distributed graph segmentation result that satisfies the funding budget and has a very small data transmission time.
  • real graphics data sets are used for evaluation on real clouds and cloud simulators.
  • five real maps are used: Gnutella (GN), WikiVote ( WV), GoogleWeb (GW), LiveJournal (LJ), and Twitter (TW).
  • Real cloud experiments are carried out on Amazon EC2 and Windows Azure cloud platforms.
  • the GAS-based PowerGraph system is used to execute graph processing algorithms, including pagerank, Classic graph algorithms such as sssp and subgraph.
  • the distribution graph processing method provided by the embodiment of the application is integrated in PowerGraph, and the graph is segmented during loading.
  • the evaluation of real geographically distributed DCs and real graphics in simulation shows that, compared with the most advanced geographically distributed graphics processing system performance and cost optimization algorithm Geo-Cut, the distribution map processing method provided by the embodiments of this application can reduce Up to 72% of the data transmission time between DCs and up to 63% of the capital cost, and the load is relatively balanced.
  • the embodiments provided in this application can be applied to multiple scenarios. For example, Facebook receives terabytes of text, image, and video data from users all over the world every day. Facebook built four geographically distributed DCs to maintain and manage these data. If the load capacity and system response time of these DCs are considered, the method provided in the embodiment of the present application can be used to segment and optimize the graph, which can make the DC work stably and bring a good experience to the user. If considering network heterogeneity, cost budget and system performance in a geographically distributed environment, the method provided in the embodiment of this application can also be used to segment and optimize the graph, which can achieve a good performance improvement in both transmission time and cost budget. .
  • the embodiment of the present application only takes the performance and cost advantages of the geographically distributed graph cutting process system as an example to explain the working principle of the distribution graph processing method.
  • the processing model formed by the distributed graph processing method proposed in this embodiment is a general model. This model can not only solve the performance and cost optimization problems of the geographically distributed graph processing system, but also solve load balancing and performance optimization. For problems such as different optimization goals, it is only necessary to design different score calculation schemes and different weight vector calculation schemes.
  • the embodiment of the present application provides a general distributed graph processing system based on reinforcement learning, as shown in FIG. 3, including:
  • the distributed graph definition and constraint condition setting module 10 is used to define the distributed data processing center based on graph theory to form a distributed graph, use the preset graph cutting model and preset graph processing model, and perform the distributed graph based on preset constraint conditions. Cutting. This module executes the method described in step S10 in embodiment 1, which will not be repeated here.
  • the action selection module 11 is used to allocate a learning automaton to each vertex of the distributed graph, and initialize the probability of each vertex in each data processing center. Based on the initialized probability, the learning automaton is a vertex according to a preset action selection method Choose the data processing center with the highest probability. This module executes the method described in step S11 in embodiment 1, which will not be repeated here.
  • the vertex migration module 12 the learning automaton is used to select the data processing center with the highest probability for the vertex and compare it with the data processing center where the vertex is currently located. If it is inconsistent, then the vertex is migrated to the data processing center corresponding to the action, otherwise, Do nothing.
  • This module executes the method described in step S12 in embodiment 1, which will not be repeated here.
  • each learning automaton is used to calculate the score of its vertex in each data processing center, and the score is determined according to the preset constraint condition. This module executes the method described in step S13 in embodiment 1, which will not be repeated here.
  • Each learning automaton is used to propagate the data processing center number corresponding to the maximum score to the learning automaton to which the neighbors of its vertex belong, and generate a corresponding weight vector.
  • the learning automaton is based on the weight vector.
  • the vertex calculates the enhanced signals corresponding to all the data processing centers; this module executes the method described in step S14 in embodiment 1, which will not be repeated here.
  • Probability update module 15 the learning automaton is used to update the probability value of its vertex in each data processing center according to the weight vector and the reinforcement signal, and guide the next action selection to iterate; this module executes the steps in embodiment 1 The method described in S15 will not be repeated here.
  • the segmentation result acquisition module 16 is configured to generate a segmentation result of the distributed graph that meets the preset constraint condition until the preset number of iterations is reached or the constraint condition converges. This module executes the method described in step S16 in embodiment 1, which will not be repeated here.
  • the general distributed graph processing system based on reinforcement learning defines distributed data processing centers based on graph theory to form distributed graphs, uses preset graph cutting models and preset graph processing models, and utilizes preset constraint conditions
  • the method of reinforcement learning cuts the distributed graph, assigns a learning automaton to each vertex, and finds the most suitable data processing center for the vertex through training.
  • the possibility of each vertex in all data processing centers obeys a certain probability distribution.
  • the system includes five steps in each iteration process: action selection, vertex migration, score calculation, enhanced signal calculation, and probability update. When the maximum number of iterations is reached or the constraint conditions have converged, the end of the iteration is judged.
  • the distributed graph processing model formed by the general distributed graph processing method provided in this application is a general distributed graph model. For different optimization goals, only different score calculation schemes and different weight vectors need to be designed.
  • FIG. 4 An embodiment of the present application provides a computer device. As shown in FIG. 4, the device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in other ways. FIG. 4 uses a bus connection as an example .
  • the processor 51 may be a central processing unit (Central Processing Unit, CPU).
  • the processor 51 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), or Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • Chips such as other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, or a combination of the above types of chips.
  • the memory 52 can be used to store non-transitory software programs, non-transitory computer executable programs and modules, such as corresponding program instructions/modules in the embodiments of the present application.
  • the processor 51 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions, and modules stored in the memory 52, that is, realizes the general distributed graph based on reinforcement learning in the above method embodiment. Approach.
  • the memory 52 may include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created by the processor 51 and the like.
  • the memory 52 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 52 may optionally include memories remotely provided with respect to the processor 51, and these remote memories may be connected to the processor 51 through a network. Examples of the aforementioned network include, but are not limited to, the Internet, an intranet, an intranet, a mobile communication network, and combinations thereof.
  • One or more modules are stored in the memory 52, and when executed by the processor 51, the general distributed graph processing method based on reinforcement learning in Embodiment 1 is executed.
  • a computer program can be used to instruct relevant hardware to complete the program, which can be stored in a computer readable storage medium, and when the program is executed , May include the processes of the above-mentioned method embodiments.
  • the storage media can be magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), random access memory (RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive) , Abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the foregoing types of memories.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Machine Translation (AREA)
  • Complex Calculations (AREA)

Abstract

本申请公开了一种基于强化学习的通用分布式图处理方法及***,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件利用强化学习的方式对分布式图切割,给每一个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个***在每个迭代过程中包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或者约束条件已经收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个通用的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。

Description

一种基于强化学习的通用分布式图处理方法及*** 技术领域
本申请涉及大规模图分割处理领域,具体涉及一种基于强化学习的通用分布式图处理方法及***。
背景技术
为了高效地进行大规模图处理,通常需要对图进行分割,使得分割后的子图可以并行地进行处理。大规模图分割目前有以下几种经典模型:
启发式模型,传统主流的大规模图处理***Pregel、PowerGraph等都采用的是启发式的分割算法。Pregel默认的分区方法就是通过对顶点id的Hash值进行取模操作以达到增强分区的局部性,减少计算节点之间网络流量的优化目标。PowerGraph默认采用的是贪婪的点切分方式,对于新加进来的边,如果它的某个顶点已经存在于某台机器上,就将该边分配到对应的机器上,从而最小化跨机器的边的数目,减少通信量。这种启发式的图分割算法容易陷入局部最优解,有一些更好的解空间并没有被搜索到。
机器学***衡图分区问题当作顶点分类问题进行解决。但是如果优化目标涉及到网络价格以及带宽的异构性时,nodes的embeddings的计算就十分复杂了。这些已有的用于图分割的机器学习模型适用场景比较单一,当图规模变大、优化目标更复杂时,这些方法就不能很好地解决图分割问题了。
发明内容
因此,本申请要解决的技术问题在于克服现有技术中图切割模型存在易陷入局部最优解、使用场景单一等分割效果差的缺陷,从而提供一种基于强化学习的通用分布式图处理方法及***。
为达到上述目的,本申请提供如下技术方案:
第一方面,本申请实施例提供一种基于强化学习的通用分布式图处理方法,包括如下步骤:基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概 率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作;
每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
在一实施例中,所述预设图切割模型为hybrid-cut图切割模型,所述预设图处理模型为GAS图处理模型,利用GAS图处理模型迭代执行顶点计算,所述约束条件为资金预算成本及数据传输时间最小。
在一实施例中,所述数据传输时间表示为收集阶段和应用阶段的数据传输时间之和,第i次迭代的数据传输时间T(i)的计算公式为:
Figure PCTCN2021076484-appb-000001
其中,
Figure PCTCN2021076484-appb-000002
Figure PCTCN2021076484-appb-000003
其中,
Figure PCTCN2021076484-appb-000004
为1时,表示数据处理中心DCr中的顶点v是master,
Figure PCTCN2021076484-appb-000005
为0时,表示DCr中的顶点v是master;
Figure PCTCN2021076484-appb-000006
为1时,表示DCr中的顶点v是high-degree,
Figure PCTCN2021076484-appb-000007
为0时,DCr中的顶点v是low-degree;
Figure PCTCN2021076484-appb-000008
表示在第i次迭代中的收集r阶段从DC r的副本中向master顶点v传送数据量的大小;
a v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量 的大小;
U r/D r表示DCr的上传/下载带宽;
R v表示包含v的副本的数据处理中心DC的集合;
数据处理中心DC之间的通信成本为在收集阶段和应用阶段的上传数据的成本之和,从DC r将数据上传至网络的单元成本为P r,所述资金预算成本表示为:
Figure PCTCN2021076484-appb-000009
约束条件为:
minT(i)          (3)
C comm(i)≤B          (4)
其中,B为使用网络资源的资金预算。
在一实施例中,初始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
初始化顶点v在数据处理中心DC i的概率P(v i)为
Figure PCTCN2021076484-appb-000010
M为分布式DC的数量;
根据顶点的概率分布获取顶点对于各数据处理中心DC的累积概率,Q(v i)表示顶点v在数据处理中心DC i的累积概率,其中,
Figure PCTCN2021076484-appb-000011
随机生成一个浮点数r∈[0,1],如果r小于等于Q(v 0),则DC 0将被选中;如果r介于Q(v k-1)与Q(v k)(k≥1)之间时,则数据处理中心DC k被选中。
在一实施例中,初始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
预设一试错参数τ,随机生成一个浮点数r∈[0,1],如果r≤τ,则学习自动机为其顶点随机选择一个DC;如果r>τ,则学习自动机为其顶点选择P(v i)值最大的数据处理中心DC。
在一实施例中,每个学习自动机计算其顶点在每一个数据处理中心时的分数,通过以下公式计算:
Figure PCTCN2021076484-appb-000012
Figure PCTCN2021076484-appb-000013
其中,
Figure PCTCN2021076484-appb-000014
表示顶点v在DCi时的分数,B表示使用网络资源的资金预算,T b表示计算分数之前***整体的数据传输时间,C b表示计算分数之前***整体的数据传输成本,
Figure PCTCN2021076484-appb-000015
表示计算顶点在DCi时***整体的数据传输时间,
Figure PCTCN2021076484-appb-000016
表示计算顶点在DCi时***整体的数据传输成本,tw与cw分别表示时间权重以及资金成本权重;在C b≥B时,cw随着迭代次数的增加从1均匀减少至0,tw随着迭代次数的增加从0均匀增加至1;当C b<B时,tw随着迭代次数的增加从1均匀减少至0,cw随着迭代次数的增加从0均匀增加至1。
每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号的步骤,包括:
计算权重向量的参考标准,通过如下公式计算:
Figure PCTCN2021076484-appb-000017
其中,
Figure PCTCN2021076484-appb-000018
表示当顶点u收到其邻居v传播的标签ρ v时,其计算权重向量的参考标准,ρ v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合;
Figure PCTCN2021076484-appb-000019
为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的数据传输时间;
Figure PCTCN2021076484-appb-000020
表示将顶点v移动至ρ v后***整体的数据传输时间;
Figure PCTCN2021076484-appb-000021
表示将顶点v移动至ρ v后***整体的资金成本;
Figure PCTCN2021076484-appb-000022
为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的资金成本;
顶点u在计算完参考标准后,其权重向量更新公式如下:
Figure PCTCN2021076484-appb-000023
Figure PCTCN2021076484-appb-000024
表示顶点u对于DCρ v的权重向量,初始化为0;
在计算完顶点对于所有数据处理中心的权重向量后,学习自动机根据权重向量计算出相应的强化信号,计算公式如下:
Figure PCTCN2021076484-appb-000025
Figure PCTCN2021076484-appb-000026
Figure PCTCN2021076484-appb-000027
表示顶点u对于数据处理中心DCi的强化信号,取值为0或者1,分别表示奖励、惩罚信号,
Figure PCTCN2021076484-appb-000028
表示顶u对于数据处理中心DC i的权重向量,初始化为0。
在一实施例中,在更新顶点在每一个数据处理中心的概率值的概率值之前,需要获取正则化权重,分为奖励和惩罚正则化权重两部分,其中:
Figure PCTCN2021076484-appb-000029
表示顶点v对于DCi的奖励正则化权重,通过以下公式计算:
Figure PCTCN2021076484-appb-000030
其中,Neg()为取反函数,
Figure PCTCN2021076484-appb-000031
表示顶点v对于数据处理中心DCi的强化信号,
Figure PCTCN2021076484-appb-000032
表示顶点v对于DC i的权重向量,
Figure PCTCN2021076484-appb-000033
表示顶点v对于DCk的权重向量;
Figure PCTCN2021076484-appb-000034
表示顶点v对于DCi的惩罚正则化权重,通过以下公式计算:
Figure PCTCN2021076484-appb-000035
其中,
Figure PCTCN2021076484-appb-000036
表示顶点v对于数据处理中心DCi的强化信号,
Figure PCTCN2021076484-appb-000037
表示顶点v对于DC i的权重向量,
Figure PCTCN2021076484-appb-000038
表示顶点v对于DCk的权重向量。
在一实施例中,根据正则化权重对顶点v的概率进行更新,更新顺序按照对于数据处理中心DC的奖励正则化权重从小到大进行,给定顶点v以及DC i
Figure PCTCN2021076484-appb-000039
在所有奖励正则化权重中最小,优先使用
Figure PCTCN2021076484-appb-000040
对所有DC进行概率更新,更新公式如下:
Figure PCTCN2021076484-appb-000041
其中,
Figure PCTCN2021076484-appb-000042
表示顶点v在第n次迭代中对于DC i的概率,α表示奖励权重,n为迭代次数,j和i均为顶点;
接着学习自动机依次找到更大的
Figure PCTCN2021076484-appb-000043
再使用它对所有的DC进行概率更新;学习自动机更新顶点对于其强化信号为
Figure PCTCN2021076484-appb-000044
的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行,假设给定顶点v以及DC i、DC k,
Figure PCTCN2021076484-appb-000045
在所有惩罚正则化权重中最大,
Figure PCTCN2021076484-appb-000046
在所有惩罚正则化权重中最小,优先使用
Figure PCTCN2021076484-appb-000047
对所有DC进行概率更新,更新公式如下:
Figure PCTCN2021076484-appb-000048
其中β表惩罚权重,
Figure PCTCN2021076484-appb-000049
表示顶点v在第n次迭代中对于DC j的概率,n为迭代次数,j和i均为顶点;
接着学习自动机会依次找到更大的
Figure PCTCN2021076484-appb-000050
以及对应的DC k,再使用
Figure PCTCN2021076484-appb-000051
对所有的DC进行概率更新;如果达到预设迭代次数或者约束条件已经收敛,则迭代结束;否则,进入N+1次迭代,第N+1次迭代中的动作选择会以第N次迭代更新后的概率为参考。
第二方面,本申请实施例提供一种基于强化学习的通用分布式图处理***,包括:
分布式图定义及约束条件设置模块,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
动作选择模块,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
顶点迁移模块,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作;
分数计算模块,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
强化信号计算模块,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
概率更新模块,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
分割结果获取模块,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
第三方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行本申请实施例第一方面的基于强化学习的通用分布式图处理方法。
第四方面,本申请实施例提供一种计算机设备,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行本申请实施例第一方面的基于强化学习的通用分布式图处理方法。
本申请技术方案,具有如下优点:
本申请提供的基于强化学习的通用分布式图处理方法及***,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条 件利用强化学习的方式对分布式图切割,给每个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个***在每个迭代过程中均包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或约束条件收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个自适应性较好的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。
附图说明
为了更清楚地说明本申请具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例中基于强化学习的通用分布式图处理方法的一个具体示例的流程图;
图2为本申请实施例提供的基于强化学习图分割过程进行迭代的流程图;
图3为本申请实施例中基于强化学习的通用分布式图处理***的一个具体示例的原理框图;
图4为本申请实施例提供的计算机设备一个具体示例的组成图。
具体实施方式
下面将结合附图对本申请的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
此外,下面所描述的本申请不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。
实施例1
本申请实施例提供一种基于强化学习的通用分布式图处理方法,可以应用于不同的优化目标,例如在地理分布式图处理***的性能以及成本优化、负载均衡以及性能优化等问题中,如图1所示,包括如下步骤:
步骤S10:基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件,对分布式图进行切割。
本申请实施例以地理分布式图分割处理过程作为举例说明,假设顶点数据没有在数据处理中心(以下简称DC)上备份,且一台机器一次只能执行一个顶点的图处理任务;每个DC的计算资源不受限制,而DC之间的数据通信是地理分布式图处理的性能瓶颈;假设DC之间的连接是没有网络拥塞的,网络的瓶颈仅来自于DC和WAN之间的上行 链路(uplink)和下行链路(downlink)带宽;只收取从DC到WAN的上传数据的费用。考虑到成本与性能之间可能存在矛盾对立的情况:当uplink的带宽较大时,可以增加在这个链路上的传输数据,从而达到减少传输时间的目的,但是这个链路的价格可能会相对来说较高从而使得成本变高,因此需要同时优化性能和成本作为优化目标来进行图分割。
首先定义图G(V,E),V是顶点的集合,E是边的集合,考虑M个地理分布式数据处理中心(以下简称DC),每个顶点v具有初始位置Lv(Lv∈(0,1,…,M-1),
Figure PCTCN2021076484-appb-000052
表示该顶点v是master顶点,
Figure PCTCN2021076484-appb-000053
表示该顶点不是master顶点,Rv是包含顶点v的复制顶点的DC集合,Ur是uplink的带宽,Dr是downlink的带宽。
本申请实施例使用的是hybrid-cut图切割模型,遵循以下规则:给定一个阈值theta,对于顶点v,如果其入度大于等于theta,称其为high-degree型顶点,相反,称其为low-degree顶点。如果顶点v是low-degree的,它的所有入边都分配到它所在的DC,如果顶点v是high-degree的,它的入边将分配到该边对端顶点所在的DC。
本申请实施例使用的是GAS图处理模型,该模型迭代地执行用户定义的顶点计算。每个GAS迭代中有三个计算阶段,即收集(Gather),应用(Apply)和发散(Scatter)。在收集阶段,每个活动顶点收集邻居的数据,并且求和函数(Sum)被定义为将接收的数据聚合为聚集和(gathered sum)。在应用阶段,每个活动顶点使用聚集和更新其数据。在发散阶段,每个活动顶点激活它在下一次迭代中执行的邻居。全局障碍(global barrier)定义为确保所有顶点在开始下一步之前完成其计算。
第i次迭代中的传输时间可以表示为gather阶段和apply阶段的数据传输时间之和。第i次迭代的传输时间的计算公式为:
Figure PCTCN2021076484-appb-000054
Figure PCTCN2021076484-appb-000055
Figure PCTCN2021076484-appb-000056
其中,
Figure PCTCN2021076484-appb-000057
为1时,表示数据处理中心DCr中的顶点v是master,
Figure PCTCN2021076484-appb-000058
为0时,表示DCr中的顶点v是master;
Figure PCTCN2021076484-appb-000059
为1时,表示DCr中的顶点v是high-degree,
Figure PCTCN2021076484-appb-000060
为0时,DCr中的顶点v是low-degree;
Figure PCTCN2021076484-appb-000061
表示在第i次迭代中的收集r阶段从DC r的副本中向master顶点v传送数据量的大小;
a v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量的大小;
U r/D r表示DCr的上传/下载带宽;
R v表示包含v的副本的数据处理中心DC的集合;
DC之间的通信成本是在gather阶段和apply阶段的上传数据的成本之和,定义从DC r将数据上传至Internet的单元成本是P r,总的通信成本可以表示为:
Figure PCTCN2021076484-appb-000062
将地理分布图分割问题表述为约束优化问题,即约束条件为:
minT(i)          (3)
C comm(i)≤B          (4)
要解决的地理分布图分割问题即公式(3)、(4)所描述的约束条件下的优化问题。
在定义完地理分布式图各个元素所代表的含义后,需要每一个顶点分配一个学习自动机(以下简称LA),通过训练为顶点找到最适合它的DC,每个顶点在所有DC的可能性服从一定的概率分布在每个迭代过程中主要包含:动作选择、顶点迁移、分数计算、强化信号生成、概率更新五个步骤,在优化地理分布式图处理***的性能以及成本时的整个工作流程图如图2所示,各步骤的主要功能以及步骤之间的联系如下所述。
步骤S11:为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心。
在本申请实施例中,定义:P(v i)表示顶点v在DC i的概率,初始化为
Figure PCTCN2021076484-appb-000063
M为分布式DC的数量,Q(v i)表示顶点v在DC i的累积概率,计算如下:
Figure PCTCN2021076484-appb-000064
在一实施例中,LA采用轮盘赌算法为其顶点选择合适的动作(DC)。LA首先根据顶点的概率分布求得顶点对于各DC的积累概率,再随机生成一个浮点数r∈[0,1]。如果r小于等于Q(v 0),则DC0将被选中;如果r介于Q(v k-1)与Q(v k)(k≥1)之间时,则DCk将被选中。通过这种方式,概率越大的动作被选中的机会越大,但概率小的动作也可能会被选中。当LA选中好的动作(概率大的动作)时,图分割结果更可能会往优化目标的方向进行;当LA选中坏的动作(概率小的动作),此过程为一个试错过程,在当前看似结果不好的选择可能探索到更好的状态空间。
在另一实施例中,动作选择还可以采用另一种方式:定义试错参数τ=0.1;随机生成一个浮点数r∈[0,1]。如果r≤τ,则LA会为其顶点随机选择一个DC;如果r>τ,则LA会为其顶点选择P(v i)值最大的DC。
步骤S12:学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作。
本申请实施例LA将从步骤S11中得到的动作与其顶点当前所在的DC作比较,如果不一致,则将顶点迁移至动作对应的DC中,否则,不做任何操作。
步骤S13:每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定。
本申请实施例对于每一个LA,都会给其顶点计算顶点在每一个DC时的分数score,首先定义L v表示顶点v当前所在的DC,T b表示计算分数之前***整体的数据传输时间,按公式(1)计算得到,
Figure PCTCN2021076484-appb-000065
表示计算顶点在DC i时***整体的数据传输时间,C b表示计算分数之前***整体的数据传输成本,按公式(2)计算得到,
Figure PCTCN2021076484-appb-000066
表示计算顶点在DC i时***整体的数据传输成本。
Figure PCTCN2021076484-appb-000067
以及
Figure PCTCN2021076484-appb-000068
的计算方式为:将顶点v移动至DC i,再分别按照公式(1)、公式(2)进行计算,最后将顶点v移回L v
Figure PCTCN2021076484-appb-000069
表示顶点v在DC i时的分数,计算方法如下:
Figure PCTCN2021076484-appb-000070
Figure PCTCN2021076484-appb-000071
在公式(5)中,B表示资金预算,tw与cw分别表示时间权重以及成本权重。在C b≥B时,cw随着迭代次数的增加从1均匀减少至0,tw随着迭代次数的增加从0均匀增加至1,目的是优先优化图处理***整体的交流成本以及探索更多能够降低***成本的图分区状态;当C b<B时,tw随着迭代次数的增加从1均匀减少至0,cw随着迭代次数的增加从0均匀增加至1,目的是优先优化图处理***整体的数据传输时间以及减缓传输时间的优化速度,从而达到更好的优化效果。
步骤14:每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号。
实际应用中,每个LA都会与其它LA进行通信,从而为其顶点生成对于所有DC的强化信号,在计算强化信号之前需要计算顶点对于所有DC的权重向量。每个LA计算完所有DC的分数后,会将最大分数对应的DC号传播给其顶点的邻居所属的LA,这些LA立刻生成相应的权重项向量。
在本实施例中,定义ρ v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合,
Figure PCTCN2021076484-appb-000072
为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的数据传输时间;
Figure PCTCN2021076484-appb-000073
表示将顶点v移动至ρ v后***整体的数据传输时间;
Figure PCTCN2021076484-appb-000074
表示将顶点v移动至ρ v后***整体的资金成本;
Figure PCTCN2021076484-appb-000075
为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的资金成本;
Figure PCTCN2021076484-appb-000076
表示当顶点u收到其邻居v传播的标签ρ v时,其计算权重向量的参考标准,计算公式如下:
Figure PCTCN2021076484-appb-000077
需要说明的是,tw、cw、sign(B-C b)的值和步骤S13中公式(5)的值一样,因为它们在同一个迭代中。顶点u在计算完参考标准后,其权重向量更新公式如下:
Figure PCTCN2021076484-appb-000078
Figure PCTCN2021076484-appb-000079
表示顶点u对于DCρ v的权重向量,初始化为0;
在计算完顶点对于所有DC的权重向量,LA会根据权重向量计算出相应的强化信号,公式如下:
Figure PCTCN2021076484-appb-000080
Figure PCTCN2021076484-appb-000081
其中,
Figure PCTCN2021076484-appb-000082
表示顶点u对于数据处理中心DCi的强化信号,取值为0或者1,分别表示奖励、惩罚信号,
Figure PCTCN2021076484-appb-000083
表示顶u对于数据处理中心DC i的权重向量,初始化为0。
步骤15:学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代。
在本是实施例中,LA会利用步骤14中得到的权重向量以及强化信号去更新其顶点在每一个DC的概率值,从而指导下一次的动作选择。在此之前,需要先计算正则化权重,分为奖励和惩罚正则化权重两部分。
本实施例定义
Figure PCTCN2021076484-appb-000084
表示顶点v对于DC i的奖励正则化权重,
Figure PCTCN2021076484-appb-000085
表示顶点v对于DC i的惩罚正则化权重。其中,
Figure PCTCN2021076484-appb-000086
的计算方法如下:
Figure PCTCN2021076484-appb-000087
其中Neg()是取反函数。
Figure PCTCN2021076484-appb-000088
的计算方法如下:
Figure PCTCN2021076484-appb-000089
Figure PCTCN2021076484-appb-000090
表示顶点v对于数据处理中心DCi的强化信号,
Figure PCTCN2021076484-appb-000091
表示顶点v对于DC i的权重向量,
Figure PCTCN2021076484-appb-000092
表示顶点v对于DCk的权重向量。
本实施例在得到正则化权重之后,就可以开始对顶点v的概率进行更新。定义
Figure PCTCN2021076484-appb-000093
表示顶点v在第n次迭代中对于DC i的概率,LA会首先更新顶点对于其强化信号为
Figure PCTCN2021076484-appb-000094
的DC,更新顺序按照对于DC的奖励正则化权重从小到大进行。假设给定顶点v以及DC i,
Figure PCTCN2021076484-appb-000095
在所有奖励正则化权重中最小,则优先使用
Figure PCTCN2021076484-appb-000096
对所有DC进行概率更新,更新公式如下:
Figure PCTCN2021076484-appb-000097
其中α表示奖励权重,公式(11)对DC i的概率进行了增加,对其它DC的概率进行了下调。接着,LA会依次找到更大的
Figure PCTCN2021076484-appb-000098
再使用它对所有的DC进行概率更新。这种实施方式的有益效果是最终能够使得
Figure PCTCN2021076484-appb-000099
最大的那个DC的概率最大。
接着,LA会更新那些顶点对于其强化信号为
Figure PCTCN2021076484-appb-000100
的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行。假设给定顶点v以及DC i、DC k,
Figure PCTCN2021076484-appb-000101
在所有惩罚正则化权重中最大,
Figure PCTCN2021076484-appb-000102
在所有惩罚正则化权重中最小,则优先使用
Figure PCTCN2021076484-appb-000103
对所有DC进行概率更新,更新公式如下:
Figure PCTCN2021076484-appb-000104
其中β表惩罚权重,
Figure PCTCN2021076484-appb-000105
表示顶点v在第n次迭代中对于DC j的概率,上述公式(12)对DC k的概率进行了下调,对其它DC的概率进行了增加。接着,LA会依次找到更大的
Figure PCTCN2021076484-appb-000106
以及对应的DC k,再使用
Figure PCTCN2021076484-appb-000107
对所有的DC进行概率更新。这种实施方式的有益效果是最终能够使得
Figure PCTCN2021076484-appb-000108
最小的DC的概率最小。
步骤16:直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
本申请实施例如果达到最大迭代次数或者约束条件已经收敛,那么判断迭代结束。否则,进入N+1次迭代,第N+1次迭代中的动作选择会以第N次迭代更新后的概率 为参考,继续执行顶点迁移、分数计算、强化信号计算、概率更新、下一次迭代等操作,直到迭代结束,生成一个满足资金预算且数据传输时间极小的地理分布式图分割结果。
为了验证本申请实施例提供的分布图处理方法的有效性和效率,在真实云和云模拟器上采用真实图形数据集来评估,具体的使用了5种真实图:Gnutella(GN)、WikiVote(WV)、GoogleWeb(GW)、LiveJournal(LJ)和Twitter(TW),在Amazon EC2和Windows Azure两个云平台上进行真实云的实验,采用基于GAS的PowerGraph***来执行图处理算法,包括pagerank、sssp、subgraph等经典图算法。在PowerGraph中实现了集成了本申请实施例提供的分布图处理方法,在加载时对图进行分割。真实的地理分布DCs和仿真中对真实图形的评估表明,与最先进的地理分布式图处理***的性能以及成本优化算法Geo-Cut相比,本申请实施例提供的分布图处理方法,可以减少高达72%的DC间数据传输时间和高达63%的资金成本,而且负载比较均衡。
本申请提供的实施例可以应用到多个场景,例如:Facebook每天从世界各地的用户那里接收tb级的文本、图像和视频数据。Facebook构建了四个地理分布的DC来维护和管理这些数据。如果考虑这些DC的负载能力以及***响应时间,可以使用本申请实施例提供的方法对图进行分割优化,可以使得DC稳定工作的同时给用户带来好的体验。如果考虑地理分布式环境下的网络异构和成本预算以及***性能,也可以使用本申请实施例提供的方法对图进行分割优化,可以在传输时间和成本预算两个方面得到很好地性能提升。
需要说明的是,本申请实施例只是以地理分布式图切割过程系的性能以及成本优问题作为举例,对分布图处理方法的工作原理做出说明。实际上,本实施例提出的分布式图处理方法形成的处理模型是一个通用的模型,该模型不仅可以解决上述地理分布式图处理***的性能以及成本优化问题,也能解决负载均衡以及性能优化等问题,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量计算方案。
实施例2
本申请实施例提供一种基于强化学习的通用分布式图处理***,如图3所示,包括:
分布式图定义及约束条件设置模块10,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割。此模块执行实施例1中的步骤S10所描述的方法,在此不再赘述。
动作选择模块11,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心。此模块执行实施例1中的步骤S11所描述的方法,在此不再赘述。
顶点迁移模块12,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作。此模块执行实施例1中的步骤S12所描述的方法,在此不再赘述。
分数计算模块13,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数, 所述分数根据所述预设约束条件确定。此模块执行实施例1中的步骤S13所描述的方法,在此不再赘述。
强化信号计算模块14,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;此模块执行实施例1中的步骤S14所描述的方法,在此不再赘述。
概率更新模块15,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;此模块执行实施例1中的步骤S15所描述的方法,在此不再赘述。
分割结果获取模块16,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。此模块执行实施例1中的步骤S16所描述的方法,在此不再赘述。
本申请实施例提供的基于强化学习的通用分布式图处理***,基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件利用强化学习的方式对分布式图切割,给每一个顶点分配一个学习自动机,通过训练为顶点找到最适合的数据处理中心,每个顶点在所有数据处理中心的可能性服从一定的概率分布,整个***在每个迭代过程中包含动作选择、顶点迁移、分数计算、强化信号计算、概率更新五个步骤,达到最大迭代次数或者约束条件已经收敛,判断迭代结束。本申请提供通用分布式图处理方法形成的分布式图处理模型是一个通用的分布式图模型,对于不同的优化目标只需要设计不同的分数计算方案以及不同的权重向量。
实施例3
本申请实施例提供一种计算机设备,如图4所示,该设备可以包括处理器51和存储器52,其中处理器51和存储器52可以通过总线或者其他方式连接,图4以通过总线连接为例。
处理器51可以为中央处理器(Central Processing Unit,CPU)。处理器51还可以为其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等芯片,或者上述各类芯片的组合。
存储器52作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块,如本申请实施例中的对应的程序指令/模块。处理器51通过运行存储在存储器52中的非暂态软件程序、指令以及模块,从而执行处理器的各种功能应用以及数据处理,即实现上述方法实施例中的基于强化学习的通用分布式图处理方法。
存储器52可以包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需要的应用程序;存储数据区可存储处理器51所创建的数据等。此外,存储器52可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器52可 选包括相对于处理器51远程设置的存储器,这些远程存储器可以通过网络连接至处理器51。上述网络的实例包括但不限于互联网、企业内部网、企业内网、移动通信网及其组合。
一个或者多个模块存储在存储器52中,当被处理器51执行时,执行实施例1中的基于强化学习的通用分布式图处理方法。
上述计算机设备具体细节可以对应参阅实施例1中对应的相关描述和效果进行理解,此处不再赘述。
本领域技术人员可以理解,实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;存储介质还可以包括上述种类的存储器的组合。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引申出的显而易见的变化或变动仍处于本申请的保护范围之中。

Claims (12)

  1. 一种基于强化学习的通用分布式图处理方法,其特征在于,包括如下步骤:
    基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
    为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
    学习自动机将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则不做任何操作;
    每个学习自动机计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
    每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
    学习自动机根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
    直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
  2. 根据权利要求1所述的基于强化学习的通用分布式图处理方法,其特征在于,所述预设图切割模型为hybrid-cut图切割模型,所述预设图处理模型为GAS图处理模型,利用GAS图处理模型迭代执行顶点计算,所述约束条件为资金预算成本及数据传输时间最小。
  3. 根据权利要求2所述的基于强化学习的通用分布式图处理方法,其特征在于,所述数据传输时间表示为收集阶段和应用阶段的数据传输时间之和,第i次迭代的数据传输时间T(i)的计算公式为:
    Figure PCTCN2021076484-appb-100001
    其中,
    Figure PCTCN2021076484-appb-100002
    Figure PCTCN2021076484-appb-100003
    其中,
    Figure PCTCN2021076484-appb-100004
    为1时,表示数据处理中心DCr中的顶点v是master,
    Figure PCTCN2021076484-appb-100005
    为0时,表示 DCr中的顶点v是master;
    Figure PCTCN2021076484-appb-100006
    为1时,表示DCr中的顶点v是high-degree,
    Figure PCTCN2021076484-appb-100007
    为0时,DCr中的顶点v是low-degree;
    Figure PCTCN2021076484-appb-100008
    表示在第i次迭代中的收集r阶段从DC r的副本中向master顶点v传送数据量的大小;
    a v(i)表示在第i次迭代中的应用阶段中从master顶点v向每一个副本发送数据量的大小;
    U r/D r表示DCr的上传/下载带宽;
    R v表示包含v的副本的数据处理中心DC的集合;
    数据处理中心DC之间的通信成本为在收集阶段和应用阶段的上传数据的成本之和,从DC r将数据上传至网络的单元成本为P r,所述资金预算成本表示为:
    Figure PCTCN2021076484-appb-100009
    约束条件为:
    min T(i)   (3)
    C comm(i)≤B   (4)
    其中,B为使用网络资源的资金预算。
  4. 根据权利要求3所述的基于强化学习的通用分布式图处理方法,其特征在于,初始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
    初始化顶点v在数据处理中心DC i的概率P(v i)为
    Figure PCTCN2021076484-appb-100010
    M为分布式DC的数量;
    根据顶点的概率分布获取顶点对于各数据处理中心DC的累积概率,Q(v i)表示顶点v在数据处理中心DC i的累积概率,其中,
    Figure PCTCN2021076484-appb-100011
    随机生成一个浮点数r∈[0,1],如果r小于等于Q(v 0),则DC 0将被选中;如果r介于Q(v k-1)与Q(v k)(k≥1)之间时,则数据处理中心DCk被选中。
  5. 根据权利要求3所述的基于强化学习的通用分布式图处理方法,其特征在于,初 始化各顶点在各数据处理中心的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心的步骤,包括:
    预设一试错参数τ,随机生成一个浮点数r∈[0,1],如果r≤τ,则学习自动机为其顶点随机选择一个DC;如果r>τ,则学习自动机为其顶点选择P(v i)值最大的数据处理中心DC。
  6. 根据权利要求4或5所述的基于强化学习的通用分布式图处理方法,其特征在于,每个学习自动机计算其顶点在每一个数据处理中心时的分数,通过以下公式计算:
    Figure PCTCN2021076484-appb-100012
    Figure PCTCN2021076484-appb-100013
    其中,
    Figure PCTCN2021076484-appb-100014
    表示顶点v在DCi时的分数,B表示使用网络资源的资金预算,T b表示计算分数之前***整体的数据传输时间,C b表示计算分数之前***整体的数据传输成本,
    Figure PCTCN2021076484-appb-100015
    表示计算顶点在DCi时***整体的数据传输时间,
    Figure PCTCN2021076484-appb-100016
    表示计算顶点在DCi时***整体的数据传输成本,tw与cw分别表示时间权重以及资金成本权重;在C b≥B时,cw随着迭代次数的增加从1均匀减少至0,tw随着迭代次数的增加从0均匀增加至1;当C b<B时,tw随着迭代次数的增加从1均匀减少至0,cw随着迭代次数的增加从0均匀增加至1。
  7. 根据权利要求6所述的基于强化学习的通用分布式图处理方法,其特征在于,每个学习自动机将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号的步骤,包括:
    计算权重向量的参考标准,通过如下公式计算:
    Figure PCTCN2021076484-appb-100017
    其中,
    Figure PCTCN2021076484-appb-100018
    表示当顶点u收到其邻居v传播的标签ρ v时,其计算权重向量的参考标准,ρ v表示顶点v最大分数对应的DC,Nbr(v)表示顶点v的邻居顶点集合;
    Figure PCTCN2021076484-appb-100019
    为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的数据传输时间;
    Figure PCTCN2021076484-appb-100020
    表示将顶点v移动至ρ v后***整体的数据传输时间;
    Figure PCTCN2021076484-appb-100021
    表示将顶点v移动至ρ v后***整体的资金成本;
    Figure PCTCN2021076484-appb-100022
    为将顶点v移动至ρ v,再将顶点u移动至ρ v后***整体的资金成本;
    顶点u在计算完参考标准后,其权重向量更新公式如下:
    Figure PCTCN2021076484-appb-100023
    Figure PCTCN2021076484-appb-100024
    表示顶点u对于DCρ v的权重向量,初始化为0;
    在计算完顶点对于所有数据处理中心的权重向量后,学习自动机根据权重向量计算出相应的强化信号,计算公式如下:
    Figure PCTCN2021076484-appb-100025
    Figure PCTCN2021076484-appb-100026
    Figure PCTCN2021076484-appb-100027
    表示顶点u对于数据处理中心DCi的强化信号,取值为0或者1,分别表示奖励、惩罚信号,
    Figure PCTCN2021076484-appb-100028
    表示顶u对于数据处理中心DC i的权重向量,初始化为0。
  8. 根据权利要求7所述的基于强化学习的通用分布式图处理方法,其特征在于,在更新顶点在每一个数据处理中心的概率值的概率值之前,需要获取正则化权重,分为奖励和惩罚正则化权重两部分,其中:
    Figure PCTCN2021076484-appb-100029
    表示顶点v对于DCi的奖励正则化权重,通过以下公式计算:
    Figure PCTCN2021076484-appb-100030
    其中,Neg()为取反函数,
    Figure PCTCN2021076484-appb-100031
    表示顶点v对于数据处理中心DCi的强化信号,
    Figure PCTCN2021076484-appb-100032
    表示顶点v对于DC i的权重向量,
    Figure PCTCN2021076484-appb-100033
    表示顶点v对于DCk的权重向量;
    Figure PCTCN2021076484-appb-100034
    表示顶点v对于DCi的惩罚正则化权重,通过以下公式计算:
    Figure PCTCN2021076484-appb-100035
    其中,
    Figure PCTCN2021076484-appb-100036
    表示顶点v对于数据处理中心DCi的强化信号,
    Figure PCTCN2021076484-appb-100037
    表示顶点v对于DC i的权重向量,
    Figure PCTCN2021076484-appb-100038
    表示顶点v对于DCk的权重向量。
  9. 根据权利要求8所述的基于强化学习的通用分布式图处理方法,其特征在于,根据正则化权重对顶点v的概率进行更新,更新顺序按照对于数据处理中心DC的奖励正则化权重从小到大进行,给定顶点v以及DC i
    Figure PCTCN2021076484-appb-100039
    在所有奖励正则化权重中最小,优先使用
    Figure PCTCN2021076484-appb-100040
    对所有DC进行概率更新,更新公式如下:
    Figure PCTCN2021076484-appb-100041
    其中,
    Figure PCTCN2021076484-appb-100042
    表示顶点v在第n次迭代中对于DC i的概率,α表示奖励权重,n为迭代次数,j和i均为顶点;
    接着学习自动机依次找到更大的
    Figure PCTCN2021076484-appb-100043
    再使用它对所有的DC进行概率更新;学习自动机更新顶点对于其强化信号为
    Figure PCTCN2021076484-appb-100044
    的DC,更新顺序按照对于DC的惩罚正则化权重从小到大进行,假设给定顶点v以及DC i、DC k,
    Figure PCTCN2021076484-appb-100045
    在所有惩罚正则化权重中最大,
    Figure PCTCN2021076484-appb-100046
    在所有惩罚正则化权重中最小,优先使用
    Figure PCTCN2021076484-appb-100047
    对所有DC进行概率更新,更新公式如下:
    Figure PCTCN2021076484-appb-100048
    其中β表惩罚权重,
    Figure PCTCN2021076484-appb-100049
    表示顶点v在第n次迭代中对于DC j的概率,n为迭代次数,j和i均为顶点;
    接着学习自动机会依次找到更大的
    Figure PCTCN2021076484-appb-100050
    以及对应的DC k,再使用
    Figure PCTCN2021076484-appb-100051
    对所有的DC进行概率更新;如果达到预设迭代次数或者约束条件已经收敛,则迭代结束;否则,进入N+1次迭代,第N+1次迭代中的动作选择会以第N次迭代更新后的概率为参考。
  10. 一种基于强化学习的通用分布式图处理***,其特征在于,包括:
    分布式图定义及约束条件设置模块,用于基于图论定义分布式数据处理中心形成分布式图,利用预设图切割模型及预设图处理模型,基于预设约束条件对分布式图进行切割;
    动作选择模块,用于为分布式图的每个顶点分配一个学习自动机,初始化各顶点在各数据处理中心的概率,基于初始化的概率,所述学习自动机按预设动作选择方法为顶点选择概率最大的数据处理中心;
    顶点迁移模块,学习自动机用于将为顶点选择概率最大的数据处理中心,与其顶点当前所在的数据处理中心作比较,如果不一致,则将顶点迁移至动作对应的数据处理中心中,否则,不做任何操作;
    分数计算模块,每个学习自动机用于计算其顶点在每一个数据处理中心时的分数,所述分数根据所述预设约束条件确定;
    强化信号计算模块,每个学习自动机用于将最大分数对应的数据处理中心号传播给其顶点的邻居所属的学习自动机,生成相应的权重向量,学习自动机根据所述权重向量为其顶点计算出所有数据处理中心对应的强化信号;
    概率更新模块,学习自动机用于根据所述权重向量以及强化信号,更新其顶点在每一个数据处理中心的概率值,指导下一次的动作选择进行迭代;
    分割结果获取模块,用于直至达到预设迭代次数或者所述约束条件收敛,生成满足预设约束条件的分布式图的分割结果。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行如权利要求1-9任一项所述的基于强化学习的通用分布式图处理方法。
  12. 一种计算机设备,其特征在于,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行如权利要求1-9任一项所述的基于强化学习的通用分布式图处理方法。
PCT/CN2021/076484 2020-05-27 2021-02-10 一种基于强化学习的通用分布式图处理方法及*** WO2021238305A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010462112.4 2020-05-27
CN202010462112.4A CN111539534B (zh) 2020-05-27 2020-05-27 一种基于强化学习的通用分布式图处理方法及***

Publications (1)

Publication Number Publication Date
WO2021238305A1 true WO2021238305A1 (zh) 2021-12-02

Family

ID=71980779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076484 WO2021238305A1 (zh) 2020-05-27 2021-02-10 一种基于强化学习的通用分布式图处理方法及***

Country Status (2)

Country Link
CN (1) CN111539534B (zh)
WO (1) WO2021238305A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726342A (zh) * 2021-09-08 2021-11-30 中国海洋大学 面向大规模图迭代计算的分段差值压缩与惰性解压方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539534B (zh) * 2020-05-27 2023-03-21 深圳大学 一种基于强化学习的通用分布式图处理方法及***
CN113835899B (zh) * 2021-11-25 2022-02-22 支付宝(杭州)信息技术有限公司 针对分布式图学习的数据融合方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2884453A1 (en) * 2013-12-12 2015-06-17 Telefonica Digital España, S.L.U. A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network
US9208257B2 (en) * 2013-03-15 2015-12-08 Oracle International Corporation Partitioning a graph by iteratively excluding edges
CN105590321A (zh) * 2015-12-24 2016-05-18 华中科技大学 一种基于块的子图构建及分布式图处理方法
CN106970779A (zh) * 2017-03-30 2017-07-21 重庆大学 一种面向内存计算的流式平衡图划分方法
CN107222565A (zh) * 2017-07-06 2017-09-29 太原理工大学 一种网络图分割方法及***
CN109033191A (zh) * 2018-06-28 2018-12-18 山东科技大学 一种面向大规模幂律分布图的分割方法
CN111539534A (zh) * 2020-05-27 2020-08-14 深圳大学 一种基于强化学习的通用分布式图处理方法及***

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106953801B (zh) * 2017-01-24 2020-05-05 上海交通大学 基于层级结构学习自动机的随机最短路径实现方法
CN109889393B (zh) * 2019-03-11 2022-07-08 深圳大学 一种地理分布式图处理方法和***

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9208257B2 (en) * 2013-03-15 2015-12-08 Oracle International Corporation Partitioning a graph by iteratively excluding edges
EP2884453A1 (en) * 2013-12-12 2015-06-17 Telefonica Digital España, S.L.U. A computer implemented method, a system and computer program product for partitioning a graph representative of a communication network
CN105590321A (zh) * 2015-12-24 2016-05-18 华中科技大学 一种基于块的子图构建及分布式图处理方法
CN106970779A (zh) * 2017-03-30 2017-07-21 重庆大学 一种面向内存计算的流式平衡图划分方法
CN107222565A (zh) * 2017-07-06 2017-09-29 太原理工大学 一种网络图分割方法及***
CN109033191A (zh) * 2018-06-28 2018-12-18 山东科技大学 一种面向大规模幂律分布图的分割方法
CN111539534A (zh) * 2020-05-27 2020-08-14 深圳大学 一种基于强化学习的通用分布式图处理方法及***

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726342A (zh) * 2021-09-08 2021-11-30 中国海洋大学 面向大规模图迭代计算的分段差值压缩与惰性解压方法

Also Published As

Publication number Publication date
CN111539534A (zh) 2020-08-14
CN111539534B (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2021238305A1 (zh) 一种基于强化学习的通用分布式图处理方法及***
US11018979B2 (en) System and method for network slicing for service-oriented networks
CN110968426B (zh) 一种基于在线学习的边云协同k均值聚类的模型优化方法
CN108667657B (zh) 一种面向sdn的基于局部特征信息的虚拟网络映射方法
CN107710696A (zh) 区域导向和变化容忍的快速最短路径算法和图形预处理框架
CN115066694A (zh) 计算图优化
US20190268234A1 (en) Capacity engineering in distributed computing systems
WO2021248937A1 (zh) 一种基于差分隐私的地理分布式图计算方法及***
CN111813506A (zh) 一种基于粒子群算法资源感知计算迁移方法、装置及介质
CN112100450A (zh) 一种图计算数据分割方法、终端设备及存储介质
CN114595049A (zh) 一种云边协同任务调度方法及装置
CN113821318A (zh) 一种物联网跨域子任务组合协同计算方法及***
CN113835899A (zh) 针对分布式图学习的数据融合方法及装置
CN111510334B (zh) 一种基于粒子群算法的vnf在线调度方法
Xu et al. A graph partitioning algorithm for parallel agent-based road traffic simulation
Garg et al. Heuristic and reinforcement learning algorithms for dynamic service placement on mobile edge cloud
CN109889393B (zh) 一种地理分布式图处理方法和***
CN115587222B (zh) 分布式图计算方法、***及设备
CN116562364A (zh) 基于知识蒸馏的深度学习模型协同推演方法、装置及设备
Chen et al. Deep reinforcement learning based container cluster placement strategy in edge computing environment
CN115965070B (zh) 计算图处理方法、装置、设备、存储介质以及程序产品
CN117707795B (zh) 基于图的模型划分的边端协同推理方法及***
Zhang Multihop Transmission‐Oriented Dynamic Workflow Scheduling in Vehicular Cloud
CN117763214A (zh) 一种局部社区动态检测方法、装置、电子设备及存储介质
CN118228762A (zh) 深度神经网络推理的图替代和并行化联合优化方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21813316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21813316

Country of ref document: EP

Kind code of ref document: A1