WO2014183531A1 - Method and device for allocating remote memory - Google Patents

Method and device for allocating remote memory Download PDF

Info

Publication number
WO2014183531A1
WO2014183531A1 PCT/CN2014/075674 CN2014075674W WO2014183531A1 WO 2014183531 A1 WO2014183531 A1 WO 2014183531A1 CN 2014075674 W CN2014075674 W CN 2014075674W WO 2014183531 A1 WO2014183531 A1 WO 2014183531A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
node
requester
nodes
distribution table
Prior art date
Application number
PCT/CN2014/075674
Other languages
French (fr)
Chinese (zh)
Inventor
张立新
侯锐
张柳航
张科
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014183531A1 publication Critical patent/WO2014183531A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity

Definitions

  • the present invention relates to the field of cloud computing, and in particular, to a method and apparatus for allocating remote memory.
  • cloud computing relies on data centers that vary in size, typically consisting of dozens, hundreds, or even tens of thousands of computer server nodes, most of which use commercial standard servers on the market. They have separate processors, private memory areas, and exclusive hard disk storage. Obviously, a single server node cannot meet the memory requirements of cloud computing. When it is implemented within the cluster, each server node can use the memory of other remote nodes, how to reasonably and efficiently within a large number of nodes. Allocating remote memory is a big problem.
  • an object of embodiments of the present invention is to provide a method and apparatus for allocating remote memory to solve the problem that the remote memory cannot be efficiently allocated in a cluster.
  • an embodiment of the present invention provides a method for allocating remote memory, which is used for a server node cluster, and the method includes:
  • the node distribution table including each node can contribute memory size and each The connection relationship between nodes;
  • the requester is centered, and the node that can be a contributor is searched for from the near and the distance according to the distance, and the remote memory is allocated to the requester, and the distance includes other nodes to the location.
  • the number of hops of the requester's route is the number of hops of the requester's route.
  • the finding a node that can be a contributor from near to far according to the distance and allocating remote memory to the requester includes:
  • step a judging whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor according to the order from small to large, and if present, the contributor may The contributed memory is allocated to the requester. If it does not exist, increment i by 1, and perform step a.
  • the judging whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester separately or jointly as the contributor in the order of the small to large includes:
  • the comparison is stopped, and the node is used as the contributor
  • the selected two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
  • the method further includes:
  • step a it is judged whether i reaches the preset threshold, and if so, the execution is stopped and the application is returned. Please fail the information.
  • the method further includes:
  • the memory size that the contributor has contributed to the requestor is notified, and the memory size that the contributor can contribute is changed in the node distribution table.
  • the method further includes:
  • a status request signal is periodically sent to each node in the service node cluster;
  • the node If the node returns a normal heartbeat signal, the presence of the node is maintained in the node distribution table, otherwise the node is deleted from the node distribution table.
  • the method further includes:
  • an embodiment of the present invention further provides an apparatus for allocating a remote memory, which is used for a server node cluster, and the apparatus includes:
  • a node distribution table establishing unit configured to establish a node distribution table, where the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes;
  • a memory request judging unit configured to determine whether a node requests the remote memory as a requester, and if so, triggers the memory allocation unit;
  • a memory allocation unit configured to search for a node that can be a contributor from the near and far distances in the node distribution table, and allocate remote memory to the requester according to the distance, where the distance includes other nodes The number of hops to the requester's route.
  • the memory allocation unit comprises:
  • a node sorting subunit configured to sort the peripheral nodes according to a contributionable memory from small to large
  • a memory selection subunit configured to determine, in the order of the small to large, whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester individually or together as a contributor, if present, The memory that the contributor can contribute is allocated to the requester, and if not, the control subunit is triggered;
  • the flow control subunit is used to increment i by 1, and then trigger the memory sum judgment subunit.
  • the memory selection subunit is configured to sequentially determine, in the order from small to large, whether there is one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor. , specifically for:
  • the comparison is stopped, and the node is used as the contributor
  • the selected two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
  • the memory allocation unit further includes:
  • the end judging subunit is configured to judge whether the i reaches the preset threshold before triggering the memory sum judgment subunit, and if so, stop executing and return the application failure information.
  • the device further includes:
  • a memory change response unit configured to notify a memory size that the contributor has contributed to the requester, and change a memory size that the contributor can contribute in the node distribution table.
  • the device further includes:
  • a node maintenance unit configured to periodically send a status request signal to each node in the service node cluster, and if the node returns a normal heartbeat signal, maintain the existence of the node in the node distribution table, otherwise, from the The node is removed from the node distribution table.
  • the device further includes:
  • An exception handling unit configured to receive the request sent by the requester to obtain the allocated remote memory The request did not get a response to the message and retriggered the memory allocation unit.
  • the embodiment of the present invention first creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed by the resource management system, and then uses a physical location sensitive when allocating remote memory.
  • the allocation strategy that is, considering the distance between the requester and the provider, assigns the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity and the obtained memory. Because the advantage of the shortest path is more efficient when used.
  • FIG. 1 is a schematic diagram of system components based on a cloud control chip
  • FIG. 2 is a schematic diagram of the composition and architecture of a cloud control chip
  • Figure 3a-3b is a schematic diagram of implementing resource on-demand allocation
  • Figure 5 is a schematic diagram of a topology structure of a node distribution table
  • Figure 6 is a detailed flow chart for allocating remote memory in a near and far manner
  • FIGS. 7 to 11 are schematic diagrams showing changes in a node distribution table corresponding to each step in the second embodiment of the present invention.
  • Figure 12 is a schematic illustration of a third embodiment of the present invention.
  • remote direct data access can be used when remote memory is required.
  • RDMA RDMA
  • RDMA allows computers to directly access the memory of other computers without the time-consuming transfer of the processor. It has the feature of allowing one computer to directly transfer data over the network to another computer's memory. It uses data from a system. Quickly move to remote system memory without any impact on the operating system.
  • Figure 1 shows the schematic flow of the transfer of RDMA data streams. RDMA technology frees up bus space by eliminating external memory copy and text swap operations compared to traditional Buffer copy and Buffer copy with DMA engine And CPU cycles are used to improve application performance, reducing the need for bandwidth and processor overhead, significantly reducing latency.
  • RDMA is a point-to-point protocol. Not only does it need to install a dedicated network card (such as a high-end Ethernet card or an IB network card) on each server, the hardware implementation cost of RDMA is high, and more importantly, RDMA technology cannot be used.
  • the remote memory resources are allocated reasonably and efficiently in the data center, that is, the nodes cannot dynamically use the memory of other remote nodes in the cluster.
  • the present invention in order to realize that the remote memory can be borrowed between the server nodes, that is, in order to dynamically allocate resources, especially memory resources, in the data center, the present invention includes all the nodes including a cloud control chip, and uses the same interface and Other nodes in the system are connected. Different processors and other components can be integrated on one board.
  • the cloud control chip provides PCIe and an independently designed interface for connecting the processor chip.
  • the self-designed interface is optimized for direct communication between the processor chip and the cloud control chip to connect the autonomously controllable processor chip.
  • the PCIe interface can be connected to any processor with a PCIe port. Its communication efficiency is limited by PCIe, but it accepts most CPU chips on the market and other computing acceleration chips such as GPUs and FPGAs.
  • FIG. la and lb are schematic diagrams of system components based on the cloud control chip, Figure la is an ARM compute node, and Figure lb is a large-capacity memory node.
  • the remote memory usage method and mechanism proposed by the present invention are implemented by a cloud control chip and a resource management system.
  • the cloud control chip is mainly composed of an on-chip switching network, an integrated memory control module, an I/O device virtualization controller, a hardware-implemented communication protocol stack, and a PCIe interface and a service processing core.
  • FIG. 2 shows a cloud control chip. Composition and architecture.
  • the use of the remote memory is performed through a software API (Application Programming Interface): first, by initiating an application to the resource management system, and dynamically interconnecting resources among the plurality of nodes through the interconnection network, Virtual server; After the combination is successful, these physical resources are exclusively used by the virtual server; after use, the above resources need to be returned to the resource management system.
  • a software API Application Programming Interface
  • remote node access is guaranteed by adding corresponding hardware management and address translation mechanisms, as well as high-speed I/O communication stacks.
  • the resource management system of the data center can be centralized in a server node or distributed management, which is responsible for collecting, managing, and allocating the resources of each node, including the memory resources contributed by each node, and the cloud control chip in each node.
  • RTLB Remote Translation Lookaside Buffer: a conversion table from a local physical address to a remote node number and a remote node physical address
  • memory capacity can be provided or contributed to other server nodes
  • the nodes are called contributing nodes or contributors; the memory space used by the contributing nodes to provide other nodes in the data center is called the contributed memory; the nodes that apply and use these contributed memory are called request nodes or requesters.
  • FIG. 3a shows a schematic diagram of implementing resource on-demand allocation.
  • FIG. 3a shows four independent traditional computer nodes, and after processing by the above technology of the present invention, various resources are separately formed into a computing cloud, a memory cloud, a storage cloud, and an interconnected cloud, as shown in FIG. 3b. This eliminates the concept and boundaries of traditional computer nodes.
  • resource scheduling and on-demand allocation multiple virtual servers are formed.
  • the resources in the dotted line on the left side of Figure 3b form the first virtual server
  • the second virtual server formed by the virtual line frame on the right side performs effective resource scheduling and sharing with the first one.
  • Embodiment 1 The above briefly describes how the present invention enables remote memory to be used between nodes, that is, how to make the dynamic allocation of memory within the cluster become a reality. Then, further, how to allocate remote memory reasonably and efficiently is the problem to be solved by the present invention. The following is a detailed description: Embodiment 1
  • FIG. 4 it is a flowchart of a method for an example of the present invention, where the method is a method for allocating a remote memory, and is characterized in that it is used for a server node cluster, and the method includes:
  • the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes, and the connection relationship between the nodes can also be said to be a topology structure between the nodes.
  • this node distribution table may have only one and is stored in a global administrator (e.g., a resource management system).
  • a global administrator e.g., a resource management system.
  • There are many ways to record the topology structure for example: You can directly record the connection between two nodes; or, from each node, record all nodes directly or indirectly connected to the node. It is even possible to record only all the nodes directly connected to the node (the connection between two nodes can be pushed out).
  • the resource management system is responsible for the allocation of remote memory, and the resource management system is centralized in a server node or distributed management, and is responsible for collecting, managing, and allocating resource resources of each node, including memory resources contributed by each node, and Initialize, set, and update the RTLB in the cloud control chip on each node.
  • the basic idea in the allocation is to consider the distance between the requester and the provider, that is, the shortest path first principle.
  • the global resource management system monitors each node's parameters (including physical location, memory usage, load status, health status, etc.) in real time, and then selects the appropriate memory provider based on the physical location of the requester.
  • the resource manager system selects the provider to provide memory within a certain distance (such as nodes within a number of hops in the network).
  • a certain distance such as nodes within a number of hops in the network.
  • the requester is centered, and the node that can be a contributor is searched according to the distance and the remote memory is allocated to the requester, and the distance includes other nodes. The number of hops to the requester's route.
  • the distance described in this embodiment mainly includes the hops of other nodes to the route of the requester, or may also be regarded as the number of times of communication forwarding.
  • a hop refers to a host (or router) to the next router. Since the present invention uses a cloud control chip with embedded routing function, hop can also refer to a service from one server node to the next in the present invention. Node. From the perspective of the topology structure, the distance between the two nodes is the minimum number of edges through which the two nodes are connected. If only one edge needs to pass, the two nodes are directly connected.
  • the preferred method for finding a node that can be a contributor and assigning a remote memory to the requester according to the distance may include:
  • step a judging whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor according to the order from small to large, and if present, the contributor may The contributed memory is allocated to the requester. If it does not exist, let i increase by 1, and execute step a.
  • the preferred step c is to sequentially determine, in the order from small to large, whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together.
  • the requested memory capacity is less than or equal to a memory capacity that can be contributed by one of the neighboring nodes, then the comparison is stopped, and the node is used as the contributor;
  • the requested memory capacity is greater than a memory capacity that can be contributed by any one of the neighboring nodes, selecting two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity.
  • the selected two or more nodes are taken as the contributor, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
  • the method may further include:
  • step a it is judged whether i reaches the preset threshold, and if so, the execution is stopped and the application failure information is returned.
  • the requesting node initiates a remote memory request containing the requested memory capacity. Then, the resource management system first searches for neighboring neighboring nodes that are one hop away from the requesting node, and collects information of the neighboring nodes that can be contributed to the memory, and sorts the capacity according to the capacity from small to large, thereby avoiding waste of memory.
  • the request memory capacity is compared with the contributable memory capacity from small to large: if the requested memory capacity is less than the contributionable memory capacity of a node, the comparison is stopped, and the contributing memory area is marked to the requesting node; If the requested memory capacity is greater than the contributable memory capacity of a single node, two or more nodes are selected among the neighboring nodes in one hop to jointly contribute memory to satisfy the requested capacity. Choosing a strategy should try to ensure that the number of contributing nodes used is the smallest.
  • the resource management system searches for a peripheral node that is two hops from the requesting node, and can be one hop and two hop nodes. Information that contributes to memory is counted and sorted by size from small to large. If the total capacity of the contributed memory in the neighboring nodes of one hop and two hops is greater than or equal to the requested capacity, compare the requested memory capacity with the contributable memory capacity from small to large, and select the contributing node, which may be specifically based on the above-mentioned "one hop" Inside the "method to pick.
  • Figure 6 is a detailed flow chart for allocating remote memory in a near and far manner as described above.
  • the method may further include:
  • the memory capacity that the contributor has contributed to the requestor is notified, and the memory capacity that the contributor can contribute is changed in the node distribution table.
  • the method may further include:
  • the resource management system can consider that the contributing node has stopped contributing and needs to delete the corresponding entry in the table. In addition, if a node is using the memory of the node, the resource management system still needs to use the node.
  • the method may further include:
  • information gathering means may be further added: the cloud control chip collects link bandwidth utilization and memory utilization on the current node, and then transmits the above information to each chip through the cloud control chip.
  • the service processor core, and then the service processor core cooperates with the resource management scheduling system to dynamically schedule resources such as bandwidth and memory.
  • a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory.
  • the strategy that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used.
  • FIG. 7 is a schematic diagram showing changes in the node distribution table corresponding to the execution of each step in the second embodiment of the present invention:
  • Figure 7 At some point, Core6 needs extra 8G of memory, but its own memory can't meet the demand, so it makes a memory allocation request to the resource management system.
  • the resource management system classifies the nodes according to the distance between each node and the requester, and can be divided into a lhop range, a 2hop range...
  • "empty 4G” indicates that there is 4G memory space on this node
  • "4G” indicates that 4G memory is being used on this node.
  • Figure 8 The resource management system first applies physical memory allocation to the requesting node from a node of similar distance. It does not require a node to satisfy the request at one time, and can provide partial help, the unit is 1G. For example, first make a request to core2.
  • FIG. 9 Core2 can only provide 4G. After provisioning, Core2 and Core6 have to modify the memory configuration table in the memory controller, and the resource management system continues to make requests to Core5.
  • Figure 10 Core5 provides the remaining 4G to the requester, and the memory configuration table in the Core5 and Core6 memory controllers is changed accordingly. At this point, an assignment task ends, and as long as the resource management system processes only one assignment task at a time, consistency can be guaranteed.
  • Figure 11 When Core6 no longer needs to take up someone else's physical memory, apply to the resource management system.
  • the resource management system releases the memory of the corresponding node through the memory configuration option in the Core6 memory manager.
  • Core2, Core5 release the memory allocated to Core6 before, and modify the memory configuration table in the memory controller.
  • a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory.
  • the strategy that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used.
  • FIG. 12 is a schematic diagram of a device according to a third embodiment of the present invention.
  • the device is provided with a device 1200 for allocating a remote memory, and the device 1200 includes:
  • the node distribution table establishing unit 1201 is configured to establish a node distribution table, where the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes;
  • the memory request judging unit 1202 is configured to determine whether a node requests the remote memory as a requester, and if so, triggers the memory allocating unit 1203;
  • the memory allocating unit 1203 is configured to, in the node distribution table, center on the requester, find a node that can be a contributor according to the distance, and allocate a remote memory to the requester, where the distance includes other The number of hops for the route of the node to the requester.
  • the memory allocating unit 1203 may specifically include:
  • a node sorting subunit configured to sort the peripheral nodes according to a contributionable memory from small to large
  • a memory selection subunit configured to determine, in the order of the small to large, whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester individually or together as a contributor, if present, The memory that the contributor can contribute is allocated to the requester, and if not, the control subunit is triggered;
  • the flow control subunit is used to increment i by 1, and then trigger the memory sum judgment subunit.
  • the memory selection subunit is configured to sequentially determine, in the order from small to large, whether there is one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor. , specifically for:
  • the comparison is stopped, and the node is used as the contributor
  • the selected two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
  • the memory allocating unit 1203 may further include:
  • the end judging subunit is configured to judge whether the i reaches the preset threshold before triggering the memory sum judgment subunit, and if yes, stop executing and return the application failure information.
  • the device 1200 may further include:
  • the memory change response unit 1204 is configured to notify the size of the memory that the contributor has contributed to the requester, and change the memory size that the contributor can contribute in the node distribution table.
  • the device 1200 may further include:
  • the node maintenance unit 1205 is configured to periodically send a status request signal to each node in the service node cluster, and if the node returns a normal heartbeat signal, maintain the existence of the node in the node distribution table, otherwise, Delete the node in the node distribution table.
  • the device 1200 may further include:
  • the exception handling unit 1206 is configured to receive a message that the request sent by the requester to obtain the requested remote memory does not obtain a response, and re-trigger the memory allocation unit.
  • the device embodiment since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment.
  • the device embodiments described above are merely illustrative and may be located in one place or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
  • a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory.
  • the strategy that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used.
  • the invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Abstract

The embodiment of the present invention discloses a method and a device for allocating remote memory, said method and device being used for a server node cluster, the method comprising: establishing a node distribution table, the node distribution table comprising the contributable memory size of each node and the connection relationships between each node; when a node, as a requestor, requests distribution of remote memory, searching, within the node distribution table centered on the requestor and according to distance, proceeding from near to far, for a node that can serve as a contributor and then distributing remote memory for the requestor; the distance comprising the routing hop count of other nodes to the requestor. The embodiment of the present invention firstly creates both a physical location exhibiting each server node and a node distribution table that can contribute memory size, then prioritizes the shortest path and allocates remote memory to the requestor. Not only can the requestor obtain the needed memory capacity, but the obtained memory is also more efficient during use owing to the advantage of the shortest path.

Description

一种分配^^内存的方法 SL^置 本申请要求于 2013 年 5 月 17 日提交中国专利局、 申请号为 201310186194.4、发明名称为 "一种分配远程内存的方法及装置" 的中国专利 申请的优先权, 上述专利申请的全部内容通过引用结合在本申请中。  A method for distributing ^^ memory is provided in Chinese patent application filed on May 17, 2013 by the Chinese Patent Office, Application No. 201310186194.4, entitled "A Method and Apparatus for Allocating Remote Memory" The entire contents of the above-identified patent application are incorporated herein by reference.
技术领域 Technical field
本发明涉及云计算领域, 尤其是涉及一种分配远程内存的方法及装置。  The present invention relates to the field of cloud computing, and in particular, to a method and apparatus for allocating remote memory.
背景技术 Background technique
云计算产生的背景之一, 便是帮助企业来处理海量的数据, 越来越多的 数据也需要云计算的产品来帮助进行实时分析。 而在做海量数据处理时, 需 要有大内存容量的机器, 所以云计算负载应用对大内存有强烈需求。 而另一 方面, 云计算所借助的数据中心根据规模不同, 通常由几十台、 几百台、 甚 至上万台计算机服务器节点组成, 这些服务器节点大部分都是釆用市场上的 商用标准服务器, 它们拥有各自独立的处理器、 私有的内存区域以及独占的 硬盘存储空间。 显然单台的服务器节点是无法满足云计算对内存的需要的, 而当在集群范围内实现了各个服务器节点可以使用其他远程节点的内存时, 具体该如何在众多的节点范围内合理、 高效地分配远程内存, 是一个不小的 难题。  One of the backgrounds of cloud computing is to help enterprises deal with massive amounts of data. More and more data also requires cloud computing products to help with real-time analysis. In the case of massive data processing, machines with large memory capacity are required, so cloud computing load applications have a strong demand for large memory. On the other hand, cloud computing relies on data centers that vary in size, typically consisting of dozens, hundreds, or even tens of thousands of computer server nodes, most of which use commercial standard servers on the market. They have separate processors, private memory areas, and exclusive hard disk storage. Obviously, a single server node cannot meet the memory requirements of cloud computing. When it is implemented within the cluster, each server node can use the memory of other remote nodes, how to reasonably and efficiently within a large number of nodes. Allocating remote memory is a big problem.
发明内容 Summary of the invention
有鉴于此, 本发明实施例的目的是提供一种分配远程内存的方法及装置, 以解决集群范围内无法高效地分配远程内存的问题。  In view of this, an object of embodiments of the present invention is to provide a method and apparatus for allocating remote memory to solve the problem that the remote memory cannot be efficiently allocated in a cluster.
一方面, 本发明实施例提供了一种分配远程内存的方法, 用于服务器节 点集群, 所述方法包括:  In one aspect, an embodiment of the present invention provides a method for allocating remote memory, which is used for a server node cluster, and the method includes:
建立节点分布表, 所述节点分布表包括每个节点可贡献内存大小以及各 节点之间的连接关系; Establishing a node distribution table, the node distribution table including each node can contribute memory size and each The connection relationship between nodes;
判断是否有节点作为请求者请求分配远程内存,  Determine if a node requests the remote memory as a requester.
若有, 则在所述节点分布表内以所述请求者为中心、 根据距离由近及远 寻找可以作为贡献者的节点并为所述请求者分配远程内存, 所述距离包括其 他节点到所述请求者的路由的跳数。  If so, in the node distribution table, the requester is centered, and the node that can be a contributor is searched for from the near and the distance according to the distance, and the remote memory is allocated to the requester, and the distance includes other nodes to the location. The number of hops of the requester's route.
优选的, 所述根据距离由近及远寻找可以作为贡献者的节点并为所述请 求者分配远程内存, 包括:  Preferably, the finding a node that can be a contributor from near to far according to the distance and allocating remote memory to the requester includes:
a、 判断距离所述请求者 i 跳范围内的周边节点可贡献内存的总和是否 大于等于所述请求者请求的内存, 若是, 则继续步骤 b, 若否, 则令 i增 1, 并执行步骤 a, 其中 i为自然数且初始时 i=1;  a. determining whether the sum of the memory that can be contributed by the neighboring node within the range of the requester i is greater than or equal to the memory requested by the requester, and if yes, proceeding to step b, if not, increasing i by 1, and performing steps a, where i is a natural number and i=1 at the beginning;
b、 将所述周边节点按照可贡献内存从小到大排序;  b. sorting the peripheral nodes according to the contributionable memory from small to large;
c、 按照所述从小到大的顺序判断所述周边节点中是否存在可以单独或共 同提供请求者所需内存的一个或多个周边节点以作为贡献者, 若存在, 则将 所述贡献者可以贡献的内存分配给所述请求者, 若不存在, 则令 i增 1, 执行 步骤 a。  c. judging whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor according to the order from small to large, and if present, the contributor may The contributed memory is allocated to the requester. If it does not exist, increment i by 1, and perform step a.
优选的, 所述按照所述从小到大的顺序依次判断所述周边节点中是否存 在可以单独或共同提供请求者所需内存的一个或多个周边节点以作为贡献 者, 包括:  Preferably, the judging whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester separately or jointly as the contributor in the order of the small to large, includes:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存与所述请 求者所请求的内存进行比较;  Comparing the contributable memory of the peripheral node with the memory requested by the requester one by one according to the order from small to large;
若所述请求的内存小于等于所述周边节点中一节点可贡献的内存, 则停 止比较, 并将这一节点作为所述贡献者;  If the requested memory is less than or equal to the memory that can be contributed by one of the neighboring nodes, then the comparison is stopped, and the node is used as the contributor;
若所述请求的内存均大于所述周边节点中任一节点可贡献的内存, 则在 所述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请求的内存 容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选择的选择 策略为保证所选节点的数目最少。  If the requested memory is greater than the memory that can be contributed by any of the neighboring nodes, selecting two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
优选的, 所述方法还包括:  Preferably, the method further includes:
在步骤 a之前, 判断 i是否达到预设的阔值, 若是, 则停止执行并返回申 请失败信息。 Before step a, it is judged whether i reaches the preset threshold, and if so, the execution is stopped and the application is returned. Please fail the information.
优选的, 所述方法还包括:  Preferably, the method further includes:
在所述为所述请求者分配远程内存之后, 通知所述贡献者已贡献给所述 请求者的内存大小, 并在所述节点分布表中更改所述贡献者可贡献的内存大 小。  After the remote memory is allocated to the requester, the memory size that the contributor has contributed to the requestor is notified, and the memory size that the contributor can contribute is changed in the node distribution table.
优选的, 所述方法还包括:  Preferably, the method further includes:
在所述建立节点分布表之后, 周期性地向所述服务节点集群内的各个节 点发送状态请求信号;  After the node distribution table is established, a status request signal is periodically sent to each node in the service node cluster;
若节点返回正常心跳信号, 则在所述节点分布表中维持该节点的存在, 否则, 从所述节点分布表中删除该节点。  If the node returns a normal heartbeat signal, the presence of the node is maintained in the node distribution table, otherwise the node is deleted from the node distribution table.
优选的, 所述方法还包括:  Preferably, the method further includes:
在所述为所述请求者分配远程内存之后, 接收所述请求者发送的获取所 分配的远程内存的请求没有获得响应的消息;  After the remote memory is allocated to the requester, receiving a message that the requester obtains the request for acquiring the allocated remote memory does not obtain a response;
根据所述方法重新为所述请求者寻找贡献者并分配远程内存。  Re-seeking the requester for the requester and allocating remote memory according to the method.
另一方面, 本发明实施例还提供了一种分配远程内存的装置, 用于服务 器节点集群, 所述装置包括:  On the other hand, an embodiment of the present invention further provides an apparatus for allocating a remote memory, which is used for a server node cluster, and the apparatus includes:
节点分布表建立单元, 用于建立节点分布表, 所述节点分布表包括每个 节点可贡献内存大小以及各节点之间的连接关系;  a node distribution table establishing unit, configured to establish a node distribution table, where the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes;
内存请求判断单元, 用于判断是否有节点作为请求者请求分配远程内存, 若有, 则触发内存分配单元;  a memory request judging unit, configured to determine whether a node requests the remote memory as a requester, and if so, triggers the memory allocation unit;
内存分配单元, 用于在所述节点分布表内以所述请求者为中心、 根据距 离由近及远寻找可以作为贡献者的节点并为所述请求者分配远程内存, 所述 距离包括其他节点到所述请求者的路由的跳数。  a memory allocation unit, configured to search for a node that can be a contributor from the near and far distances in the node distribution table, and allocate remote memory to the requester according to the distance, where the distance includes other nodes The number of hops to the requester's route.
优选的, 所述内存分配单元包括:  Preferably, the memory allocation unit comprises:
内存总和判断子单元, 用于判断距离所述请求者 i跳范围内的周边节点 可贡献内存的总和是否大于等于所述请求者请求的内存, 若是, 则触发节点 排序子单元, 若否, 则触发控制子单元, 其中 i为自然数且初始时 i=1 ;  a memory sum determination subunit, configured to determine whether a sum of memory that can be contributed by a peripheral node within a range of the requester i hop is greater than or equal to a memory requested by the requester, and if so, trigger a node sorting subunit, and if not, Trigger control subunit, where i is a natural number and i=1 at the beginning;
节点排序子单元, 用于将所述周边节点按照可贡献内存从小到大排序; 内存选择子单元, 用于按照所述从小到大的顺序判断所述周边节点中是 否存在可以单独或共同提供请求者所需内存的一个或多个周边节点以作为贡 献者, 若存在, 则将所述贡献者可以贡献的内存分配给所述请求者, 若不存 在, 则触发控制子单元; a node sorting subunit, configured to sort the peripheral nodes according to a contributionable memory from small to large; a memory selection subunit, configured to determine, in the order of the small to large, whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester individually or together as a contributor, if present, The memory that the contributor can contribute is allocated to the requester, and if not, the control subunit is triggered;
流程控制子单元, 用于令 i增 1, 然后触发内存总和判断子单元。  The flow control subunit is used to increment i by 1, and then trigger the memory sum judgment subunit.
优选的, 所述内存选择子单元用于按照所述从小到大的顺序依次判断所 述周边节点中是否存在可以单独或共同提供请求者所需内存的一个或多个周 边节点以作为贡献者时, 具体用于:  Preferably, the memory selection subunit is configured to sequentially determine, in the order from small to large, whether there is one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor. , specifically for:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存与所述请 求者所请求的内存进行比较;  Comparing the contributable memory of the peripheral node with the memory requested by the requester one by one according to the order from small to large;
若所述请求的内存小于等于所述周边节点中一节点可贡献的内存, 则停 止比较, 并将这一节点作为所述贡献者;  If the requested memory is less than or equal to the memory that can be contributed by one of the neighboring nodes, then the comparison is stopped, and the node is used as the contributor;
若所述请求的内存均大于所述周边节点中任一节点可贡献的内存, 则在 所述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请求的内存 容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选择的选择 策略为保证所选节点的数目最少。  If the requested memory is greater than the memory that can be contributed by any of the neighboring nodes, selecting two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
优选的, 所述内存分配单元还包括:  Preferably, the memory allocation unit further includes:
结束判断子单元,用于在触发内存总和判断子单元之前判断 i是否达到预 设的阔值, 若是, 则停止执行并返回申请失败信息。  The end judging subunit is configured to judge whether the i reaches the preset threshold before triggering the memory sum judgment subunit, and if so, stop executing and return the application failure information.
优选的, 所述装置还包括:  Preferably, the device further includes:
内存变更响应单元, 用于通知所述贡献者已贡献给所述请求者的内存大 小, 并在所述节点分布表中更改所述贡献者可贡献的内存大小。  And a memory change response unit, configured to notify a memory size that the contributor has contributed to the requester, and change a memory size that the contributor can contribute in the node distribution table.
优选的, 所述装置还包括:  Preferably, the device further includes:
节点维护单元, 用于周期性地向所述服务节点集群内的各个节点发送状 态请求信号, 若节点返回正常心跳信号, 则在所述节点分布表中维持该节点 的存在, 否则, 从所述节点分布表中删除该节点。  a node maintenance unit, configured to periodically send a status request signal to each node in the service node cluster, and if the node returns a normal heartbeat signal, maintain the existence of the node in the node distribution table, otherwise, from the The node is removed from the node distribution table.
优选的, 所述装置还包括:  Preferably, the device further includes:
异常处理单元, 用于接收所述请求者发送的请求获取所分配的远程内存 的请求没有获得响应的消息, 并重新触发内存分配单元。 An exception handling unit, configured to receive the request sent by the requester to obtain the allocated remote memory The request did not get a response to the message and retriggered the memory allocation unit.
本发明实施例首先通过资源管理***创建并维护一张可以体现出集群内 各服务器节点的物理位置及可贡献内存大小的节点分布表, 然后在分配远程 内存的时候使用一种对物理位置敏感的分配策略, 即考虑请求者和提供者之 间的距离远近, 按照最短路径优先的思路为请求者分配远程内存, 从而不但 使请求者可以获取到所需的内存容量, 更使得所获取到的内存因为最短路径 的优势在使用时效率更高。  The embodiment of the present invention first creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed by the resource management system, and then uses a physical location sensitive when allocating remote memory. The allocation strategy, that is, considering the distance between the requester and the provider, assigns the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity and the obtained memory. Because the advantage of the shortest path is more efficient when used.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1是基于云控制芯片的***组件示意图;  1 is a schematic diagram of system components based on a cloud control chip;
图 2是云控制芯片的组成与架构示意图;  2 is a schematic diagram of the composition and architecture of a cloud control chip;
图 3a-3b是实现资源按需分配的示意图;  Figure 3a-3b is a schematic diagram of implementing resource on-demand allocation;
图 4是本发明实例一方法的流程图;  4 is a flow chart of a method of an example of the present invention;
图 5是节点分布表拓朴结构示意图;  Figure 5 is a schematic diagram of a topology structure of a node distribution table;
图 6是按照由近及远地方式分配远程内存的详细流程图;  Figure 6 is a detailed flow chart for allocating remote memory in a near and far manner;
图 7~11是本发明实施例二中执行各步骤时所对应的节点分布表变化示意 图;  7 to 11 are schematic diagrams showing changes in a node distribution table corresponding to each step in the second embodiment of the present invention;
图 12是本发明实施例三装置的示意图。  Figure 12 is a schematic illustration of a third embodiment of the present invention.
具体实施方式 detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 本发明的方法是建立在集群范围内各服务器节点可以彼此借用远程内存 这一基础之上的, 故下面先对如何实现服务器节点之间借用远程内存做一个 简要的说明: The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention. The method of the present invention is based on the fact that each server node can borrow remote memory from each other in the cluster, so the following is a brief description of how to implement remote memory borrowing between server nodes:
在现有技术中, 当需要借助远程内存时可以釆用远程直接数据存取 In the prior art, remote direct data access can be used when remote memory is required.
( RDMA )技术。 RDMA让计算机可以直接存取其它计算机的内存, 而不需 要经过处理器耗时的传输, 具有使一台计算机直接将数据通过网络传送到另 一台计算机内存中的特性, 它将数据从一个***快速移动到远程***存储器 中,而不对操作***造成任何影响。图 1展示了 RDMA数据流的传输示意过程。 相比传统的緩冲区拷贝技术(Buffer copy )和带有 DMA引擎的緩冲区拷贝技 术( Buffer copy with DMA engine ) , RDMA技术通过消除外部存储器复制和 文本交换操作, 因而能腾出总线空间和 CPU周期用于改进应用***性能, 从 而减少对带宽和处理器开销的需要, 显著降低了时延。 (RDMA) technology. RDMA allows computers to directly access the memory of other computers without the time-consuming transfer of the processor. It has the feature of allowing one computer to directly transfer data over the network to another computer's memory. It uses data from a system. Quickly move to remote system memory without any impact on the operating system. Figure 1 shows the schematic flow of the transfer of RDMA data streams. RDMA technology frees up bus space by eliminating external memory copy and text swap operations compared to traditional Buffer copy and Buffer copy with DMA engine And CPU cycles are used to improve application performance, reducing the need for bandwidth and processor overhead, significantly reducing latency.
但是, RDMA是一种点对点协议, 不但需要在每台服务器上均安装专用 网卡(如高端以太网卡或者 IB网卡) , 致使 RDMA的硬件实现成本较高, 而 且更重要的是, RDMA技术并不能在数据中心中对远程内存资源进行合理地、 高效地分配, 即无法在集群范围内实现各个节点动态地使用其他远程节点的 内存。  However, RDMA is a point-to-point protocol. Not only does it need to install a dedicated network card (such as a high-end Ethernet card or an IB network card) on each server, the hardware implementation cost of RDMA is high, and more importantly, RDMA technology cannot be used. The remote memory resources are allocated reasonably and efficiently in the data center, that is, the nodes cannot dynamically use the memory of other remote nodes in the cluster.
在本发明中, 为了实现服务器节点之间可以借用远程内存, 即为了实现 在数据中心中动态分配资源特别是内存资源, 本发明使所有节点都包含一颗 云控制芯片, 并使用相同的接口和***中其它节点相连。 一个板子上可以集 成不同的处理器和其它部件。 云控制芯片提供 PCIe和自主设计的接口用来连 接处理器芯片。 自主设计的接口为处理器芯片和云控制芯片直接通信做了优 化, 用来连接自主可控的处理器芯片。 PCIe接口可以连接任何带 PCIe端口处 理器芯片, 它的通信效率受 PCIe所限制, 但是接纳市场上绝大部分的 CPU芯 片和 GPU、 FPGA等其他计算加速芯片。 譬如, 支持通用性的***可以连接主 流 x86服务器芯片, 支持高性能计算的***可以连接 GPU芯片。 需要大内存的 ***, 可以按需釆用 "内存节点" 。 一个内存节点没有处理器芯片, 只有一 块或数块云控制芯片。 可参见图 la、 lb所示, 图 la、 lb均是基于云控制芯片的 ***组件示意图, 图 la为 ARM计算节点, 图 lb为大容量内存节点。 本发明提出的远程内存使用方法和机制依靠云控制芯片和资源管理*** 来实现。 它们管理数据中心服务器***中的计算、 内存、 互连等资源, 形成 计算云、 内存云、 10云。 云控制芯片主要由片上交换网络, 一体化内存控制 模块, I/O设备虚拟化控制器, 硬件实现的通信协议栈, 以及 PCIe接口和服务 处理核等模块组成, 图 2示出了云控制芯片的组成与架构。 In the present invention, in order to realize that the remote memory can be borrowed between the server nodes, that is, in order to dynamically allocate resources, especially memory resources, in the data center, the present invention includes all the nodes including a cloud control chip, and uses the same interface and Other nodes in the system are connected. Different processors and other components can be integrated on one board. The cloud control chip provides PCIe and an independently designed interface for connecting the processor chip. The self-designed interface is optimized for direct communication between the processor chip and the cloud control chip to connect the autonomously controllable processor chip. The PCIe interface can be connected to any processor with a PCIe port. Its communication efficiency is limited by PCIe, but it accepts most CPU chips on the market and other computing acceleration chips such as GPUs and FPGAs. For example, systems that support versatility can connect to mainstream x86 server chips, and systems that support high-performance computing can connect to GPU chips. For systems that require large amounts of memory, you can use "memory nodes" as needed. A memory node has no processor chip, only one or several cloud control chips. Referring to Figures la and lb, Figures la and lb are schematic diagrams of system components based on the cloud control chip, Figure la is an ARM compute node, and Figure lb is a large-capacity memory node. The remote memory usage method and mechanism proposed by the present invention are implemented by a cloud control chip and a resource management system. They manage computing, memory, interconnect, and other resources in a data center server system to form a computing cloud, a memory cloud, and a cloud. The cloud control chip is mainly composed of an on-chip switching network, an integrated memory control module, an I/O device virtualization controller, a hardware-implemented communication protocol stack, and a PCIe interface and a service processing core. FIG. 2 shows a cloud control chip. Composition and architecture.
本发明中, 远程内存的使用通过软件 API ( Application Programming Interface, 应用程序编程接口)进行: 首先通过向资源管理***发起申请, 通 过互连网络, 动态将多个节点间的资源有机组合起来, 实现虚拟服务器; 之 后, 一旦组合成功, 这些物理资源被虚拟服务器独占式使用; 在使用完毕后, 需要将上述资源归还给资源管理***。 在硬件实现上, 通过增加对应的硬件 管理和地址翻译机制, 以及高速 I/O通信栈来保障远程的节点访问。  In the present invention, the use of the remote memory is performed through a software API (Application Programming Interface): first, by initiating an application to the resource management system, and dynamically interconnecting resources among the plurality of nodes through the interconnection network, Virtual server; After the combination is successful, these physical resources are exclusively used by the virtual server; after use, the above resources need to be returned to the resource management system. In hardware implementation, remote node access is guaranteed by adding corresponding hardware management and address translation mechanisms, as well as high-speed I/O communication stacks.
在此过程中, 数据中心的资源管理***可以集中位于某服务器节点或分 布式管理, 负责收集、 管理、 分配各个节点资源, 包括各个节点贡献出来的 内存资源, 并对各个节点上云控制芯片内的 RTLB ( Remote Translation Lookaside Buffer: 一种由本地物理地址至远程节点编号和远程节点物理地址 的转换表)进行初始化、 设置和更新等操作; 本发明中, 可向其他服务器节 点提供或贡献内存容量的节点称为贡献节点或贡献者; 由贡献节点提供数据 中心范围内其他节点使用的内存空间称为被贡献内存; 申请和使用这些被贡 献内存的节点称为请求节点或请求者。  In this process, the resource management system of the data center can be centralized in a server node or distributed management, which is responsible for collecting, managing, and allocating the resources of each node, including the memory resources contributed by each node, and the cloud control chip in each node. RTLB (Remote Translation Lookaside Buffer: a conversion table from a local physical address to a remote node number and a remote node physical address) for initializing, setting, and updating; in the present invention, memory capacity can be provided or contributed to other server nodes The nodes are called contributing nodes or contributors; the memory space used by the contributing nodes to provide other nodes in the data center is called the contributed memory; the nodes that apply and use these contributed memory are called request nodes or requesters.
图 3a、 3b给出了实现资源按需分配的示意图。 其中图 3a示出了 4***立的 传统计算机节点, 而经过本发明以上技术处理后, 各种资源通过整合分别形 成计算云、 内存云、 存储云以及互连云, 参见图 3b所示。 这样就不再有传统 计算机节点的概念和边界。 通过资源调度和按需分配, 就形成了多台虚拟服 务器。 比如, 图 3b左侧虚线框内的资源形成了第一台虚拟服务器, 而右侧虚 线框形成的第二台虚拟服务器则与第一台进行了有效的资源调度和共享。  Figure 3a, 3b shows a schematic diagram of implementing resource on-demand allocation. FIG. 3a shows four independent traditional computer nodes, and after processing by the above technology of the present invention, various resources are separately formed into a computing cloud, a memory cloud, a storage cloud, and an interconnected cloud, as shown in FIG. 3b. This eliminates the concept and boundaries of traditional computer nodes. Through resource scheduling and on-demand allocation, multiple virtual servers are formed. For example, the resources in the dotted line on the left side of Figure 3b form the first virtual server, and the second virtual server formed by the virtual line frame on the right side performs effective resource scheduling and sharing with the first one.
以上内容简要描述了本发明是如何使各节点之间能够使用远程内存, 即 如何使集群范围内的内存动态分配成为现实。 那么接下来更进一步的, 如何 合理、 高效地去分配远程内存, 便是本发明要着重解决的问题。 下面进行详 细描述: 实施例一 The above briefly describes how the present invention enables remote memory to be used between nodes, that is, how to make the dynamic allocation of memory within the cluster become a reality. Then, further, how to allocate remote memory reasonably and efficiently is the problem to be solved by the present invention. The following is a detailed description: Embodiment 1
参见图 4, 为本发明实例一方法的流程图, 所述方法为一种分配远程内存 的方法, 其特征在于, 用于服务器节点集群, 所述方法包括:  Referring to FIG. 4, it is a flowchart of a method for an example of the present invention, where the method is a method for allocating a remote memory, and is characterized in that it is used for a server node cluster, and the method includes:
5401、 建立节点分布表, 所述节点分布表包括每个节点可贡献内存大小 以及各节点之间的连接关系。  5401. Establish a node distribution table, where the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes.
在本实施例中, 节点分布表包括每个节点可贡献内存大小以及各节点之 间的连接关系, 其中各节点之间的连接关系也可以说是各节点之间的拓朴结 构。 在本发明某些实施例中, 这个节点分布表可以只有一张, 而且保存在全 局管理者(如资源管理***)那里。 而具体如何记录所述拓朴结构可以有很 多种方式, 例如: 可以直接记录下所有节点间两两之间的连接关系; 或者从 每个节点出发, 记录所有与该节点直接或间接相连的节点, 甚至也可以只记 录所有与该节点直连的节点(可以推出来所有节点间两两之间的连接关系) 。 在本实施例中, 是由资源管理***负责远程内存的分配, 资源管理***集中 位于某服务器节点或分布式管理, 负责收集、 管理、 分配各个节点资源, 包 括各个节点贡献出来的内存资源, 并对各个节点上云控制芯片内的 RTLB进行 初始化、 设置和更新等操作。 在分配时的基本思路是考虑请求者和提供者之 间的距离远近, 即最短路径优先原则。 涵盖全局的资源管理***实时监控每 个节点的参数(包括物理位置, 内存使用情况, 负载情况, 健康状况等) , 然后依据请求者的物理位置来选取合适的内存提供者。 资源管理者***在满 足一定距离范围内 (如网络中若干 hop之内的节点)去选取提供者提供内存。 所述节点分布表用拓朴结构表示时可参见图 5所示, 图中 Corel、 Core2等代表 各个服务器节点的处理器核, 即 CPU。  In this embodiment, the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes, and the connection relationship between the nodes can also be said to be a topology structure between the nodes. In some embodiments of the invention, this node distribution table may have only one and is stored in a global administrator (e.g., a resource management system). There are many ways to record the topology structure, for example: You can directly record the connection between two nodes; or, from each node, record all nodes directly or indirectly connected to the node. It is even possible to record only all the nodes directly connected to the node (the connection between two nodes can be pushed out). In this embodiment, the resource management system is responsible for the allocation of remote memory, and the resource management system is centralized in a server node or distributed management, and is responsible for collecting, managing, and allocating resource resources of each node, including memory resources contributed by each node, and Initialize, set, and update the RTLB in the cloud control chip on each node. The basic idea in the allocation is to consider the distance between the requester and the provider, that is, the shortest path first principle. The global resource management system monitors each node's parameters (including physical location, memory usage, load status, health status, etc.) in real time, and then selects the appropriate memory provider based on the physical location of the requester. The resource manager system selects the provider to provide memory within a certain distance (such as nodes within a number of hops in the network). When the node distribution table is represented by a topology structure, reference may be made to FIG. 5, in which Corel, Core2, and the like represent processor cores of respective server nodes, that is, CPUs.
5402、 判断是否有节点作为请求者请求分配远程内存。  5402. Determine whether a node requests the remote memory to be allocated as a requester.
5403、 若有, 则在所述节点分布表内以所述请求者为中心、 根据距离由 近及远寻找可以作为贡献者的节点并为所述请求者分配远程内存, 所述距离 包括其他节点到所述请求者的路由的跳数。  5403. If yes, in the node distribution table, the requester is centered, and the node that can be a contributor is searched according to the distance and the remote memory is allocated to the requester, and the distance includes other nodes. The number of hops to the requester's route.
本实施例所述的距离, 主要包括其他节点到所述请求者的路由的跳数 ( hop ) , 或者也可以看成是通信转发的次数。 在计算机网络中, 一个 hop是 指从一个主机(或路由器)到下一个路由器。 由于本发明使用内嵌路由功能 的云控制芯片, 因此 hop在本发明中也可以指从一个服务器节点到下一个服务 器节点。 从拓朴结构的角度来看, 两节点间的距离即将这两节点相连所经过 的最少的边数, 若只需要经过一条边, 则这两节点即直接相连。 在本发明某 些实施例中, 优选的所述根据距离由近及远寻找可以作为贡献者的节点并为 所述请求者分配远程内存, 可以具体包括: The distance described in this embodiment mainly includes the hops of other nodes to the route of the requester, or may also be regarded as the number of times of communication forwarding. In a computer network, a hop refers to a host (or router) to the next router. Since the present invention uses a cloud control chip with embedded routing function, hop can also refer to a service from one server node to the next in the present invention. Node. From the perspective of the topology structure, the distance between the two nodes is the minimum number of edges through which the two nodes are connected. If only one edge needs to pass, the two nodes are directly connected. In some embodiments of the present invention, the preferred method for finding a node that can be a contributor and assigning a remote memory to the requester according to the distance may include:
a、判断距离所述请求者 范围内的周边节点可贡献内存的总和是否大 于等于所述请求者请求的内存, 若是, 则继续步骤 b, 若否, 则令 i增 1, 并执 行步骤 a, 其中 i为自然数且初始时 i=l;  a. determining whether the sum of the memory that can be contributed by the neighboring nodes in the range of the requester is greater than or equal to the memory requested by the requester, and if yes, proceeding to step b, if not, increasing i by 1, and performing step a, Where i is a natural number and i = l at the beginning;
b、 将所述周边节点按照可贡献内存从小到大排序;  b. sorting the peripheral nodes according to the contributionable memory from small to large;
c、 按照所述从小到大的顺序判断所述周边节点中是否存在可以单独或共 同提供请求者所需内存的一个或多个周边节点以作为贡献者, 若存在, 则将 所述贡献者可以贡献的内存分配给所述请求者, 若不存在, 则令 i增 1, 执行步 a。  c. judging whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor according to the order from small to large, and if present, the contributor may The contributed memory is allocated to the requester. If it does not exist, let i increase by 1, and execute step a.
在本发明某些实施例中, 优选的步骤 c中所述按照所述从小到大的顺序依 次判断所述周边节点中是否存在可以单独或共同提供请求者所需内存的一个 或多个周边节点以作为贡献者, 可以具体包括:  In some embodiments of the present invention, the preferred step c is to sequentially determine, in the order from small to large, whether there are one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together. As a contributor, you can specifically include:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存容量与所 述请求者所请求的内存容量进行比较;  Comparing the contributable memory capacity of the peripheral node to the memory capacity requested by the requester one by one in the order from small to large;
若所述请求的内存容量小于等于所述周边节点中一节点可贡献的内存容 量, 则停止比较, 并将这一节点作为所述贡献者;  If the requested memory capacity is less than or equal to a memory capacity that can be contributed by one of the neighboring nodes, then the comparison is stopped, and the node is used as the contributor;
若所述请求的内存容量均大于所述周边节点中任一节点可贡献的内存容 量, 则在所述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请 求的内存容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选 择的选择策略为保证所选的节点的数目最少。  If the requested memory capacity is greater than a memory capacity that can be contributed by any one of the neighboring nodes, selecting two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity. The selected two or more nodes are taken as the contributor, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
此外, 在本发明某些实施例中, 所述方法还可以包括:  In addition, in some embodiments of the present invention, the method may further include:
在步骤 a之前, 判断 i是否达到预设的阔值, 若是, 则停止执行并返回申请 失败信息。  Before step a, it is judged whether i reaches the preset threshold, and if so, the execution is stopped and the application failure information is returned.
经过上面的叙述可以看出: 首先是由请求节点发起远程内存请求, 内含 请求的内存容量。 然后资源管理***先查找距离请求节点一跳的周边邻近节 点, 将邻近节点中可被贡献内存的信息进行统计, 并按容量从小至大排序, 可以避免内存的浪费。 如果一跳的邻近节点中被贡献内存的总容量大于或等 于请求容量, 则将请求内存容量与可贡献内存容量从小至大进行逐一比较: 如果请求内存容量小于某节点的可贡献内存容量, 则停止比较, 并将此可贡 献内存区域标记给请求节点; 如果请求内存容量大于单一节点的可贡献内存 容量, 则在一跳的邻近节点中选择两个或多个节点, 共同贡献内存, 以满足 请求容量。 选择策略, 应尽量保证使用的贡献节点数目最小。 如果一跳的邻 近节点没有可贡献的内存空间或可贡献的内存总容量不足以满足请求总量, 则资源管理***查找距离请求节点两跳的周边节点, 将一跳和两跳节点中可 被贡献内存的信息进行统计, 并按容量从小至大排序。 如果一跳和两跳的邻 近节点中被贡献内存的总容量大于或等于请求容量, 则将请求内存容量与可 贡献内存容量从小至大进行逐一比较, 选择贡献节点, 具体可依据上述 "一 跳内" 的办法来选取。 如果两跳距离内的节点依旧不能满足请求, 则查找三 跳、 四跳、 甚至更远的周边节点, 依此类推, 直至满足请求节点的请求或达 到阔值。 图 6是按照以上由近及远地方式分配远程内存的详细流程图。 As can be seen from the above description: First, the requesting node initiates a remote memory request containing the requested memory capacity. Then, the resource management system first searches for neighboring neighboring nodes that are one hop away from the requesting node, and collects information of the neighboring nodes that can be contributed to the memory, and sorts the capacity according to the capacity from small to large, thereby avoiding waste of memory. If the total capacity of the contributed memory in the neighboring node of one hop is greater than or equal In the request capacity, the request memory capacity is compared with the contributable memory capacity from small to large: if the requested memory capacity is less than the contributionable memory capacity of a node, the comparison is stopped, and the contributing memory area is marked to the requesting node; If the requested memory capacity is greater than the contributable memory capacity of a single node, two or more nodes are selected among the neighboring nodes in one hop to jointly contribute memory to satisfy the requested capacity. Choosing a strategy should try to ensure that the number of contributing nodes used is the smallest. If the neighboring node of one hop does not have a memory space that can contribute or the total memory capacity that can be contributed is insufficient to satisfy the total amount of requests, the resource management system searches for a peripheral node that is two hops from the requesting node, and can be one hop and two hop nodes. Information that contributes to memory is counted and sorted by size from small to large. If the total capacity of the contributed memory in the neighboring nodes of one hop and two hops is greater than or equal to the requested capacity, compare the requested memory capacity with the contributable memory capacity from small to large, and select the contributing node, which may be specifically based on the above-mentioned "one hop" Inside the "method to pick. If the node within the two-hop distance still cannot satisfy the request, it looks for the three-hop, four-hop, and even further peripheral nodes, and so on, until the requesting node's request or the threshold is met. Figure 6 is a detailed flow chart for allocating remote memory in a near and far manner as described above.
此外在本发明某些实施例中, 所述方法还可以包括:  In addition, in some embodiments of the present invention, the method may further include:
在所述为所述请求者分配远程内存之后, 通知所述贡献者已贡献给所述 请求者的内存容量, 并在所述节点分布表中更改所述贡献者可贡献的内存容 量。  After the remote memory is allocated to the requester, the memory capacity that the contributor has contributed to the requestor is notified, and the memory capacity that the contributor can contribute is changed in the node distribution table.
在本发明某些实施例中, 所述方法还可以包括:  In some embodiments of the present invention, the method may further include:
在所述根据每个节点可贡献内存大小和物理位置建立节点分布表之后: 1 )周期性地向所述服务节点集群内的各个节点发送状态请求信号;  After the node distribution table is established according to the memory size and the physical location that each node can contribute: 1) periodically transmitting a status request signal to each node in the service node cluster;
2 )若节点返回正常心跳信号,则在所述节点分布表中维持该节点的存在, 否则, 从所述节点分布表中删除该节点。 换句话说, 如果贡献节点没有返回 应答信号 (如此节点宕机) , 则资源管理***可认为此贡献节点已经停止贡 献, 需要删除表中的相应条目。 此外, 如果有节点正在使用该节点的内存, 则资源管理***还需通使用节点。  2) If the node returns a normal heartbeat signal, the presence of the node is maintained in the node distribution table, otherwise the node is deleted from the node distribution table. In other words, if the contributing node does not return a reply signal (so the node is down), the resource management system can consider that the contributing node has stopped contributing and needs to delete the corresponding entry in the table. In addition, if a node is using the memory of the node, the resource management system still needs to use the node.
在本发明某些实施例中, 所述方法还可以包括:  In some embodiments of the present invention, the method may further include:
在为所述请求者分配远程内存之后:  After allocating remote memory to the requestor:
i )接收所述请求者发送的获取所分配的远程内存的请求没有获得响应的 消息;  i) receiving a message sent by the requester to obtain the allocated remote memory without receiving a response;
ϋ )根据所述方法重新为所述请求者寻找贡献者并分配远程内存。 换句话说, 在资源管理***发现贡献节点停止贡献(如发生了单点故障) 之前, 请求者可能已经发现向贡献者发出的远程内存请求没有获得响应, 贝' J 请求者不再向此贡献者发出请求, 同时通知资源管理***, 并请求在周边其 他贡献节点中为其分配距离最近且满足容量需求的内存。以上 1 ) ~2 )或 i ) ~ii ) 可以分别使用或联合使用, 形成本实施例的单点故障解决机制。 ϋ) Re-seeking the requester for the requester and allocating remote memory according to the method. In other words, before the resource management system finds that the contributing node stops contributing (if a single point of failure occurs), the requester may have found that the remote memory request sent to the contributor did not get a response, and the B'J requester no longer contributes to it. The request is made, and the resource management system is notified at the same time, and it is requested to allocate the nearest memory in the surrounding other contributing nodes and satisfy the capacity requirement. The above 1) to 2) or i) to ii) may be used separately or in combination to form a single point failure resolution mechanism of the present embodiment.
此外, 在本发明某些实施例中可以再增加信息收集手段: 由云控制芯片 收集当前节点上的链路带宽利用率和内存利用率, 然后通过云控制芯片传递 上述信息至每个芯片中的服务处理器核, 再由服务处理器核配合资源管理调 度***对带宽、 内存等资源进行动态调度。  In addition, in some embodiments of the present invention, information gathering means may be further added: the cloud control chip collects link bandwidth utilization and memory utilization on the current node, and then transmits the above information to each chip through the cloud control chip. The service processor core, and then the service processor core cooperates with the resource management scheduling system to dynamically schedule resources such as bandwidth and memory.
本实施例首先通过资源管理***创建并维护一张可以体现出集群内各服 务器节点的物理位置及可贡献内存大小的节点分布表, 然后在分配远程内存 的时候使用一种对物理位置敏感的分配策略, 即考虑请求者和提供者之间的 距离远近, 按照最短路径优先的思路为请求者分配远程内存, 从而不但使请 求者可以获取到所需的内存容量, 更使得所获取到的内存因为最短路径的优 势在使用时效率更高。 实施例二  In this embodiment, a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory. The strategy, that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used. Embodiment 2
下面以上述实施例为基础并结合具体场景对本发明做进一步阐述, 参见 图 7〜图 11所示, 是本发明实施例二中执行各步骤时所对应的节点分布表变化 示意图:  The present invention will be further described below based on the above-mentioned embodiments and in combination with specific scenarios. Referring to FIG. 7 to FIG. 11 , FIG. 7 is a schematic diagram showing changes in the node distribution table corresponding to the execution of each step in the second embodiment of the present invention:
图 7: 某一时刻, Core6需要额外的 8G内存, 可是自身内存已经满足不了 需求, 于是向资源管理***提出内存分配请求。 资源管理***根据各个节点 与请求者的距离, 将节点进行分类, 可以分为 lhop范围、 2hop范围....。 图中 "空 4G"表示本节点上有 4G内存空余, "用 4G"表示本节点上有 4G内存正在 被使用。  Figure 7: At some point, Core6 needs extra 8G of memory, but its own memory can't meet the demand, so it makes a memory allocation request to the resource management system. The resource management system classifies the nodes according to the distance between each node and the requester, and can be divided into a lhop range, a 2hop range... In the figure, "empty 4G" indicates that there is 4G memory space on this node, and "4G" indicates that 4G memory is being used on this node.
图 8: 资源管理***首先从距离比较近的那类节点上申请物理内存分配 给请求节点, 并不要求某一个节点一次性满足请求, 可以提供部分帮助, 单 位为 1G。 例如先向 core2发出请求。  Figure 8: The resource management system first applies physical memory allocation to the requesting node from a node of similar distance. It does not require a node to satisfy the request at one time, and can provide partial help, the unit is 1G. For example, first make a request to core2.
图 9: Core2只能提供 4G, 提供后, Core2、 Core6都得修改内存控制器当 中的内存配置表, 同时资源管理***继续向 Core5发出请求。 图 10: Core5提供剩余 4G给请求者, 同时, Core5和 Core6内存控制器中的 内存配置表进行相应的更改。 至此, 一个分配任务结束, 只要资源管理*** 一次只处理一个分配任务, 那么即可保证一致性。 Figure 9: Core2 can only provide 4G. After provisioning, Core2 and Core6 have to modify the memory configuration table in the memory controller, and the resource management system continues to make requests to Core5. Figure 10: Core5 provides the remaining 4G to the requester, and the memory configuration table in the Core5 and Core6 memory controllers is changed accordingly. At this point, an assignment task ends, and as long as the resource management system processes only one assignment task at a time, consistency can be guaranteed.
图 11 : 当 Core6不再需要占用别人的物理内存时, 向资源管理***提出申 请。 资源管理***通过 Core6内存管理器当中的内存配置选项, 释放相应节点 的内存, Core2、 Core5释放之前分配给 Core6的内存, 同时修改内存控制器当 中的内存配置表。  Figure 11: When Core6 no longer needs to take up someone else's physical memory, apply to the resource management system. The resource management system releases the memory of the corresponding node through the memory configuration option in the Core6 memory manager. Core2, Core5 release the memory allocated to Core6 before, and modify the memory configuration table in the memory controller.
本实施例首先通过资源管理***创建并维护一张可以体现出集群内各服 务器节点的物理位置及可贡献内存大小的节点分布表, 然后在分配远程内存 的时候使用一种对物理位置敏感的分配策略, 即考虑请求者和提供者之间的 距离远近, 按照最短路径优先的思路为请求者分配远程内存, 从而不但使请 求者可以获取到所需的内存容量, 更使得所获取到的内存因为最短路径的优 势在使用时效率更高。 实施例三  In this embodiment, a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory. The strategy, that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used. Embodiment 3
图 12是本发明实施例三装置的示意图, 本实施例与上述两个方法实施例 相对应, 提供了一种分配远程内存的装置 1200, 用于服务器节点集群, 所述 装置 1200包括:  FIG. 12 is a schematic diagram of a device according to a third embodiment of the present invention. The device is provided with a device 1200 for allocating a remote memory, and the device 1200 includes:
节点分布表建立单元 1201, 用于建立节点分布表, 所述节点分布表包括 每个节点可贡献内存大小以及各节点之间的连接关系;  The node distribution table establishing unit 1201 is configured to establish a node distribution table, where the node distribution table includes a memory size that each node can contribute and a connection relationship between the nodes;
内存请求判断单元 1202, 用于判断是否有节点作为请求者请求分配远程 内存, 若有, 则触发内存分配单元 1203;  The memory request judging unit 1202 is configured to determine whether a node requests the remote memory as a requester, and if so, triggers the memory allocating unit 1203;
内存分配单元 1203, 用于在所述节点分布表内以所述请求者为中心、 根 据距离由近及远寻找可以作为贡献者的节点并为所述请求者分配远程内存, 所述距离包括其他节点到所述请求者的路由的跳数。  The memory allocating unit 1203 is configured to, in the node distribution table, center on the requester, find a node that can be a contributor according to the distance, and allocate a remote memory to the requester, where the distance includes other The number of hops for the route of the node to the requester.
优选的, 所述内存分配单元 1203具体可以包括:  Preferably, the memory allocating unit 1203 may specifically include:
内存总和判断子单元, 用于判断距离所述请求者 i跳范围内的周边节点 可贡献内存的总和是否大于等于所述请求者请求的内存, 若是, 则触发节点 排序子单元, 若否, 则触发控制子单元, 其中 i为自然数且初始时 i=l ;  a memory sum determination subunit, configured to determine whether a sum of memory that can be contributed by a peripheral node within a range of the requester i hop is greater than or equal to a memory requested by the requester, and if so, trigger a node sorting subunit, and if not, Trigger control subunit, where i is a natural number and initially i=l;
节点排序子单元, 用于将所述周边节点按照可贡献内存从小到大排序; 内存选择子单元, 用于按照所述从小到大的顺序判断所述周边节点中是 否存在可以单独或共同提供请求者所需内存的一个或多个周边节点以作为贡 献者, 若存在, 则将所述贡献者可以贡献的内存分配给所述请求者, 若不存 在, 则触发控制子单元; a node sorting subunit, configured to sort the peripheral nodes according to a contributionable memory from small to large; a memory selection subunit, configured to determine, in the order of the small to large, whether there are one or more peripheral nodes in the peripheral node that can provide the memory required by the requester individually or together as a contributor, if present, The memory that the contributor can contribute is allocated to the requester, and if not, the control subunit is triggered;
流程控制子单元, 用于令 i增 1, 然后触发内存总和判断子单元。  The flow control subunit is used to increment i by 1, and then trigger the memory sum judgment subunit.
优选的, 所述内存选择子单元用于按照所述从小到大的顺序依次判断所 述周边节点中是否存在可以单独或共同提供请求者所需内存的一个或多个周 边节点以作为贡献者时, 具体用于:  Preferably, the memory selection subunit is configured to sequentially determine, in the order from small to large, whether there is one or more peripheral nodes in the neighboring node that can provide the memory required by the requester separately or together as a contributor. , specifically for:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存与所述请 求者所请求的内存进行比较;  Comparing the contributable memory of the peripheral node with the memory requested by the requester one by one according to the order from small to large;
若所述请求的内存小于等于所述周边节点中一节点可贡献的内存, 则停 止比较, 并将这一节点作为所述贡献者;  If the requested memory is less than or equal to the memory that can be contributed by one of the neighboring nodes, then the comparison is stopped, and the node is used as the contributor;
若所述请求的内存均大于所述周边节点中任一节点可贡献的内存, 则在 所述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请求的内存 容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选择的选择 策略为保证所选节点的数目最少。  If the requested memory is greater than the memory that can be contributed by any of the neighboring nodes, selecting two or more nodes in the peripheral node to jointly contribute memory to satisfy the requested memory capacity, and The selected two or more nodes are the contributors, wherein the selected selection strategy is to ensure the minimum number of selected nodes.
优选的, 所述内存分配单元 1203具体还可以包括:  Preferably, the memory allocating unit 1203 may further include:
结束判断子单元, 用于在触发内存总和判断子单元之前判断 i是否达到预 设的阔值, 若是, 则停止执行并返回申请失败信息。  The end judging subunit is configured to judge whether the i reaches the preset threshold before triggering the memory sum judgment subunit, and if yes, stop executing and return the application failure information.
优选的, 所述装置 1200还可以包括:  Preferably, the device 1200 may further include:
内存变更响应单元 1204, 用于通知所述贡献者已贡献给所述请求者的内 存大小, 并在所述节点分布表中更改所述贡献者可贡献的内存大小。  The memory change response unit 1204 is configured to notify the size of the memory that the contributor has contributed to the requester, and change the memory size that the contributor can contribute in the node distribution table.
优选的, 所述装置 1200还可以包括:  Preferably, the device 1200 may further include:
节点维护单元 1205, 用于周期性地向所述服务节点集群内的各个节点发 送状态请求信号, 若节点返回正常心跳信号, 则在所述节点分布表中维持该 节点的存在, 否则, 从所述节点分布表中删除该节点。  The node maintenance unit 1205 is configured to periodically send a status request signal to each node in the service node cluster, and if the node returns a normal heartbeat signal, maintain the existence of the node in the node distribution table, otherwise, Delete the node in the node distribution table.
优选的, 所述装置 1200还可以包括:  Preferably, the device 1200 may further include:
异常处理单元 1206, 用于接收所述请求者发送的请求获取所分配的远程 内存的请求没有获得响应的消息, 并重新触发内存分配单元。 对于装置实施例而言, 由于其基本对应于方法实施例, 所以相关之处参 见方法实施例的部分说明即可。 以上所描述的装置实施例仅仅是示意性的, 可以位于一个地方, 或者也可以分布到多个网络单元上。 可以才艮据实际的需 要选择其中的部分或者全部单元来实现本实施例方案的目的。 本领域普通技 术人员在不付出创造性劳动的情况下, 即可以理解并实施。 The exception handling unit 1206 is configured to receive a message that the request sent by the requester to obtain the requested remote memory does not obtain a response, and re-trigger the memory allocation unit. For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment. The device embodiments described above are merely illustrative and may be located in one place or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
本实施例首先通过资源管理***创建并维护一张可以体现出集群内各服 务器节点的物理位置及可贡献内存大小的节点分布表, 然后在分配远程内存 的时候使用一种对物理位置敏感的分配策略, 即考虑请求者和提供者之间的 距离远近, 按照最短路径优先的思路为请求者分配远程内存, 从而不但使请 求者可以获取到所需的内存容量, 更使得所获取到的内存因为最短路径的优 势在使用时效率更高。 本发明可以在由计算机执行的计算机可执行指令的一般上下文中描述, 例如程序模块。 一般地, 程序模块包括执行特定任务或实现特定抽象数据类 型的例程、 程序、 对象、 组件、 数据结构等等。 也可以在分布式计算环境中 实践本发明, 在这些分布式计算环境中, 由通过通信网络而被连接的远程处 理设备来执行任务。 在分布式计算环境中, 程序模块可以位于包括存储设备 在内的本地和远程计算机存储介质中。  In this embodiment, a resource management system firstly creates and maintains a node distribution table that can reflect the physical location of each server node in the cluster and the size of the memory that can be contributed, and then uses a physical location-sensitive allocation when allocating remote memory. The strategy, that is, considering the distance between the requester and the provider, assigning the remote memory to the requester according to the shortest path first idea, so that the requester can obtain the required memory capacity, and the obtained memory is The advantage of the shortest path is more efficient when used. The invention may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including storage devices.
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步 骤是可以通过程序来指令相关的硬件来完成, 所述的程序可以存储于计算机 可读取存储介质中, 这里所称得的存储介质, 如: ROM、 RAM, 磁碟、 光盘 等。  A person skilled in the art can understand that all or part of the steps in implementing the above method embodiments can be completed by a program instructing related hardware, and the program can be stored in a computer readable storage medium, which is referred to herein. Storage media, such as: ROM, RAM, disk, CD, etc.
还需要说明的是, 在本文中, 诸如第一和第二等之类的关系术语仅仅用 来将一个实体或者操作与另一个实体或操作区分开来, 而不一定要求或者暗 示这些实体或操作之间存在任何这种实际的关系或者顺序。 而且, 术语 "包 括" 、 "包含" 或者其任何其他变体意在涵盖非排他性的包含, 从而使得包 括一系列要素的过程、 方法、 物品或者设备不仅包括那些要素, 而且还包括 没有明确列出的其他要素, 或者是还包括为这种过程、 方法、 物品或者设备 所固有的要素。 在没有更多限制的情况下, 由语句 "包括一个 ... ... " 限定的 要素, 并不排除在包括所述要素的过程、 方法、 物品或者设备中还存在另外 的相同要素。 It should also be noted that, in this context, relational terms such as first and second, etc. are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or order between them. Furthermore, the terms "including", "comprising" or "comprising" or "includes" or "includes" or "includes" or "includes" or "includes" Other elements, or elements that are inherent to such a process, method, item, or device. In the absence of more restrictions, the statement "includes a ..." The elements are not excluded from the existence of additional equivalent elements in the process, method, article, or device.
以上所述仅为本发明的较佳实施例而已, 并非用于限定本发明的保护范 施例的说明只是用于帮助理解本发明的方法及其核心思想; 同时, 对于本领 域的一般技术人员, 依据本发明的思想, 在具体实施方式及应用范围上均会 有改变之处。 综上所述, 本说明书内容不应理解为对本发明的限制。 凡在本 发明的精神和原则之内所作的任何修改、 等同替换、 改进等, 均包含在本发 明的保护范围内。  The above description is only the preferred embodiment of the present invention, and the description of the protective embodiment of the present invention is not intended to be helpful only for understanding the method of the present invention and its core idea; According to the idea of the present invention, there will be changes in the specific embodiments and application scopes. In summary, the content of the specification should not be construed as limiting the invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are included in the scope of the present invention.

Claims

权利要求 书 claims
1、 一种分配远程内存的方法, 用于服务器节点集群, 其特征在于, 所述方 法包括: 1. A method of allocating remote memory for server node clusters, characterized in that the method includes:
建立节点分布表, 所述节点分布表包括每个节点可贡献内存大小以及各节 点之间的连接关系; Establish a node distribution table, which includes the memory size that each node can contribute and the connection relationship between each node;
判断是否有节点作为请求者请求分配远程内存, Determine whether there is a node as a requester requesting remote memory allocation,
若有, 则在所述节点分布表内以所述请求者为中心、 根据距离由近及远寻 找可以作为贡献者的节点并为所述请求者分配远程内存, 所述距离包括其他节 点到所述请求者的路由的跳数。 If so, search for nodes that can be contributors in the node distribution table with the requester as the center, and allocate remote memory to the requester according to distance from other nodes to all nodes. The hop count of the requester's route.
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据距离由近及远寻找 可以作为贡献者的节点并为所述请求者分配远程内存, 包括: 2. The method according to claim 1, characterized in that, searching for nodes that can be contributors according to distance from near to far and allocating remote memory to the requester includes:
a、 判断距离所述请求者 i 跳范围内的周边节点可贡献内存的总和是否大 于等于所述请求者请求的内存, 若是, 则继续步骤 b, 若否, 则令 i增 1, 并执 行步骤 a, 其中 i为自然数且初始时 i=1; a. Determine whether the sum of memory that can be contributed by surrounding nodes within i hops from the requester is greater than or equal to the memory requested by the requester. If so, proceed to step b. If not, increase i by 1 and execute steps a, where i is a natural number and initially i=1;
b、 将所述周边节点按照可贡献内存从小到大排序; b. Sort the surrounding nodes according to the memory they can contribute from small to large;
c、 按照所述从小到大的顺序判断所述周边节点中是否存在可以单独或共同 提供请求者所需内存的一个或多个周边节点以作为贡献者, 若存在, 则将所述 贡献者可以贡献的内存分配给所述请求者, 若不存在, 则令 i增 1, 执行步骤3。 c. Determine whether there are one or more peripheral nodes among the peripheral nodes that can individually or jointly provide the memory required by the requester as contributors in the order from small to large. If there are, then the contributors can The contributed memory is allocated to the requester. If it does not exist, increase i by 1 and perform step 3.
3、 根据权利要求 2所述的方法, 其特征在于, 所述按照所述从小到大的顺 序依次判断所述周边节点中是否存在可以单独或共同提供请求者所需内存的一 个或多个周边节点以作为贡献者, 包括: 3. The method according to claim 2, characterized in that, in order from small to large, it is determined whether there are one or more peripheral nodes in the peripheral nodes that can individually or jointly provide the memory required by the requester. Nodes as contributors include:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存与所述请求 者所请求的内存进行比较; According to the order from small to large, compare the contributeable memory of the surrounding nodes with the memory requested by the requester one by one;
若所述请求的内存小于等于所述周边节点中一节点可贡献的内存, 则停止 比较, 并将这一节点作为所述贡献者; If the requested memory is less than or equal to the memory that one of the surrounding nodes can contribute, stop the comparison and use this node as the contributor;
若所述请求的内存均大于所述周边节点中任一节点可贡献的内存, 则在所 述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请求的内存容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选择的选择策略为保 证所选节点的数目最少。 If the requested memory is greater than the memory that any node in the surrounding nodes can contribute, then select two or more nodes in the surrounding nodes to jointly contribute memory to meet the requested memory capacity, and Two or more nodes are selected as the contributors, where the selected selection strategy is to preserve Ensure that the number of selected nodes is the minimum.
4、 根据权利要求 2所述的方法, 其特征在于, 所述方法还包括: 在步骤 a之前, 判断 i是否达到预设的阔值, 若是, 则停止执行并返回申请 失败信息。 4. The method according to claim 2, characterized in that the method further includes: before step a, determining whether i reaches a preset threshold, and if so, stopping execution and returning application failure information.
5、 根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 在所述为所述请求者分配远程内存之后, 通知所述贡献者已贡献给所述请 求者的内存大小, 并在所述节点分布表中更改所述贡献者可贡献的内存大小。 5. The method according to claim 1, characterized in that, the method further includes: after allocating remote memory to the requester, notifying the contributor of the memory size that has been contributed to the requester, And change the memory size that the contributor can contribute in the node distribution table.
6、 根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 在所述建立节点分布表之后, 周期性地向所述服务节点集群内的各个节点 发送状态请求信号; 6. The method according to claim 1, characterized in that, the method further includes: after establishing the node distribution table, periodically sending status request signals to each node in the service node cluster;
若节点返回正常心跳信号, 则在所述节点分布表中维持该节点的存在, 否 则, 从所述节点分布表中删除该节点。 If the node returns a normal heartbeat signal, the existence of the node is maintained in the node distribution table, otherwise, the node is deleted from the node distribution table.
7、 根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 在所述为所述请求者分配远程内存之后, 接收所述请求者发送的获取所分 配的远程内存的请求没有获得响应的消息; 7. The method according to claim 1, characterized in that, the method further comprises: after allocating the remote memory to the requester, receiving a request from the requester to obtain the allocated remote memory. Get the response message;
根据所述方法重新为所述请求者寻找贡献者并分配远程内存。 Re-find contributors for the requester and allocate remote memory according to the method.
8、 一种分配远程内存的装置, 用于服务器节点集群, 其特征在于, 所述装 置包括: 8. A device for allocating remote memory for server node clusters, characterized in that the device includes:
节点分布表建立单元, 用于建立节点分布表, 所述节点分布表包括每个节 点可贡献内存大小以及各节点之间的连接关系; A node distribution table creation unit is used to establish a node distribution table. The node distribution table includes the memory size that each node can contribute and the connection relationship between each node;
内存请求判断单元, 用于判断是否有节点作为请求者请求分配远程内存, 若有, 则触发内存分配单元; The memory request judgment unit is used to judge whether there is a node as a requester requesting remote memory allocation. If so, trigger the memory allocation unit;
内存分配单元, 用于在所述节点分布表内以所述请求者为中心、 根据距离 由近及远寻找可以作为贡献者的节点并为所述请求者分配远程内存, 所述距离 包括其他节点到所述请求者的路由的跳数。 A memory allocation unit, used to find nodes that can be contributors in the node distribution table with the requester as the center, according to distance from near to far, and allocate remote memory to the requester, where the distance includes other nodes The hop count of the route to the requester.
9、 根据权利要求 8所述的装置, 其特征在于, 所述内存分配单元包括: 内存总和判断子单元, 用于判断距离所述请求者 i跳范围内的周边节点可 贡献内存的总和是否大于等于所述请求者请求的内存, 若是, 则触发节点排序 子单元, 若否, 则触发控制子单元, 其中 i为自然数且初始时 i=1 ; 9. The device according to claim 8, characterized in that, the memory allocation unit includes: a memory sum determination subunit, used to determine whether surrounding nodes within i hop range from the requester can Whether the sum of the contributed memory is greater than or equal to the memory requested by the requester, if so, trigger the node sorting sub-unit, if not, trigger the control sub-unit, where i is a natural number and initially i=1;
节点排序子单元, 用于将所述周边节点按照可贡献内存从小到大排序; 内存选择子单元, 用于按照所述从小到大的顺序判断所述周边节点中是否 存在可以单独或共同提供请求者所需内存的一个或多个周边节点以作为贡献 者, 若存在, 则将所述贡献者可以贡献的内存分配给所述请求者, 若不存在, 则触发控制子单元; The node sorting subunit is used to sort the peripheral nodes according to the memory they can contribute from small to large; the memory selection subunit is used to determine whether there are any of the peripheral nodes that can provide requests individually or jointly in the order from small to large. One or more surrounding nodes that require the memory required by the requester serve as contributors. If it exists, the memory that the contributor can contribute is allocated to the requester. If it does not exist, the control subunit is triggered;
流程控制子单元, 用于令 i增 1, 然后触发内存总和判断子单元。 The process control subunit is used to increase i by 1 and then trigger the memory sum judgment subunit.
10、 根据权利要求 9所述的装置, 其特征在于, 所述内存选择子单元用于 按照所述从小到大的顺序依次判断所述周边节点中是否存在可以单独或共同提 供请求者所需内存的一个或多个周边节点以作为贡献者时, 具体用于: 10. The device according to claim 9, wherein the memory selection subunit is used to determine in order from small to large whether there is a memory required by the requester in the surrounding nodes that can provide the memory required by the requester individually or jointly. When one or more surrounding nodes serve as contributors, it is specifically used for:
按照所述从小到大的顺序, 逐一将所述周边节点的可贡献内存与所述请求 者所请求的内存进行比较; According to the order from small to large, compare the contributeable memory of the surrounding nodes with the memory requested by the requester one by one;
若所述请求的内存小于等于所述周边节点中一节点可贡献的内存, 则停止 比较, 并将这一节点作为所述贡献者; If the requested memory is less than or equal to the memory that one of the surrounding nodes can contribute, stop the comparison and use this node as the contributor;
若所述请求的内存均大于所述周边节点中任一节点可贡献的内存, 则在所 述周边节点中选择两个或多个节点共同贡献内存, 以满足所述请求的内存容量, 并将所选择的两个或多个节点作为所述贡献者, 其中所述选择的选择策略为保 证所选节点的数目最少。 If the requested memory is greater than the memory that any node in the surrounding nodes can contribute, then select two or more nodes in the surrounding nodes to jointly contribute memory to meet the requested memory capacity, and Two or more nodes are selected as the contributors, wherein the selection strategy is to ensure that the number of selected nodes is the minimum.
1 1、 根据权利要求 9所述的装置, 其特征在于, 所述内存分配单元还包括: 结束判断子单元,用于在触发内存总和判断子单元之前判断 i是否达到预设 的阔值, 若是, 则停止执行并返回申请失败信息。 1 1. The device according to claim 9, wherein the memory allocation unit further includes: an end judgment subunit, used to judge whether i reaches a preset threshold before triggering the memory sum judgment subunit, and if so , then stop execution and return application failure information.
12、 根据权利要求 8所述的装置, 其特征在于, 所述装置还包括: 内存变更响应单元, 用于通知所述贡献者已贡献给所述请求者的内存大小, 并在所述节点分布表中更改所述贡献者可贡献的内存大小。 12. The device according to claim 8, characterized in that, the device further includes: a memory change response unit, used to notify the contributor of the memory size that has been contributed to the requester, and distribute it on the node Table changes the amount of memory that contributors can contribute.
13、 根据权利要求 8所述的装置, 其特征在于, 所述装置还包括: 节点维护单元, 用于周期性地向所述服务节点集群内的各个节点发送状态 请求信号, 若节点返回正常心跳信号, 则在所述节点分布表中维持该节点的存 在, 否则, 从所述节点分布表中删除该节点。 13. The device according to claim 8, characterized in that, the device further includes: a node maintenance unit, configured to periodically send status request signals to each node in the service node cluster. If the node returns a normal heartbeat signal, the node is maintained in the node distribution table. Otherwise, delete the node from the node distribution table.
14、 根据权利要求 8所述的装置, 其特征在于, 所述装置还包括: 异常处理单元, 用于接收所述请求者发送的请求获取所分配的远程内存的 请求没有获得响应的消息, 并重新触发内存分配单元。 14. The device according to claim 8, characterized in that, the device further includes: an exception handling unit, configured to receive a message sent by the requester that the request to obtain the allocated remote memory has not received a response, and Retrigger the memory allocation unit.
PCT/CN2014/075674 2013-05-17 2014-04-18 Method and device for allocating remote memory WO2014183531A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310186194.4 2013-05-17
CN201310186194.4A CN104166597B (en) 2013-05-17 2013-05-17 A kind of method and device for distributing long-distance inner

Publications (1)

Publication Number Publication Date
WO2014183531A1 true WO2014183531A1 (en) 2014-11-20

Family

ID=51897678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/075674 WO2014183531A1 (en) 2013-05-17 2014-04-18 Method and device for allocating remote memory

Country Status (2)

Country Link
CN (1) CN104166597B (en)
WO (1) WO2014183531A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101288A1 (en) * 2014-12-27 2016-06-30 华为技术有限公司 Remote direct memory accessmethod, device and system
CN105808448A (en) * 2014-12-30 2016-07-27 中兴通讯股份有限公司 Memory management method and system
CN104572569A (en) * 2015-01-21 2015-04-29 江苏微锐超算科技有限公司 ARM (Algorithmic Remote Manipulation) and FPGA (Field Programmable Gate Array)-based high performance computing node and computing method
CN107003904A (en) * 2015-04-28 2017-08-01 华为技术有限公司 A kind of EMS memory management process, equipment and system
CN106471482B (en) * 2015-06-19 2019-05-03 华为技术有限公司 A kind of optical-switch control method and device
CN105739965B (en) * 2016-01-18 2019-03-05 深圳先进技术研究院 A kind of assemble method of the ARM mobile phone cluster based on RDMA
CN109388490B (en) * 2017-08-07 2020-11-17 华为技术有限公司 Memory allocation method and server
CN107908474A (en) * 2017-10-27 2018-04-13 郑州云海信息技术有限公司 A kind of Memory Allocation application method and system based on rdma protocol
CN111007987A (en) * 2019-11-08 2020-04-14 苏州浪潮智能科技有限公司 Memory management method, system, terminal and storage medium for raid io
CN111913907A (en) * 2020-08-13 2020-11-10 上海钜成锐讯科技有限公司 FPGA clustering method, FPGA chip and FPGA clustering system
CN115495246B (en) * 2022-09-30 2023-04-18 上海交通大学 Hybrid remote memory scheduling method under separated memory architecture
CN116436978B (en) * 2023-06-13 2023-08-29 苏州浪潮智能科技有限公司 Cloud computing-oriented memory allocation method, memory acquisition method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001331457A (en) * 2000-05-19 2001-11-30 Ricoh Co Ltd Distributed shared memory system
US20050238035A1 (en) * 2004-04-27 2005-10-27 Hewlett-Packard System and method for remote direct memory access over a network switch fabric
CN101158927A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 EMS memory sharing system, device and method
CN101277252A (en) * 2007-03-30 2008-10-01 迈普(四川)通信技术有限公司 Method for traversing multi-branch Trie tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001331547A (en) * 2000-05-22 2001-11-30 Ge Toshiba Silicones Co Ltd Method, system, server operation method, server, client operation method, and client for building joint design, and computer-readable recording medium with program corresponding to building joint design recorded thereon

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001331457A (en) * 2000-05-19 2001-11-30 Ricoh Co Ltd Distributed shared memory system
US20050238035A1 (en) * 2004-04-27 2005-10-27 Hewlett-Packard System and method for remote direct memory access over a network switch fabric
CN101277252A (en) * 2007-03-30 2008-10-01 迈普(四川)通信技术有限公司 Method for traversing multi-branch Trie tree
CN101158927A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 EMS memory sharing system, device and method

Also Published As

Publication number Publication date
CN104166597B (en) 2018-07-03
CN104166597A (en) 2014-11-26

Similar Documents

Publication Publication Date Title
WO2014183531A1 (en) Method and device for allocating remote memory
US20200241927A1 (en) Storage transactions with predictable latency
EP3754511B1 (en) Multi-protocol support for transactions
TWI543073B (en) Method and system for work scheduling in a multi-chip system
US9244880B2 (en) Automatic construction of deadlock free interconnects
US9769077B2 (en) QoS in a system with end-to-end flow control and QoS aware buffer allocation
US9672167B2 (en) Resource management for peripheral component interconnect-express domains
US9086919B2 (en) Fabric independent PCIe cluster manager
TWI519958B (en) Method and apparatus for memory allocation in a multi-node system
CN111813330B (en) System and method for dispatching input-output
TWI547870B (en) Method and system for ordering i/o access in a multi-node environment
TWI541649B (en) System and method of inter-chip interconnect protocol for a multi-chip system
TW201543218A (en) Chip device and method for multi-core network processor interconnect with multi-node connection
US20210326221A1 (en) Network interface device management of service execution failover
US20140289728A1 (en) Apparatus, system, method, and storage medium
US10380041B2 (en) Fabric independent PCIe cluster manager
WO2014101502A1 (en) Memory access processing method based on memory chip interconnection, memory chip, and system
US10038767B2 (en) Technologies for fabric supported sequencers in distributed architectures
CN112416538A (en) Multilayer architecture and management method of distributed resource management framework
KR102663318B1 (en) System and method for intelligent path selection and load balancing
WO2024087663A1 (en) Job scheduling method and apparatus, and chip
US20240160487A1 (en) Flexible gpu resource scheduling method in large-scale container operation environment
KR20240064613A (en) System and method for intelligent path selection and load balancing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14797034

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14797034

Country of ref document: EP

Kind code of ref document: A1