WO2023109554A1 - 分布式***的数据处理方法、***、节点和存储介质 - Google Patents

分布式***的数据处理方法、***、节点和存储介质 Download PDF

Info

Publication number
WO2023109554A1
WO2023109554A1 PCT/CN2022/136682 CN2022136682W WO2023109554A1 WO 2023109554 A1 WO2023109554 A1 WO 2023109554A1 CN 2022136682 W CN2022136682 W CN 2022136682W WO 2023109554 A1 WO2023109554 A1 WO 2023109554A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
storage node
node
target client
distributed system
Prior art date
Application number
PCT/CN2022/136682
Other languages
English (en)
French (fr)
Inventor
王志超
屠要峰
徐进
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023109554A1 publication Critical patent/WO2023109554A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the technical field of communications, and in particular, to a data processing method, system, management node, and storage medium of a distributed system.
  • Apache Hadoop has become the de facto standard for big data in actual industrial production; because traditional distributed systems bind computing and storage resources, resulting in low resource utilization, limited data throughput, and difficulty in elastic expansion, etc.
  • the cloud platform with its flexible application creation and deployment, portability of distribution on the cloud platform and OS, resource isolation, and high resource utilization, can solve the above-mentioned problems faced by traditional distributed systems.
  • the main purpose of the embodiment of the present application is to propose a data processing method, system, management node and storage medium of a distributed system, which avoids the loss of data locality when the distributed system is deployed on the cloud platform, and is beneficial to reduce the impact on network bandwidth. consumption.
  • an embodiment of the present application provides a data processing method for a distributed system, the distributed system is deployed on a cloud platform, and the method is applied to a management node, including: receiving the distributed system Operation instructions for reading and writing data sent by the target client; wherein, the distributed system includes a plurality of hosts, and each of the hosts is deployed with a client and a storage node; the host of the target client is determined ; Query the first target storage node existing on the host machine of the target client; return the information of the first target storage node to the target client for the target client to store in the first target Nodes read and write data.
  • an embodiment of the present application further provides a data processing system of a distributed system, wherein the distributed system is deployed on a cloud platform, and the data processing system includes: a management node and a distributed system,
  • the distributed system includes a plurality of host computers, each of which is deployed with a client and a storage node; the management node is used to receive the data read and write sent by the target client in the distributed system
  • the operation instruction is to determine the host computer of the target client; the management node is also used to query the first target storage node existing on the host computer of the target client, and return the information of the first target storage node to the The target client; the target client is used to read and write data on the first target storage node.
  • an embodiment of the present application further provides a management node, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the at least one processor An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned data processing method of the distributed system.
  • the embodiment of the present application further provides a computer-readable storage medium storing a computer program, and implementing the above-mentioned data processing method of the distributed system when the computer program is executed by a processor.
  • FIG. 1 is a schematic diagram of a client reading data on a storage node under a cloud platform mentioned in the embodiment of the present application;
  • FIG. 2 is a schematic diagram of a distributed system deployed in a cloud environment mentioned in the embodiment of the present application;
  • Fig. 3 is a schematic flowchart of a data processing method of a distributed system mentioned in the embodiment of the present application;
  • Fig. 4 is the network topology diagram of the datanode mentioned in the embodiment of the present application.
  • Fig. 5 is an implementation flowchart of another distributed system data processing method mentioned in the embodiment of the present application.
  • Fig. 6 is the implementation flowchart of writing data in the distributed storage system mentioned in the embodiment of the present application.
  • FIG. 7 is a schematic diagram of the architecture of writing data in the distributed storage system mentioned in the embodiment of the present application.
  • Fig. 8 is a rack topology network diagram mentioned in the embodiment of the present application.
  • FIG. 9 is a schematic diagram of the processing flow for calculation instructions mentioned in the embodiment of the present application.
  • Fig. 10 is an implementation flowchart of executing computing tasks in the distributed resource management and scheduling system mentioned in the embodiment of the present application;
  • Fig. 11 is a schematic diagram of the architecture of executing computing tasks in the distributed resource management scheduling system mentioned in the embodiment of the present application;
  • Fig. 12 is a schematic structural diagram of the management node mentioned in the embodiment of the present application.
  • Data locality means that Google creatively proposed a computing and storage coupling architecture in order to solve the network bandwidth bottleneck at that time, moving the computing code to where the data is located instead of transferring data to computing nodes. Later, Hadoop also completely copied this architecture. Data localization is a very important feature of the distributed framework, which can be used to ensure the overall performance.
  • the distributed framework can also be understood as a distributed system or a distributed cluster.
  • Figure 1 is an example of a client reading data on a storage node under the cloud platform.
  • the three clients are: Client1 , Client2, and Client3 are respectively encapsulated in three physical hosts by the container instance Pod; physical hosts can also be understood as hosts.
  • physical hosts can also be understood as hosts.
  • Client1 When Client1 reads data, it will preferentially select Node1 on the same physical host, but in a cloud environment, Client1 cannot recognize that Node1 is deployed on the same physical machine, so it will Randomly select storage nodes, such as Node2; this will greatly increase the consumption of network bandwidth, and lose the performance advantage of the native data locality of the distributed framework.
  • a data processing method for distributed systems is provided, which is applied to management nodes.
  • the distributed system Deployed on a cloud platform, the distributed system includes multiple hosts, that is, physical hosts. Clients and storage nodes are deployed in each host. Refer to Figure 1 for a schematic diagram of the distributed system.
  • the number of physical hosts in a distributed system can be greater than or equal to three.
  • each node can be implemented by a container, and a physical host can be understood as a host that carries containers.
  • Each physical host can have multiple containers, such as a storage container for data storage.
  • the computing container that realizes computing functions, and the client is the initiator of sending read and write operation commands and computing operation commands.
  • Figure 2 for a schematic diagram of a distributed system deployed in a cloud environment.
  • the cloud management platform in Figure 2 can be referred to as the cloud platform for short.
  • the cloud platform can be Platform as a service (Platform as a service, PaaS), container orchestration system Kubernetes, or other cloud platforms.
  • the cloud platform is used to manage distributed Service management and scheduling, distributed services can be Hadoop, ZDH, CDH, and provide cloud deployment scripts or tools for the service.
  • the distributed service may be a distributed read-write service or a distributed computing service.
  • the aforementioned Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution.
  • FIG. 3 a schematic flowchart of a data processing method for a distributed system applied to the management nodes can refer to FIG. 3 , including:
  • Step 101 Receive an operation instruction for reading and writing data sent by a target client in the distributed system.
  • Step 102 Determine the host computer of the target client.
  • Step 103 Query the first target storage node existing on the host machine of the target client.
  • Step 104 Return the information of the first target storage node to the target client for the target client to read and write data on the first target storage node.
  • the data processing method of the distributed system deployed on the cloud platform receives the operation instruction of data read and write sent by the target client in the distributed system.
  • the distributed system includes multiple host machines, each A client and a storage node are deployed in the host; determine the host of the target client; query the first target storage node existing on the host of the target client; return the information of the first target storage node to the target client for the target.
  • the client performs data reading and writing on the first target storage node to ensure that the distributed system can preferentially choose to read and write with the storage node of the host computer of the target client during the reading and writing process, that is, the local node is preferentially selected for reading and writing, making full use of Local storage resources avoid the loss of data locality when the distributed system is deployed on the cloud platform, which is beneficial to reduce the consumption of network bandwidth and improve storage performance.
  • the target client can be mapped to the host where it actually resides, that is, the physical host, so as to ensure that the distributed system can preferentially select the storage node on the same physical host as the target client for reading and writing, that is, Prioritize local nodes for reading and writing, make full use of local storage resources, avoid the loss of data locality when the distributed system is deployed on the cloud platform, and help reduce the consumption of network bandwidth and improve storage performance.
  • the target client is the client that sends the operation command in the distributed system
  • the management node receives the data read and write operation command sent by the target client, and the operation command can be a read operation command or a write operation command.
  • the management node may be a namenode, and the namenode may receive a read operation instruction or a write operation instruction sent by a target client.
  • HDFS Hadoop Distributed File System
  • the management node determines the host computer of the target client, that is, the physical host where the target client resides.
  • step 102 is implemented in the following manner: obtain the universally unique identifier (Universally Unique Identifier, UUID) UUID of the target client; wherein, the IP of the host computer of the target client is added to the UUID; the management node resolves the target client's UUID, which identifies the host machine of the target client.
  • UUID is a standard for software construction, and is also a part of the Open Software Foundation in the field of distributed computing environments. Its purpose is to allow all elements in the distributed system to have unique identification information without specifying the identification information through the central control terminal.
  • the client When the client performs local configuration, it can add the IP of the host machine to the UUID in the local configuration, so as to determine the storage node with the same IP address as the host machine of the client when reading and writing data.
  • the host is a physical host, and the physical host has the same IP address as the storage node.
  • the IP address of the physical host is the same as the rack name of the storage node, indicating that the storage node is located in the physical host.
  • the storage node having the same IP address as the host computer of the client may be understood as: the storage node having the same IP address as the physical host where the target client is located.
  • step 102 the implementation of step 102 is as follows: the management node obtains the configuration information of the target client, the configuration information includes the IP address of the host where the target client is located, and the management node can obtain the configuration information of the target client by analyzing the configuration information of the target client. The IP address of the host where the target client is located, so as to determine the host where the target client is located.
  • the management node may query the storage nodes in the distributed system for the first target storage node existing on the host machine of the target client.
  • the storage node where the host computer of the target client exists is the first target storage node.
  • the management node returns the information of the first target storage node to the target client, so that the target client can read and write data on the first target storage node. It can be understood that if the management node receives a write operation command, the target client writes data on the first target storage node; if the management node receives a read operation command, the target client writes data on the first target storage node read data.
  • the management node receives the write operation instruction sent by the target client, determines the first storage node that executes the write operation instruction among the storage nodes in the distributed system, and controls the target client to write the data Enter the first storage node; wherein, the first storage node is a storage node in the physical host where the target client is located. That is, in the process of writing data, the data is preferentially written to the storage node located on the same physical host as the target client. Then, the data is sequentially written to storage nodes in other physical hosts. Finally, after the data is written, the target client returns a message that the data writing is complete to the namenode, and the namenode updates the metadata information to record the operation log.
  • the write operation command is the upload file command submitted by the target client, hdfs dfs-put file1/dst, uploading the local file file1 to the distributed file system HDFS, then in this embodiment, the file is first written to the location of the target client The storage node datanode in the physical host, and then write the file to other datanodes.
  • the management node receives the read operation instruction sent by the target client, can determine the second storage node that stores the data to be read among the storage nodes in the distributed system, and controls the target client to read from the second storage node.
  • the node reads the data to be read. That is, in the process of reading data, the local reading is preferred, and the local reading is reading from the storage node in the physical host where the target client is located.
  • the storage node corresponds to a rack name
  • the rack name is the IP of the host machine where the Pod of the storage node is located
  • the implementation of step 103 querying the first target storage node existing on the host machine of the target client can be as follows : Obtain the rack name corresponding to each storage node in the distributed system; query the first target storage node existing on the host machine of the target client according to the rack name corresponding to each storage node and the IP address of the host machine of the target client. For example, the rack name corresponding to each storage node is compared with the IP of the host machine of the target client, and the storage node having the same rack name as the IP address of the host machine is used as the first target storage node.
  • the namenode can determine the rack name of the rack to which the datanode of each storage node belongs, and the rack name of each storage node is the IP of the physical host where the Pod of the storage node is located.
  • Each physical host is regarded as a rack, and all nodes on the same physical host are mounted on the same rack. When each storage node starts, it can report its own rack name to the management node.
  • step 103 may be implemented by querying the first target storage node existing on the host computer of the target client by: acquiring rack information; wherein, the rack information includes: storage node deployed in each host machine information; query the first target storage node existing on the host computer of the target client according to the rack information.
  • the rack information includes information about the storage nodes deployed in each host machine, that is, the information about the storage nodes deployed on the host machine of the target client can be accurately known through the rack information. Therefore, according to the rack information, the storage nodes deployed in the host machine of the target client among the storage nodes can be accurately determined, that is, the above-mentioned first target storage node can be accurately determined.
  • the above-mentioned acquisition of rack information includes: acquisition of the affiliation reported by each storage node in the distributed system; wherein, the affiliation is the affiliation between each storage node and the host machine of each storage node; according to the affiliation , constructing a network topology of storage nodes; acquiring rack information according to the network topology.
  • the above process can be understood as a rack-aware process.
  • the datanode network topology diagram shown in Figure 4 can be obtained. Among them, D1 and R1 are switches, and the bottom layer is datanode.
  • the parent of H1 is R1, and the parent of R1 is D1.
  • the namenode may determine the above-mentioned first target storage node through the ownership relationship between each storage node and the host machine of each storage node.
  • the rack information after obtaining the rack information according to the network topology, it further includes: obtaining the information monitored by the listener started by each storage node through a preset interface; Start the listener in the corresponding process; update the rack information when it is determined that the rack information is updated according to the monitored information.
  • the preset interface may be a rest interface. After obtaining the rack information according to the network topology, the information monitored by the listener started by each storage node is obtained through the rest interface. When each storage node is started, the The listener is started in the container instance Pod where the process corresponding to each storage node is located; if the location information of the Pod is determined to be updated according to the monitored information, the rack information is updated according to the updated location information of the Pod.
  • a container instance Pod is the smallest unit in a cloud environment, and any required role can be installed on a Pod.
  • a Pod can be a storage node, a computing node, or a client.
  • Kubernetes provides a rest interface for pods that can be used to monitor pod location changes. Use this rest interface to check whether the pod location has changed, which is beneficial to obtain the latest rack information and improve the accuracy of the selected first target storage node. sex.
  • the implementation flowchart of the data processing method of the distributed system can refer to FIG. 5, including:
  • Step 201 Receive an operation instruction for reading and writing data sent by a target client in the distributed system.
  • Step 202 Determine the distance between the target client and each storage node in the distributed system.
  • Step 203 According to the distance, sort the storage nodes from small to large to obtain a sorted list; wherein, the first target storage node is the one ranked first in the sorted list.
  • Step 204 Return the sorted list to the target client, so that the target client can read and write data on each storage node according to the sorted list.
  • each storage node is sorted to obtain a sorted list, so that when reading and writing data, the first target storage node can be read and written first according to the sorted list, and then Writing to or reading from the storage nodes next to the first target storage node in sequence is beneficial to reduce the network overhead during storage and increase the overall throughput of the system.
  • the write data operation instruction after data is written at the first target storage node, data is written at storage nodes after the first target storage node according to the sorted list.
  • the first local target storage node is given priority to read data. If the required data cannot be read on the first target storage node, it can be ranked in the first target storage node according to the above sorting list Read on the subsequent storage nodes, and so on, until the required data is read,
  • the management node may obtain the positional relationship between the target client and each physical host in the distributed system, and obtain the distance between the target client and the storage nodes in each physical host according to the positional relationship.
  • step 203 it can be understood that since the first target storage node and the target client are located on the same physical host, the distance between the first target storage node and the target client is the shortest.
  • step 204 after the management node controls the target client to write the data into the first target storage node, it can control the target client to write the data in order according to the sorted list in each storage after the first target storage node in the sorted list. node. Specifically, after the target client transmits the data to the first target storage node, the first target storage node can write the received data into the local warehouse, and at the same time transmit the received data to the first target storage node (referred to as storage node 1 ) after storage node 2, storage node 2 writes the data from storage node 1 into the local warehouse, and at the same time transmits the received data to storage node 3 after storage node 2, and so on until all storage nodes have written data.
  • storage node 1 the first target storage node
  • the target client submits an upload file instruction (ie, a write operation instruction), hdfs dfs-put file1/dst, and uploads the local file file1 to the distributed file system HDFS .
  • the data to be written that is, the local file file1
  • the data to be written is first written to the datanode located on the same physical host as the target client, and then the local file file1 is written to other datanodes to achieve storage localization.
  • Step 301 The target client sends a file upload instruction to the management node.
  • Figure 7 is a schematic diagram of the architecture of writing data in a distributed storage system.
  • the management node namenode checks the file to be uploaded, determines whether it can be uploaded, and returns the check result to the target client.
  • Step 302 After the target client obtains the check result that the file is allowed to be uploaded, it reads the configuration information of the target client and sends the configuration information to the management node.
  • Step 303 The management node queries the storage node information according to the configuration information of the target client, and checks whether the position of the Pod has changed through the rest interface provided by the cloud platform, and obtains the latest rack information.
  • Step 304 the management node parses the UUID of the target client, obtains the host machine IP, and queries the storage nodes existing on the host machine through rack information.
  • FIG. 8 is a rack topology network diagram in some embodiments.
  • the management node parses the UUID to obtain the information of the host 1, that is, the physical host 1, and then obtains the storage node H2 of the same host, and similarly obtains the storage nodes H5 and H8 on the other two hosts; H5 and H8 sort according to the distance from the target client to obtain a sorted list, and return the sorted list to the target client.
  • Step 305 The target client writes the files into each storage node according to the order in the sorted list.
  • the target client can decompose the file into several data blocks for transmission. Before starting to transmit the data blocks, the data blocks will be cached locally. When the cache size exceeds the size of a data block, the target client will match the The first storage node (H2) establishes a connection and starts streaming data transmission. H2 will receive a small part of the data and write it to the local warehouse. At the same time, it will transmit the data to the second storage node (H5). H5 also receives a small part of the data and writes it to the local warehouse, and transmits it to the third storage node (H8), and so on.
  • H2 will receive a small part of the data and write it to the local warehouse.
  • H5 also receives a small part of the data and writes it to the local warehouse, and transmits it to the third storage node (H8), and so on.
  • Step 306 When it is confirmed that the transmission of a data block is completed, the management node updates the metadata information and records the operation log.
  • the data block is transmitted from storage node 1 to storage node 2 and then to storage node 3, storage node 3 returns the written message to storage node 2 after writing the data block, and then storage node 2 sends the written message Return to storage node 1, and storage node 1 returns the written message to the management node namenode.
  • the target client can send a message that the data block transfer is complete to the namenode, and then the namenode will update the metadata information to record the operation log.
  • the cloud platform can provide a real-time monitoring service for the rack; each storage node and computing node will start a listener in the Pod where the process is located when starting, so as to obtain the current state of the rack and query it through the rest interface Host information; when the storage node starts, it will build the network topology of the node.
  • Each physical host is regarded as a rack, and all nodes on the same physical host are mounted on the same rack; the big data component client in the local configuration
  • the host IP is added to the UUID to compare whether the host IP of the client and the server are consistent when reading and writing data; when the Pod hangs up and starts successfully, it may migrate to other hosts, so it will be notified every time the Pod restarts
  • the cloud platform updates the rack information; when the rack information is determined, the management node will sort the storage nodes according to the node distance to ensure that the storage nodes with the same host are preferentially selected for reading and writing data.
  • computing nodes are also deployed in each physical host, and the data processing method of the distributed system further includes: receiving a computing instruction from a target client, determining the target computing node among computing nodes in the distributed system, and The target computing node is controlled to execute the computing instruction; wherein, the target computing node is located on the same physical host as the storage node where the data required for executing the computing instruction is stored. That is, computing nodes that execute computing instructions are allocated as close as possible to the data required for computing, which helps reduce network overhead during computing.
  • the management node may be a resource management node.
  • FIG. 9 For a schematic diagram of the processing flow of calculation instructions, refer to FIG. 9 , including:
  • Step 401 Receive a computing instruction from a target client.
  • Step 402 Query a second target storage node that stores data required for executing a computing instruction.
  • Step 403 Determine the host machine of the second target storage node.
  • Step 404 Query the target computing node existing on the host machine of the second target storage node.
  • Step 405 Return the information of the target computing node to the target client, so that the target client can perform data calculation on the target computing node.
  • the computing nodes that execute computing instructions are allocated as close as possible to the data required for computing, which helps reduce network overhead during computing.
  • the resource management node can determine the data required to execute the calculation instruction according to the calculation instruction, and then determine the storage node in the distributed system that stores the data required to execute the calculation instruction.
  • the queried storage node can be called Second target storage node.
  • the resource management node may select, among the physical hosts in the distributed system, the physical host where the storage node storing the data needed to execute the computing instruction resides, that is, determine the host of the second target storage node. For example, the resource management node can obtain the rack name corresponding to the second target storage node; determine the host computer IP of the second target storage node according to the rack name corresponding to the second target storage node, thereby determining the second target storage node host machine. The rack name corresponding to the second target storage node is the same as the IP address of the host machine of the second target storage node.
  • the resource management node can obtain the affiliation relationship between each physical host and each computing node from the api server of Kubernetes, and the affiliation relationship is which physical host each computing node belongs to. Therefore, the target computing node that exists on the host machine of the second target storage node can be queried according to the affiliation relationship. That is, the resource management node can determine the computing node that belongs to the host computer of the second target storage node among the computing nodes in the distributed system according to the host computer and the affiliation relationship of the second target storage node, and assign the computing node that belongs to the second target storage node. The computing node of the host machine of the target storage node is used as the target computing node.
  • computing nodes also have rack names corresponding to them.
  • the rack names of computing nodes are the IP addresses of the hosts where the Pods of the computing nodes are located.
  • the resource management node can obtain the rack names of each computing node. According to the The rack name of the second target storage node and the IP address of the host machine of the second target storage node are queried for the target computing node existing on the host machine of the second target storage node.
  • step 405 the resource management node returns the information of the target computing node to the target client for the target client to perform data calculation on the target computing node, so that the target computing node can read the The data required for calculation is beneficial to reduce the network overhead during calculation.
  • the management node is a resource management node, and each computing node in the distributed system registers its own IP address at the resource management node when starting, so that the resource management node can identify each computing node, and it is convenient for each computing node
  • the management is beneficial to accurately select the target computing node that has the same IP address as the storage node that stores the data needed to execute the computing instruction.
  • the target client submits yarn computing tasks (that is, the above-mentioned computing instructions), and executes the data required by the computing instructions (hereinafter referred to as the source files) are stored on the distributed file system HDFS.
  • the resource management node assigns the calculation instruction to the computing node with the same IP address as the storage node where the source file is located, so as to realize the localization of calculation.
  • the computing nodes in each physical host can register their own IP addresses in the resource management node resourceManager when starting up, and the resourceManager obtains the affiliation relationship between the physical host and the computing node from the Kubernetes interface service api server.
  • Computing nodes that can realize local computing are obtained by combining the affiliation relationship.
  • Step 501 The target client submits an application program to the resource management node, including the ApplicationMaster program of the application manager, the command to start the ApplicationMaster, the user program, etc.
  • the submitted application program can be understood as a submitted computing task.
  • ApplicationMaster is one of the core components in YARN to coordinate the processes executed by applications in the cluster.
  • Figure 11 is an implementation architecture diagram for executing computing tasks in a distributed resource management and scheduling system.
  • Step 502 The resource management node allocates the first container Container for the application program, and controls the corresponding computing node NodeManager in YARN to start the ApplicationMaster of the application program in the Container.
  • the corresponding NodeManager is one of the core components in YARN.
  • Step 503 ApplicationMaster applies to the resource management node for a computing node to perform computing tasks.
  • Step 504 The resource management node determines the physical host where the source file is located according to the IP address of the storage node where the source file is located, and determines the docker ip of the target computing node under the physical host where the source file is located, and returns it to the application manager ApplicationMaster.
  • the resourceManager obtains the affiliation relationship between the physical host and the computing node from the api server of Kubernetes, and combines the attribution relationship to obtain the docker ip of the target computing node.
  • Step 505 After the ApplicationMaster receives the docker ip of the target computing node, it initiates a computing task to the target computing node.
  • the target computing nodes may include: computing nodes deployed in the host machine 1-3.
  • Step 506 The target computing node executes the computing task after receiving the computing task.
  • Step 507 After the application program finishes running, the ApplicationMaster logs off from the resource management node and closes itself.
  • the node performs calculation; this embodiment obtains the location relationship between the target client and each physical host to ensure that the data is preferentially written to the storage node of the machine during storage, and the calculation node is allocated as close as possible to the data during calculation to reduce storage/ Network overhead during computation, increasing the overall throughput of the system.
  • the data processing method of the distributed system provided in this embodiment is universal, and is generally applicable to the deployment of most big data distributed products in a cloud environment.
  • the hit rate of local data in the cloud environment is reduced by 20% to 30% compared with the physical machine environment (this value will be slightly different for different cluster environments). different), after adopting the data processing method of the distributed system of this embodiment, the hit rate of local data in the cloud environment and the physical machine environment can be basically equal, which reasonably solves the problem of data locality in the cloud environment.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • the embodiment of the present application also provides a data processing system of a distributed system.
  • the distributed system is deployed on a cloud platform.
  • the data processing system includes: a management node and a distributed system.
  • the distributed system includes multiple host machines, each The client and storage nodes are deployed in the host; the management node is used to receive the data read and write operation instructions sent by the target client in the distributed system, and determine the host of the target client; the management node is also used to query the target client
  • the first target storage node existing on the host computer at the end returns the information of the first target storage node to the target client; the target client is used for reading and writing data on the first target storage node.
  • this embodiment is a system embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment.
  • the relevant technical details and technical effects mentioned in the foregoing method embodiments are still valid in this implementation manner, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied to the above method embodiments.
  • the embodiment of the present application also provides a management node, as shown in FIG. 12 , including: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; Instructions executed by the processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the above-mentioned data processing method of the distributed system.
  • the memory 602 and the processor 601 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 601 and various circuits of the memory 602 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 601 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 601 .
  • Processor 601 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions. And the memory 602 may be used to store data used by the processor 601 when performing operations.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式***的数据处理方法、***、管理节点和存储介质,涉及通信技术领域,上述数据处理方法包括:接收所述分布式***中的目标客户端发送的数据读写的操作指令(101);其中,所述分布式***包括多个宿主机,每个所述宿主机中部署有客户端和存储节点;确定所述目标客户端的宿主机(102);查询所述目标客户端的宿主机上存在的第一目标存储节点(103);将所述第一目标存储节点的信息返回至所述目标客户端,以供所述目标客户端在所述第一目标存储节点进行数据读写(104)。

Description

分布式***的数据处理方法、***、节点和存储介质
相关申请
本申请要求于2021年12月14日申请的、申请号为202111531957.5的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,特别涉及一种分布式***的数据处理方法、***、管理节点和存储介质。
背景技术
随着移动互联网、物联网、5G、智能终端以及人工智能高速发展,数据正在迅速膨胀,大数据在实际生产中的应用也越来越广泛;各种各样的大数据框架地位越来越重要,比如Apache Hadoop就已经成为实际工业生产中大数据的事实标准;由于传统的分布式***是将计算和存储资源绑定的,导致资源利用率低、数据吞吐量受限以及不易于弹性扩容等问题;而云化平台,其灵活的应用创建与部署、在云平台与OS上分发的可移植性、资源隔离及资源利用率高等特性,能够解决传统的分布式***面临的上述问题。
但是,分布式***部署在云化平台上也会面临新的问题,其中最主要的问题就是失去了数据本地性,这样则会增加对网络带宽的消耗。
发明内容
本申请实施例的主要目的在于提出一种分布式***的数据处理方法、***、管理节点和存储介质,避免了分布式***部署在云化平台上失去数据本地性,有利于减小对网络带宽的消耗。
为至少实现上述目的,本申请实施例提供了一种分布式***的数据处理方法,所述分布式***部署在云化平台中,所述方法应用于管理节点,包括:接收所述分布式***中的目标客户端发送的数据读写的操作指令;其中,所述分布式***包括多个宿主机,每个所述宿主机中部署有客户端和存储节点;确定所述目标客户端的宿主机;查询所述目标客户端的宿主机上存在的第一目标存储节点;将所述第一目标存储节点的信息返回至所述目标客户端,以供所述目标客户端在所述第一目标存储节点进行数据读写。
为实现上述目的,本申请实施例还提供了一种分布式***的数据处理***,其中,所述分布式***部署在云化平台中,所述数据处理***包括:管理节点和分布式***,所述分布式***包括多个宿主机,每个所述宿主机中部署有客户端和存储节点;所述管理节点,用于接收所述分布式***中的目标客户端发送的数据读写的操作指令,确定所述目标客户端的宿主机;所述管理节点,还用于查询所述目标客户端的宿主机上存在的第一目标存储节点,将所述第一目标存储节点的信息返回至所述目标客户端;所述目标客户端,用于在所述第一目标存储节点进行数据读写。
为实现上述目的,本申请实施例还提供了一种管理节点,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的分布式***的数据处理方法。
为至少实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的分布式***的数据处理方法。
附图说明
图1是本申请实施例中提到的在云化平台下客户端读取存储节点上的数据的示意图;
图2是本申请实施例中提到的云化环境下部署的分布式***的示意图;
图3是本申请实施例中提到的一种分布式***的数据处理方法的流程示意图;
图4是本申请实施例中提到的datanode的网络拓扑图;
图5是本申请实施例中提到的另一种分布式***的数据处理方法的实施流程图;
图6是本申请实施例中提到的在分布式存储***中写入数据的实施流程图;
图7是本申请实施例中提到的在分布式存储***中写入数据的架构示意图;
图8是本申请实施例中提到的机架拓扑网络图;
图9是本申请实施例中提到的对于计算指令的处理流程的示意图;
图10是本申请实施例中提到的在分布式资源管理调度***中执行计算任务的实施流程图;
图11是本申请实施例中提到的在分布式资源管理调度***中执行计算任务的架构示意图;
图12是本申请实施例中提到的管理节点的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
为便于对本申请实施例的理解,下面首先对本申请中所涉及的相关技术进行简要说明:
数据本地性是Google为了解决当时网络带宽瓶颈,创造性地提出计算和存储耦合的架构,将计算的代码移动到数据所在的地方,而不是将数据传输到计算节点。后来Hadoop也是完全照搬了这个架构,数据本地化是分布式框架一个非常重要的特性,可以用来保证整体性能。本实施例中,分布式框架也可以理解为分布式***或分布式集群。
参阅图1,图1是在云化平台下客户端读取存储节点上的数据的示例,分布式***中共有3个存储节点,分别为Node1、Node2、Node3,3个客户端分别为:Client1、Client2、Client3 由容器实例Pod分别封装于3台物理主机;物理主机也可以理解为宿主机。对于传统的直接部署在物理主机集群上的分布式框架,当Client1读取数据时,会优先选择同物理主机的Node1,但是在云化环境下Client1无法识别Node1与其部署在同一物理机,因此会随机选择存储节点,比如Node2;这样则会大大增加网络带宽的消耗,失去了分布式框架原生数据本地性的性能优势。
随着网络技术的进步,网络的性能发生了巨大的变化,从之前主流1Gb到100Gb,增长了100倍,而同时期的硬盘驱动器(Hard Disk Drive,HDD)的性能基本没有太大变化,单盘容量增大了很多。大数据的资源瓶颈已经从网络带宽转为磁盘IO。这使得网络带宽不再成为性能瓶颈,所以失去了数据本地性对性能下降也有限,但原先走本地的数据改为走网络传输,增加了对带宽的消耗,所以性能依然会有大幅下降,网络带宽如果成为瓶颈,会导致计算和存储资源利用不充分。
本申请实施例中,为了解决云化平台下部署的分布式***,失去数据本地性会导致性能下降的问题,提供了一种分布式***的数据处理方法,应用于管理节点,该分布式***部署在云化平台中,分布式***包括多个宿主机即物理主机,每个宿主机中部署有客户端和存储节点,该分布式***的示意图可以参考图1。
分布式***中的物理主机的数量可以大于或等于3。在云化环境下每个节点可以通过容器实现,则物理主机可以理解为承载容器的宿主机,每台物理主机上可以有多个容器,比如用于实现数据的存储功能的存储容器,用于实现计算功能的计算容器,客户端即为发送读写操作指令及计算操作指令的发起者。云化环境下部署的分布式***的示意图可以参考图2。
图2中的云化管理平台可以简称为云化平台,云化平台可以为平台即服务(Platform as a service,PaaS)、容器编排***Kubernetes或者其他云化平台,云化平台用于对分布式服务进行管理和调度,分布式服务可以为Hadoop、ZDH、CDH,同时提供该服务的云化部署脚本或工具。分布式服务可以为分布式读写服务或是分布式计算服务。上述的Hadoop是一个由Apache基金会所开发的分布式***基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。
在一个实施例中,多个物理主机中存在1个或2个物理主机中部署有管理节点,应用于该管理节点的分布式***的数据处理方法的流程示意图可以参考图3,包括:
步骤101:接收分布式***中的目标客户端发送的数据读写的操作指令。
步骤102:确定目标客户端的宿主机。
步骤103:查询目标客户端的宿主机上存在的第一目标存储节点。
步骤104:将第一目标存储节点的信息返回至目标客户端,以供目标客户端在第一目标存储节点进行数据读写。
本申请实施例提供的部署在云化平台中的分布式***的数据处理方法,接收分布式***中的目标客户端发送的数据读写的操作指令,分布式***包括多个宿主机,每个宿主机中部署有客户端和存储节点;确定目标客户端的宿主机;查询目标客户端的宿主机上存在的第一目标存储节点;将第一目标存储节点的信息返回至目标客户端,以供目标客户端在第一目标存储节点进行数据读写,以保证分布式***在读写过程中可以优先选择与目标客户端的宿主机的存储节点进行读写,即优先选择本地节点进行读写,充分利用本地存储资源,避免了分布式***部署在云化平台上失去数据本地性,有利于减小对网络带宽的消耗,提升存储性能。
本实施例中,目标客户端可以映射到其实际所在的宿主机即物理主机,以保证分布式***在读写过程中可以优先选择与目标客户端处于同一物理主机的存储节点进行读写,即优先选择本地节点进行读写,充分利用本地存储资源,避免了分布式***部署在云化平台上失去数据本地性,有利于减小对网络带宽的消耗,提升存储性能。
在步骤101中,目标客户端为分布式***中发送操作指令的客户端,管理节点接收目标客户端发送的数据读写的操作指令,该操作指令可以为读操作指令也可以为写操作指令。
在一个实施例中,对于分布式存储***(Hadoop Distributed File System,HDFS),管理节点可以为名字节点namenode,namenode可以接收目标客户端发送的读操作指令或写操作指令。
在步骤102中,管理节点确定目标客户端的宿主机,即目标客户端所在的物理主机。
在一个实施例中,步骤102的实现方式为:获取目标客户端的通用唯一识别码(Universally Unique Identifier,UUID)UUID;其中,UUID中添加有目标客户端的宿主机的IP;管理节点解析目标客户端的UUID,确定目标客户端的宿主机。目标客户端在进行配置时,可以将其所在的宿主机的IP添加在UUID中。其中,UUID是一种软件建构的标准,亦为开放软件基金会组织在分布式计算环境领域的一部分。其目的是让分布式***中的所有元素,都能有唯一的辨识信息,而不需要通过中央控制端来做辨识信息的指定。
客户端在进行本地配置时,可以在本地配置中的UUID中加入宿主机的IP,以便在读写数据时,确定与客户端的宿主机具有相同IP地址的存储节点。宿主机即物理主机,物理主机与存储节点具有相同的IP地址,物理主机的IP地址和存储节点的机架名称相同,说明该存储节点位于该物理主机内。本实施例中,与客户端的宿主机具有相同IP地址的存储节点可以理解为:与目标客户端所在的物理主机具有相同IP地址的存储节点。
在一个实施例中,步骤102的实现方式为:管理节点获取目标客户端的配置信息,配置信息中包括该目标客户端所在的宿主机的IP地址,管理节点通过解析目标客户端的配置信息,可以得到目标客户端所在的宿主机的IP地址,从而确定目标客户端所在的宿主机。
在步骤103中,管理节点可以在分布式***中的各个存储节点中查询目标客户端的宿主机上存在的第一目标存储节点。其中,目标客户端的宿主机存在的存储节点即为第一目标存储节点。
在步骤104中,管理节点将第一目标存储节点的信息返回至目标客户端,以供目标客户端在第一目标存储节点进行数据读写。可以理解的是,如果管理节点接收到的为写操作指令,则目标客户端在第一目标存储节点写数据,如果管理节点接收到的为读操作指令,则目标客户端在第一目标存储节点读数据。
在一个实施例中,管理节点接收到目标客户端发送的写操作指令,在分布式***的各存储节点中确定第一个执行写操作指令的第一存储节点,并控制目标客户端将数据写入第一存储节点;其中,第一存储节点为目标客户端所在的物理主机中的存储节点。即,在写数据的过程中,优先将数据写入与目标客户端位于同一物理主机的存储节点中。然后,再将数据依次写入其他物理主机中的存储节点。最终,写完数据后,目标客户端向namenode返回数据写入完成的消息,由namenode更新元数据信息记录操作日志。
比如,写操作指令为目标客户端提交的上传文件指令,hdfs dfs-put file1/dst,将本地文件file1上传至分布式文件***HDFS上,则本实施例中优先将文件写入目标客户端所在的物理主机中的存储节点datanode,再将文件写入其他的datanode。
在一个实施例中,管理节点接收到目标客户端发送的读操作指令,可以在分布式***的各存储节点中确定存储有待读取数据的第二存储节点,并控制目标客户端从第二存储节点读取所述待读取数据数据。即,在读数据的过程中,优先在本地读取,在本地读取即在目标客户端所在的物理主机中的存储节点读取。
在一个实施例中,存储节点对应有机架名称,机架名称为存储节点的Pod所在的宿主机的IP;步骤103查询目标客户端的宿主机上存在的第一目标存储节点的实现方式可以为:获取分布式***中的各存储节点对应的机架名称;根据各存储节点对应的机架名称和目标客户端的宿主机的IP,查询目标客户端的宿主机上存在的第一目标存储节点。比如,将各存储节点对应的机架名称和目标客户端的宿主机的IP进行比较,将具有与宿主机的IP相同的机架名称的存储节点作为第一目标存储节点。本实施例中,namenode可以确定每一个存储节点datanode所属的机架的机架名称,每个存储节点的机架名称即为该存储节点的Pod所在的物理主机的IP。每个物理主机作为一个机架,同一物理主机上的所有节点挂载在同一个机架上。各个存储节点在启动时,可以将各自的机架名称上报给管理节点。
在一个实施例中,步骤103查询目标客户端的宿主机上存在的第一目标存储节点的实现方式可以为:获取机架信息;其中,机架信息包括:每个宿主机中部署的存储节点的信息;根据机架信息,查询目标客户端的宿主机上存在的第一目标存储节点。
本实施例中,由于机架信息包括每个宿主机中部署的存储节点的信息,即通过机架信息可以准确得知目标客户端的宿主机上部署的存储节点的信息。从而,根据机架信息可以准确的确定各存储节点中部署在目标客户端的宿主机中的存储节点,即可以准确的确定上述的第一目标存储节点。
在一个实施例中,上述的获取机架信息包括:获取分布式***中的各存储节点上报的归属关系;其中,归属关系为各存储节点与各存储节点的宿主机的归属关系;根据归属关系,构建存储节点的网络拓扑;根据所述网络拓扑,获取机架信息。上述过程可以理解为机架感知的过程,基于机架感知管理节点namenode可以得到如图4所示的datanode网络拓扑图。其中,D1、R1是交换机,最底层是datanode。H1的parent是R1,R1的parent是D1。namenode可以通过各存储节点与各存储节点的宿主机的归属关系,确定上述的第一目标存储节点。
在一个实施例中,在根据网络拓扑,获取机架信息之后,还包括:通过预设接口获取各存储节点启动的***监听到的信息;其中,各存储节点在启动时,在各存储节点对应的进程内启动***;在根据监听到的信息确定机架信息更新的情况下,更新机架信息。
在一个实施例中,预设接口可以为rest接口,则在根据网络拓扑,获取机架信息之后,通过rest接口获取各存储节点启动的***监听到的信息,各存储节点在启动时,在各存储节点对应的进程所在的容器实例Pod内启动***;在根据监听到的信息确定Pod的位置信息更新的情况下,根据更新后的Pod的位置信息,更新机架信息。其中,容器实例Pod为云化环境下的最小单位,Pod可以安装任何需要的角色,比如,Pod可以为存储节点,也可以为 计算节点,还可以为客户端。
由于,容器编排***Kubernetes中的Pod与所在物理主机的关系非永久固定,可能在Pod销毁重新创造后拥有同样的Pod IP,但落在了不同的物理主机上,因此本实施例中可以实时监控Pod的变化情况,并更新整体的机架信息。Kubernetes对于Pod提供了rest接口可以用来监控Pod的位置变化,利用该rest接口查看pod的位置是否有变动,从而有利于能够获得最新的机架信息,以提高选择的第一目标存储节点的准确性。
在一个实施例中,分布式***的数据处理方法的实施流程图可以参阅图5,包括:
步骤201:接收分布式***中的目标客户端发送的数据读写的操作指令。
步骤202:确定目标客户端与分布式***中各存储节点之间的距离。
步骤203:根据距离,从小到大对各存储节点进行排序,得到排序列表;其中,排序列表中排在首位的为第一目标存储节点。
步骤204:将排序列表返回至目标客户端,以供目标客户端按照排序列表在各存储节点进行数据读写。
本实施例中,通过根据各存储节点与目标客端之间的距离,对各存储节点进行排序得到排序列表,使得读写数据时可以根据排序列表优先在第一目标存储节点进行读写,再就近依次写入或读取排在第一目标存储节点之后的存储节点,有利于降低存储时的网络开销,增加***的整体吞吐量。对于写数据操作指令,在第一目标存储节点写数据后,根据排序列表再在第一目标存储节点之后的存储节点写数据。对于读数据操作指令,优先在本地的第一目标存储节点读取数据,如果在第一目标存储节点上读取不到需要的数据,可以按照上述的排序列表再在排在第一目标存储节点之后的存储节点上读取,依次类推,直到读取到需要的数据,
在步骤202中,管理节点可以获取目标客户端与分布式***中的各个物理主机之间的位置关系,根据该位置关系得到目标客户端与各个物理主机中的存储节点的距离。
在步骤203中,可以理解的是,由于第一目标存储节点与目标客户端位于同一物理主机,则第一目标存储节点与目标客户端之间的距离最近。
在步骤204中,管理节点可以在控制目标客户端将数据写入第一目标存储节点之后,控制目标客户端按照排序列表,将数据依次写入排序列表中的第一目标存储节点之后的各存储节点。具体的,目标客户端在将数据传输至第一目标存储节点之后,第一目标存储节点可以将接收的数据写入本地仓库,同时将接收的数据传输至第一目标存储节点(简称存储节点1)之后的存储节点2,存储节点2将来自存储节点1的数据写入本地仓库,同时将接收的数据传输至存储节点2之后的存储节点3,依次类推,直到所有存储节点都写入数据。
在一个实施例中,以分布式存储***HDFS为例,目标客户端提交一条上传文件指令(即写操作指令),hdfs dfs-put file1/dst,将本地文件file1上传至分布式文件***HDFS上。本实施例中,优先将待写入的数据即本地文件file1写在与目标客户端位于同一物理主机的datanode,再将本地文件file1写入到其他datanode节点,以实现存储本地化。在分布式存储***中写入数据的实施流程图参阅图6,包括:
步骤301:目标客户端向管理节点发送上传文件指令。
参阅图7,图7为在分布式存储***中写入数据的架构示意图,管理节点namenode对要 上传的文件进行检查,判断是否可以上传,并向目标客户端返回是否可以上传的检查结果。
步骤302:目标客户端得到允许上传文件的检查结果后,读取目标客户端的配置信息,并将配置信息发送给管理节点。
步骤303:管理节点根据目标客户端的的配置信息,查询存储节点信息,并通过云化平台提供的rest接口查看Pod的位置是否有变动,获取最新的机架信息。
步骤304:管理节点解析目标客户端的UUID,获取宿主机IP,通过机架信息来查询该宿主机上存在的存储节点。
参阅图8,图8为一些实施例中的机架拓扑网络图。管理节点解析UUID获得了宿主机1即物理主机1的信息,进而得到了同宿主机的存储节点H2,同理还会获得其他两个宿主机上的存储节点H5、存储节点H8;然后将H2、H5、H8按照和目标客户端的距离大小,进行排序得到排序列表,并将该排序列表返回给目标客户端。
步骤305:目标客户端按照排序列表中的顺序,将文件写入各存储节点。
目标客户端可以将文件分解为若干个数据块进行传输,在开始传输数据块之前会把数据块缓存在本地,当缓存大小超过了一个数据块的大小,目标客户端就会和排序列表中的第一个存储节点(H2)建立连接开始流式的传输数据,H2会一小部分一小部分的接收数据然后写入本地仓库,同时会把这些数据传输到第二个存储节点(H5),H5也同样一小部分一小部分的接收数据并写入本地仓库,同时传输给第三个存储节点(H8),依次类推。
步骤306:当确认一个数据块传输完成,管理节点更新元数据信息记录操作日志。
参阅图7,数据块从存储节点1传输到存储节点2再传输到存储节点3,存储节点3写完数据块将写完的消息返回至存储节点2,再由存储节点2将写完的消息返回给存储节点1,存储节点1将写完的消息返回给管理节点namenode。这样逐级调用和返回之后,待这个数据块传输完成,目标客户端可以向namenode发送数据块传输完成的消息,这时候namenode才会更新元数据信息记录操作日志。
第一个数据块传输完成后,会使用同样的方式传输其余的数据块,直到整个文件上传完成。
在一个实施例中,云化平台可以提供对机架的实时监控服务;每个存储节点和计算节点在启动时会在进程所在Pod内启动***,以便通过rest接口获取机架当前状态及查询宿主机信息;存储节点启动时会构建节点的网络拓扑,每个物理主机作为一个机架,同一物理主机上的所有节点挂载在同一个机架上;大数据组件客户端在本地配置中的UUID中加入宿主机IP,以便在读写数据时比对客户端和服务端的宿主机IP是否一致;当Pod挂掉再成功启动后,可能会迁移到其他宿主机,因此每次Pod重启都会通知云化平台更新机架信息;当确定了机架信息后,管理节点会按照节点距离将存储节点进行排序,保证优先选择同宿主机的存储节点进行读写数据。
在一个实施例中,每个物理主机中还部署有计算节点,分布式***的数据处理方法还包括:接收目标客户端的计算指令,在分布式***中的各计算节点中确定目标计算节点,并控制目标计算节点执行计算指令;其中,目标计算节点与执行计算指令所需的数据所存储的存储节点位于同一物理主机。即执行计算指令的计算节点尽可能分配在离计算所需数据最近的地方,有利于降低计算时的网络开销。
在一个实施例中,管理节点可以为资源管理节点,对于计算指令的处理流程的示意图可以参阅图9,包括:
步骤401:接收目标客户端的计算指令。
步骤402:查询存储有执行计算指令所需的数据的第二目标存储节点。
步骤403:确定第二目标存储节点的宿主机。
步骤404:查询第二目标存储节点的宿主机上存在的目标计算节点。
步骤405:将目标计算节点的信息返回至目标客户端,以供目标客户端在目标计算节点进行数据计算。
本实施例中,执行计算指令的计算节点尽可能分配在离计算所需数据最近的地方,有利于降低计算时的网络开销。
在步骤402中,资源管理节点可以根据计算指令确定执行计算指令所需的数据,然后在分布式***中确定存储有执行计算指令所需的数据的存储节点,查询到的该存储节点可以称为第二目标存储节点。
在步骤403中,资源管理节点可以在分布式******的各物理主机中选择出存储有执行计算指令所需的数据的存储节点所在的物理主机,即确定第二目标存储节点的宿主机。比如,资源管理节点可以获取第二目标存储节点对应的机架名称;根据第二目标存储节点对应的机架名称,确定第二目标存储节点的宿主机的IP,从而确定第二目标存储节点的宿主机。第二目标存储节点对应的机架名称和第二目标存储节点的宿主机的IP相同。
在步骤404中,资源管理节点可以从Kubernetes的api server获取各物理主机与各计算节点之间的归属关系,该归属关系即为各计算节点分别归属于哪一个物理主机。从而,可以根据归属关系查询到第二目标存储节点的宿主机上存在的目标计算节点。即资源管理节点可以根据第二目标存储节点的宿主机和归属关系,在分布式***中的各计算节点中确定归属于第二目标存储节点的宿主机的计算节点,并将该归属于第二目标存储节点的宿主机的计算节点作为目标计算节点。
在一个实施例中,计算节点也对应有机架名称,计算节点的机架名称为计算节点的Pod所在的宿主机的IP,资源管理节点可以获取各计算节点的机架名称,根据各计算节点的机架名称和第二目标存储节点的宿主机的IP,查询第二目标存储节点的宿主机上存在的目标计算节点。
在步骤405中,资源管理节点将目标计算节点的信息返回至目标客户端,以供目标客户端在目标计算节点进行数据计算,使得目标计算节点在进行数据计算时,可以从本机读取到计算所需的数据,有利于降低计算时的网络开销。
在一个实施例中,管理节点为资源管理节点,分布式***中的各计算节点在启动时均在资源管理节点登记自己的IP地址,使得资源管理节点可以识别各计算节点,便于对各计算节点的管理,有利于准确的选择出和存储有执行计算指令所需的数据的存储节点具有相同IP地址的目标计算节点。
在一个实施例中,以分布式资源管理调度***(Yet Another Resource Negotiator,YARN)为例,目标客户端提交yarn计算任务(即上述的计算指令),执行计算指令所需的数据(以 下简称源文件)存储在分布式文件***HDFS上。资源管理节点根据源文件所在的存储节点的IP地址,将计算指令分配给和源文件所在的存储节点的IP地址同IP地址的计算节点执行,以实现计算本地化。
为了更好的实现计算本地化,各物理主机中的计算节点在启动时可以在资源管理节点resourceManager登记自己的IP地址,同时resourceManager从Kubernetes的接口服务api server获取物理主机和计算节点的归属关系,结合归属关系得到能够实现本地计算的计算节点。在分布式资源管理调度***中执行计算任务的实施流程图参阅图10,包括:
步骤501:目标客户端向资源管理节点提交应用程序;其中包括应用管理器ApplicationMaster程序、启动ApplicationMaster的命令、用户程序等,提交的应用程序可以理解为提交的计算任务。ApplicationMaster为协调集群中应用程序执行的进程,是YARN中的核心组件之一。
同时还可以参阅图11,图11为在分布式资源管理调度***中执行计算任务的实施架构图,
步骤502:资源管理节点为应用程序分配第一个容器Container,并控制对应的YARN中的计算节点NodeManager在Container中启动应用程序的ApplicationMaster。其中,对应的NodeManager是YARN中的核心组件之一。
步骤503:ApplicationMaster向资源管理节点申请一个计算节点执行计算任务。
步骤504:资源管理节点根据源文件所在存储节点的IP地址,确定源文件所在的物理主机,并确定源文件所在的物理主机下的目标计算节点的docker ip,返回给应用管理器ApplicationMaster。其中,resourceManager从Kubernetes的api server获取物理主机和计算节点的归属关系,结合该归属关系得到目标计算节点的docker ip。
步骤505:ApplicationMaster收到目标计算节点的docker ip后,向目标计算节点发起计算任务。通过图11可以看出,目标计算节点可以包括:宿主机1-3中部署的计算节点。
步骤506:目标计算节点接收到计算任务后执行计算任务。
步骤507:应用程序运行完成后,ApplicationMaster向资源管理节点注销并关闭自己。
本实施例中,可以解决分布式***在云化平台部署下导致原生存储及读写性能下降的问题。在云化环境下部署hadoop服务,不同于一般的部署情况,存储计算组件的各个角色(即存储节点和计算节点)均分布在不同的Pod中,具有不同的IP地址,相互之间认为彼此是不同的节点,因此在分布式存储***选取存储节点进行读写时,无法选择到优先的与客户端在同一IP的存储节点,分布式计算***在选取计算节点时,也无法选择到就近的计算节点进行计算;本实施例通过获取目标客户端与各物理主机的位置关系,保证存储时数据优先写入本机的存储节点,计算时计算节点尽可能分配在离数据最近的地方以降低存储/计算时的网络开销,增加***的整体吞吐量。本实施例中提供的分布式***的数据处理方法具有通用性,且普遍适用于大部分的大数据分布式产品在云化环境下部署的情况。
在未采用本实施例的分布式***的数据处理方法的情况下,云化环境下的本地数据命中率相比物理机环境下降了20%~30%(针对不同的集群环境此值会稍有不同),采用本实施例的分布式***的数据处理方法后,云化环境与物理机环境下本地数据命中率可达到基本持平,合理地解决了云化环境下数据本地性的问题。
需要说明的是,本申请实施例中的上述各示例均为为方便理解进行的举例说明,并不对本申请的技术方案构成限定。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请实施例还提供了一种分布式***的数据处理***,分布式***部署在云化平台中,数据处理***包括:管理节点和分布式***,分布式***包括多个宿主机,每个宿主机中部署有客户端和存储节点;管理节点,用于接收分布式***中的目标客户端发送的数据读写的操作指令,确定目标客户端的宿主机;管理节点,还用于查询目标客户端的宿主机上存在的第一目标存储节点,将第一目标存储节点的信息返回至目标客户端;目标客户端,用于在第一目标存储节点进行数据读写。
不难发现,本实施例为与上述方法实施例相对应的***实施例,本实施例可与上述方法实施例互相配合实施。上述方法实施例中提到的相关技术细节和技术效果在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述方法实施例中。
本申请实施例还提供了一种管理节点,如图12所示,包括:至少一个处理器601;以及,与至少一个处理器601通信连接的存储器602;其中,存储器602存储有可被至少一个处理器601执行的指令,指令被所述至少一个处理器601执行,以使至少一个处理器601能够执行上述的分布式***的数据处理方法。
其中,存储器602和处理器601采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器601和存储器602的各种电路连接在一起。总线还可以将诸如***设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器601处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器601。
处理器601负责管理总线和通常的处理,还可以提供各种功能,包括定时,***接口,电压调节、电源管理以及其他控制功能。而存储器602可以被用于存储处理器601在执行操作时所使用的数据。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码 的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (12)

  1. 一种分布式***的数据处理方法,其中,所述分布式***部署在云化平台中,所述方法应用于管理节点,包括:
    接收所述分布式***中的目标客户端发送的数据读写的操作指令;其中,所述分布式***包括多个宿主机,每个所述宿主机中部署有客户端和存储节点;
    确定所述目标客户端的宿主机;
    查询所述目标客户端的宿主机上存在的第一目标存储节点;
    将所述第一目标存储节点的信息返回至所述目标客户端,以供所述目标客户端在所述第一目标存储节点进行数据读写。
  2. 根据权利要求1所述的分布式***的数据处理方法,其中,每个所述宿主机作为一个机架,所述查询所述目标客户端的宿主机上存在的第一目标存储节点,包括:
    获取机架信息;其中,所述机架信息包括:每个所述宿主机中部署的存储节点的信息;
    根据所述机架信息,查询所述目标客户端的宿主机上存在的第一目标存储节点。
  3. 根据权利要求2所述的分布式***的数据处理方法,其中,所述获取机架信息包括:
    获取所述分布式***中的各存储节点上报的归属关系;其中,所述归属关系为所述各存储节点与所述各存储节点的宿主机的归属关系;
    根据所述归属关系,构建存储节点的网络拓扑;
    根据所述网络拓扑,获取机架信息。
  4. 根据权利要求3所述的分布式***的数据处理方法,其中,在所述根据所述网络拓扑,获取机架信息之后,还包括:
    通过预设接口获取所述各存储节点启动的***监听到的信息;其中,所述各存储节点在启动时,在所述各存储节点对应的进程内启动***;
    在根据所述监听到的信息确定所述机架信息更新的情况下,更新所述机架信息。
  5. 根据权利要求2所述的分布式***的数据处理方法,其中,所述确定所述目标客户端的宿主机,包括:
    获取所述目标客户端的通用唯一识别码UUID;其中,所述UUID中添加有所述目标客户端的宿主机的IP;
    解析所述目标客户端的UUID,确定所述目标客户端的宿主机。
  6. 根据权利要求1所述的分布式***的数据处理方法,其中,所述存储节点对应有机架名称,所述机架名称为所述存储节点的Pod所在的宿主机的IP;
    所述查询所述目标客户端的宿主机上存在的第一目标存储节点,包括:
    获取所述分布式***中的各存储节点对应的机架名称;
    根据所述各存储节点对应的机架名称和所述目标客户端的宿主机的IP,查询所述目标客 户端的宿主机上存在的第一目标存储节点。
  7. 根据权利要求1所述的分布式***的数据处理方法,其中,在所述接收所述分布式***中的目标客户端发送的数据读写的操作指令之后,所述方法还包括:
    确定所述目标客户端与所述分布式***中各存储节点之间的距离;
    根据所述距离,从小到大对所述各存储节点进行排序,得到排序列表;其中,所述排序列表中排在首位的为所述第一目标存储节点;
    所述将所述第一目标存储节点的信息返回至所述目标客户端,以供所述目标客户端在所述第一目标存储节点进行数据读写,包括:
    将所述排序列表返回至所述目标客户端,以供所述目标客户端按照所述排序列表在所述各存储节点进行数据读写。
  8. 根据权利要求1所述的分布式***的数据处理方法,其中,每个所述宿主机中还部署有计算节点,所述方法还包括:
    接收所述目标客户端的计算指令;
    查询存储有执行所述计算指令所需的数据的第二目标存储节点;
    确定所述第二目标存储节点的宿主机;
    查询所述第二目标存储节点的宿主机上存在的目标计算节点;
    将所述目标计算节点的信息返回至所述目标客户端,以供所述目标客户端在所述目标计算节点进行数据计算。
  9. 根据权利要求8所述的分布式***的数据处理方法,其中,所述管理节点为资源管理节点,所述分布式***中的各计算节点在启动时均在所述资源管理节点登记自己的IP地址。
  10. 一种分布式***的数据处理***,其中,所述分布式***部署在云化平台中,所述数据处理***包括:管理节点和分布式***,所述分布式***包括多个宿主机,每个所述宿主机中部署有客户端和存储节点;
    所述管理节点,设置为接收所述分布式***中的目标客户端发送的数据读写的操作指令,确定所述目标客户端的宿主机;
    所述管理节点,还设置为查询所述目标客户端的宿主机上存在的第一目标存储节点,将所述第一目标存储节点的信息返回至所述目标客户端;
    所述目标客户端,设置为在所述第一目标存储节点进行数据读写。
  11. 一种管理节点,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至9中任一所述的分布式***的数据处理方法。
  12. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至9中任一所述的分布式***的数据处理方法。
PCT/CN2022/136682 2021-12-14 2022-12-05 分布式***的数据处理方法、***、节点和存储介质 WO2023109554A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111531957.5 2021-12-14
CN202111531957.5A CN116301561A (zh) 2021-12-14 2021-12-14 分布式***的数据处理方法、***、节点和存储介质

Publications (1)

Publication Number Publication Date
WO2023109554A1 true WO2023109554A1 (zh) 2023-06-22

Family

ID=86774800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136682 WO2023109554A1 (zh) 2021-12-14 2022-12-05 分布式***的数据处理方法、***、节点和存储介质

Country Status (2)

Country Link
CN (1) CN116301561A (zh)
WO (1) WO2023109554A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455577A (zh) * 2013-08-23 2013-12-18 中国科学院计算机网络信息中心 云主机镜像文件的多备份就近存储和读取方法及***
US20150052214A1 (en) * 2011-12-28 2015-02-19 Beijing Qihoo Technology Company Limited Distributed system and data operation method thereof
CN104580437A (zh) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 一种云存储客户端及其高效数据访问方法
CN106302607A (zh) * 2015-06-05 2017-01-04 腾讯科技(深圳)有限公司 应用于云计算的块存储***及方法
CN109254958A (zh) * 2018-10-18 2019-01-22 上海云轴信息科技有限公司 分布式数据读写方法、设备及***
CN110198346A (zh) * 2019-05-06 2019-09-03 北京三快在线科技有限公司 数据读取方法、装置、电子设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150052214A1 (en) * 2011-12-28 2015-02-19 Beijing Qihoo Technology Company Limited Distributed system and data operation method thereof
CN103455577A (zh) * 2013-08-23 2013-12-18 中国科学院计算机网络信息中心 云主机镜像文件的多备份就近存储和读取方法及***
CN104580437A (zh) * 2014-12-30 2015-04-29 创新科存储技术(深圳)有限公司 一种云存储客户端及其高效数据访问方法
CN106302607A (zh) * 2015-06-05 2017-01-04 腾讯科技(深圳)有限公司 应用于云计算的块存储***及方法
CN109254958A (zh) * 2018-10-18 2019-01-22 上海云轴信息科技有限公司 分布式数据读写方法、设备及***
CN110198346A (zh) * 2019-05-06 2019-09-03 北京三快在线科技有限公司 数据读取方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN116301561A (zh) 2023-06-23

Similar Documents

Publication Publication Date Title
US11687555B2 (en) Conditional master election in distributed databases
US9906598B1 (en) Distributed data storage controller
US8918392B1 (en) Data storage mapping and management
US11411885B2 (en) Network-accessible data volume modification
US20120150930A1 (en) Cloud storage and method for managing the same
US8935203B1 (en) Environment-sensitive distributed data management
US20160275123A1 (en) Pipeline execution of multiple map-reduce jobs
CN115039391A (zh) 用于提供边缘计算服务的方法和装置
US20170244596A1 (en) Configuration Information Management Method, Device, Network Element Management System and Storage Medium
US10037298B2 (en) Network-accessible data volume modification
KR102567565B1 (ko) 연합학습 자원 관리장치, 시스템 및 이의 자원 효율화 방법
CN109327332B (zh) 一种Ceph云存储下基于LIO的iSCSI GateWay高可用实现方法
US11431798B2 (en) Data storage system
WO2015123225A1 (en) Aggregating memory to create a network addressible storage volume for storing virtual machine files
US10545667B1 (en) Dynamic data partitioning for stateless request routing
CN113746641B (zh) 一种基于分布式存储的odx协议处理方法
US11194500B2 (en) Resilient implementation of client file operations and replication
JP2017503422A (ja) ネットワーク要素データアクセス方法および装置、およびネットワーク管理システム
CN106933654B (zh) 一种基于缓存的虚拟机启动方法
CN117407159A (zh) 内存空间的管理方法及装置、设备、存储介质
WO2023109554A1 (zh) 分布式***的数据处理方法、***、节点和存储介质
CN109992447B (zh) 数据复制方法、装置及存储介质
CN114006910B (zh) 信息同步的方法及装置
EP3479236B1 (en) Network-accessible data volume modification
CN112970009A (zh) 在应用程序编排中复制存储表示的***和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906304

Country of ref document: EP

Kind code of ref document: A1