WO2015090245A1 - 文件传输方法、装置及分布式集群文件*** - Google Patents

文件传输方法、装置及分布式集群文件*** Download PDF

Info

Publication number
WO2015090245A1
WO2015090245A1 PCT/CN2015/072980 CN2015072980W WO2015090245A1 WO 2015090245 A1 WO2015090245 A1 WO 2015090245A1 CN 2015072980 W CN2015072980 W CN 2015072980W WO 2015090245 A1 WO2015090245 A1 WO 2015090245A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
file
transmission path
push
cluster
Prior art date
Application number
PCT/CN2015/072980
Other languages
English (en)
French (fr)
Inventor
张磊
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US15/105,657 priority Critical patent/US9917884B2/en
Publication of WO2015090245A1 publication Critical patent/WO2015090245A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/403Bus networks with centralised control, e.g. polling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route

Definitions

  • the present invention relates to the field of information technology, and in particular, to a file transmission method and apparatus, and a distributed cluster file system.
  • the distributed cluster file system may include a database, an interface machine, a dispatcher, and a plurality of destination nodes; wherein, the database is used for recording and displaying file information, and the interface machine is used for storing task data that needs to be delivered, and the dispatcher is used for Scheduling and managing tasks within a distributed cluster file system, the destination node is used to store files.
  • One of the important functions of a distributed cluster file system is to quickly transfer files to the destination node.
  • the dispatcher detects whether the file information to be transmitted is saved in the database. When it is detected that the file to be transmitted is stored in the database, the dispatcher dispatches the interface according to the file information to be transmitted. File transfer between the machine and multiple destination nodes.
  • the interface machine Since the existing transmission process is that multiple destination nodes request files from the interface machine, and the interface machine is one or a few devices, and its load capacity is limited, the interface machine is under pressure when transmitting files to multiple destination nodes. Larger, easy to reach the bottleneck of device disk input / output (IO, Input / Output). This not only has a large impact on other programs running on this machine, Moreover, the transmission of the entire distributed cluster file system is affected. Moreover, if the database for maintaining the information of the file to be transmitted is not stable, when the database host fails, the dispatcher will not be able to schedule the transmission of the file, thereby causing the entire system to be in a state of paralysis.
  • IO Input / Output
  • the embodiment of the invention provides a file transmission method and device and a distributed cluster file system.
  • the technical solution is as follows:
  • the embodiment of the invention provides a distributed cluster file system, including:
  • the distributed coordination node cluster includes a plurality of coordination nodes, the plurality of coordination nodes share information, and the distributed coordination node cluster is used to create file information to be transmitted;
  • the file storage node cluster includes a plurality of storage nodes, and the file storage node cluster is configured to store a file to be transmitted;
  • Pushing a cluster of nodes the push node cluster comprising a main push node and at least one slave push node, the master push node configured to determine the at least one slave according to the file information to be transmitted and at least one heartbeat information from the push node
  • the file to be transmitted is saved in the push node, and the heartbeat information carries at least the file information saved by the push node, and the file to be transmitted is extracted from the cluster of file storage nodes by the at least one push node;
  • the primary push node is further configured to generate a file transmission path, where the file transmission path includes at least one destination node cluster;
  • the primary push node is further configured to send the file transmission path to the at least one slave push node, so that the at least one slave push node sends the file to be transmitted to the at least one destination according to the file transmission path.
  • Node cluster
  • each destination node cluster includes multiple destination nodes point.
  • Another embodiment of the present invention provides a file transmission method, where the method includes:
  • the main push node obtains the file information to be transmitted in the distributed coordination node cluster
  • the main push node determines, according to the to-be-transmitted file information and at least one heartbeat information from the push node, that the at least one slave push node stores a file to be transmitted, and the heartbeat information carries at least the file information saved by the push node. Declaring the transfer file as the at least one pull node from the file storage node cluster;
  • the main push node generates a file transmission path, where the file transmission path includes at least one destination node cluster;
  • Each of the destination node clusters includes multiple destination nodes.
  • Another embodiment of the present invention provides a file transmission method, where the method includes:
  • the destination node receives the data packet and the file transmission path of the file to be transmitted;
  • the destination node saves the data packet of the file to be transmitted in a memory
  • the destination node sends the data packet of the file to be transmitted to the next destination node according to the file transmission path.
  • Another embodiment of the present invention provides a file transfer apparatus, where the apparatus includes:
  • a file information obtaining module configured to obtain file information to be transmitted in the distributed coordination node cluster
  • the to-be-transmitted file determining module is configured to determine, according to the to-be-transmitted file information created by the distributed coordination node cluster and the at least one heartbeat information from the push node, that the at least one slave push node stores the file to be transmitted, where the heartbeat information is at least Carrying file information saved from the push node, the file to be transferred is the at least one slave push node from the file storage section Pull in the cluster;
  • a path generation module configured to generate a file transmission path, where the file transmission path includes at least one destination node cluster
  • each destination node cluster includes a plurality of destination nodes.
  • Another embodiment of the present invention provides a file transfer apparatus, where the apparatus includes:
  • a receiving module configured to receive a data packet and a file transmission path of the file to be transmitted
  • a saving module configured to save the data packet of the file to be transferred in a memory
  • a sending module configured to send the data packet of the file to be transmitted to the next destination node according to the file transmission path.
  • the distributed coordination node cluster is used to maintain the file information to be transmitted, and the push node cluster realizes the specific execution process of the file transmission path generation and transmission according to the file information to be transmitted, and the main push node is responsible for generating the file transmission.
  • the path is obtained by the plurality of slave nodes from the file storage node cluster to be transmitted, and then multiplexed to be sent to the plurality of destination node clusters.
  • FIG. 1 is a schematic structural diagram of a distributed cluster file system according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a file transmission method according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for electing a main push node according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of file transmission according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a file transmission method according to another embodiment of the present invention.
  • FIG. 6 is a flowchart of a file transmission method according to another embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a file transmission apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a file transmission apparatus according to another embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a file transmission apparatus according to another embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a distributed cluster file system according to an embodiment of the present invention.
  • the system includes a distributed coordination node cluster 101, a file storage node cluster 102, a push node cluster 103, and at least one destination node cluster 104.
  • the distributed coordination node cluster 101 includes a plurality of coordination nodes, and the plurality of coordination nodes share information, and the distributed coordination node cluster 101 is used to create file information to be transmitted.
  • the file storage node cluster 102 includes a plurality of storage nodes, and the file storage node cluster 102 is configured to store files to be transferred.
  • the push node cluster 103 includes a main push node 103a and at least one slave push node 103a for determining the at least one slave push node 103b according to the file information to be transmitted and at least one heartbeat information from the push node 103b. Saved to be passed And transmitting the file, the heartbeat information carrying the at least one file information saved from the push node 103b, the file to be transferred being pulled from the file storage node cluster 102 by the at least one slave push node 103b.
  • the primary push node 103a is also operative to generate a file transfer path that includes at least one destination node cluster 104.
  • the main push node 103a is further configured to send the file transmission path to the at least one slave push node 103b, such that the at least one slave push node 103b sends the file to be transmitted to the at least one destination node cluster 104 according to the file transmission path.
  • Each of the at least one destination node cluster 104 includes a plurality of destination nodes.
  • system further includes a database 105 for displaying status information such as the progress of the file to be transmitted.
  • the main push node 103a can determine the current file transfer path by one of the following ways or any combination.
  • the main push node 103a is configured to acquire the generated file transmission path, and delete the generated file transmission path when the generated file transmission path includes the first destination node cluster.
  • the first destination node cluster uses the deleted file transmission path as a current file transmission path, and the first destination node cluster is a destination node cluster in which the primary push node 103a does not receive its heartbeat information within a preset period.
  • the main push node 103a is configured to acquire the generated file transmission path, and when the generated file transmission path includes the second destination node cluster, from the generated file transmission path.
  • the second destination node cluster is deleted, and the deleted file transmission path is used as the current file transmission path, and the second destination node cluster is the destination node cluster currently performing the transmission task.
  • the main push node 103a is configured to be based on the heart of the destination node Jumping information, obtaining a current file transmission path, where the file transmission path includes a third destination node cluster, and the third destination node is a destination node cluster in an idle state.
  • the primary push node 103a is configured to obtain a current file transmission path according to a data request for the file to be transmitted, and the at least one destination node cluster in the file transmission path includes the purpose of sending the data request. Node cluster.
  • the heartbeat information of the at least one slave push node 103b further carries the node state information of the slave push node.
  • the main push node 103a is configured to acquire a slave push node in an idle state from the at least one slave push node 103b according to the node state information of the at least one slave push node 103b;
  • the main push node 103a transmits the file transfer path to the slave push node in the idle state.
  • the main push node 103a is elected by a plurality of push nodes according to the temporary node serial number, and the temporary node serial number is allocated by the distributed coordination system.
  • the at least one slave push node 103b is configured to detect the to-be-transmitted file information created by the distributed coordination node cluster 101, and determine, according to the to-be-transmitted file information, that the slave push node does not include the to-be-transmitted When the file is filed, the file to be transferred is pulled from the file storage node cluster 102.
  • the system further includes a database 105.
  • the main push node 103a is further configured to receive progress information of file transmissions sent by the plurality of destination nodes in the file transmission path, and write the progress information of the file transmission into the database 105.
  • the destination node is configured to: when receiving the data packet of the file to be transmitted, send the received data packet to the next destination node in the file transmission path.
  • the destination node is configured to receive the number of files to be transmitted. According to the packet, the received data packet is transmitted to any of the other destination nodes belonging to the same destination node cluster as the destination node.
  • the method process includes the following steps:
  • Step 201 The push client pushes the file to be transferred to the file storage node cluster.
  • the step 201 is specifically: when the push client obtains a certain file, the file can be pushed to the file storage node cluster for storage as needed. At this time, the push client determines the file as a file to be transferred, and pushes the file to be transferred to the file storage node cluster.
  • file storage node clusters such as HDFS (Hadoop File System, Hadoop File System)
  • this file can be stored in HDFS.
  • Step 202 The push client sends the file information to be transmitted to the distributed coordination node cluster.
  • the push client can send the file information to the distributed coordination node cluster when sending the file to the file storage node cluster.
  • the step 202 may be specifically: the sending client sends the saved address of the file to be transmitted in the file storage node cluster, the size of the file to be transmitted, the identifier of the file to be transmitted, and the time of receiving the file to be transmitted, to the distributed The node cluster is coordinated, and when the distributed coordination node cluster receives the information, the file information to be transmitted is created.
  • Step 203 Send the heartbeat information of the slave push node from the push node to the master push node in the push node cluster.
  • the heartbeat information from the push node may include: saved file information, node state information, and the like.
  • the node status information may include the information that the slave node has succeeded, the task being executed, the failed task, and information about whether it is in an idle state or the like.
  • push nodes are generally used to push files to the destination node. Therefore, in order to avoid conflicts between push nodes, push node sets may be used.
  • the main push node is selected in the group, and the main push node is configured to collect heartbeat information of each slave push node to obtain file information, current node status, and the like saved by each push node.
  • the main push node may also obtain heartbeat information of each destination node cluster or destination node to learn the node status of each destination node cluster. Further, in order to keep the data in the file system consistent, the main push node can also schedule and manage file transfers in the file system.
  • the process of pushing the node cluster to elect the main push node may include: the distributed coordination system assigns a temporary node serial number to each push node in the push node cluster, and the plurality of push nodes perform election according to the temporary node serial number to determine the main push node.
  • the distributed coordination node cluster may allocate a unique temporary node serial number to the multiple push nodes.
  • each of the plurality of push nodes registers in the distributed coordination node cluster, so that each of the plurality of push nodes acts as a temporary node and is distributed and coordinated
  • the node cluster is assigned a temporary node serial number.
  • the push node obtains its own temporary node serial number, it compares its own temporary node serial number with the temporary node serial numbers of other multiple push nodes to determine the temporary node sequence. Whether the number is the smallest.
  • the push node determines that its own temporary node serial number is the smallest among the temporary node serial numbers of the plurality of push nodes, the push node is determined to be the primary push node.
  • the coordination node cluster sends notification information to each push node.
  • the election is performed again according to the temporary node serial number, and the main push node is re-determined. Where notification information is used for notification
  • the push node performs election based on the temporary node serial number.
  • FIG. 3 is a flowchart of a method for electing a main push node according to an embodiment of the present invention. Referring to Figure 3, the method includes the following steps:
  • Step 301 The distributed coordination node cluster uses each of the plurality of push nodes as a temporary node, and assigns a serial number to the temporary node.
  • Step 302 For any one of the plurality of push nodes, the push node determines whether it includes a push node that is smaller than its own temporary node serial number. When the push node having a smaller serial number than the temporary node is included, step 303 is performed; otherwise, step 304 is performed.
  • Step 303 the push node is used as a slave push node, and detects the service state of the push node smaller than the own temporary node serial number in real time.
  • step 302 is performed.
  • step 304 the push node is used as the main push node.
  • Step 204 The main push node obtains file information to be transmitted in the distributed coordination node cluster.
  • the primary push node determines whether the file information to be transmitted is included in the distributed coordination node cluster by detecting file information saved in the distributed coordination node cluster.
  • the primary push node acquires the file information to be transmitted.
  • the file information to be transmitted includes: a storage address of the file to be transmitted in the file storage node cluster, a size of the file to be transmitted, a file identifier to be transmitted, and a time when the file to be transmitted is received.
  • the file identifier to be transmitted may be the name of the file to be transmitted, or may be a hash (HASH) value of the file to be transmitted, such as a message digest algorithm (MD5), or may be a user.
  • HASH hash
  • MD5 message digest algorithm
  • the embodiment of the present invention does not limit how to represent the file identifier to be transmitted.
  • the primary push node may also request the primary push node to transmit the target node at the destination node.
  • the main push node obtains the file information saved in the distributed coordination node cluster according to the file identifier to be transmitted carried in the request sent by the destination node.
  • the primary push node obtains the file information to be transmitted in the distributed coordination node cluster.
  • Step 205 The main push node determines, according to the to-be-transmitted file information and at least one heartbeat information from the push node, that the at least one slave push node stores the file to be transmitted.
  • the heartbeat information carries the at least one file information saved from the push node, and the file to be transmitted is pulled from the file storage node cluster by the at least one push node.
  • the master push node searches whether the heartbeat information uploaded from the push node includes the to-be-transmitted file information according to the file information to be transmitted, and the master push node determines the heartbeat information uploaded from the push node.
  • the main push node determines that the file to be transmitted is saved in the slave push node.
  • the main push node compares the MD5 value in the obtained file information to be transmitted with the MD5 value included in the saved file information sent from the push node, and determines the slave node from the push node. Whether the MD5 value of the file to be transferred is included in the saved file information.
  • the main push node determines to save the file to be transferred from the push node.
  • the distributed coordination node cluster is also detected from the push node to know whether there is new file information to be transmitted.
  • the slave node determines that there is a new file to be transmitted on the distributed coordination node cluster according to the file information of the node and the file information of the distributed coordination node cluster, the file is stored in the cluster of file storage nodes according to the file information to be transmitted. Obtain the file to be transferred.
  • Step 206 The main push node generates a file transmission path.
  • the main push node can obtain the file transmission path by any of the following methods:
  • the main push node acquires the generated file transmission path, and when the generated file transmission path includes the first destination node cluster, deleting the first destination from the generated file transmission path
  • the node cluster uses the deleted file transmission path as the current file transmission path
  • the first destination node cluster is a destination node cluster in which the primary push node does not receive its heartbeat information within a preset period.
  • the main push node can directly transfer the file to be transmitted through the generated file transfer path. Specifically, the main push node obtains the generated file transmission path, and determines whether the file transmission path is received in the preset period according to the heartbeat information sent by the cluster of each destination node in the file transmission path in a preset period. Heartbeat information of all destination node clusters. If the heartbeat information of all the destination node clusters in the file transmission path is received within the preset period, the main push node obtains the file transmission path as the current file transmission path, if the file transmission path is not received within the preset period. For the heartbeat information of the cluster of the destination node, the primary push node deletes the target node cluster from the generated file path, and deletes the file transmission path of the destination node cluster as the current file transmission path.
  • the main push node acquires the generated file transmission path, and when the generated file transmission path includes the second destination node cluster, deleting the second destination from the generated file transmission path
  • the node cluster uses the deleted file transmission path as the current file transmission path, and the second destination node cluster is the destination node cluster currently performing the transmission task.
  • the primary push node may transfer the target node cluster that is performing the task from the generated file. Deleted in the path.
  • the main push node obtains the current file transfer according to the heartbeat information of the destination node.
  • a path of the destination, the file transmission path includes a third destination node cluster, and the third destination node is a destination node cluster in an idle state.
  • the main push node determines the destination node cluster in an idle state according to the node state information in the received heartbeat information of the destination node cluster, and the main push node generates a file transmission path according to the destination node cluster in the idle state.
  • the main push node acquires a current file transmission path according to the data request for the file to be transmitted, and the at least one destination node cluster in the file transmission path includes a destination node cluster that sends the data request.
  • the main push node may generate a target node including the destination node according to the target node of the file to be transmitted to the primary push node. File transfer path.
  • file transmission path generation method may be superimposed. For example, for a file transmission path, if at least two of the above four conditions are included, a destination node or a destination node cluster may be deleted or added according to the actual situation, so that file transmission can be flexibly performed.
  • the at least one file to be transmitted from the push node is taken as an example for description.
  • the at least one slave push node may have only one or more slave transfer nodes stored in the push node, and in this case, only the slave push that has saved the file to be transferred is saved.
  • the node generates a file transfer path. Since the physical addresses of the respective slave push nodes are different, the starting point of the file transfer path is different for different slave push nodes.
  • different file transmission paths may be set for different slave push nodes, that is, different slave push nodes are used to send files to be transmitted to different destination node clusters.
  • the main push node acquires a slave push node in an idle state from the at least one slave push node according to the node state information of the at least one slave push node.
  • each slave push node sends node state information to the master push node every preset period.
  • the primary push node receives the node status information, it determines whether the push node is in an idle state according to the node status information.
  • the node status information can be used to determine the slave push node that is currently in an idle state.
  • Step 208 The primary push node sends the file transmission path to the slave push node in the idle state.
  • the file transmission path has a one-to-one correspondence with the push node.
  • Step 209 When the file transmission path is received from the push node, the data packet of the file to be transmitted is sequentially sent to the destination node according to the file transmission path.
  • the obtained file to be transmitted is split into a plurality of data packets from the push node, and the slave push node sequentially sends the data packet of the file to be transmitted to the first destination in the file transmission path according to the file transmission path. node.
  • Step 210 For any destination node in the file transmission path, when the destination node receives the data packet of the file to be transmitted, save the data packet in the memory, and send the data packet to the destination in the file transmission path. The next destination node of the node.
  • the destination node when the data packet of the file to be transmitted is sent from the push node to the first destination node in the file transmission path, the destination node saves the data packet in the memory, and sends the data packet to the file transmission.
  • the second destination node in the path When the second destination node receives the data packet, saves the data packet and sends the data packet to the third destination node until a destination node receives the data packet and determines that it is the last node on the file transmission path. The time is over.
  • FIG. 4 shows a schematic diagram of file transmission provided by an embodiment of the present invention.
  • the main push node sends a file transmission path to any slave slave node that holds the file to be transmitted and is in an idle state, and the slave push node sends the data packet of the file to be transmitted according to the file transmission path to the destination node cluster.
  • the destination node A when the destination node A receives the data packet, the data packet is stored in the memory, and the data packet is sent to the next destination node B of the destination node A in the file transmission path, and the destination node B saves the packet in memory and sends the packet to the last destination node C in the file transfer path.
  • the file to be transferred will be saved in each destination node in the file transfer path.
  • the push node cluster and the destination node cluster are deployed in the same equipment room, and each of the ten destination nodes serves as a destination node cluster, and the unit for each push node is tasked.
  • a destination node cluster that is, a file transmission path of each push node includes only one destination node cluster. For example, a data file of 200 MB is transmitted, and the task is delivered from the push client to the file is completely delivered to the destination node cluster. , only 20s.
  • Step 211 The destination node in the file transmission path sends the progress information of the file transmission to the main push node.
  • the destination node In order to determine whether the file transfer process is successful and which destination nodes successfully receive the file to be transmitted, the destination node needs to feed back the progress of receiving the file to be transmitted to the main push node. Specifically, when the preset period arrives, the destination node sends the progress information of the data packet that itself receives the file to be transmitted to the primary push node, so that the primary push node determines that the primary push node is to be transmitted. The progress of the file.
  • Step 212 When the main push node receives the progress information of the file transfer, the progress information of the file transfer is written into the database.
  • Step 213 When the database obtains the progress information of the file transmission, the progress information of the file transmission is displayed.
  • the primary push node may also write the heartbeat information of the push node and the heartbeat information of the destination node into the database.
  • the database obtains the heartbeat information of the push node and the heartbeat information of the destination node
  • the database displays the From the heartbeat information of the push node and the heartbeat information of the destination node, the file information saved from the push node, the service state of the node and the file information received by the destination node, the file information of the failed reception, the task information being executed, and the task information being executed are displayed to the user. Wait.
  • the distributed coordination node cluster is used to maintain the file information to be transmitted, and the push node cluster implements the specific execution process of the file transmission path generation and transmission according to the file information to be transmitted, and is performed by the main push node.
  • a file transmission path is generated, and a plurality of slave nodes are obtained from the file storage node cluster to be transmitted, and then multiplexed to be sent to multiple destination node clusters.
  • FIG. 5 is a flowchart of a file transmission method according to another embodiment of the present invention, where the method includes the following steps:
  • Step 501 The primary push node obtains a file to be transmitted in the distributed coordination node cluster. interest.
  • Step 502 The main push node determines, according to the to-be-transmitted file information and at least one heartbeat information from the push node, that the at least one slave push node stores a file to be transmitted, where the heartbeat information carries the at least one file information saved from the push node.
  • the file to be transmitted is the at least one pull from the file storage node cluster from the push node.
  • Step 503 The main push node generates a file transmission path, where the file transmission path includes at least one destination node cluster.
  • Step 504 The primary push node sends the file transmission path to the at least one slave push node, so that the at least one slave push node sends the file to be transmitted to the at least one destination node cluster according to the file transmission path.
  • Each of the destination node clusters includes multiple destination nodes.
  • the distributed coordination node cluster is used to maintain the file information to be transmitted, and the push node cluster implements the specific execution process of the file transmission path generation and transmission according to the file information to be transmitted, and is performed by the main push node.
  • a file transmission path is generated, and a plurality of slave nodes are obtained from the file storage node cluster to be transmitted, and then multiplexed to be sent to multiple destination node clusters.
  • the primary push node may generate a file transmission path by using one of the following ways or any combination:
  • the primary push node obtains the generated file transmission path, and when the generated file transmission path includes the first destination node cluster, deleting the first file from the generated file transmission path.
  • the destination node cluster uses the deleted file transmission path as the current file transmission path, and the first destination node cluster is the primary push node in the preset period.
  • the main push node acquires the generated file transmission path, and when the generated file transmission path includes the second destination node cluster, the first file node is deleted from the generated file transmission path.
  • the two-destination node cluster uses the deleted file transmission path as the current file transmission path, and the second destination node cluster is the destination node cluster that is currently performing the transmission task.
  • the primary push node obtains a current file transmission path according to the heartbeat information of the destination node, where the file transmission path includes a third destination node cluster, and the third destination node is a destination node cluster in an idle state. .
  • the primary push node obtains a current file transmission path according to the data request for the file to be transmitted, and the at least one destination node cluster in the file transmission path includes a target node cluster that sends the data request.
  • the heartbeat information of the slave push node further carries the node state information of the slave push node.
  • Sending, by the primary push node, the file transmission path to the at least one slave push node includes:
  • the main push node acquires a slave push node in an idle state from the at least one slave push node according to the node state information of the at least one slave push node;
  • the primary push node sends the file transfer path to the slave push node in the idle state.
  • the primary push node is elected by multiple push nodes according to the temporary node serial number, and the temporary node serial number is allocated by the distributed coordination system.
  • the to-be-transmitted file is obtained by the slave push node from the file storage node cluster when it determines that the file to be transmitted is not saved.
  • the method further includes:
  • the main push node receives progress information of file transmissions sent by the plurality of destination nodes in the file transmission path; the main push node writes the progress information of the file transfer into the database.
  • FIG. 6 is a flowchart of a file transmission method according to another embodiment of the present invention, where the method includes the following steps:
  • Step 601 The destination node receives a data packet and a file transmission path of the file to be transmitted.
  • Step 602 The destination node saves the data packet of the file to be transmitted in a memory.
  • Step 603 The destination node sends the data packet of the file to be transmitted to the next destination node according to the file transmission path.
  • the destination node receives and saves the data packet of the file to be transmitted, and sends the data packet of the file to be transmitted to other destination nodes according to the file transmission path.
  • each destination node in the file transmission path transmits the data packet at almost the same time, so that the load from the push node and each destination node is low, occupying the system.
  • the resources are small, do not affect the programs running on them, and greatly increase the speed of data transfer.
  • FIG. 7 is a schematic structural diagram of a file transmission apparatus according to an embodiment of the present invention.
  • the apparatus includes: a file information obtaining module 701, a file to be transmitted determining module 702, a path generating module 703, and a path sending module 704.
  • the file information obtaining module 701 is configured to obtain file information to be transmitted in the cluster of the distributed coordination node; the file information obtaining module 701 is connected to the file determining module 702 to be transmitted.
  • the to-be-transmitted file determining module 702 is configured to determine, according to the file information to be transmitted created by the distributed coordination node cluster and the at least one heartbeat information from the push node, that the at least one slave node saves the file to be transmitted.
  • the heartbeat information carries the at least one file information saved from the push node, and the file to be transmitted is pulled from the file storage node cluster by the at least one push node.
  • the file to be transmitted determination module 702 is connected to the path generation module 703. a path generation module 703, configured to generate a file transmission path, where the file transmission path includes at least one destination node cluster;
  • the module 703 is connected to the path sending module 704.
  • the path sending module 704 is configured to send the file transmission path to the at least one slave push node, so that the at least one slave push node sends the file to be transmitted to the at least one destination node cluster according to the file transmission path.
  • Each of the destination node clusters includes multiple destination nodes.
  • the path generation module 703 is configured to obtain the generated file transmission path, and delete the generated file transmission path when the generated file transmission path includes the first destination node cluster.
  • the first destination node cluster uses the deleted file transmission path as the current file transmission path, and the first destination node cluster is a destination node cluster that does not receive its heartbeat information within a preset period.
  • the path generation module 703 is configured to obtain the generated file transmission path, and when the generated file transmission path includes the second destination node cluster, from the generated file transmission path.
  • the second destination node cluster is deleted, and the deleted file transmission path is used as the current file transmission path, and the second destination node cluster is the destination node cluster currently performing the transmission task.
  • the path generation module 703 is configured to acquire a current file transmission path according to the heartbeat information of the destination node, where the file transmission path includes a third destination node cluster, and the third destination node is in an idle state. Destination node cluster.
  • the path generating module 703 is configured to obtain a current file transmission path according to the data request for the file to be transmitted, where the at least one destination node cluster includes the purpose of sending the data request. Node cluster.
  • the path sending module 704 is configured to: when the heartbeat information of the at least one slave push node further carries the node state information of the slave push node, according to the node state information of the at least one slave push node, At least one slave push node that acquires an idle state from the push node transmits the file transfer path to the slave push node that is in an idle state.
  • the primary push node is elected by multiple push nodes according to the temporary node serial number, and the temporary node serial number is allocated by the distributed coordination system.
  • the to-be-transmitted file is obtained by the slave push node from the file storage node cluster when it determines that the file to be transmitted is not saved.
  • the device further includes:
  • the writing module is configured to receive progress information of file transmissions sent by the plurality of destination nodes in the file transmission path, and write the progress information of the file transmission into the database.
  • the apparatus provided by the embodiment of the present invention maintains information to be transmitted by using a distributed coordination node cluster, and implements a specific execution process of the file transmission path generation and transmission according to the file information to be transmitted by the push node cluster, and is implemented by the main
  • the push node is responsible for generating the file transfer path, and the plurality of slave push nodes obtain the files to be transferred from the file storage node cluster, and then respectively perform multiplex transmission to be sent to the plurality of destination node clusters.
  • FIG. 8 is a schematic structural diagram of a file transmission apparatus according to another embodiment of the present invention.
  • the apparatus includes: a receiving module 801, a saving module 802, and a transmitting module 803.
  • the receiving module 801 is configured to receive a data packet and a file transmission path of the file to be transmitted, and the receiving module 801 is connected to the saving module 802.
  • the saving module 802 is configured to save the data packet of the file to be transmitted in the memory; the saving module 802 is connected to the sending module 803.
  • the sending module 803 is configured to send the data packet of the file to be transmitted to the next destination node according to the file transmission path.
  • the apparatus sends a data packet of a file to be transmitted to another destination node according to the file transmission path by receiving and storing the data packet of the file to be transmitted.
  • the file is in the transmission path.
  • Each destination node delivers this data packet at almost the same time, so that the load from the push node and each destination node is very low, occupying small system resources, and not affecting the programs running on it, greatly improving The speed of data transfer.
  • FIG. 9 is a schematic structural diagram of a file transmission apparatus according to another embodiment of the present invention.
  • the apparatus includes a nonvolatile memory 901, a CPU (Central Processing Unit) 902, a forwarding chip 903, a memory 904, and other hardware 905.
  • a nonvolatile memory 901 a nonvolatile memory 901
  • a CPU Central Processing Unit
  • a forwarding chip 903 a memory 904
  • the memory 904 is configured to store the instruction code, and the operations performed when the instruction code is executed are mainly the file information acquisition module, the file determination module to be transmitted, the path generation module, and the path sending module in the apparatus shown in FIG. The function.
  • the CPU 902 is configured to communicate with the forwarding chip 903 to perform transmission and reception of various data packets.
  • the communication module 904 is configured to communicate with the memory 904, read and execute the instruction code stored in the memory 904, and complete the file information acquiring module in the device.
  • the data in the memory 901 includes: file information to be transmitted and heartbeat information of at least one push node.
  • the file information obtaining module is configured to obtain file information to be transmitted in the distributed coordination node cluster
  • the to-be-transmitted file determining module is configured to determine, according to the to-be-transmitted file information created by the distributed coordination node cluster and the at least one heartbeat information from the push node, that the at least one slave push node stores the file to be transmitted, where the heartbeat information is at least Carrying file information saved from the push node, where the to-be-transmitted file is pulled from the file storage node cluster by the at least one push node;
  • a path generation module configured to generate a file transmission path, where the file transmission path includes at least one destination node cluster
  • a path sending module configured to send the file transmission to the at least one slave push node a path, wherein the at least one slave push node sends the file to be transmitted to the at least one destination node cluster according to the file transmission path; wherein each destination node cluster includes a plurality of destination nodes.
  • the forwarding chip 903 is configured to be connected to other nodes through ports on the chip, and is responsible for receiving and processing the various data packets described above.
  • the non-volatile memory 901 is configured to store various data, including: file information to be transmitted and heartbeat information of at least one push node, to complete the function of the storage module in the foregoing apparatus.
  • the file transmission device provided in the foregoing embodiment is only illustrated by the division of each functional module in the file transmission. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the node is divided into different functional modules to perform all or part of the functions described above.
  • the embodiment of the file transmission device and the file transmission method provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种文件传输方法包括:获取分布式协调节点集群中的待传输文件信息(501);根据该待传输文件信息和至少一个从推送节点的心跳信息,确定该至少一个从推送节点中保存有待传输文件,该心跳信息携带所述至少一个从推送节点所保存的文件信息,该待传输文件为该至少一个从推送节点从文件存储节点集群中拉取(502);生成文件传输路径,该文件传输路径包括至少一个目的节点集群(503);向该至少一个从推送节点发送该文件传输路径(504)。

Description

文件传输方法、装置及分布式集群文件***
本申请要求于2013年12月17日提交中国专利局、申请号为201310695160.8、发明名称为“文件传输方法、装置及分布式集群文件***”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及信息技术领域,特别涉及一种文件传输方法、装置及分布式集群文件***。
发明背景
随着数据规模越来越大,文件***逐渐发展成为分布式集群文件***。该分布式集群文件***可以包括数据库、接口机、调度机和多个目的节点;其中,数据库用于记录和展示文件信息,接口机用于存放文件等需要下发的任务数据,调度机用于调度和管理分布式集群文件***内的任务,目的节点用于存储文件。
对于分布式集群文件***来说,其重要职能之一在于快速的将文件传输至目的节点。
一般在利用分布式集群文件***进行文件传输时,调度机检测数据库中是否保存有待传输文件信息,当检测到数据库中保存有该待传输文件信息时,调度机根据该待传输文件信息,调度接口机与多个目的节点之间的文件传输。
由于现有的传输过程是由多个目的节点向接口机请求文件,而接口机是一个或少数个设备,其负载能力有限,因此在向多个目的节点传输文件时,接口机所承受的压力较大,容易达到设备磁盘输入/输出(IO,Input/Output)的瓶颈。这不仅对本机上其他运行的程序造成较大影响, 而且使得整个分布式集群文件***的传输受到影响。而且,如果用于维护待传输文件信息的数据库稳定性较差,则当该数据库主机出现故障时,调度机将无法对文件的传输进行调度,从而导致整个***处于瘫痪状态。
发明内容
本发明实施例提供了一种文件传输方法、装置及分布式集群文件***。所述技术方案如下:
本发明实施例提供了一种分布式集群文件***,包括:
分布式协调节点集群,所述分布式协调节点集群包括多个协调节点,所述多个协调节点之间共享信息,所述分布式协调节点集群用于创建待传输文件信息;
文件存储节点集群,所述文件存储节点集群包括多个存储节点,所述文件存储节点集群用于存储待传输文件;
推送节点集群,所述推送节点集群包括主推送节点和至少一个从推送节点,所述主推送节点用于根据所述待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息至少携带从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从所述文件存储节点集群中拉取;
所述主推送节点还用于生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
所述主推送节点还用于向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;
所述至少一个目的节点集群,每个目的节点集群包括多个目的节 点。
本发明另一实施例提供了一种文件传输方法,所述方法包括:
主推送节点获取分布式协调节点集群中的待传输文件信息;
主推送节点根据所述待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息至少携带从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从文件存储节点集群中拉取;
所述主推送节点生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
所述主推送节点向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;
其中,每个目的节点集群包括多个目的节点。
本发明另一实施例提供了一种文件传输方法,所述方法包括:
目的节点接收待传输文件的数据包和文件传输路径;
所述目的节点将所述待传输文件的数据包保存在内存中;
所述目的节点根据所述文件传输路径,将所述待传输文件的数据包发送至下一个目的节点。
本发明另一实施例提供了一种文件传输装置,所述装置包括:
文件信息获取模块,用于获取分布式协调节点集群中的待传输文件信息;
待传输文件确定模块,用于根据分布式协调节点集群所创建的待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息至少携带从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从文件存储节 点集群中拉取;
路径生成模块,用于生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
路径发送模块,用于向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;其中,每个目的节点集群包括多个目的节点。
本发明另一实施例提供了一种文件传输装置,所述装置包括:
接收模块,用于接收待传输文件的数据包和文件传输路径;
保存模块,用于将所述待传输文件的数据包保存在内存中;
发送模块,用于根据所述文件传输路径,将所述待传输文件的数据包发送至下一个目的节点。
在本发明实施例中,利用分布式协调节点集群维护待传输文件信息,由推送节点集群根据待传输文件信息实现文件传输路径的生成以及传输的具体执行过程,并由主推送节点负责生成文件传输路径,而由多个从推送节点从文件存储节点集群中获取到待传输文件,再各自进行多路传输,从而发送至多个目的节点集群。利用本发明实施例,避免了由于多个节点同时访问单一节点以进行文件传输而造成的单点瓶颈现象,且即使分布式协调节点集群中任一个节点出现故障,也不会影响整个文件***的正常运转。
附图简要说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出 创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种分布式集群文件***的结构示意图;
图2是本发明实施例提供的一种文件传输方法的流程图;
图3是本发明实施例提供的一种主推送节点选举方法的流程图;
图4是本发明实施例提供的一种文件传输的示意图;
图5是本发明另一实施例提供的一种文件传输方法的流程图;
图6是本发明另一实施例提供的一种文件传输方法的流程图;
图7是本发明实施例提供的一种文件传输装置的结构示意图;
图8是本发明另一实施例提供的一种文件传输装置的结构示意图;
图9是本发明另一实施例提供的一种文件传输装置的结构示意图。
实施本发明的方式
为使本发明的技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种分布式集群文件***的结构示意图。参见图1,该***包括:分布式协调节点集群101、文件存储节点集群102、推送节点集群103、至少一个目的节点集群104。
分布式协调节点集群101包括多个协调节点,该多个协调节点之间共享信息,该分布式协调节点集群101用于创建待传输文件信息。
文件存储节点集群102包括多个存储节点,该文件存储节点集群102用于存储待传输文件。
推送节点集群103包括主推送节点103a和至少一个从推送节点103b,该主推送节点103a用于根据该待传输文件信息和至少一个从推送节点103b的心跳信息,确定该至少一个从推送节点103b中保存有待传 输文件,该心跳信息携带该至少一个从推送节点103b所保存的文件信息,该待传输文件为该至少一个从推送节点103b从该文件存储节点集群102中拉取。
该主推送节点103a还用于生成文件传输路径,该文件传输路径包括至少一个目的节点集群104。
该主推送节点103a还用于向该至少一个从推送节点103b发送该文件传输路径,使得该至少一个从推送节点103b根据该文件传输路径将该待传输文件发送至该至少一个目的节点集群104。
该至少一个目的节点集群104中的每个目的节点集群包括多个目的节点。
进一步地,该***还包括数据库105,该数据库105用于展示待传输文件的进度等状态信息。
该主推送节点103a可以通过以下几种方式之一或任意组合确定当前文件传输路径。
在本发明一实施例中,该主推送节点103a用于获取已生成的文件传输路径,当该已生成的文件传输路径中包括第一目的节点集群时,从该已生成的文件传输路径中删除该第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第一目的节点集群为该主推送节点103a在预设周期内未接收到其心跳信息的目的节点集群。
在本发明另一实施例中,该主推送节点103a用于获取已生成的文件传输路径,当该已生成的文件传输路径中包括第二目的节点集群时,从该已生成的文件传输路径中删除该第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第二目的节点集群为当前正在执行传输任务的目的节点集群。
在本发明另一实施例中,该主推送节点103a用于根据目的节点的心 跳信息,获取当前文件传输路径,该文件传输路径包括第三目的节点集群,该第三目的节点为处于空闲状态的目的节点集群。
在本发明另一实施例中,该主推送节点103a用于根据对该待传输文件的数据请求,获取当前文件传输路径,该文件传输路径中的至少一个目的节点集群包括发送该数据请求的目的节点集群。
在本发明实施例中,该至少一个从推送节点103b的心跳信息还携带该从推送节点的节点状态信息。
该主推送节点103a用于根据该至少一个从推送节点103b的节点状态信息,从该至少一个从推送节点103b中获取一个处于空闲状态的从推送节点;
该主推送节点103a向该处于空闲状态的从推送节点发送该文件传输路径。
在本发明实施例中,该主推送节点103a由多个推送节点根据临时节点序列号选举得出,该临时节点序列号由分布式协调***分配。
在本发明实施例中,该至少一个从推送节点103b用于检测分布式协调节点集群101所创建的该待传输文件信息,当根据该待传输文件信息确定该从推送节点中不包括该待传输文件时,从该文件存储节点集群102中拉取该待传输文件。
在本发明实施例中,该***还包括数据库105。
该主推送节点103a还用于接收该文件传输路径中多个目的节点发送的文件传输的进度信息,并将该文件传输的进度信息写入数据库105中。
在本发明一实施例中,该目的节点用于当接收到待传输文件的数据包时,向该文件传输路径中的下一个目的节点发送接收到的数据包。
在本发明另一实施例中,该目的节点用于当接收到待传输文件的数 据包时,向与该目的节点属于同一个目的节点集群的其他目的节点中的任一个发送接收到的数据包。
下面,结合图1所示的***架构,详细描述本发明实施例提供的一种文件传输方法,参见图2,该方法流程包括如下步骤:
步骤201、推送客户端将待传输文件推送至文件存储节点集群。
该步骤201具体为:当推送客户端获取到某个文件时,可以根据需要将该文件推送至文件存储节点集群中进行保存。此时,该推送客户端将该文件确定为待传输文件,并将该待传输文件推送至文件存储节点集群。对于文件存储节点集群如HDFS(Hadoop File System,Hadoop文件***)来说,该文件可以在HDFS中进行存储。
步骤202、推送客户端将待传输文件信息发送至分布式协调节点集群。
为了清楚的了解文件存储节点集群中存储有哪些文件,推送客户端在将文件发送至文件存储节点集群中时,可以将该文件信息发送给分布式协调节点集群。该步骤202具体可以为:推送客户端将该待传输文件在文件存储节点集群中的保存地址、该待传输文件的大小、该待传输文件标识和接收该待传输文件的时间等发送至分布式协调节点集群,该分布式协调节点集群接收到该信息时,创建该待传输文件信息。
步骤203、推送节点集群中的从推送节点向主推送节点发送从推送节点的心跳信息。
其中,从推送节点的心跳信息可以包括:已保存的文件信息、节点状态信息等。该节点状态信息可以包括该从推送节点已成功的任务、正在执行的任务、失败的任务以及是否处于空闲状态的信息等。
由于推送节点的数量可能会很多,而推送节点一般是用来向目的节点推送文件的,因此,为了避免推送节点间的冲突,可以在推送节点集 群中选举主推送节点,该主推送节点用于收集各个从推送节点的心跳信息,以获知该各个从推送节点所保存的文件信息、当前节点状态等。且该主推送节点还可以获取各个目的节点集群或目的节点的心跳信息,以获知各个目的节点集群的节点状态。进一步地,为了使得文件***中的数据保持一致,主推送节点还可以对文件***中的文件传输进行调度和管理。
推送节点集群选举主推送节点的过程可以包括:分布式协调***为推送节点集群中的每个推送节点分配临时节点序列号,多个推送节点根据临时节点序列号进行选举,确定主推送节点。
其中,分布式协调节点集群可以为该多个推送节点分配唯一的一个临时节点序列号。当多个推送节点启动时,该多个推送节点中的每一个推送节点在分布式协调节点集群中进行注册,使得多个推送节点中的每一个推送节点作为一个临时节点,并由分布式协调节点集群分配一个临时节点序列号。对于多个推送节点中任一个推送节点,当该推送节点得到自身的临时节点序列号时,将自身的临时节点序列号与其它多个推送节点的临时节点序列号进行对比,确定该临时节点序列号是否最小。当该推送节点确定自身的临时节点序列号在该多个推送节点的临时节点序列号中是最小时,将该推送节点确定为主推送节点。
在本发明实施例中,当多个推送节点根据临时节点序列号进行选举,确定主推送节点后,除了主推送节点以外的其它推送节点都将作为从推送节点。该从推送节点中任一个从推送节点将实时监测比自身的临时节点序列号小的其它推送节点,当检测到比自身的临时节点序列号小的其它推送节点出现故障或不存在时,分布式协调节点集群向各个推送节点发送通知信息。各个推送节点接收到该通知信息时,根据临时节点序列号再次进行选举,重新确定主推送节点。其中,通知信息用于通知 推送节点根据临时节点序列号进行选举。
图3是本发明实施例提供的一种主推送节点选举方法的流程图。参见图3,该方法包括如下步骤:
步骤301,分布式协调节点集群将多个推送节点中的每一个推送节点作为临时节点,并对该临时节点分配序列号。
步骤302,对于多个推送节点中的任一个,该推送节点判断是否包含有比自身的临时节点序列号小的推送节点。当包含有比自身的临时节点序列号小的推送节点时,执行步骤303;否则,执行步骤304。
步骤303,该推送节点被作为从推送节点,并实时检测比自身的临时节点序列号小的推送节点的服务状态,当被检测的推送节点的服务状态为故障时,执行步骤302。
步骤304,该推送节点被作为主推送节点。
步骤204、主推送节点获取分布式协调节点集群中的待传输文件信息。
具体地,主推送节点通过检测分布式协调节点集群中保存的文件信息,确定该分布式协调节点集群中是否包含待传输文件信息。当该分布式协调节点集群中包含待传输文件信息时,主推送节点获取该待传输文件信息。
其中,待传输文件信息包括:待传输文件在文件存储节点集群中的保存地址、待传输文件的大小、待传输文件标识、接收到该待传输文件的时间等。其中,待传输文件标识可以为该待传输文件名称,也可以为该待传输文件的哈希(HASH)值,如消息摘要算法第五版(MD5,Message Digest Algorithm)值等,还可以是用户对该待传输文件赋予的key值,本发明实施例对如何表示待传输文件标识不做限定。
当然,主推送节点也可以在目的节点向该主推送节点请求该待传输 文件时获取分布式协调节点集群中的待传输文件信息。主推送节点根据目的节点发送的请求中携带的待传输文件标识,获取分布式协调节点集群中保存的文件信息。当该分布式协调节点集群中保存的文件信息中包含该待传输文件标识对应的文件信息时,主推送节点获取分布式协调节点集群中的待传输文件信息。
步骤205、该主推送节点根据该待传输文件信息和至少一个从推送节点的心跳信息,确定该至少一个从推送节点中保存有待传输文件。该心跳信息携带该至少一个从推送节点所保存的文件信息,该待传输文件为该至少一个从推送节点从文件存储节点集群中拉取的。
具体地,对于一个从推送节点,主推送节点根据该待传输文件信息,查找该从推送节点上传的心跳信息中是否包含该待传输文件信息,当主推送节点确定该从推送节点上传的心跳信息中包括该待传输文件信息时,该主推送节点确定该从推送节点中保存有待传输文件。
以待传输文件标识为MD5值为例,主推送节点根据获取到的待传输文件信息中的MD5值,与从推送节点发送的已保存的文件信息中包含的MD5值进行对比,判断从推送节点中已保存的文件信息中是否包含待传输文件的MD5值。当从推送节点发送的已保存的文件信息中包含该待传输文件信息中的MD5值,主推送节点确定从推送节点中保存有待传输文件。
需要说明的是,从推送节点也会对分布式协调节点集群进行检测,以获知是否有新的待传输文件信息。当从推送节点根据自身的文件信息和分布式协调节点集群上的文件信息,确定此时分布式协调节点集群上具有新的待传输文件信息时,根据待传输文件信息,从文件存储节点集群中获取该待传输文件。
步骤206、该主推送节点生成文件传输路径。
具体地,主推送节点可以通过以下任一方式获取文件传输路径:
(I)所述主推送节点获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第一目的节点集群时,从所述已生成的文件传输路径中删除所述第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第一目的节点集群为所述主推送节点在预设周期内未接收到其心跳信息的目的节点集群。
为了提高文件传输的速度,主推送节点可以直接通过已生成的文件传输路径对该待传输文件进行传输。具体地,主推送节点获取已生成的文件传输路径,并根据该文件传输路径中各个目的节点集群在预设周期内发送的心跳信息情况,确定在预设周期内是否接收到该文件传输路径上所有目的节点集群的心跳信息。如果在预设周期内接收到该文件传输路径上所有目的节点集群的心跳信息,则主推送节点将该文件传输路径获取为当前文件传输路径,如果在预设周期内未接收到文件传输路径上某一目的节点集群的心跳信息,则主推送节点将该目的节点集群从该已生成的文件路径中删除,并将删除了该目的节点集群的文件传输路径作为当前文件传输路径。
(II)所述主推送节点获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第二目的节点集群时,从所述已生成的文件传输路径中删除所述第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第二目的节点集群为当前正在执行传输任务的目的节点集群。
由于在已生成的文件传输路径中,可能会包含正在执行任务的目的节点集群,为了避免重复传输或传输不成功等情况,主推送节点可以将正在执行任务的目的节点集群从已生成的文件传输路径中删除。
(III)所述主推送节点根据目的节点的心跳信息,获取当前文件传 输路径,所述文件传输路径包括第三目的节点集群,所述第三目的节点为处于空闲状态的目的节点集群。
具体地,主推送节点根据接收到的目的节点集群的心跳信息中的节点状态信息,确定处于空闲状态的目的节点集群,主推送节点根据处于空闲状态的目的节点集群,生成文件传输路径。
(IV)所述主推送节点根据对所述待传输文件的数据请求,获取当前文件传输路径,所述文件传输路径中的至少一个目的节点集群包括发送所述数据请求的目的节点集群。
具体地,当目的节点集群中的任一个目的节点向主推送节点请求该待传输文件时,主推送节点可以根据向主推送节点请求该待传输文件的目的节点,生成包含该目的节点集群在内的文件传输路径。
需要说明的是,上述文件传输路径生成方法可以叠加进行。如对于一条文件传输路径来说,如果包含上述四种情况中的至少两种,可以根据实际情况,删除或添加某个目的节点或目的节点集群,以便灵活的进行文件传输。
在本发明实施例中,以至少一个从推送节点中均包括待传输文件为例进行说明。而在实际场景中,该至少一个从推送节点中可能仅有一个或多个从推送节点中已保存有该待传输文件,则对于这种情况,仅为已保存有该待传输文件的从推送节点生成文件传输路径。由于各个从推送节点的物理地址不同,对于不同的从推送节点,其文件传输路径的起点不同。为了避免多个从推送节点的传输冲突,提高传输效率,可以为不同从推送节点设置不同的文件传输路径,也即是,不同从推送节点用于向不同的目的节点集群发送待传输文件。
207、主推送节点根据该至少一个从推送节点的节点状态信息,从该至少一个从推送节点中获取一个处于空闲状态的从推送节点。
具体地,每个从推送节点每隔预设周期向该主推送节点发送节点状态信息。当主推送节点接收到该节点状态信息时,根据该节点状态信息,确定该推送节点是否处于空闲状态。
当从推送节点正在执行文件传输等任务时,如果仍使用该从推送节点作为推送设备,可能会影响文件传输的速度,使得待传输文件需要在该从推送节点处进行排队等待,才能够执行传输过程。因此,为了节约时间,提高传输效率,可以通过节点状态信息以确定当前处于空闲状态的从推送节点。
步骤208、该主推送节点向该处于空闲状态的从推送节点发送该文件传输路径。
在本发明实施例中,文件传输路径与从推送节点一一对应。
步骤209、当该从推送节点接收到该文件传输路径时,根据该文件传输路径将待传输文件的数据包依次发送给目的节点。
在传输过程中,从推送节点将得到的待传输文件拆分为多个数据包,该从推送节点根据文件传输路径,将待传输文件的数据包依次发送给文件传输路径中的第一个目的节点。
步骤210、对于文件传输路径中的任一个目的节点,当目的节点接收到待传输文件的数据包时,将该数据包保存在内存中,并将该数据包发送至文件传输路径中的该目的节点的下一个目的节点。
具体地,当从推送节点将待传输文件的数据包发送至文件传输路径中的第一个目的节点时,该目的节点将该数据包保存在内存中,并将该数据包发送至该文件传输路径中的第二个目的节点。第二个目的节点在接收到数据包时,保存该数据包并将该数据包发送至第三个目的节点,直到一个目的节点接收到该数据包且确定自身是文件传输路径上的最后一个节点时结束。通过上述过程,使得当从推送节点在发送数据包时, 该文件传输路径中的每一个目的节点几乎在同一时刻都在传递这个数据包。这样,从推送节点以及各个目的节点的负载都很低,占用***资源很小,不会对及其上运行的程序造成影响,而且能在相隔很短的时间内同时完成数据传递,其相隔时刻通常为毫秒级。
为了使该待传输文件的传输过程更加清楚明了,图4示出了本发明实施例提供的一种文件传输的示意图。参见图4,主推送节点将文件传输路径发送给保存有待传输文件、且处于空闲状态的任一个从推送节点,该从推送节点根据文件传输路径将待传输文件的数据包发送至目的节点集群中的目的节点A中,当目的节点A接收到该数据包时,将该数据包保存在内存中,并将该数据包发送给该文件传输路径中目的节点A的下一个目的节点B,目的节点B将该数据包保存在内存中并将该数据包发送给该文件传输路径中的最后一个目的节点C。这样,该待传输文件将保存在文件传输路径中的每一个目的节点中。
需要说明的是,按照上述部署方案部署的分布式文件传输***,推送节点集群与目的节点集群部署在同一机房,每10个目的节点作为一个目的节点集群,每个推送节点下发任务的单位为一个目的节点集群,也即是每个推送节点的文件传输路径上仅包括一个目的节点集群,那么以传输200MB的数据文件为例,从推送客户端下发任务到文件完全下发到目的节点集群,仅需20s。
步骤211、文件传输路径中的目的节点向主推送节点发送文件传输的进度信息。
为了确定本次文件传输过程是否成功以及哪些目的节点成功接收到了待传输文件,目的节点需要将接收待传输文件的进度反馈给主推送节点。具体地,当预设周期到达时,目的节点将自身接收待传输文件的数据包的进度信息发送给主推送节点,以使得该主推送节点确定待传输 文件的进度。
步骤212、当主推送节点接收到该文件传输的进度信息时,将该文件传输的进度信息写入数据库中。
步骤213、当数据库获取到该文件传输的进度信息时,将该文件传输的进度信息进行显示。
需要说明的是,该主推送节点还可以将从推送节点的心跳信息和目的节点的心跳信息写入数据库中,当数据库获取到该从推送节点的心跳信息和目的节点的心跳信息时,显示该从推送节点的心跳信息和目的节点的心跳信息,以向用户显示从推送节点中保存的文件信息、节点的服务状态和目的节点接收到的文件信息、接收失败的文件信息、正在执行的任务信息等。
在本发明实施例提供的方法中,利用分布式协调节点集群维护待传输文件信息,由推送节点集群根据待传输文件信息实现文件传输路径的生成以及传输的具体执行过程,并由主推送节点负责生成文件传输路径,而由多个从推送节点从文件存储节点集群中获取到待传输文件,再各自进行多路传输,从而发送至多个目的节点集群。利用本发明实施例提供的方法,避免了由于多个节点同时访问单一节点以进行文件传输而造成的单点瓶颈现象,且即使分布式协调节点集群中任一个节点出现故障,也不会影响整个文件***的正常运转。由于在目的节点间进行链式传输,使得从推送节点以及各个目的节点的负载都很低,占用***资源很小,不会对及其上运行的程序造成影响,而且能在相隔很短的时间内同时完成数据传递。
图5是本发明另一实施例提供的一种文件传输方法的流程图,该方法流程包括如下步骤:
步骤501、主推送节点获取分布式协调节点集群中的待传输文件信 息。
步骤502、主推送节点根据该待传输文件信息和至少一个从推送节点的心跳信息,确定该至少一个从推送节点中保存有待传输文件,该心跳信息携带该至少一个从推送节点所保存的文件信息,该待传输文件为该至少一个从推送节点从文件存储节点集群中拉取的。
步骤503、该主推送节点生成文件传输路径,该文件传输路径包括至少一个目的节点集群。
步骤504、该主推送节点向该至少一个从推送节点发送该文件传输路径,使得该至少一个从推送节点根据该文件传输路径将该待传输文件发送至该至少一个目的节点集群。
其中,每个目的节点集群包括多个目的节点。
在本发明实施例提供的方法中,利用分布式协调节点集群维护待传输文件信息,由推送节点集群根据待传输文件信息实现文件传输路径的生成以及传输的具体执行过程,并由主推送节点负责生成文件传输路径,而由多个从推送节点从文件存储节点集群中获取到待传输文件,再各自进行多路传输,从而发送至多个目的节点集群。利用本发明实施例提供的方法,避免了由于多个节点同时访问单一节点以进行文件传输而造成的单点瓶颈现象,且即使分布式协调节点集群中任一个节点出现故障,也不会影响整个文件***的正常运转。
在本发明实施例中,该主推送节点可以通过以下几种方式之一或任意组合生成文件传输路径包括:
在本发明一实施例中,该主推送节点获取已生成的文件传输路径,当该已生成的文件传输路径中包括第一目的节点集群时,从该已生成的文件传输路径中删除该第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第一目的节点集群为该主推送节点在预设周期 内未接收到心跳信息的目的节点集群。
在本发明另一实施例中,该主推送节点获取已生成的文件传输路径,当该已生成的文件传输路径中包括第二目的节点集群时,从该已生成的文件传输路径中删除该第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第二目的节点集群为当前正在执行传输任务的目的节点集群。
在本发明另一实施例中,该主推送节点根据目的节点的心跳信息,获取当前文件传输路径,该文件传输路径包括第三目的节点集群,该第三目的节点为处于空闲状态的目的节点集群。
在本发明另一实施例中,该主推送节点根据对该待传输文件的数据请求,获取当前文件传输路径,该文件传输路径中的至少一个目的节点集群包括发送该数据请求的目的节点集群。
在本发明实施例中,该从推送节点的心跳信息还携带该从推送节点的节点状态信息。该主推送节点向该至少一个从推送节点发送该文件传输路径包括:
主推送节点根据该至少一个从推送节点的节点状态信息,从该至少一个从推送节点中获取一个处于空闲状态的从推送节点;
该主推送节点向该处于空闲状态的从推送节点发送该文件传输路径。
在本发明实施例中,该主推送节点由多个推送节点根据临时节点序列号选举得出,该临时节点序列号由分布式协调***分配。
在本发明实施例中,该待传输文件由该从推送节点在确定自身未保存有该待传输文件时,从该文件存储节点集群中拉取得到。
在本发明实施例中,该主推送节点向该至少一个从推送节点发送该文件传输路径之后,该方法还包括:
该主推送节点接收该文件传输路径中多个目的节点发送的文件传输的进度信息;该主推送节点将该文件传输的进度信息写入数据库中。
图6是本发明另一实施例提供的一种文件传输方法的流程图,该方法流程包括如下步骤:
步骤601、目的节点接收待传输文件的数据包和文件传输路径。
步骤602、该目的节点将该待传输文件的数据包保存在内存中。
步骤603、该目的节点根据该文件传输路径,将该待传输文件的数据包发送至下一个目的节点。
在本发明实施例提供的方法,目的节点接收并保存待传输文件的数据包,并根据文件传输路径,将该待传输文件的数据包发送至其它目的节点。这样,当从推送节点在发送数据包时,该文件传输路径中的每一个目的节点几乎在同一时刻都在传递这个数据包,从而使得从推送节点以及各个目的节点的负载都很低,占用***资源很小,不会对及其上运行的程序造成影响,大大提高了数据传递的速度。
图7是本发明实施例提供的一种文件传输装置的结构示意图。参见图7,该装置包括:文件信息获取模块701、待传输文件确定模块702、路径生成模块703和路径发送模块704。其中,文件信息获取模块701,用于获取分布式协调节点集群中的待传输文件信息;文件信息获取模块701与待传输文件确定模块702相连接。待传输文件确定模块702,用于根据分布式协调节点集群所创建的待传输文件信息和至少一个从推送节点的心跳信息,确定该至少一个从推送节点中保存有待传输文件。该心跳信息携带该至少一个从推送节点所保存的文件信息,该待传输文件为该至少一个从推送节点从文件存储节点集群中拉取的。待传输文件确定模块702与路径生成模块703相连接。路径生成模块703,用于生成文件传输路径,该文件传输路径包括至少一个目的节点集群;路径生 成模块703与路径发送模块704相连接。路径发送模块704,用于向该至少一个从推送节点发送该文件传输路径,使得该至少一个从推送节点根据该文件传输路径将该待传输文件发送至该至少一个目的节点集群。其中,每个目的节点集群包括多个目的节点。
在本发明一实施例中,该路径生成模块703用于获取已生成的文件传输路径,当该已生成的文件传输路径中包括第一目的节点集群时,从该已生成的文件传输路径中删除该第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第一目的节点集群为在预设周期内未接收到其心跳信息的目的节点集群。
在本发明另一实施例中,该路径生成模块703用于获取已生成的文件传输路径,当该已生成的文件传输路径中包括第二目的节点集群时,从该已生成的文件传输路径中删除该第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,该第二目的节点集群为当前正在执行传输任务的目的节点集群。
在本发明另一实施例中,该路径生成模块703用于根据目的节点的心跳信息,获取当前文件传输路径,该文件传输路径包括第三目的节点集群,该第三目的节点为处于空闲状态的目的节点集群。
在本发明另一实施例中,该路径生成模块703用于根据对该待传输文件的数据请求,获取当前文件传输路径,该文件传输路径中的至少一个目的节点集群包括发送该数据请求的目的节点集群。
在本发明实施例中,该路径发送模块704用于当该至少一个从推送节点的心跳信息还携带该从推送节点的节点状态信息时,根据该至少一个从推送节点的节点状态信息,从该至少一个从推送节点中获取一个处于空闲状态的从推送节点,向该处于空闲状态的从推送节点发送该文件传输路径。
在本发明实施例中,该主推送节点由多个推送节点根据临时节点序列号选举得出,该临时节点序列号由分布式协调***分配。
在本发明实施例中,该待传输文件由该从推送节点在确定自身未保存有该待传输文件时,从该文件存储节点集群中拉取得到。
在本发明实施例中,该装置还包括:
写入模块,用于接收该文件传输路径中多个目的节点发送的文件传输的进度信息,并将该文件传输的进度信息写入数据库中。
综上所述,本发明实施例提供的装置,利用分布式协调节点集群维护待传输文件信息,由推送节点集群根据待传输文件信息实现文件传输路径的生成以及传输的具体执行过程,并由主推送节点负责生成文件传输路径,而由多个从推送节点从文件存储节点集群中获取到待传输文件,再各自进行多路传输,从而发送至多个目的节点集群。利用本发明实施例提供的装置,避免了由于多个节点同时访问单一节点以进行文件传输而造成的单点瓶颈现象,且即使分布式协调节点集群中任一个节点出现故障,也不会影响整个文件***的正常运转。
图8是本发明另一实施例提供的一种文件传输装置的结构示意图。参见图8,该装置包括:接收模块801、保存模块802和发送模块803。其中,接收模块801,用于接收待传输文件的数据包和文件传输路径;接收模块801与保存模块802相连接。保存模块802,用于将该待传输文件的数据包保存在内存中;保存模块802与发送模块803相连接。发送模块803,用于根据该文件传输路径,将该待传输文件的数据包发送至下一个目的节点。
综上所述,本发明实施例提供的装置,通过接收并保存待传输文件的数据包,并根据文件传输路径,将该待传输文件的数据包发送至其它目的节点。这样,当从推送节点在发送数据包时,该文件传输路径中的 每一个目的节点几乎在同一时刻都在传递这个数据包,从而使得从推送节点以及各个目的节点的负载都很低,占用***资源很小,不会对及其上运行的程序造成影响,大大提高了数据传递的速度。
图9是本发明另一实施例提供的一种文件传输装置的结构示意图。参见图9,该装置包括:非易失性存储器901,CPU(中央处理器)902,转发芯片903,内存904和其他硬件905。
其中,内存904,用于存储指令代码,当所述指令代码被执行时完成的操作主要为图7所示的装置中的文件信息获取模块、待传输文件确定模块、路径生成模块、路径发送模块的功能。
CPU 902,用于与转发芯片903通信,进行各种数据包的收发;用于与内存904通信,读取和执行内存904中存储的所述指令代码,完成上述装置中的文件信息获取模块、待传输文件确定模块、路径生成模块、路径发送模块等模块完成的功能,以及对从转发芯片903上送的数据包的处理;用于与非易失性存储器901通信,读/写非易失性存储器901中的数据,包括:待传输文件信息以及至少一个推送节点的心跳信息。
其中,文件信息获取模块,用于获取分布式协调节点集群中的待传输文件信息;
待传输文件确定模块,用于根据分布式协调节点集群所创建的待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息至少携带从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从文件存储节点集群中拉取;
路径生成模块,用于生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
路径发送模块,用于向所述至少一个从推送节点发送所述文件传输 路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;其中,每个目的节点集群包括多个目的节点。
转发芯片903,用于通过该芯片上的端口连接到其它节点,负责上述的各种数据包的收发处理。
非易失性存储器901,用于存储各种数据,包括:待传输文件信息以及至少一个推送节点的心跳信息,完成上述装置中的存储模块的功能。
需要说明的是:上述实施例提供的文件传输装置在文件传输时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将节点的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的文件传输装置和文件传输方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (21)

  1. 一种分布式集群文件***,其特征在于,包括:
    分布式协调节点集群,所述分布式协调节点集群包括多个协调节点,所述多个协调节点之间共享信息,所述分布式协调节点集群用于创建待传输文件信息;
    文件存储节点集群,所述文件存储节点集群包括多个存储节点,所述文件存储节点集群用于存储待传输文件;
    推送节点集群,所述推送节点集群包括主推送节点和至少一个从推送节点,所述主推送节点用于根据所述待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息携带所述至少一个从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从所述文件存储节点集群中拉取;
    所述主推送节点还用于生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
    所述主推送节点还用于向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;
    所述至少一个目的节点集群,每个目的节点集群包括多个目的节点。
  2. 根据权利要求1所述的***,其特征在于,
    所述主推送节点用于获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第一目的节点集群时,从所述已生成的文件传输路径中删除所述第一目的节点集群,将删除后的文件传输路径作为当前文 件传输路径,所述第一目的节点集群为所述主推送节点在预设周期内未接收到其心跳信息的目的节点集群;和/或,
    所述主推送节点用于获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第二目的节点集群时,从所述已生成的文件传输路径中删除所述第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第二目的节点集群为当前正在执行传输任务的目的节点集群;和/或,
    所述主推送节点用于根据目的节点的心跳信息,获取当前文件传输路径,所述文件传输路径包括第三目的节点集群,所述第三目的节点为处于空闲状态的目的节点集群;和/或,
    所述主推送节点用于根据对所述待传输文件的数据请求,获取当前文件传输路径,所述文件传输路径中的至少一个目的节点集群包括发送所述数据请求的目的节点集群。
  3. 根据权利要求1所述的***,其特征在于,所述从推送节点的心跳信息还携带所述从推送节点的节点状态信息,
    所述主推送节点用于根据所述至少一个从推送节点的节点状态信息,从所述至少一个从推送节点中获取一个处于空闲状态的从推送节点;
    所述主推送节点向所述处于空闲状态的从推送节点发送所述文件传输路径。
  4. 根据权利要求1所述的***,其特征在于,所述主推送节点由多个推送节点根据临时节点序列号选举得出,所述临时节点序列号由分布式协调***分配。
  5. 根据权利要求1所述的***,其特征在于,所述从推送节点用于检测分布式协调节点集群所创建的所述待传输文件信息,当根据所述 待传输文件信息确定所述从推送节点中不包括所述待传输文件时,从所述文件存储节点集群中拉取所述待传输文件。
  6. 根据权利要求1所述的***,其特征在于,所述***还包括数据库,
    所述主推送节点还用于接收所述文件传输路径中多个目的节点发送的文件传输的进度信息,并将所述文件传输的进度信息写入数据库中。
  7. 根据权利要求1所述的***,其特征在于,所述目的节点用于当接收到待传输文件的数据包时,向所述文件传输路径中的下一个目的节点发送接收到的数据包;或,
    所述目的节点用于当接收到待传输文件的数据包时,向与所述目的节点属于同一个目的节点集群的其他目的节点中的任一个发送接收到的数据包。
  8. 一种文件传输方法,其特征在于,所述方法包括:
    主推送节点获取分布式协调节点集群中的待传输文件信息;
    主推送节点根据所述待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息携带所述至少一个从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从文件存储节点集群中拉取的;
    所述主推送节点生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
    所述主推送节点向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;
    其中,每个目的节点集群包括多个目的节点。
  9. 根据权利要求8所述的方法,其特征在于,所述主推送节点生成文件传输路径包括:
    所述主推送节点获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第一目的节点集群时,从所述已生成的文件传输路径中删除所述第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第一目的节点集群为所述主推送节点在预设周期内未接收到其心跳信息的目的节点集群;和/或,
    所述主推送节点获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第二目的节点集群时,从所述已生成的文件传输路径中删除所述第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第二目的节点集群为当前正在执行传输任务的目的节点集群;和/或,
    所述主推送节点根据目的节点的心跳信息,获取当前文件传输路径,所述文件传输路径包括第三目的节点集群,所述第三目的节点为处于空闲状态的目的节点集群;和/或,
    所述主推送节点根据对所述待传输文件的数据请求,获取当前文件传输路径,所述文件传输路径中的至少一个目的节点集群包括发送所述数据请求的目的节点集群。
  10. 根据权利要求8所述的方法,其特征在于,所述从推送节点的心跳信息还携带所述从推送节点的节点状态信息,所述主推送节点向所述至少一个从推送节点发送所述文件传输路径包括:
    主推送节点根据所述至少一个从推送节点的节点状态信息,从所述至少一个从推送节点中获取一个处于空闲状态的从推送节点;
    所述主推送节点向所述处于空闲状态的从推送节点发送所述文件传输路径。
  11. 根据权利要求8所述的方法,其特征在于,所述主推送节点由多个推送节点根据临时节点序列号选举得出,所述临时节点序列号由分布式协调***分配。
  12. 根据权利要求8所述的方法,其特征在于,所述待传输文件由所述从推送节点在确定自身未保存有所述待传输文件时,从所述文件存储节点集群中拉取得到。
  13. 根据权利要求8所述的方法,其特征在于,所述主推送节点向所述至少一个从推送节点发送所述文件传输路径之后,所述方法还包括:
    所述主推送节点接收所述文件传输路径中多个目的节点发送的文件传输的进度信息;所述主推送节点将所述文件传输的进度信息写入数据库中。
  14. 一种文件传输方法,其特征在于,所述方法包括:
    目的节点接收待传输文件的数据包和文件传输路径;
    所述目的节点将所述待传输文件的数据包保存在内存中;
    所述目的节点根据所述文件传输路径,将所述待传输文件的数据包发送至下一个目的节点。
  15. 一种文件传输装置,其特征在于,所述装置包括:
    文件信息获取模块,用于获取分布式协调节点集群中的待传输文件信息;
    待传输文件确定模块,用于根据分布式协调节点集群所创建的待传输文件信息和至少一个从推送节点的心跳信息,确定所述至少一个从推送节点中保存有待传输文件,所述心跳信息携带该至少一个从推送节点所保存的文件信息,所述待传输文件为所述至少一个从推送节点从文件存储节点集群中拉取的;
    路径生成模块,用于生成文件传输路径,所述文件传输路径包括至少一个目的节点集群;
    路径发送模块,用于向所述至少一个从推送节点发送所述文件传输路径,使得所述至少一个从推送节点根据所述文件传输路径将所述待传输文件发送至所述至少一个目的节点集群;其中,每个目的节点集群包括多个目的节点。
  16. 根据权利要求15所述的装置,其特征在于,所述路径生成模块用于获取已生成的文件传输路径,当所述已生成的文件传输路径中包括第一目的节点集群时,从所述已生成的文件传输路径中删除所述第一目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第一目的节点集群为在预设周期内未接收到其心跳信息的目的节点集群;和/或,
    所述路径生成模块用于获取已生成的文件传输路径,当根据所述已生成的文件传输路径中包括第二目的节点集群时,从所述已生成的文件传输路径中删除所述第二目的节点集群,将删除后的文件传输路径作为当前文件传输路径,所述第二目的节点集群为当前正在执行传输任务的目的节点集群;和/或,
    所述路径生成模块用于根据目的节点的心跳信息,获取当前文件传输路径,所述文件传输路径包括第三目的节点集群,所述第三目的节点为处于空闲状态的目的节点集群;和/或,
    所述路径生成模块用于根据对所述待传输文件的数据请求,获取当前文件传输路径,所述文件传输路径中的至少一个目的节点集群包括发送所述数据请求的目的节点集群。
  17. 根据权利要求15所述的装置,其特征在于,所述路径发送模块用于当所述至少一个从推送节点的心跳信息还携带所述从推送节点 的节点状态信息时,根据所述至少一个从推送节点的节点状态信息,从所述至少一个从推送节点中获取一个处于空闲状态的从推送节点,向所述处于空闲状态的从推送节点发送所述文件传输路径。
  18. 根据权利要求15所述的装置,其特征在于,所述主推送节点由多个推送节点根据临时节点序列号选举得出,所述临时节点序列号由分布式协调***分配。
  19. 根据权利要求15所述的装置,其特征在于,所述待传输文件由所述从推送节点在确定自身未保存有所述待传输文件时,从所述文件存储节点集群中拉取得到。
  20. 根据权利要求15所述的装置,其特征在于,所述装置还包括:
    写入模块,用于接收所述文件传输路径中多个目的节点发送的文件传输的进度信息,并将所述文件传输的进度信息写入数据库中。
  21. 一种文件传输装置,其特征在于,所述装置包括:
    接收模块,用于接收待传输文件的数据包和文件传输路径;
    保存模块,用于将所述待传输文件的数据包保存在内存中;
    发送模块,用于根据所述文件传输路径,将所述待传输文件的数据包发送至下一个目的节点。
PCT/CN2015/072980 2013-12-17 2015-02-13 文件传输方法、装置及分布式集群文件*** WO2015090245A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/105,657 US9917884B2 (en) 2013-12-17 2015-02-13 File transmission method, apparatus, and distributed cluster file system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310695160.8 2013-12-17
CN201310695160.8A CN104092719B (zh) 2013-12-17 2013-12-17 文件传输方法、装置及分布式集群文件***

Publications (1)

Publication Number Publication Date
WO2015090245A1 true WO2015090245A1 (zh) 2015-06-25

Family

ID=51640399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072980 WO2015090245A1 (zh) 2013-12-17 2015-02-13 文件传输方法、装置及分布式集群文件***

Country Status (3)

Country Link
US (1) US9917884B2 (zh)
CN (1) CN104092719B (zh)
WO (1) WO2015090245A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445951A (zh) * 2015-08-07 2017-02-22 中兴通讯股份有限公司 一种文件传输方法和装置

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656976B (zh) * 2014-12-01 2019-01-04 腾讯科技(深圳)有限公司 集群***的信息推送方法及装置
CN104967619B (zh) 2015-06-17 2018-09-04 深圳市腾讯计算机***有限公司 文件推送方法、装置和***
CN107851105B (zh) 2015-07-02 2022-02-22 谷歌有限责任公司 具有副本位置选择的分布式存储***
CN107181637B (zh) * 2016-03-11 2021-01-29 华为技术有限公司 一种心跳信息发送方法、装置及心跳发送节点
CN109248440B (zh) * 2018-07-20 2019-10-29 苏州玩友时代科技股份有限公司 一种实现游戏实时动态加载配置的方法及***
CN110198346B (zh) * 2019-05-06 2020-10-27 北京三快在线科技有限公司 数据读取方法、装置、电子设备及可读存储介质
CN111600957A (zh) * 2020-05-20 2020-08-28 中国工商银行股份有限公司 文件传输方法、装置、***和电子设备
CN111818145B (zh) * 2020-06-29 2021-03-23 苏州好玩友网络科技有限公司 一种文件传输方法、装置、***、设备及存储介质
CN112398905B (zh) * 2020-09-28 2022-05-31 联想(北京)有限公司 一种节点及信息同步方法
CN112328560B (zh) * 2020-11-25 2024-06-18 北京无线电测量研究所 一种文件调度方法和***
CN115102946B (zh) * 2022-06-16 2023-10-24 平安银行股份有限公司 一种基于文件传输的配置方法及***
CN117560368B (zh) * 2024-01-09 2024-04-12 北京华云安信息技术有限公司 基于多级节点网络的文件传输方法和***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185623B1 (en) * 1997-11-07 2001-02-06 International Business Machines Corporation Method and system for trivial file transfer protocol (TFTP) subnet broadcast
CN1777110A (zh) * 2005-11-25 2006-05-24 杭州华为三康技术有限公司 一种集群设备批量传输文件的方法及文件传输设备
CN101355490A (zh) * 2007-07-25 2009-01-28 华为技术有限公司 消息路由方法、***和节点设备
CN101902388A (zh) * 2009-05-26 2010-12-01 北京风格九州文化传播有限公司 可扩充的多级排序资源快速发现技术

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5776339B2 (ja) * 2011-06-03 2015-09-09 富士通株式会社 ファイル配布方法、ファイル配布システム、マスタサーバ、及びファイル配布プログラム
CN102394922A (zh) * 2011-10-27 2012-03-28 上海文广互动电视有限公司 分布式集群文件***及文件访问方法
KR101901266B1 (ko) * 2012-05-30 2018-09-20 삼성에스디에스 주식회사 파일 스토리지 클러스터간 병렬 파일 전송 시스템 및 방법
KR20140032542A (ko) * 2012-08-30 2014-03-17 삼성전자주식회사 무선 네트워크에서 푸시 서비스의 hearthbeat 주기 결정 방법 및 장치
CN103414761B (zh) * 2013-07-23 2017-02-08 北京工业大学 一种基于Hadoop架构的移动终端云资源调度方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185623B1 (en) * 1997-11-07 2001-02-06 International Business Machines Corporation Method and system for trivial file transfer protocol (TFTP) subnet broadcast
CN1777110A (zh) * 2005-11-25 2006-05-24 杭州华为三康技术有限公司 一种集群设备批量传输文件的方法及文件传输设备
CN101355490A (zh) * 2007-07-25 2009-01-28 华为技术有限公司 消息路由方法、***和节点设备
CN101902388A (zh) * 2009-05-26 2010-12-01 北京风格九州文化传播有限公司 可扩充的多级排序资源快速发现技术

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445951A (zh) * 2015-08-07 2017-02-22 中兴通讯股份有限公司 一种文件传输方法和装置

Also Published As

Publication number Publication date
US20180027048A1 (en) 2018-01-25
CN104092719A (zh) 2014-10-08
US9917884B2 (en) 2018-03-13
CN104092719B (zh) 2015-10-07

Similar Documents

Publication Publication Date Title
WO2015090245A1 (zh) 文件传输方法、装置及分布式集群文件***
US20160275123A1 (en) Pipeline execution of multiple map-reduce jobs
Wang et al. Optimizing load balancing and data-locality with data-aware scheduling
CN109150987B (zh) 基于主机层和容器层的两层式容器集群弹性扩容方法
CN104077199B (zh) 基于共享磁盘的高可用集群的隔离方法和***
JP5793690B2 (ja) インタフェース装置、およびメモリバスシステム
US11743333B2 (en) Tiered queuing system
JP5548829B2 (ja) 計算機システム、データ管理方法及びデータ管理プログラム
US11595474B2 (en) Accelerating data replication using multicast and non-volatile memory enabled nodes
JPH1049507A (ja) 並列計算機
TW201741901A (zh) 資料遷移方法和裝置
WO2019189963A1 (ko) 분산 클러스터 관리 시스템 및 그 방법
US20130031221A1 (en) Distributed data storage system and method
CN109639773A (zh) 一种动态构建的分布式数据集群控制***及其方法
CN102012944A (zh) 一种提供复制特性的分布式nosql数据库
EP3031172B1 (en) Managing data feeds
CN109218086B (zh) 一种交换网构建方法与***
CN106855869B (zh) 一种实现数据库高可用的方法、装置和***
JP2010044553A (ja) データ処理方法、クラスタシステム、及びデータ処理プログラム
CN116304390B (zh) 时序数据处理方法、装置、存储介质及电子设备
CN105450679A (zh) 进行数据云存储的方法及***
JP2024514467A (ja) 地理的に分散されたハイブリッドクラウドクラスタ
US10749957B2 (en) Method and apparatus for information management
JP2001156874A (ja) 通信制御システムとその制御方法
JP2010244469A (ja) 分散処理システム及び分散処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15729736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 15105657

Country of ref document: US

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.10.16)

122 Ep: pct application non-entry in european phase

Ref document number: 15729736

Country of ref document: EP

Kind code of ref document: A1