CN107592368B - Distributed data synchronous routing method, storage medium, device and system - Google Patents

Distributed data synchronous routing method, storage medium, device and system Download PDF

Info

Publication number
CN107592368B
CN107592368B CN201710936634.1A CN201710936634A CN107592368B CN 107592368 B CN107592368 B CN 107592368B CN 201710936634 A CN201710936634 A CN 201710936634A CN 107592368 B CN107592368 B CN 107592368B
Authority
CN
China
Prior art keywords
storage node
storage
same
data transmission
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710936634.1A
Other languages
Chinese (zh)
Other versions
CN107592368A (en
Inventor
张梦涵
陈少杰
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201710936634.1A priority Critical patent/CN107592368B/en
Publication of CN107592368A publication Critical patent/CN107592368A/en
Application granted granted Critical
Publication of CN107592368B publication Critical patent/CN107592368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Multi Processors (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method, a storage medium, equipment and a system for selecting a distributed data synchronous route, which relate to the field of distributed storage, and the method comprises the following steps of defining data transmission distances from short to long in sequence: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room; newly building a first set and a second set, adding a storage node connected with a client in the first set, and adding all other storage nodes to be synchronized into the second set; and selecting a storage node which is closest to the newly added storage node in the first set in data transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting a storage node from the second set again to be added into the first storage node to be connected according to the rule, and so on until the storage node in the second set is selected to be empty. The invention can effectively save the data synchronization time.

Description

Distributed data synchronous routing method, storage medium, device and system
Technical Field
The invention relates to the field of distributed storage, in particular to a method, a storage medium, equipment and a system for selecting a distributed data synchronous route.
Background
In the field of distributed storage, a distributed storage system stores multiple copies of the same data copy, specifically, after a plurality of storage nodes are elected by a central node, a data copy is written into one of the storage nodes by a client, and then the data copy is pushed to other nodes by the storage node to complete the storage of data.
However, different storage nodes may be located in different servers and connected to different switches, and different servers are located in different rooms, so that a proper routing manner needs to be selected to improve the synchronization efficiency of the copies among different storage nodes.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a distributed data synchronization routing method which can effectively save the data synchronization time.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
defining the data transmission distance from short to long as: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room;
newly building a first set and a second set, adding a storage node connected with a client in the first set, and adding all other storage nodes to be synchronized into the second set;
and selecting a storage node which is closest to the newly added storage node in the first set in data transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting a storage node from the second set again to be added into the first set to be connected according to the rule, and so on until the storage node in the second set is selected to be empty.
On the basis of the technical proposal, the device comprises a shell,
one machine room comprises at least one cabinet, and one cabinet comprises at least one memory;
the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same.
On the basis of the technical proposal, the device comprises a shell,
the client is used for generating data to be stored;
the client is connected with a memory, and the client is connected with one storage node in the memory.
On the basis of the technical proposal, the device comprises a shell,
at least one storage node is selected from the second set each time;
and when a plurality of storage nodes are selected from the second set, the selected storage nodes are all connected with only one newly added storage node in the first set.
On the basis of the technical scheme, when a plurality of storage nodes are newly added in the first set, the storage node connected with the storage node selected in the second set is selected as the storage node with the minimum depth and the maximum connection number in all the newly added storage nodes.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method described above.
The invention also provides a distributed data synchronous routing device, which comprises a storage device and a processor, wherein the storage device is stored with a computer program running on the processor, and the processor executes the computer program to realize the method.
The invention also provides a distributed data synchronous routing system, which comprises:
the preset module is used for defining the data transmission distance from short to long as: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room;
the creating module is used for creating a first set and a second set, adding storage nodes connected with the client into the first set, and adding all other storage nodes to be synchronized into the second set;
and the execution module is used for selecting a storage node which is closest to the newly added storage node in the first set in transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting the storage node from the second set again to be added into the first set to be connected according to the rule, and so on until the storage node in the second set is selected to be empty.
On the basis of the technical proposal, the device comprises a shell,
one machine room comprises at least one cabinet, and one cabinet comprises at least one memory;
the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same.
On the basis of the technical proposal, the device comprises a shell,
the client is used for generating data to be stored; the client is connected with a memory, and the client is connected with one storage node in the memory.
Compared with the prior art, the invention has the advantages that: the data transmission distances among the storage nodes at different positions are defined, accordingly, a tree structure connection mode is adopted, the storage nodes in one set are selected to be connected with the storage nodes in the other set by utilizing set grouping according to the distance of the data transmission distances, the sum of the data transmission distances among all the storage nodes is minimum, the synchronization of the data among the storage nodes is accelerated, the concept of the data transmission distances among the storage nodes is abstracted, the routing circuit is only dependent on the distance among the storage nodes, and if other attributes are added to the subsequent storage nodes, the data transmission distance mode is redefined, the selection process of the routing circuit is unchanged, and the problem of a universal scene is effectively solved.
Drawings
Fig. 1 is a flowchart of a method for synchronous routing of distributed data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an example of an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a distributed data synchronous routing device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, an embodiment of the present invention provides a distributed data synchronous routing method, which is used for selecting a routing line among storage nodes, transmitting data among the storage nodes through the selected routing line, and further backing up the data. The distributed data synchronous routing method of the embodiment of the invention comprises the following steps:
s1: defining the data transmission distance from short to long as: in the same storage, in the same cabinet and in the same machine room, that is, the data transmission distances between the storage nodes in the same storage are the same, the data transmission distances between the storage nodes in the same cabinet are the same, and the data transmission distances between the cabinets in the same machine room are the same, which is consistent with the actual circuit arrangement condition, data need to be transmitted from one storage node to another storage node in the synchronization process between the storage nodes. At least one storage node is built on the memory. In a distributed storage system, a plurality of storages, cabinets and rooms are included, wherein at least one cabinet is included in one room, one cabinet comprises at least one storage, and one storage comprises at least one storage node.
In one implementation mode, in order to distinguish each storage node conveniently, 4 tags are added to each storage node, namely a machine room ID, a cabinet ID, a storage ID and a service ID, the machine room ID represents the number of a machine room where the storage node is located, different machine rooms are in a connection state, the cabinet ID represents the number of a cabinet where the storage node is located, different cabinets in the same machine room are connected through a switch, the storage ID represents the number of a storage where the storage node is located, the service ID represents the number of the storage node on the storage, and the numbers of different storage nodes on the same storage are different, so that the machine room, the cabinet and the storage where the storage node is located can be obtained visually according to tag information of the storage node.
S2: and newly building a first set and a second set, adding storage nodes connected with the client in the first set, and adding all other storage nodes to be subjected to data synchronization in the second set. The client is used for generating data to be stored; the client is connected with a memory, and the client is connected with a storage node in the memory. In the distributed storage system, in the synchronous backup process of data, a client generating data to be stored sends the data to one storage node of a storage connected with the client, and then the storage node synchronizes the data to other storage nodes to complete the synchronous backup of the data.
S3: and selecting a storage node which is closest to the newly added storage node in the first set in data transmission distance from the second set to add into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting a storage node from the second set again to add into the first storage node for connection according to the rule, and so on until the storage node in the second set is empty, and the storage nodes are in transmission connection to form a tree structure, wherein the sum of the data transmission distances among all the storage nodes is minimum. And when the number of the storage nodes selected from the second set is multiple, the selected multiple storage nodes are all connected with only one newly added storage node in the first set. And when the storage nodes meeting the requirements of minimum depth and maximum connection number are multiple, selecting one of the storage nodes.
Specifically, for the first selection process, the storage node connected to the client is the newly added storage node in the first set, the storage node with the data transmission distance closest to the newly added storage node in the first set is selected from the second set to be added to the first set, and the selected storage node is connected to the newly added storage node in the first set, and the storage node added to the first set replaces the storage node which is previously newly added to the first set, for example, the storage in which the newly added storage node in the first set is located also contains two storage nodes, the storage node selected from the second set for the first time is the two storage nodes, for example, if the storage in which the newly added storage node in the first set is located does not contain other storage nodes, but the storage node in which the newly added storage node in the first set is located contains storage nodes in other storages in the cabinet, the storage nodes selected from the second set for the first time are the storage nodes; and for the second selection process, selecting the storage node which has the closest data transmission distance with the newly added storage node in the first set from the second set again to add into the first set, connecting the selected storage node with the newly added storage node in the first set, and so on, and performing multiple selections until the storage nodes in the second set are selected.
The following describes the distributed data synchronous routing method according to an embodiment of the present invention with reference to a specific example.
For convenience of description, a data transmission distance between storage nodes in the same storage is defined as 10, a data transmission distance between storages in the same cabinet is defined as 100, a data transmission distance between cabinets in the same machine room is defined as 1000, and a data transmission distance between different machine rooms is defined as 10000.
The client C is located in a memory, and the memory has a storage node B1, the cabinet where the client C is located has another memory, the memory includes a storage node B2, a storage node B3 and a storage node B4, and the other computer room includes a memory, the memory includes a storage node B5. Then the distances between storage node B2, storage node B3, and storage node B4 are all 10, the distances between storage node B1 and any of storage node B2, storage node B3, and storage node B4 are 100, and since 10000 is much greater than 100, the distances between storage node B1, storage node B2, storage node B3, and storage node B4 and storage node B5 are all 10000.
Therefore, as shown in fig. 2, the client C is connected to the storage node B1, and since the distances between the storage node B2, the storage node B3 and the storage node B4 and the storage node B1 are the same, the storage node B2, the storage node B3 and the storage node B4 are all connected to the storage node B1, and then the distances between the storage node B5 and the storage node B2, the storage node B3 and the storage node B4 are all the same, so that the storage node B5 may be connected to the storage node B4.
The principle of the distributed data synchronous routing method of the embodiment of the invention is that firstly, the layout condition of the circuits among the devices in the machine room is combined, defining data transmission distances among storage nodes at different positions, wherein the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, the data transmission distances among the cabinets in the same machine room are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same, and accordingly, a tree-structure connection mode is adopted, and set grouping is utilized, selecting storage nodes in one set to be connected with storage nodes in the other set according to the distance of data transmission, the sum of data transmission distances among all storage nodes is minimized, and the synchronization of data among the storage nodes is accelerated. The concept of data transmission distance between storage nodes is abstracted, so that the routing line is calculated only by depending on the factor of the distance between the storage nodes, and simultaneously, a prim algorithm (a primum algorithm) can be improved to adapt to different data copy pushing scenes, the node attribute is well shielded, if a subsequent storage node is added with other attributes, the data transmission distance mode is redefined, the selection process of the routing line is still unchanged, and the problem of a general scene is effectively solved.
In addition, corresponding to the above method for selecting a synchronous route for distributed data, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for selecting a synchronous route for distributed data according to the above embodiments are implemented. The computer-readable storage medium includes various media that can store program codes, such as a usb disk, a removable hard disk, a ROM (Read-Only Memory), a RAM (Random access Memory), a magnetic disk, or an optical disk.
Referring to fig. 3, corresponding to the distributed data synchronous routing method, the present invention further provides a distributed data synchronous routing device, which includes a storage device and a processor, where the storage device stores a computer program running on the processor, and the processor implements the distributed data synchronous routing method according to the foregoing embodiments when executing the computer program.
The invention also provides a distributed data synchronous routing system based on the distributed data synchronous routing method, which comprises a preset module, a creation module and an execution module.
The preset module is used for defining the data transmission distance from short to long as: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room; the creating module is used for creating a first set and a second set, adding storage nodes connected with the client into the first set, and adding all other storage nodes to be synchronized into the second set; the execution module is used for selecting a storage node which is closest to the newly added storage node in the first set in transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting a storage node from the second set again to be added into the first storage node to be connected according to the rule, and so on until the storage node in the second set is selected to be empty.
One machine room comprises at least one cabinet, one cabinet comprises at least one memory, and one memory comprises at least one storage node; the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same. The client is used for generating data to be stored; the client is connected with a memory, and the client is connected with one storage node in the memory.
The principle of the distributed data synchronous routing system of the embodiment of the invention is that, in combination with the distribution condition of the lines among the devices in the machine room, the preset module defines data transmission distances among storage nodes at different positions, the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, the data transmission distances among the cabinets in the same machine room are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same, so that a tree structure connection mode is adopted, set grouping is utilized, the execution module selects the storage nodes in one set to be connected with the storage nodes in the other set according to the distance of the data transmission distance, the sum of data transmission distances among all storage nodes is minimized, and the synchronization of data among the storage nodes is accelerated. The concept of data transmission distance between storage nodes is abstracted, so that the routing line is calculated only by depending on the factor of the distance between the storage nodes, the prim algorithm can be improved to adapt to different data copy pushing scenes, the node attribute is well shielded, and if other attributes are added to the subsequent storage nodes, the data transmission distance mode is redefined, the selection process of the routing line is still unchanged, and the problem of a general scene is effectively solved.
The present invention is not limited to the above-described embodiments, and it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements are also considered to be within the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims (10)

1. A method for selecting a distributed data synchronous route is characterized by comprising the following steps:
defining the data transmission distance from short to long as: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room;
newly building a first set and a second set, adding a storage node connected with a client in the first set, and adding all other storage nodes to be synchronized into the second set;
and selecting a storage node which is closest to the newly added storage node in the first set in data transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting a storage node from the second set again to be added into the first set to be connected according to the rule, and so on until the storage node in the second set is selected to be empty.
2. The method of claim 1, wherein:
one machine room comprises at least one cabinet, and one cabinet comprises at least one memory;
the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same.
3. The method of claim 2, wherein:
the client is used for generating data to be stored;
the client is connected with a memory, and the client is connected with one storage node in the memory.
4. The method of claim 1, wherein:
at least one storage node is selected from the second set each time;
and when a plurality of storage nodes are selected from the second set, the selected storage nodes are all connected with only one newly added storage node in the first set.
5. The method of claim 1, wherein: and when a plurality of newly added storage nodes in the first set exist, selecting the storage node connected with the selected storage node in the second set as the storage node with the minimum depth and the maximum connection number in all the newly added storage nodes.
6. A computer-readable storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1 to 5.
7. A distributed data synchronous routing apparatus, comprising a storage device and a processor, the storage device having stored thereon a computer program running on the processor, characterized in that: the processor, when executing the computer program, implements the method of any of claims 1 to 5.
8. A distributed data synchronous routing system, comprising:
the preset module is used for defining the data transmission distance from short to long as: the method comprises the following steps that at least one storage node is built in the same memory, the same cabinet and the same machine room;
the creating module is used for creating a first set and a second set, adding storage nodes connected with the client into the first set, and adding all other storage nodes to be synchronized into the second set;
and the execution module is used for selecting a storage node which is closest to the newly added storage node in the first set in transmission distance from the second set to be added into the first set, connecting the selected storage node with a newly added storage node in the first set, selecting the storage node from the second set again to be added into the first set to be connected according to the rule, and so on until the storage node in the second set is selected to be empty.
9. The distributed data synchronous routing system of claim 8, wherein:
one machine room comprises at least one cabinet, and one cabinet comprises at least one memory;
the data transmission distances among the storage nodes in the same storage are the same, the data transmission distances among the storages in the same cabinet are the same, and the data transmission distances among the cabinets in the same machine room are the same.
10. The distributed data synchronous routing system of claim 9, wherein: the client is used for generating data to be stored; the client is connected with a memory, and the client is connected with one storage node in the memory.
CN201710936634.1A 2017-10-10 2017-10-10 Distributed data synchronous routing method, storage medium, device and system Active CN107592368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710936634.1A CN107592368B (en) 2017-10-10 2017-10-10 Distributed data synchronous routing method, storage medium, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710936634.1A CN107592368B (en) 2017-10-10 2017-10-10 Distributed data synchronous routing method, storage medium, device and system

Publications (2)

Publication Number Publication Date
CN107592368A CN107592368A (en) 2018-01-16
CN107592368B true CN107592368B (en) 2020-06-16

Family

ID=61052258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710936634.1A Active CN107592368B (en) 2017-10-10 2017-10-10 Distributed data synchronous routing method, storage medium, device and system

Country Status (1)

Country Link
CN (1) CN107592368B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491478A (en) * 2018-03-09 2018-09-04 深圳市瑞驰信息技术有限公司 A kind of data distribution method and system of follow-on distributed memory system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012042491A1 (en) * 2010-09-29 2012-04-05 Universita' Degli Studi Di Udine Method for direct access to information stored in nodes of a packet switching network
CN103634401A (en) * 2013-12-03 2014-03-12 北京京东尚科信息技术有限公司 Data copy storage method and terminal unit, and server unit
CN105933233A (en) * 2016-04-20 2016-09-07 乐视控股(北京)有限公司 Topology structure generation method and system of CDN network
CN106789632A (en) * 2017-02-25 2017-05-31 郑州云海信息技术有限公司 A kind of method of the node-routing of large-scale distributed storage system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012042491A1 (en) * 2010-09-29 2012-04-05 Universita' Degli Studi Di Udine Method for direct access to information stored in nodes of a packet switching network
CN103634401A (en) * 2013-12-03 2014-03-12 北京京东尚科信息技术有限公司 Data copy storage method and terminal unit, and server unit
CN105933233A (en) * 2016-04-20 2016-09-07 乐视控股(北京)有限公司 Topology structure generation method and system of CDN network
CN106789632A (en) * 2017-02-25 2017-05-31 郑州云海信息技术有限公司 A kind of method of the node-routing of large-scale distributed storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于P2P结构的广域分布式存储相关技术研究;杨磊;《中国博士学位论文全文数据库 信息科技辑》;20140815;全文 *

Also Published As

Publication number Publication date
CN107592368A (en) 2018-01-16

Similar Documents

Publication Publication Date Title
CN109634932B (en) Intelligent contract storage method and storage system
CN103503414B (en) A kind of group system calculating storage and merge
CN102801784B (en) A kind of distributed data storage method and equipment
CN103095687B (en) metadata processing method and device
JP6542909B2 (en) File operation method and apparatus
US10628050B2 (en) Data processing method and apparatus
CN104079614B (en) The method and system obtained in order for distributed post ordering system message
CN103986694B (en) Control method of multi-replication consistency in distributed computer data storing system
CN103714097A (en) Method and device for accessing database
EP3364310A1 (en) Data processing method and device
CN107657027B (en) Data storage method and device
US11398981B2 (en) Path creation method and device for network on chip and electronic apparatus
CN102202087A (en) Method for identifying storage equipment and system thereof
CN104301233A (en) Route access method, route access system and user terminal
CN103229480A (en) Data processing method, device and client in distributed storage system
CN106873902B (en) File storage system, data scheduling method and data node
CN104348793A (en) Storage server system and storage method for data information
CN107592368B (en) Distributed data synchronous routing method, storage medium, device and system
CN111104250B (en) Method, apparatus and computer readable medium for data processing
CN111200525A (en) Network shooting range scene re-engraving method and system, electronic equipment and storage medium
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
CN105162833B (en) Client management system and method applied to non-disk workstation
CN108255434A (en) Label management method, managing device and computer readable storage medium
CN109379223A (en) A kind of method and apparatus for realizing network interface card automated setting
CN104038566A (en) Virtual switching device address learning method, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant