CN106100961A - A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method - Google Patents

A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method Download PDF

Info

Publication number
CN106100961A
CN106100961A CN201610580213.5A CN201610580213A CN106100961A CN 106100961 A CN106100961 A CN 106100961A CN 201610580213 A CN201610580213 A CN 201610580213A CN 106100961 A CN106100961 A CN 106100961A
Authority
CN
China
Prior art keywords
module
computing unit
routing table
task
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610580213.5A
Other languages
Chinese (zh)
Inventor
林铭杰
叶政晟
张彦彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou High Energy Computer Technology Co Ltd
Original Assignee
Guangzhou High Energy Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou High Energy Computer Technology Co Ltd filed Critical Guangzhou High Energy Computer Technology Co Ltd
Priority to CN201610580213.5A priority Critical patent/CN106100961A/en
Publication of CN106100961A publication Critical patent/CN106100961A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/52Queue scheduling by attributing bandwidth to queues
    • H04L47/522Dynamic queue service slot or variable bandwidth allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/52Queue scheduling by attributing bandwidth to queues
    • H04L47/527Quantum based scheduling, e.g. credit or deficit based scheduling or token bank

Abstract

The invention provides a kind of Direct Connect Architecture computer cluster based on infinite bandwidth, including main control unit, topology constructing module and calculating resource pool, wherein, described calculating resource pool includes that at least 2 computing units, described computing unit include infinite bandwidth adaptation module and route construction module;Described computing unit is connected with each other by infinite bandwidth network, communication between computing unit is without the communication interaction that can realize lossless calculated performance by switch, network delay is low, reduce the cost of group system operation maintenance, improve the reliability of group system;And set expandability provided by the present invention can be good, can be according to the demand of nonidentity operation amount, the arbitrarily number of computing unit in extension or reduction system.

Description

A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method
Technical field
The present invention relates to high-performance computer group system, calculate particularly to a kind of Direct Connect Architecture based on infinite bandwidth Group system and construction method.
Background technology
Computer cluster is a kind of computer system, and it is connected by one group of loose integrated computer software and/or hardware Picking up the evaluation work that the most closely cooperated, in some sense, they can be counted as a computer, cluster system Single computer in system is commonly referred to node, is generally connected by LAN.
HPCC is the one of computer cluster, uses and the different of distribution of computation tasks to cluster are calculated joint Put and improve computing capability, be mainly used in scientific algorithm and engineering calculation field.HPCC generally runs Concurrent application, such as based on MPI standard exploitation Parallel Computation.This class application program can realize multiple calculating Nodal parallel performs calculating task, calculates and generally has data exchange frequently and message transmission, therefore high-performance meter between node Calculating cluster and generally configure special calculating network to carry out these data exchanges, the performance calculating network can be to a great extent Affect the computational efficiency of concurrent program.
At present, computing cluster system uses fat tree topology structure mostly, carries out being connected in series (Indirect with switch Network, switch based), carry out data exchange through copper cable or optical cable.When group system does cross-node computing, thoroughly Crossing TCP/IP agreement, data enter switch through netting twine, and switch transmits data to correct node and completes communication, to complete Cross-node operation.But with write computer node number increase, between node, network service amplitude is necessarily significantly increased, therefore, for Accelerating point-to-point transmission call duration time and reduce delay, the demand of switch is necessarily synchronized to increase by system, in turn results in system overall Network environment is complicated, and system builds operation management cost to be increased.
In addition to such scheme, also having another kind of computing cluster system, it uses complete direct-connected topological structure, and this framework is not required to Want switch can realize the communication interaction of all calculating nodes.But this structure is typically only applicable to minisystem, because right For the group system with N number of calculating node, use complete direct-connected topological structure system to need to be equipped with N* (N-1) individual network interface card and connect Mouthful, so for large-scale cluster system, the framework difficulty of this structure is high, autgmentability is poor, management is inconvenient.
Summary of the invention
It is an object of the invention to overcome prior art not enough, it is provided that a kind of Direct Connect Architecture based on infinite bandwidth calculates collection Group's system and construction method, in system, the communication interaction of all computing units is without completing by mutual machine, and system is prone to structure Building, autgmentability is strong, is applicable to large-scale calculations cluster, and system have employed infinite bandwidth communication technology, meets cluster system System is for bandwidth and the demand of communication delay.
The present invention uses following technical scheme for achieving the above object:
On the one hand, the invention provides a kind of Direct Connect Architecture computer cluster based on infinite bandwidth, including topology Build module and calculate resource pool;Described calculating resource pool is connected with described topology constructing module;
Wherein, described calculating resource pool includes at least 2 computing units, and described computing unit passes through infinite bandwidth network phase Connect;
Described computing unit includes infinite bandwidth adaptation module and route construction module;
Described topology constructing module is used for obtaining sum and neighbours' number of each described computing unit of described computing unit, And draw maximum neighbours' number, and calculate network dimension according to described maximum neighbours' number, and according to described computing unit sum and net Network dimension generates at least one network topological diagram, and all described network topological diagrams are sent to described calculating resource pool;
Described infinite bandwidth adaptation module is for providing data transport service based on infinite bandwidth agreement, to realize each Data communication between described computing unit is mutual;
Described route construction module is used for obtaining all described network topological diagrams, and according to network topological diagram meter each described Calculate all possible communication path between this described computing unit and other described computing units, and generate complete trails routing table; Described route construction module is additionally operable to determine the routed path of actual survival in described complete trails routing table, i.e. can practical communication Routed path, and according to reality survival routed path generate communication routing table, described communication routing table is according to routed path Purpose IP address be grouped, and the routed path in each packet is carried out ascending sort according to the jumping figure of path process.
In an embodiment of the present invention, described Direct Connect Architecture computer cluster based on infinite bandwidth also includes master control Unit, described main control unit is connected with any one of computing unit;Described main control unit is used for obtaining task, and by described It is sent in the described computing unit being connected after task segmentation, then it is single to be assigned to other described calculating by computing unit this described Unit, described main control unit is additionally operable to initialize described computing unit.
In an embodiment of the present invention, described main control unit includes task acquisition module, task allocating module and initialization Module;Wherein, described task acquisition module is used for obtaining task, if described task allocating module is for being divided into described task Dry subtask, and be that computing unit is distributed in described subtask, described task allocating module is additionally operable to be sent to described subtask Calculating in resource pool, described initialization module, for distributing IP address for described computing unit, is additionally operable to initialize described topology Build module and described route construction module.
In an embodiment of the present invention, described main control unit also includes state read module and feedback module, described state Read module is for reading the duty of described computing unit, and is sent to described feedback module, and described feedback module is used for To the duty of the described computing unit that user feedback receives.
In an embodiment of the present invention, described main control unit also includes resource distribution module and resource adjusting module;
Described resource distribution module for arranging resource acquisition authority and distribution initial resource to getting of task;Described Resource adjusting module is for adjusting, according to the resource acquisition authority of each task, the resource that each task can be occupied.
In an embodiment of the present invention, described topology constructing module is by traveling through the IP address acquisition institute of described computing unit State the total and maximum neighbours' number of computing unit.
In an embodiment of the present invention, during described topology constructing module is arranged on described main control unit.
In an embodiment of the present invention, optionally, described main control unit be additionally operable to obtain user input computing unit total Several and maximum neighbours' number, and total for described computing unit and maximum neighbours' number is sent to described topology constructing module, described in open up Flutter structure module and generate network topological diagram according to receiving the total and maximum neighbours of computing unit.
In an embodiment of the present invention, described main control unit can be any one of computing unit.
In another embodiment of the present invention, the system that first aspect present invention is provided also includes total route construction mould Block, described total route construction module is connected with described calculating resource pool, described total route construction module also with described topology constructing Module is connected;
Described total route construction module is for obtaining the IP address of all described computing units, described total route construction module It is additionally operable to obtain all-network topological diagram, and generates between all computing units all possible logical according to described network topological diagram Letter path, and generate at least one complete trails routing table according to the IP address of initial calculation unit, and described complete trails is route Table is sent in the computing unit of correspondence, and the route construction module in described computing unit is according to the complete trails routing table received Determine the routed path of actual survival, i.e. can the routed path of practical communication, and generate according to the routed path of reality survival Communication routing table, described communication routing table is grouped according to the purpose IP address of routed path, and to the road in each packet Ascending sort is carried out according to the jumping figure of path process by path.
In an embodiment of the present invention, described computing unit also includes that processor, internal memory, local memory device, extension set Standby interface.
On the other hand, present invention also offers a kind of network topology map generalization method, comprise the steps:
Obtain node total number N in network and the neighbor node number of each node, take maximum neighbor node number M;
Calculate network dimensionality K, the logarithm that wherein K is is end M with 2, and round up;
Building at least one K and tie up network topological diagram, the most each node is all with 2KIndividual neighbor node is connected, and the dimension of maximum Degree nodes is not more than N-M+2.
In an embodiment of the present invention, described K dimension its coordinate of topological network meets:
0≤xi≤2Ni-1
xjMod2=xj+1mod2
Each node xiIt is connected to 2KIndividual neighbor node yi, yiCoordinate meet:
yi=(xi+1)mod2NiOr yi=(xi-1+2Ni)mod2Ni
Wherein, mod represents modulo operation, coordinate points xiRepresent any one node, N in i-th dimensioniRepresent the joint of i-th dimension degree Count, whereinK=log2M, and round up;max1≤i≤KNi≤N-M+2。
On the other hand, present invention also offers a kind of generation method of routing table, comprise the steps:
Selected start node and destination node, obtain all-network topological diagram;
According to all network topological diagrams got, calculate the described start node all paths to described destination node, And generate complete trails routing table;
Confirm the surviving path in complete trails routing table, and generate communication routing table;
Routed path in communication routing table is grouped according to the purpose IP address in path, and in each packet Routed path carries out ascending sort according to the jumping figure of path process.
Beneficial effects of the present invention: Direct Connect Architecture computing cluster system based on infinite bandwidth provided by the present invention, be The communication that in system, all computing units can reach lossless calculated performance not by the case of mutual machine, network delay is low Alternately, reduce the cost of group system operation maintenance, improve the reliability of group system;And system provided by the present invention Scalability is good, arbitrarily can extend or the number of computing unit in reduction system according to the demand of nonidentity operation amount.
Accompanying drawing explanation
Fig. 1 is the system structure schematic diagram in one embodiment of the invention;
Fig. 2 is the system structure schematic diagram in another embodiment of the present invention;
Fig. 3 is the system structure schematic diagram in another embodiment of the present invention;
Fig. 4 is the flow chart of Generating Network Topology Map provided by the present invention;
Fig. 5 is the flow chart of routing table production method provided by the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings and specific embodiment the present invention will be further described, illustrative examples therein and Illustrate only to be used for explaining the present invention, but not as a limitation of the invention.
In first embodiment of the invention, as it is shown in figure 1, be the system structure schematic diagram of the present invention, a kind of based on nothing The Direct Connect Architecture computer cluster of limit bandwidth, including main control unit 100, topology constructing module 200 and calculating resource pool;
Wherein, described calculating resource pool includes that at least 2 computing units 300, all computing units 300 pass through infinite bandwidth Network is connected with each other;
Topology constructing module 200 is used for obtaining sum and neighbours' number of each described computing unit of computing unit 300, and Draw maximum neighbours' number, calculate network dimension according to described maximum neighbours' number, and according to described computing unit 300 sum and network Dimension generates at least one network topological diagram, and all described network topological diagrams are sent to described calculating resource pool;
Computing unit 300 includes infinite bandwidth adaptation module 310 and route construction module 320;Wherein, infinite bandwidth is adaptive Module 310 is for providing data transport service based on infinite bandwidth agreement, to realize the data between each computing unit 300 Communication interaction;
Route frame modules 320 is used for obtaining all described network topological diagrams, and according to network topological diagram meter each described Calculate this computing unit 300 and arrive all possible routed path between other computing units 300, and generate complete trails routing table;Road By building module 320 and be additionally operable to determine in described complete trails routing table the path of actual survival, i.e. can the route of practical communication Path, and generate communication routing table according to the routed path of reality survival, described communication routing table is according to the purpose of routed path IP address is grouped, and according to the jumping figure of path process, the routed path in each packet is carried out ascending sort.
System provided by the present invention also includes main control unit 100, described main control unit 100 and any one of calculating Unit 300 is connected;Main control unit 100 is used for obtaining task, and is sent to the meter being connected after getting of task being split Calculating in unit 300, then be assigned in other computing units 300 by this computing unit 300, described main control unit 100 is additionally operable to Initialize described computing unit 300;
In second embodiment of the invention, as in figure 2 it is shown, topology constructing module 200 is arranged in main control unit 100, Main control unit 100 also includes task acquisition module 110, task allocating module 120, initialization module 130, state read module 140, feedback module 150;
Wherein, task acquisition module 110 is for obtaining the task that user issues, and task allocating module 120 will get Task is divided at least one subtask, and is the computing unit 300 of the concrete execution of each subtask distribution, task distribution mould All subtasks are sent in the computing unit 300 being connected by block 120, and computing unit 300 is intercepting the subtask of oneself correspondence After, forward remaining subtask to other computing units 300.
Initialization module 130, for distributing IP address for computing unit 300, is additionally operable to topology constructing module 200 and road Initialization directive is sent by building module 320;Concrete, initialization includes, topology constructing module 200 builds network topological diagram, Route construction module 320 builds routing table.
State read module 140 is for reading the duty of each computing unit 300, as memory usage, CPU use Rate, hard disk remaining space etc., and the duty read is fed back to user by feedback module 150, in order to user checks Calculate the working condition of resource pool.
In the embodiment of the present invention the first or second embodiment, according to user's request, described computing unit 300 also may be used Including processor, internal memory, local memory device, expansion equipment interface etc..When running for the first time, main control unit 100 sends initially Change instruction, distribute IP address for all computing units 300, and order topology constructing module 200 builds network topological diagram, order road Routing table is built by building module 320.
In the embodiment of the present invention the first or second embodiment, topology constructing module 200 is to connected computing unit 300 send communication bag, travel through the IP address of all computing units 300, and topology constructing module 200 obtains according to traversing result and calculates The total N of unit 300 and the neighboring units number of each computing unit 300, and take maximum neighboring units number M, single to maximum neighbours Unit number M takes the logarithm with 2 as the end, and rounds up, and obtains network dimension K, and generates at least according to sum N and network dimension K All described network topological diagrams are sent to calculate in resource pool by one network topological diagram;Wherein, the distance metric list in network Position, for jumping, is often a jumping through a via node in data communication process, and the distance between two computing units 300 is During zero jumping, then the two computing unit 300 neighboring units each other.
Concrete, after described topology constructing module 200 obtains the total N and maximum neighboring units number M of computing unit 300, Build a cartesian coordinate system, wherein, coordinate points xiRepresent any one node, N in i-th dimensioniRepresent the node of i-th dimension degree Number, wherein,K=log2M, and round up;max1≤i≤KNi≤N-M+2,
Coordinate xiMeet:
0≤xi≤2Ni-1
xjMod2=xj+1mod2
Each node xiIt is connected to 2KIndividual neighbor node yi, yiCoordinate meet:
yi=(xi+1)mod2NiOr yi=(xi-1+2Ni)mod2Ni
According to above-mentioned formula, topology constructing module 200 can build at least one K and tie up network topology (N1×N2×……× NK), wherein, the arbitrary node in all-network topology is all with 2KIndividual neighbor node is connected, and the dimension nodes of maximum is not more than N-M+2, and node total number is not less than N;
Route construction module 320 obtains all described network topological diagrams, and according to network topological diagram each described in terms of this Calculating unit is all possible routed path that start element calculates other computing units, and is written into complete trails routing table In;Route construction module 320 sends communication acknowledgement bag, to obtain actual depositing according to the routed path of record in complete trails routing table The routed path lived, i.e. can the routed path of practical communication, and generate communication routing table according to the routed path of reality survival, Described communication routing table is grouped according to the purpose IP address of routed path, and to the routed path in each packet according to road The jumping figure of footpath process carries out ascending sort.
When needs communicate, computing unit 300 from top to bottom selects routed path to communicate according to purpose IP address, when When selected path failure cannot communicate, next routed path is selected to communicate, to ensure that the data between computing unit are handed over Mutually.
In an embodiment of the present invention, alternatively, computing unit 300 sum N and maximum neighbor node number M are passed through by user Main control unit 100 inputs, and computing unit 300 sum N and maximum neighbours number M is sent to topology constructing module by main control unit 100 200。
In an embodiment of the present invention, described main control unit also includes resource distribution module and resource adjusting module;
Wherein, described resource distribution module for arranging resource acquisition authority and the initial money of distribution to getting of task Source, described initial resource includes minimum resources and flexible resource;
Described resource adjusting module can actual account for for adjusting each task institute according to the resource acquisition authority of each task Some flexible resources.
In the concrete application scenarios of the present invention one, system provided by the present invention is for running the teaching management system of school System;Wherein, including 16 computing units, each computing unit has 2 and processes core, 4G internal memory, 250G solid state hard disc, i.e. counts Calculate resource pool and have 32 process cores, 64G internal memory and 4TB memory space.Task mainly includes student status management, curricula-variable pipe Reason, teaching schedule management system, school's personnel management and examination management, user is each by the resource distribution module of main control unit 100 Individual task divides resource and sets authority, as respectively distributed 8 process for curricula-variable management, examination management and teaching schedule management system task Core, 16G internal memory and 500G memory space, and it is set to general resource acquisition authority;For school's personnel management and student status management Task is respectively distributed 4 and is processed core, 8G internal memory and 1250G memory space, and is set to the highest resource acquisition authority;Arrange simultaneously The minimum resources of all tasks is configured to 2 and processes core, 4G internal memory and 250G memory space, and remaining configures for flexible resource. When certain mission requirements amount increases suddenly, resource adjusting module adjusts each task according to the resource acquisition authority of this task Flexible resource configures, and e.g., at the beginning of the new term begins, the demand of curricula-variable management role increases severely, and resource adjusting module judges that curricula-variable task is General resource acquisition authority, therefore, will be both examination management and the bullet of teaching schedule management system task of general resource acquisition authority Property resource distributes to curricula-variable management role, for belonging to school's personnel management and the student status management task of the highest resource acquisition authority Shared resource then not adjusts;When new life enters a school, the demand of student status management task increases, and resource adjusting module judges to learn Nationality management role is the highest resource acquisition authority, therefore, first the curricula-variable of general resource acquisition authority is managed, examination management and The flexible resource of teaching schedule management system task distributes to student status management task, when the resource requirement having not been met student status management task Time, then the flexible resource of school's personnel management task is distributed to student status management task.
In an embodiment of the present invention, during main control unit 100 may be disposed at one of them computing unit 300.
In third embodiment of the invention, as described in Figure 3, provided in the present invention the first or second embodiment it is System also includes total route construction module 400;Total route construction module 400 respectively with topology constructing module 200 and calculate resource pool It is connected;The all-network topological diagram of generation is sent to total route construction module 400, total route construction by topology constructing module 200 Module 400 sends the ergodic communication bag IP address with all computing units 300 of acquisition to computing unit 300, and according to each institute State network topological diagram and calculate routed path possible between each computing unit 300, and according to the IP address of initial calculation unit Generate at least one complete trails routing table;All complete trails routing tables are sent to calculate resource pool by total route construction module 400 In, the computing unit 300 in resource pool obtains the complete trails routing table using local IP address as initial address, and by remaining Complete trails routing table is transmitted to other computing units 300;Route construction module 320 is remembered according in the complete trails routing table got The routed path of record sends communication acknowledgement bag, to obtain the routed path of reality survival, i.e. can the routed path of practical communication, And generating communication routing table according to the routed path of reality survival, described communication routing table is according to the purpose IP address of routed path It is grouped, and the routed path in each packet is carried out ascending sort according to the jumping figure of path process.When needs communicate, Computing unit 300 from top to bottom selects routed path to communicate according to purpose IP address, when selected path failure cannot communicate Time, select next routed path to communicate, to ensure the data interaction between computing unit.
In the embodiment of third embodiment of the invention, according to user's request, described computing unit 300 may also include place Reason device, internal memory, local memory device, expansion equipment interface etc..When running for the first time, main control unit 100 sends initialization directive, Distribute IP address for all computing units 300, and order topology constructing module 200 builds network topological diagram, order always route structure Modeling block 400 builds complete trails routing table, order route construction module 320 builds communication routing table.
As shown in Figure 4, present invention also offers a kind of network topology map generalization method, comprise the steps:
S110: obtain node total number N in network and the neighbor node number of each node, takes maximum neighbor node number M;
S120: calculate network dimension K, the logarithm that wherein K is is end M with 2, and round up;
S130: building at least one K and tie up topological network, the most each node is all with 2KIndividual neighbor node is connected, and maximum Dimension nodes be not more than N-M+2.
In an embodiment of the present invention, above-mentioned steps uses system provided by the present invention to complete, specifically by topology constructing Module 200 performs;
Concrete, topology constructing module 200 sends communication bag to connected computing unit 300, travels through all computing units The IP address of 300, topology constructing module 200 obtains the total N of computing unit 300 and each computing unit according to traversing result The neighboring units number of 300, and take maximum neighboring units number M;Wherein, when the distance between two computing units 300 is a jumping, Then the two computing unit 300 neighboring units each other;Maximum neighboring units number M is taken the logarithm with 2 as the end, and rounds up, Obtain network dimension K, and generate at least one network topological diagram, by all described network topologies according to sum N and network dimension K Figure is sent to calculate in resource pool.
Concrete, topology constructing module 200 builds K by the following method and ties up network topology (N1×N2×……×NK): Build a cartesian coordinate system, wherein, coordinate points xiRepresent any one node, N in i-th dimensioniRepresent the node of i-th dimension degree Number, whereinK=log2M, and round up;max1≤i≤KNi≤N-M+2,
Coordinate xiMeet:
0≤xi≤2Ni-1
xjMod2=xj+1mod2
Each node xiIt is connected to 2KIndividual neighbor node yi, yiCoordinate meet:
yi=(xi+1)mod2NiOr yi=(xi-1+2Ni)mod2Ni
Network topology (N is tieed up according at least one K constructed by above-mentioned formula1×N2×……×NK), in its network arbitrarily Node is all with 2KIndividual neighbor node is connected, and the dimension nodes of maximum is not more than N-M+2, and node total number is not less than N;
In an embodiment of the present invention, computing unit 300 sum N and maximum neighbor node number M are passed through master control list by user Unit 100 input, computing unit 300 sum N and maximum neighbours number M is sent to topology constructing module 200 by main control unit 100.
As it is shown in figure 5, present invention also offers a kind of generation method of routing table, comprise the steps:
S210: selected start node and destination node, obtain all-network topological diagram;
S220: according to all network topological diagrams got, calculates described start node owning to described destination node Path, and generate complete trails routing table;
S230: confirm the surviving path in complete trails routing table, and generate communication routing table;
S240: the routed path in communication routing table is grouped according to the purpose IP address in path, and to each point Routed path in group carries out ascending sort according to the jumping figure of path process.
In an embodiment of the present invention, above-mentioned steps uses the system of the present invention the first or second embodiment to complete, tool Body is performed by the route construction module 320 of each computing unit 300;
Concrete, route construction module 320 obtains at least one network topological diagram, and according to network topological diagram each described Calculate with the IP address of this computing unit for initial address to institute's likely routed path of each other computing units, and by it In write complete trails routing table, route construction module 320 sends communication acknowledgement according to the path of the record in complete trails routing table Bag, to obtain the routed path of reality survival, i.e. can the routed path of practical communication, and according to the routed path of reality survival Generating communication routing table, described communication routing table is grouped according to the purpose IP address of routed path, and in each packet Routed path carry out ascending sort according to the jumping figure of path process.
In an alternative embodiment of the invention, above-mentioned steps use third embodiment of the invention system complete, specifically by Total route construction module 400 and route construction module 320 perform jointly, and wherein step S210 and S220 are in total route construction module Completing in 400, step S230 and S240 are completed by building module 320;
Concrete, total route construction module 400 obtains all-network topological diagram, and it is logical to send traversal to computing unit 300 Letter bag is to obtain the IP address of all computing units 300, and calculates each computing unit 300 according to network topological diagram each described Between possible routed path, and generate at least one complete trails routing table according to the IP address of initial calculation unit;Total route Build module 400 to be sent to all complete trails routing tables calculate in resource pool, the route construction module of each computing unit 300 320 intercept the complete trails routing table using local IP address as initial address, and remaining complete trails routing table is transmitted to it His computing unit 300, route construction module 320 sends communication acknowledgement bag according to the path of the record in complete trails routing table, with Obtain the routed path of reality survival, i.e. can the routed path of practical communication, and generate according to the routed path of reality survival Communication routing table, described communication routing table is grouped according to the purpose IP address of routed path, and to the road in each packet Ascending sort is carried out according to the jumping figure of path process by path.
Obviously, above-described embodiment is only used to clearer expression technical solution of the present invention example, rather than right The restriction of embodiment of the present invention.To those skilled in the art, can also be made other on the basis of the above description The change of multi-form or variation, without departing from the inventive concept of the premise, these broadly fall into protection scope of the present invention.Cause The protection domain of this patent of the present invention should be as the criterion with claims.

Claims (10)

1. a Direct Connect Architecture computer cluster based on infinite bandwidth, it is characterised in that include topology constructing module and Calculate resource pool;Described calculating resource pool is connected with described main control unit and described topology constructing module respectively;
Wherein, described calculating resource pool includes at least 2 computing units, and described computing unit is interconnected mutually by infinite bandwidth network Connect;
Described computing unit includes infinite bandwidth adaptation module and route construction module;
Described topology constructing module is for obtaining sum and neighbours' number of each described computing unit of described computing unit, and obtains Go out maximum neighbours' number, and calculate network dimension according to described maximum neighbours' number, and according to the sum of described computing unit and described Network dimension generates at least one network topological diagram, and all described network topological diagrams are sent to described calculating resource pool;
Described infinite bandwidth adaptation module is for providing data transport service based on infinite bandwidth agreement, to realize described in each Data communication between computing unit is mutual;
Described route construction module is used for obtaining all described network topological diagrams, and calculates this according to network topological diagram each described All possible communication path between described computing unit and other described computing units, and generate complete trails routing table;Described Route construction module is additionally operable to determine the routed path of actual survival in described complete trails routing table, and according to the road of reality survival Being communicated routing table by coordinates measurement, described communication routing table is grouped according to the purpose IP address of routed path, and to each Routed path in packet carries out ascending sort according to the jumping figure of path process.
2. Direct Connect Architecture computer cluster based on infinite bandwidth as claimed in claim 1, it is characterised in that also include Main control unit, described main control unit is connected with any one computing unit;
Wherein, described main control unit is used for obtaining task, and will be sent to the described computing unit being connected after the segmentation of described task In, then it being assigned to other described computing units by computing unit this described, described main control unit is additionally operable to initialize described meter Calculate unit.
3. Direct Connect Architecture computer cluster based on infinite bandwidth as claimed in claim 2, it is characterised in that described master Control unit includes task acquisition module, task allocating module and initialization module;
Wherein, described task acquisition module is used for obtaining task, if described task allocating module is for being divided into described task Dry subtask, and be that computing unit is distributed in described subtask, described task allocating module is additionally operable to be sent to described subtask Calculating in resource pool, described initialization module, for distributing IP address for described computing unit, is additionally operable to initialize described topology Build module and described route construction module.
4. Direct Connect Architecture computer cluster based on infinite bandwidth as claimed in claim 1, it is characterised in that described master Control unit also includes state read module and feedback module, and described state read module is for reading the work of described computing unit State, and it is sent to described feedback module, the work of the described feedback module described computing unit for receiving to user feedback Make state.
5. Direct Connect Architecture computer cluster based on infinite bandwidth as claimed in claim 1, it is characterised in that at this In a bright embodiment, described main control unit also includes resource distribution module and resource adjusting module;
Described resource distribution module for arranging resource acquisition authority and distribution initial resource to getting of task;Described resource Adjusting module is for adjusting, according to the resource acquisition authority of each task, the resource that each task can be occupied.
6. Direct Connect Architecture computer cluster based on infinite bandwidth as claimed in claim 1, it is characterised in that described master Control unit may be disposed in any one of computing unit.
7. a Direct Connect Architecture computer cluster based on infinite bandwidth, it is characterised in that include as in claim 1-6 Arbitrary described Direct Connect Architecture computer cluster based on infinite bandwidth, also includes total route construction module, described total road Being connected with described calculating resource pool by building module, described total route construction module is also connected with described topology constructing module;
Described total route construction module is for obtaining the IP address of all described computing units, and described total route construction module is also used In obtaining all-network topological diagram, and calculate the communication path between each computing unit according to described network topological diagram, and press IP address according to initial calculation unit generates at least one complete trails routing table, and described total route construction module is additionally operable to described Complete trails routing table is sent to calculate in resource pool.
8. a network topology map generalization method, it is characterised in that include
Obtain node total number N in network and the neighbor node number of each node, take maximum neighbor node number M;
Calculate network dimension K, the logarithm that wherein K is is end M with 2, and round up;
Building at least one K and tie up network topological diagram, the most each node is all with 2KIndividual neighbor node is connected, and the dimension joint of maximum Count no more than N-M+2.
9. network topology map generalization method as claimed in claim 8, it is characterised in that described K ties up the every of network topological diagram Individual node xiIt is connected to 2KIndividual neighbor node yi
Wherein, xiCoordinate meet:
0≤xi≤2Ni-1
yiCoordinate meet:
yi=(xi+1)mod2NiOr yi=(xi-1+2Ni)mod2Ni
Mod represents modulo operation, coordinate points xiRepresent any one node, N in i-th dimensioniRepresent the nodes of i-th dimension degree,K=log2M, and round up;max1≤i≤KNi≤N-M+2。
10. the generation method of a routing table, it is characterised in that include
Selected start node and destination node, obtain all-network topological diagram;
According to all network topological diagrams got, calculate the described start node all paths to described destination node, and raw Become complete trails routing table;
Confirm the surviving path in complete trails routing table, and generate communication routing table;
Routed path in communication routing table is grouped according to the purpose IP address in path, and to the route in each packet Path carries out ascending sort according to the jumping figure of path process.
CN201610580213.5A 2016-07-21 2016-07-21 A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method Pending CN106100961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610580213.5A CN106100961A (en) 2016-07-21 2016-07-21 A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610580213.5A CN106100961A (en) 2016-07-21 2016-07-21 A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method

Publications (1)

Publication Number Publication Date
CN106100961A true CN106100961A (en) 2016-11-09

Family

ID=57449158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610580213.5A Pending CN106100961A (en) 2016-07-21 2016-07-21 A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method

Country Status (1)

Country Link
CN (1) CN106100961A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018119830A1 (en) * 2016-12-29 2018-07-05 中国科学院计算技术研究所 Method and system for constructing task processing path

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1921437A (en) * 2006-08-04 2007-02-28 上海红神信息技术有限公司 Inside and outside connecting network topology framework and parallel computing system for self-consistent expanding the same
CN101309201A (en) * 2007-05-14 2008-11-19 华为技术有限公司 Route processing method, routing processor and router
CN101727512A (en) * 2008-10-17 2010-06-09 中国科学院过程工程研究所 General algorithm based on variation multiscale method and parallel calculation system
CN102790698A (en) * 2012-08-14 2012-11-21 南京邮电大学 Large-scale computing cluster task scheduling method based on energy-saving tree
CN103152397A (en) * 2013-02-06 2013-06-12 浪潮电子信息产业股份有限公司 Method for designing multi-control storage system
CN104243320A (en) * 2014-09-10 2014-12-24 珠海市君天电子科技有限公司 Method and device for optimizing network access paths
CN104283789A (en) * 2014-09-19 2015-01-14 深圳市腾讯计算机***有限公司 Routing convergence method and system
CN206100022U (en) * 2016-07-21 2017-04-12 广州高能计算机科技有限公司 It calculates cluster system directly to link framework based on infinite bandwidth

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1921437A (en) * 2006-08-04 2007-02-28 上海红神信息技术有限公司 Inside and outside connecting network topology framework and parallel computing system for self-consistent expanding the same
CN101309201A (en) * 2007-05-14 2008-11-19 华为技术有限公司 Route processing method, routing processor and router
CN101727512A (en) * 2008-10-17 2010-06-09 中国科学院过程工程研究所 General algorithm based on variation multiscale method and parallel calculation system
CN102790698A (en) * 2012-08-14 2012-11-21 南京邮电大学 Large-scale computing cluster task scheduling method based on energy-saving tree
CN103152397A (en) * 2013-02-06 2013-06-12 浪潮电子信息产业股份有限公司 Method for designing multi-control storage system
CN104243320A (en) * 2014-09-10 2014-12-24 珠海市君天电子科技有限公司 Method and device for optimizing network access paths
CN104283789A (en) * 2014-09-19 2015-01-14 深圳市腾讯计算机***有限公司 Routing convergence method and system
CN206100022U (en) * 2016-07-21 2017-04-12 广州高能计算机科技有限公司 It calculates cluster system directly to link framework based on infinite bandwidth

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018119830A1 (en) * 2016-12-29 2018-07-05 中国科学院计算技术研究所 Method and system for constructing task processing path

Similar Documents

Publication Publication Date Title
CN108566659B (en) 5G network slice online mapping method based on reliability
CN106101262A (en) A kind of Direct Connect Architecture computing cluster system based on Ethernet and construction method
CN104375882B (en) The multistage nested data being matched with high-performance computer structure drives method of calculation
Prisacari et al. Bandwidth-optimal all-to-all exchanges in fat tree networks
CN107836001A (en) Convolutional neural networks on hardware accelerator
CN102486739A (en) Method and system for distributing data in high-performance computer cluster
Zhao et al. Joint VM placement and topology optimization for traffic scalability in dynamic datacenter networks
Gong et al. Revenue-driven virtual network embedding based on global resource information
CN105049353A (en) Method for configuring routing path of business and controller
CN108111335A (en) A kind of method and system dispatched and link virtual network function
Chen et al. Tology-aware optimal data placement algorithm for network traffic optimization
JP6809360B2 (en) Information processing equipment, information processing methods and programs
CN105391651A (en) Virtual optical network multilayer resource convergence method and system
Pearce et al. One quadrillion triangles queried on one million processors
Wolfe et al. Preliminary performance analysis of multi-rail fat-tree networks
Navaridas et al. Reducing complexity in tree-like computer interconnection networks
Filelis-Papadopoulos et al. Towards simulation and optimization of cache placement on large virtual content distribution networks
El-Zoghdy A hierarchical load balancing policy for grid computing environment
Pascual et al. Optimization-based mapping framework for parallel applications
CN206100022U (en) It calculates cluster system directly to link framework based on infinite bandwidth
CN102404409A (en) Equivalent cloud network system based on optical packet switch
CN106100961A (en) A kind of Direct Connect Architecture computing cluster system based on infinite bandwidth and construction method
Marinakis et al. A hybrid discrete artificial bee colony algorithm for the multicast routing problem
Gaffour et al. A new congestion-aware routing algorithm in network-on-chip: 2D and 3D comparison
US20230094933A1 (en) Connecting processors using twisted torus configurations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161109

WD01 Invention patent application deemed withdrawn after publication