CN113791368A - Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box - Google Patents
Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box Download PDFInfo
- Publication number
- CN113791368A CN113791368A CN202111065477.4A CN202111065477A CN113791368A CN 113791368 A CN113791368 A CN 113791368A CN 202111065477 A CN202111065477 A CN 202111065477A CN 113791368 A CN113791368 A CN 113791368A
- Authority
- CN
- China
- Prior art keywords
- server
- gpu box
- cpld chip
- frequency
- physical channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012545 processing Methods 0.000 title description 2
- 238000003780 insertion Methods 0.000 claims description 12
- 230000037431 insertion Effects 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 230000011664 signaling Effects 0.000 claims 1
- 238000011065 in-situ storage Methods 0.000 abstract 1
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- PXLYROINIXKFAW-UHFFFAOYSA-N 1-(3-bromophenyl)-2-(methylamino)propan-1-one Chemical compound CNC(C)C(=O)C1=CC=CC(Br)=C1 PXLYROINIXKFAW-UHFFFAOYSA-N 0.000 description 1
- OOJXMFNDUXHDOV-UHFFFAOYSA-N 4-bromo-n-methylcathinone Chemical compound CNC(C)C(=O)C1=CC=C(Br)C=C1 OOJXMFNDUXHDOV-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/50—Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
- G01R31/66—Testing of connections, e.g. of plugs or non-disconnectable joints
- G01R31/67—Testing the correctness of wire connections in electric apparatus or circuits
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/50—Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
- G01R31/54—Testing for continuity
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/50—Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
- G01R31/66—Testing of connections, e.g. of plugs or non-disconnectable joints
- G01R31/68—Testing of releasable connections, e.g. of terminals mounted on a printed circuit board
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a method and a device for automatically checking the misplugging of a cable interconnecting a server and a GPU box, wherein the method comprises the following steps: arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting the server and the GPU box, and respectively connecting two ends of each in-place signal line with a first CPLD chip of the GPU box and a second CPLD chip of the server; the first CPLD chip sends signals with different frequencies to the second CPLD chip through each in-situ signal line; and the BMC arranged on the server analyzes the actual signal frequency of each in-place signal wire received by the second CPLD chip, judges whether each actual signal frequency is the same as the corresponding preset signal frequency, further judges whether the interconnection cable is inserted by mistake or not, and positions the serial number of the inserted cable by mistake or not. The method and the system realize debugging and error positioning after the server and the GPU box are assembled, and can effectively identify error conditions of loose splicing, wrong splicing and missed splicing of cables.
Description
Technical Field
The invention belongs to the technical field of server production assembly error correction, and particularly relates to a method and a device for automatically checking misplugging of a server and a GPU box interconnection cable.
Background
With the rapid development of AI calculation, high-performance operation and artificial intelligence application, the wide application of heterogeneous operation equipment is promoted. In order to achieve powerful heterogeneous computing power and extended performance, CPU operations and GPU operations are generally divided into two separate devices, a server and a GPU box, which are connected by a PCIe high-speed cable.
In practical applications, each PCIe port has a bandwidth of x16, i.e., contains 16 sets of communication channels. Since there are 4 sets of communication lanes in each high-speed cable, 4 high-speed cables are required for each PCIe port. The GPU box has 4 PCIe X16 ports, i.e. 16 high-speed cables are required to connect to the server.
In the production and assembly link, the problems of wrong insertion, missing insertion and the like of the cable cannot be automatically detected during manual assembly. Often rely on the mode of artifical visual inspection, discern the problem and correct the condition that often takes place the cable order and insert the mistake. The problems of bandwidth reduction of PCIe products, communication incapability, non-correspondence between GPU system serial numbers and physical serial numbers and the like can be caused by the problems of wrong cable insertion, wrong cross-port insertion and the like in the same port, the problems are often judged as GPU board card function problems by mistake, inspectors needing special functions are screened and clarified, each cable of each device is inspected, the workload is large, and more human resources are consumed; the method completely depends on manual screening of inspectors, and has the risks of missed detection and wrong detection due to slight negligence; when the cable data of the plugging error is more, great efforts are often needed to adjust the wire sequence, and even after the cable is completely detached, the cable is reconnected.
Therefore, it is very necessary to provide a method and an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
The invention provides a method and a device for automatically checking the misplugging of an interconnection cable of a server and a GPU box, aiming at solving the technical problems that whether the misplugging, the misplugging and the like of the cable cannot be automatically detected when the server and the GPU box are assembled in the prior art, a large amount of manpower is consumed by depending on a manual visual inspection mode, the missing inspection and the misplugging risks are high, and the wrong cable sequence is inconvenient to adjust.
In a first aspect, the present invention provides a method for automatically checking the misplugging of a cable interconnecting a server and a GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
Further, the step S1 specifically includes the following steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp. The server and the GPU box monitor cables connected with each pair of PCIe ports through respective CPLD chips.
Further, the step S2 specifically includes the following steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
and S22, the first CPLD chip stores the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table. The physical channels are distinguished by transmitting frequency signals of different frequency square waves.
Further, the step S3 specifically includes the following steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel. The BMC acquires frequency value information received by the server side at regular time, compares the frequency value information with a frequency value sent by the GPU box side, and further positions serial numbers of PCIe ports which are spliced between the server side and the GPU box side.
Further, the step S31 specifically includes the following steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
and S313, the second CPLD chip calculates the signal frequency value F acquired by each second GPIO pin to be 1/(T _ H + T _ L). And the second CPLD chip calculates the signal frequency value acquired by the second GPIO pin according to the square wave signal characteristics and the high and low level duration.
Further, the step S34 specifically includes the following steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
In a second aspect, the present invention provides an apparatus for automatically checking the misplugging of interconnection cables between a server and a GPU box, comprising:
the system comprises an in-place signal line setting module, a first CPLD chip and a second CPLD chip, wherein the in-place signal line setting module is arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first end of the in-place signal line is connected with the first CPLD chip of the GPU box, and the second end of the in-place signal line is connected with the second CPLD chip of the server;
the frequency signal sending module is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module is used for analyzing the actual signal frequency of each in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the insertion cable by mistake or not.
Further, the in-place signal line setting module includes:
an in-place signal line setting unit which sets an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of the first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
and the BMC and alarm setting unit is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp.
Further, the frequency signal transmitting module includes:
the frequency square wave generating unit is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
and the frequency square wave GPU box end storage unit is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip.
Further, the interconnection cable fault checking module includes:
the server-side signal frequency acquisition unit is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit is used for storing the frequency values acquired by the second GPIO pins into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
the BMC frequency timing acquisition unit is used for the BMC to access the internal register at regular time through the I2C interface to acquire all frequency values;
and the cable connection relation determining unit is used for resolving the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channel and the frequency value in the GPU box end physical channel signal frequency table by the BMC, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
The beneficial effect of the invention is that,
the method and the device for automatically checking the misplugging of the interconnection cable of the server and the GPU box, provided by the invention, have the advantages that the debugging and error positioning after the server and the GPU box are assembled are realized, and the error conditions of loose plugging, misplugging and missing plugging of the cable can be effectively identified; when the complete machine is tested after being assembled, the relevant errors of the cables can be identified at the initial electrifying stage, the connection relation of the cables currently used is output, the errors can be conveniently corrected by an operator, and meanwhile the PCIe communication problem caused by the wrong insertion of the cables can be avoided. The invention realizes the cable plugging error active warning function, and when the cable error is detected, the server actively reports the warning detail and records the system log, thereby facilitating the overhaul and maintenance of the working personnel in the machine room.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a first flowchart illustrating a method for automatically checking for mis-insertion of a cable interconnecting a server and a GPU box according to the present invention.
FIG. 2 is a second flowchart illustrating a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to the present invention.
FIG. 3 is a schematic diagram of an apparatus for automatically checking the misplugging of the interconnection cables between the server and the GPU box according to the present invention.
FIG. 4 is a schematic diagram of the interconnection cable between the server and the GPU box according to the present invention.
Fig. 5 is a schematic diagram of a signal frequency table corresponding to a physical channel of a GPU box in embodiment 4 of the present invention.
Fig. 6 is a schematic diagram of a cable connection table between a server and a GPU box in embodiment 4 of the present invention.
In the figure, 1-an in-place signal line setting module; 1.1-an on-line signal line setting unit; 1.2-the in-place signal line is connected with the GPU box; 1.3-the in-place signal line is connected with the server and is set up the unit; 1.4-BMC and alarm setting unit; 2-a frequency signal transmitting module; 2.1-frequency square wave generating unit; 2.2-frequency square wave GPU box end storage unit; 3-an interconnection cable fault checking module; 3.1-a server-side signal frequency acquisition unit; 3.2-a server-side frequency signal storage unit; 3.3-BMC frequency timing acquisition unit; 3.4-cable connection relation determining unit.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, the present invention provides a method for automatically checking the misplugging of interconnection cables between a server and a GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
Example 2:
as shown in fig. 2, the present invention provides a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server; the method comprises the following specific steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals with different frequencies to the second CPLD chip through the connected in-place signal line; the method comprises the following specific steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
s22, storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
s3, analyzing the actual signal frequency of an in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cables are inserted in a wrong way or not and positioning the serial numbers of the cables inserted in the wrong way or not; the method comprises the following specific steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
Example 3:
as shown in fig. 2, the present invention provides a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server; the method comprises the following specific steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line; the method comprises the following specific steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
s22, storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
s3, analyzing the actual signal frequency of an in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cables are inserted in a wrong way or not and positioning the serial numbers of the cables inserted in the wrong way or not; the method comprises the following specific steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line; the method comprises the following specific steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
s313, the second CPLD chip calculates the signal frequency value F collected by each second GPIO pin to be 1/(T _ H + T _ L);
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
s34, analyzing the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel by the BMC, and determining the cable connection relation between the server physical channel and the GPU box physical channel; the method comprises the following specific steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
Example 4:
applying embodiment 3 above to the application scenario shown in fig. 4, the server and GPU box communicate over 4 sets of PCIe x16 buses, ports A, B, C and D respectively. Each group of ports communicates using 4 cable bundles, one cable _ Present signal in each cable bundle, set as the in-place signal line.
At the server end, the 16 first GPIO pins of the first CPLD chip are respectively connected to the in-place signal lines in the 16 cable bundles, and the BMC communicates with the second CPLD chip through the I2C interface, and can read the input state of each second GPIO pin. And a third GPIO pin of the BMC is connected to an LED alarm indicator lamp and can light up and turn off the LED alarm indicator lamp.
At the GPU box end, the 16 first GPIO pins of the first CPLD chip are respectively connected to 16 Cable _ Present line cables serving as in-place signal lines. The first CPLD chip controls the level state of each first GPIO pin and can generate square wave signals with different frequencies.
In the GPU box, the first CPLD chip generates square wave signals with different frequencies at each first GPIO pin, and 16 channel codes correspond to 16 frequencies as shown in fig. 5.
At the server end, each second GPIO pin independently collects signal frequencies. A10 us timer module is designed in the second CPLD chip, and a counter is started and stopped by collecting the level change edge of a pin, so that high level time T _ H and low level duration time T _ L are calculated, and then the signal frequency value F is converted into 1/L (T _ H + T _ L). For example, when the high level T _ H is 25000us and T _ L is 25000us, the frequency value is converted to 1 ÷ (25+25) × 1000 ═ 20 Hz. And taking an integer from the converted frequency value by adopting a rounding method.
And the second CPLD chip stores frequency values acquired by the 16 second GPIO pins into an internal register, and the frequency values respectively correspond to the 16 physical channels of the server end. The BMC accesses the second CPLD chip once every 2 seconds through the I2C interface at regular time to acquire all frequency values. Then, according to the correspondence between the physical channels of the GPU box and the frequency values in fig. 5, the physical channel codes of the corresponding GPU box are analyzed, and further, the cable connection relationship between the physical channel of the current server side and the physical channel of the GPU box is determined.
After all channel cable connections are determined to be complete, the server BMC outputs a server and GPU box cable connection table as shown in fig. 6 for guiding an operator to adjust the cable sequence.
And when the frequency value of the physical channel cannot be obtained by the BMC at the server end, the channel is considered to have no plug-in cable, the warning LED lamp is lightened, the warning LED lamp is actively reported through the management network, and a corresponding error log is recorded.
When the server-side BMC obtains the frequency values of all the physical channels and no error such as any frequency value, repeated frequency value and the like is found in the corresponding relation table, the BMC considers that an unknown frequency error occurs, lights the alarm LED lamp, actively reports the frequency error through a management network, and records a corresponding error log.
The invention has strong universality, the original cable is not required to be changed, an on-position signal line cable _ Present is used as a cable on-position checking pin in the common cable design, and the high/low level represents that the cable is spliced in place. The invention can be reused on the cable _ Present pin of the in-place signal line without changing.
Example 5:
as shown in fig. 3, the present invention provides an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box, comprising:
the system comprises an in-place signal line setting module 1, a first CPLD chip and a second CPLD chip, wherein in-place signal lines are arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first ends of the in-place signal lines are connected with the first CPLD chip of the GPU box, and the second ends of the in-place signal lines are connected with the second CPLD chip of the server;
the frequency signal sending module 2 is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module 3 is used for analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the insertion cable by mistake or not.
Example 6:
as shown in fig. 3, the present invention provides an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box, comprising:
the system comprises an in-place signal line setting module 1, a first CPLD chip and a second CPLD chip, wherein in-place signal lines are arranged in each high-speed cable of each pair of PCIe ports for a server and a GPU box, the first ends of the in-place signal lines are connected with the first CPLD chip of the GPU box, and the second ends of the in-place signal lines are connected with the second CPLD chip of the server; the in-place signal line setting module 1 includes:
an in-place signal line setting unit 1.1, which sets an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit 1.2 is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of a first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit 1.3 is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
the BMC and alarm setting unit 1.4 is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp;
the frequency signal sending module 2 is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line; the frequency signal transmission module 2 includes:
the frequency square wave generating unit 2.1 is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
the frequency square wave GPU box end storage unit 2.2 is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
the interconnection cable fault checking module 3 is used for analyzing the actual signal frequency of an in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the cable inserted by mistake or omission; the interconnection cable fault checking module 3 includes:
the server-side signal frequency acquisition unit 3.1 is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit 3.2 is used for storing the frequency value acquired by each second GPIO pin into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
a BMC frequency timing acquisition unit 3.3, configured to access the internal register at a timing through an I2C interface by a BMC to acquire all frequency values;
and the cable connection relation determining unit 3.4 is used for analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channel and the frequency value in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for automatically checking the misplugging of a cable for interconnecting a server and a GPU box is characterized by comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
2. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 1, wherein the step S1 comprises the following steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp.
3. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 2, wherein the step S2 comprises the following steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
and S22, the first CPLD chip stores the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table.
4. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 3, wherein the step S3 comprises the following steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
5. The method for automatically checking the misplugging of the interconnection cables between the server and the GPU box according to claim 4, wherein the step S31 comprises the following steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
and S313, the second CPLD chip calculates the signal frequency value F acquired by each second GPIO pin to be 1/(T _ H + T _ L).
6. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 5, wherein the step S34 comprises the following steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
7. An apparatus for automatically checking for misplugging of a server and GPU box interconnect cable, comprising:
the system comprises an in-place signal line setting module (1), an in-place signal line is arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first end of the in-place signal line is connected with a first CPLD chip of the GPU box, and the second end of the in-place signal line is connected with a second CPLD chip of the server;
the frequency signal sending module (2) is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module (3) is used for analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted in a wrong way or not and positioning the serial number of the insertion cable in the wrong way or not.
8. The apparatus for automatically checking the misplugging of the interconnection cables of the server and the GPU box according to claim 7, wherein the in-place signal line setting module (1) comprises:
an on-site signal line setting unit (1.1) for setting an on-site signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit (1.2) is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of the first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit (1.3) is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
and the BMC and alarm setting unit (1.4) is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp.
9. The apparatus for automatically checking the misplugging of the interconnection cables of the server and the GPU box according to claim 7, wherein the frequency signaling module (2) comprises:
the frequency square wave generating unit (2.1) is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
and the frequency square wave GPU box end storage unit (2.2) is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip.
10. The apparatus for automatically checking for misplugging of server and GPU box interconnect cables of claim 7, wherein the interconnect cable fault checking module (3) comprises:
the server end signal frequency acquisition unit (3.1) is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit (3.2) is used for storing the frequency value acquired by each second GPIO pin into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
a BMC frequency timing acquisition unit (3.3) for BMC to access the internal register via I2C interface to acquire all frequency values;
and the cable connection relation determining unit (3.4) is used for analyzing the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channel and the frequency value in the signal frequency table of the GPU box end physical channel by the BMC and determining the cable connection relation between the server physical channel and the GPU box physical channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111065477.4A CN113791368A (en) | 2021-09-10 | 2021-09-10 | Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111065477.4A CN113791368A (en) | 2021-09-10 | 2021-09-10 | Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113791368A true CN113791368A (en) | 2021-12-14 |
Family
ID=79183260
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111065477.4A Pending CN113791368A (en) | 2021-09-10 | 2021-09-10 | Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113791368A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1048279A (en) * | 1996-08-01 | 1998-02-20 | Nec Corp | Method for indicating erroneous connection of cable and unit for detecting/indicating erroneous connection of cable |
CN1776440A (en) * | 2005-12-13 | 2006-05-24 | 北京潞电电气设备厂 | Wiring identifying device and method |
CN101290338A (en) * | 2007-04-20 | 2008-10-22 | 株式会社东芝 | Test system and test method for control cable |
CN101865947A (en) * | 2010-06-24 | 2010-10-20 | 四川长虹电器股份有限公司 | AC input detection circuit and detection method of power supply of flat television |
CN109726055A (en) * | 2017-10-31 | 2019-05-07 | 杭州华为数字技术有限公司 | Detect the method and computer equipment of PCIe chip exception |
CN110988580A (en) * | 2019-11-25 | 2020-04-10 | 国网四川省电力公司广安供电公司 | Secondary cable alignment system and alignment method thereof |
CN111176913A (en) * | 2019-12-16 | 2020-05-19 | 苏州浪潮智能科技有限公司 | Circuit and method for detecting Cable Port in server |
CN112596983A (en) * | 2020-12-30 | 2021-04-02 | 苏州浪潮智能科技有限公司 | Monitoring method for connector in server |
-
2021
- 2021-09-10 CN CN202111065477.4A patent/CN113791368A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH1048279A (en) * | 1996-08-01 | 1998-02-20 | Nec Corp | Method for indicating erroneous connection of cable and unit for detecting/indicating erroneous connection of cable |
CN1776440A (en) * | 2005-12-13 | 2006-05-24 | 北京潞电电气设备厂 | Wiring identifying device and method |
CN101290338A (en) * | 2007-04-20 | 2008-10-22 | 株式会社东芝 | Test system and test method for control cable |
CN101865947A (en) * | 2010-06-24 | 2010-10-20 | 四川长虹电器股份有限公司 | AC input detection circuit and detection method of power supply of flat television |
CN109726055A (en) * | 2017-10-31 | 2019-05-07 | 杭州华为数字技术有限公司 | Detect the method and computer equipment of PCIe chip exception |
CN110988580A (en) * | 2019-11-25 | 2020-04-10 | 国网四川省电力公司广安供电公司 | Secondary cable alignment system and alignment method thereof |
CN111176913A (en) * | 2019-12-16 | 2020-05-19 | 苏州浪潮智能科技有限公司 | Circuit and method for detecting Cable Port in server |
CN112596983A (en) * | 2020-12-30 | 2021-04-02 | 苏州浪潮智能科技有限公司 | Monitoring method for connector in server |
Non-Patent Citations (1)
Title |
---|
杨世兴 等: "《检测监控***原理与实用设计》", 中国电力出版社, pages: 67 - 69 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2531867B1 (en) | Cable test method | |
CN103731663B (en) | The testing method of a kind of intelligent television and device | |
US11782809B2 (en) | Test and measurement system for analyzing devices under test | |
CN211378027U (en) | Automatic monitoring and diagnosing system for optical fiber link of intelligent substation | |
CN113315572B (en) | Detection method and device for optical module physical link, optical module and optical transmission system | |
CN107450013B (en) | Circuit board functional integrity test platform and test method | |
CN116962471A (en) | Medical equipment management system based on Internet of things | |
CN110855353B (en) | Error code tester and test system suitable for various types of optical modules | |
CN115858221A (en) | Management method and device of storage equipment, storage medium and electronic equipment | |
CN103913728A (en) | Portable radar general-purpose tester and testing method | |
CN108419263B (en) | Indoor distributed communication system monitoring device and monitoring method | |
CN111966033B (en) | Detection system for connection state of high-density connector | |
CN113791368A (en) | Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box | |
CN111654405A (en) | Method, device, equipment and storage medium for fault node of communication link | |
CN109634849B (en) | Visual signal interaction intelligent substation reconstruction and extension virtual testing device and method | |
CN109753396A (en) | A kind of cable self checking method, system and the server of storage system | |
CN116449134A (en) | Method and system for acquiring fault information of photovoltaic inverter | |
CN115480975A (en) | Wiring checking method and device | |
CN108549042A (en) | A kind of NVME LED detecting systems and detection method | |
CN111679841B (en) | Burning system, method and device of camera module | |
CN103605034A (en) | Device for detecting wiring of equipment cabinet | |
CN111307280A (en) | Converter valve base electronic equipment optical power online monitoring system and monitoring method | |
CN109981394B (en) | Communication method and device based on enhanced CAN bus protocol analyzer | |
CN110737586A (en) | computer software test system with short test period | |
CN219392545U (en) | Clock board and clock compatibility verification system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |