CN113791368A - Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box - Google Patents

Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box Download PDF

Info

Publication number
CN113791368A
CN113791368A CN202111065477.4A CN202111065477A CN113791368A CN 113791368 A CN113791368 A CN 113791368A CN 202111065477 A CN202111065477 A CN 202111065477A CN 113791368 A CN113791368 A CN 113791368A
Authority
CN
China
Prior art keywords
server
gpu box
cpld chip
frequency
physical channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111065477.4A
Other languages
Chinese (zh)
Inventor
田东顺
殷奎龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111065477.4A priority Critical patent/CN113791368A/en
Publication of CN113791368A publication Critical patent/CN113791368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/50Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
    • G01R31/66Testing of connections, e.g. of plugs or non-disconnectable joints
    • G01R31/67Testing the correctness of wire connections in electric apparatus or circuits
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/50Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
    • G01R31/54Testing for continuity
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R31/00Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
    • G01R31/50Testing of electric apparatus, lines, cables or components for short-circuits, continuity, leakage current or incorrect line connections
    • G01R31/66Testing of connections, e.g. of plugs or non-disconnectable joints
    • G01R31/68Testing of releasable connections, e.g. of terminals mounted on a printed circuit board

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for automatically checking the misplugging of a cable interconnecting a server and a GPU box, wherein the method comprises the following steps: arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting the server and the GPU box, and respectively connecting two ends of each in-place signal line with a first CPLD chip of the GPU box and a second CPLD chip of the server; the first CPLD chip sends signals with different frequencies to the second CPLD chip through each in-situ signal line; and the BMC arranged on the server analyzes the actual signal frequency of each in-place signal wire received by the second CPLD chip, judges whether each actual signal frequency is the same as the corresponding preset signal frequency, further judges whether the interconnection cable is inserted by mistake or not, and positions the serial number of the inserted cable by mistake or not. The method and the system realize debugging and error positioning after the server and the GPU box are assembled, and can effectively identify error conditions of loose splicing, wrong splicing and missed splicing of cables.

Description

Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box
Technical Field
The invention belongs to the technical field of server production assembly error correction, and particularly relates to a method and a device for automatically checking misplugging of a server and a GPU box interconnection cable.
Background
With the rapid development of AI calculation, high-performance operation and artificial intelligence application, the wide application of heterogeneous operation equipment is promoted. In order to achieve powerful heterogeneous computing power and extended performance, CPU operations and GPU operations are generally divided into two separate devices, a server and a GPU box, which are connected by a PCIe high-speed cable.
In practical applications, each PCIe port has a bandwidth of x16, i.e., contains 16 sets of communication channels. Since there are 4 sets of communication lanes in each high-speed cable, 4 high-speed cables are required for each PCIe port. The GPU box has 4 PCIe X16 ports, i.e. 16 high-speed cables are required to connect to the server.
In the production and assembly link, the problems of wrong insertion, missing insertion and the like of the cable cannot be automatically detected during manual assembly. Often rely on the mode of artifical visual inspection, discern the problem and correct the condition that often takes place the cable order and insert the mistake. The problems of bandwidth reduction of PCIe products, communication incapability, non-correspondence between GPU system serial numbers and physical serial numbers and the like can be caused by the problems of wrong cable insertion, wrong cross-port insertion and the like in the same port, the problems are often judged as GPU board card function problems by mistake, inspectors needing special functions are screened and clarified, each cable of each device is inspected, the workload is large, and more human resources are consumed; the method completely depends on manual screening of inspectors, and has the risks of missed detection and wrong detection due to slight negligence; when the cable data of the plugging error is more, great efforts are often needed to adjust the wire sequence, and even after the cable is completely detached, the cable is reconnected.
Therefore, it is very necessary to provide a method and an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
The invention provides a method and a device for automatically checking the misplugging of an interconnection cable of a server and a GPU box, aiming at solving the technical problems that whether the misplugging, the misplugging and the like of the cable cannot be automatically detected when the server and the GPU box are assembled in the prior art, a large amount of manpower is consumed by depending on a manual visual inspection mode, the missing inspection and the misplugging risks are high, and the wrong cable sequence is inconvenient to adjust.
In a first aspect, the present invention provides a method for automatically checking the misplugging of a cable interconnecting a server and a GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
Further, the step S1 specifically includes the following steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp. The server and the GPU box monitor cables connected with each pair of PCIe ports through respective CPLD chips.
Further, the step S2 specifically includes the following steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
and S22, the first CPLD chip stores the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table. The physical channels are distinguished by transmitting frequency signals of different frequency square waves.
Further, the step S3 specifically includes the following steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel. The BMC acquires frequency value information received by the server side at regular time, compares the frequency value information with a frequency value sent by the GPU box side, and further positions serial numbers of PCIe ports which are spliced between the server side and the GPU box side.
Further, the step S31 specifically includes the following steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
and S313, the second CPLD chip calculates the signal frequency value F acquired by each second GPIO pin to be 1/(T _ H + T _ L). And the second CPLD chip calculates the signal frequency value acquired by the second GPIO pin according to the square wave signal characteristics and the high and low level duration.
Further, the step S34 specifically includes the following steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
In a second aspect, the present invention provides an apparatus for automatically checking the misplugging of interconnection cables between a server and a GPU box, comprising:
the system comprises an in-place signal line setting module, a first CPLD chip and a second CPLD chip, wherein the in-place signal line setting module is arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first end of the in-place signal line is connected with the first CPLD chip of the GPU box, and the second end of the in-place signal line is connected with the second CPLD chip of the server;
the frequency signal sending module is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module is used for analyzing the actual signal frequency of each in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the insertion cable by mistake or not.
Further, the in-place signal line setting module includes:
an in-place signal line setting unit which sets an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of the first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
and the BMC and alarm setting unit is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp.
Further, the frequency signal transmitting module includes:
the frequency square wave generating unit is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
and the frequency square wave GPU box end storage unit is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip.
Further, the interconnection cable fault checking module includes:
the server-side signal frequency acquisition unit is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit is used for storing the frequency values acquired by the second GPIO pins into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
the BMC frequency timing acquisition unit is used for the BMC to access the internal register at regular time through the I2C interface to acquire all frequency values;
and the cable connection relation determining unit is used for resolving the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channel and the frequency value in the GPU box end physical channel signal frequency table by the BMC, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
The beneficial effect of the invention is that,
the method and the device for automatically checking the misplugging of the interconnection cable of the server and the GPU box, provided by the invention, have the advantages that the debugging and error positioning after the server and the GPU box are assembled are realized, and the error conditions of loose plugging, misplugging and missing plugging of the cable can be effectively identified; when the complete machine is tested after being assembled, the relevant errors of the cables can be identified at the initial electrifying stage, the connection relation of the cables currently used is output, the errors can be conveniently corrected by an operator, and meanwhile the PCIe communication problem caused by the wrong insertion of the cables can be avoided. The invention realizes the cable plugging error active warning function, and when the cable error is detected, the server actively reports the warning detail and records the system log, thereby facilitating the overhaul and maintenance of the working personnel in the machine room.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a first flowchart illustrating a method for automatically checking for mis-insertion of a cable interconnecting a server and a GPU box according to the present invention.
FIG. 2 is a second flowchart illustrating a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to the present invention.
FIG. 3 is a schematic diagram of an apparatus for automatically checking the misplugging of the interconnection cables between the server and the GPU box according to the present invention.
FIG. 4 is a schematic diagram of the interconnection cable between the server and the GPU box according to the present invention.
Fig. 5 is a schematic diagram of a signal frequency table corresponding to a physical channel of a GPU box in embodiment 4 of the present invention.
Fig. 6 is a schematic diagram of a cable connection table between a server and a GPU box in embodiment 4 of the present invention.
In the figure, 1-an in-place signal line setting module; 1.1-an on-line signal line setting unit; 1.2-the in-place signal line is connected with the GPU box; 1.3-the in-place signal line is connected with the server and is set up the unit; 1.4-BMC and alarm setting unit; 2-a frequency signal transmitting module; 2.1-frequency square wave generating unit; 2.2-frequency square wave GPU box end storage unit; 3-an interconnection cable fault checking module; 3.1-a server-side signal frequency acquisition unit; 3.2-a server-side frequency signal storage unit; 3.3-BMC frequency timing acquisition unit; 3.4-cable connection relation determining unit.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, the present invention provides a method for automatically checking the misplugging of interconnection cables between a server and a GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
Example 2:
as shown in fig. 2, the present invention provides a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server; the method comprises the following specific steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals with different frequencies to the second CPLD chip through the connected in-place signal line; the method comprises the following specific steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
s22, storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
s3, analyzing the actual signal frequency of an in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cables are inserted in a wrong way or not and positioning the serial numbers of the cables inserted in the wrong way or not; the method comprises the following specific steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
Example 3:
as shown in fig. 2, the present invention provides a method for automatically checking the misplugging of the interconnection cable between the server and the GPU box, comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server; the method comprises the following specific steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line; the method comprises the following specific steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
s22, storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
s3, analyzing the actual signal frequency of an in-place signal wire connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cables are inserted in a wrong way or not and positioning the serial numbers of the cables inserted in the wrong way or not; the method comprises the following specific steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line; the method comprises the following specific steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
s313, the second CPLD chip calculates the signal frequency value F collected by each second GPIO pin to be 1/(T _ H + T _ L);
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
s34, analyzing the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel by the BMC, and determining the cable connection relation between the server physical channel and the GPU box physical channel; the method comprises the following specific steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
Example 4:
applying embodiment 3 above to the application scenario shown in fig. 4, the server and GPU box communicate over 4 sets of PCIe x16 buses, ports A, B, C and D respectively. Each group of ports communicates using 4 cable bundles, one cable _ Present signal in each cable bundle, set as the in-place signal line.
At the server end, the 16 first GPIO pins of the first CPLD chip are respectively connected to the in-place signal lines in the 16 cable bundles, and the BMC communicates with the second CPLD chip through the I2C interface, and can read the input state of each second GPIO pin. And a third GPIO pin of the BMC is connected to an LED alarm indicator lamp and can light up and turn off the LED alarm indicator lamp.
At the GPU box end, the 16 first GPIO pins of the first CPLD chip are respectively connected to 16 Cable _ Present line cables serving as in-place signal lines. The first CPLD chip controls the level state of each first GPIO pin and can generate square wave signals with different frequencies.
In the GPU box, the first CPLD chip generates square wave signals with different frequencies at each first GPIO pin, and 16 channel codes correspond to 16 frequencies as shown in fig. 5.
At the server end, each second GPIO pin independently collects signal frequencies. A10 us timer module is designed in the second CPLD chip, and a counter is started and stopped by collecting the level change edge of a pin, so that high level time T _ H and low level duration time T _ L are calculated, and then the signal frequency value F is converted into 1/L (T _ H + T _ L). For example, when the high level T _ H is 25000us and T _ L is 25000us, the frequency value is converted to 1 ÷ (25+25) × 1000 ═ 20 Hz. And taking an integer from the converted frequency value by adopting a rounding method.
And the second CPLD chip stores frequency values acquired by the 16 second GPIO pins into an internal register, and the frequency values respectively correspond to the 16 physical channels of the server end. The BMC accesses the second CPLD chip once every 2 seconds through the I2C interface at regular time to acquire all frequency values. Then, according to the correspondence between the physical channels of the GPU box and the frequency values in fig. 5, the physical channel codes of the corresponding GPU box are analyzed, and further, the cable connection relationship between the physical channel of the current server side and the physical channel of the GPU box is determined.
After all channel cable connections are determined to be complete, the server BMC outputs a server and GPU box cable connection table as shown in fig. 6 for guiding an operator to adjust the cable sequence.
And when the frequency value of the physical channel cannot be obtained by the BMC at the server end, the channel is considered to have no plug-in cable, the warning LED lamp is lightened, the warning LED lamp is actively reported through the management network, and a corresponding error log is recorded.
When the server-side BMC obtains the frequency values of all the physical channels and no error such as any frequency value, repeated frequency value and the like is found in the corresponding relation table, the BMC considers that an unknown frequency error occurs, lights the alarm LED lamp, actively reports the frequency error through a management network, and records a corresponding error log.
The invention has strong universality, the original cable is not required to be changed, an on-position signal line cable _ Present is used as a cable on-position checking pin in the common cable design, and the high/low level represents that the cable is spliced in place. The invention can be reused on the cable _ Present pin of the in-place signal line without changing.
Example 5:
as shown in fig. 3, the present invention provides an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box, comprising:
the system comprises an in-place signal line setting module 1, a first CPLD chip and a second CPLD chip, wherein in-place signal lines are arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first ends of the in-place signal lines are connected with the first CPLD chip of the GPU box, and the second ends of the in-place signal lines are connected with the second CPLD chip of the server;
the frequency signal sending module 2 is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module 3 is used for analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the insertion cable by mistake or not.
Example 6:
as shown in fig. 3, the present invention provides an apparatus for automatically checking the mis-insertion of the interconnection cable between the server and the GPU box, comprising:
the system comprises an in-place signal line setting module 1, a first CPLD chip and a second CPLD chip, wherein in-place signal lines are arranged in each high-speed cable of each pair of PCIe ports for a server and a GPU box, the first ends of the in-place signal lines are connected with the first CPLD chip of the GPU box, and the second ends of the in-place signal lines are connected with the second CPLD chip of the server; the in-place signal line setting module 1 includes:
an in-place signal line setting unit 1.1, which sets an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit 1.2 is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of a first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit 1.3 is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
the BMC and alarm setting unit 1.4 is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp;
the frequency signal sending module 2 is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line; the frequency signal transmission module 2 includes:
the frequency square wave generating unit 2.1 is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
the frequency square wave GPU box end storage unit 2.2 is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip;
the interconnection cable fault checking module 3 is used for analyzing the actual signal frequency of an in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, and further judging whether the interconnection cable is inserted by mistake or not and positioning the serial number of the cable inserted by mistake or omission; the interconnection cable fault checking module 3 includes:
the server-side signal frequency acquisition unit 3.1 is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit 3.2 is used for storing the frequency value acquired by each second GPIO pin into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
a BMC frequency timing acquisition unit 3.3, configured to access the internal register at a timing through an I2C interface by a BMC to acquire all frequency values;
and the cable connection relation determining unit 3.4 is used for analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channel and the frequency value in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for automatically checking the misplugging of a cable for interconnecting a server and a GPU box is characterized by comprising the following steps:
s1, arranging in-place signal lines in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, connecting the first ends of the in-place signal lines with a first CPLD chip of the GPU box, and connecting the second ends of the in-place signal lines with a second CPLD chip of the server;
s2, the first CPLD chip generates signals with different frequencies through the GPIO pin and sends the signals to the second CPLD chip through the connected in-place signal line;
and S3, analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cables are inserted in a wrong way or not, and positioning the serial numbers of the cables inserted in the wrong way or not.
2. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 1, wherein the step S1 comprises the following steps:
s11, arranging an in-place signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
s12, setting a first end of each in-place signal line to be connected with a first GPIO pin of a first CPLD chip of the GPU box;
s13, setting a second end of each in-place signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
s14, setting a BMC in the server to be connected with a second CPLD chip through an I2C signal line, and setting a third GPIO pin of the BMC to be connected with an LED alarm indicator lamp.
3. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 2, wherein the step S2 comprises the following steps:
s21, the first CPLD chip controls the level state of each first GPIO pin so as to generate square wave signals with different frequencies;
and S22, the first CPLD chip stores the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table.
4. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 3, wherein the step S3 comprises the following steps:
s31, each second GPIO pin of the second CPLD chip independently acquires the frequency value of the connected in-place signal line;
s32, the second CPLD chip stores the frequency value acquired by each second GPIO pin into an internal register, and the internal register corresponds to the physical channel of the server end;
s33, the BMC accesses an internal register at regular time through an I2C interface to obtain all frequency values;
and S34, analyzing the physical channel codes of the GPU box corresponding to each frequency value by the BMC according to the corresponding relation between the GPU box physical channels and the frequency values in the signal frequency table of the GPU box end physical channel, and determining the cable connection relation between the server physical channel and the GPU box physical channel.
5. The method for automatically checking the misplugging of the interconnection cables between the server and the GPU box according to claim 4, wherein the step S31 comprises the following steps:
s311, setting a sampling period by the second CPLD chip according to an internal timer;
s312, the second CPLD chip collects the level change edge starting and stopping counters of the second GPIO pins according to the sampling period, and high level duration T _ H and low level duration T _ L of the second GPIO pins are counted;
and S313, the second CPLD chip calculates the signal frequency value F acquired by each second GPIO pin to be 1/(T _ H + T _ L).
6. The method for automatically checking the misplugging of the interconnection cable between the server and the GPU box according to claim 5, wherein the step S34 comprises the following steps:
s341, acquiring a signal frequency table of a physical channel of a GPU box end by the BMC;
s342, the BMC locates a frequency value in the internal register and a server-side physical channel thereof;
s343, the BMC analyzes the physical channel code of the GPU box corresponding to the positioned frequency value in the GPU box end physical channel signal frequency table to generate a cable connection relation between the server end physical channel and the GPU box physical channel;
s344, when the BMC cannot acquire the frequency value of a certain server end physical channel, judging that the server end physical channel is not provided with a plug-in cable, lighting an LED alarm indicator lamp, reporting an error through a management network, and recording a corresponding error log;
and S345, when the BMC acquires the frequency values of all the physical channels of the server end, but any frequency value is not found in the corresponding signal frequency table of the physical channel of the GPU box end or a repeated frequency value error occurs, the BMC judges that the frequency error occurs, lights an LED alarm indicator lamp, actively reports the frequency error through a management network, and records a corresponding error log.
7. An apparatus for automatically checking for misplugging of a server and GPU box interconnect cable, comprising:
the system comprises an in-place signal line setting module (1), an in-place signal line is arranged in each high-speed cable of each pair of PCIe ports for butting a server and a GPU box, the first end of the in-place signal line is connected with a first CPLD chip of the GPU box, and the second end of the in-place signal line is connected with a second CPLD chip of the server;
the frequency signal sending module (2) is used for generating different frequency signals by the first CPLD chip through the GPIO pin and sending the signals to the second CPLD chip through the connected in-place signal line;
and the interconnection cable fault checking module (3) is used for analyzing the actual signal frequency of the in-place signal line connected with the GPIO pin of the second CPLD chip by the BMC arranged in the server, judging whether each actual signal frequency is the same as the corresponding preset signal frequency, further judging whether the interconnection cable is inserted in a wrong way or not and positioning the serial number of the insertion cable in the wrong way or not.
8. The apparatus for automatically checking the misplugging of the interconnection cables of the server and the GPU box according to claim 7, wherein the in-place signal line setting module (1) comprises:
an on-site signal line setting unit (1.1) for setting an on-site signal line in each high-speed cable of each pair of PCIe ports for connecting the server and the GPU box;
the on-site signal line and GPU box connection setting unit (1.2) is used for setting the first end of each on-site signal line to be connected with one first GPIO pin of the first CPLD chip of the GPU box;
the on-site signal line and server connection setting unit (1.3) is used for setting the second end of each on-site signal line to be connected with a second GPIO pin of a second CPLD chip of the server;
and the BMC and alarm setting unit (1.4) is used for setting that the BMC in the server is connected with the second CPLD chip through an I2C signal line and setting that a third GPIO pin of the BMC is connected with an LED alarm indicator lamp.
9. The apparatus for automatically checking the misplugging of the interconnection cables of the server and the GPU box according to claim 7, wherein the frequency signaling module (2) comprises:
the frequency square wave generating unit (2.1) is used for controlling the level state of each first GPIO pin by the first CPLD chip so as to generate square wave signals with different frequencies;
and the frequency square wave GPU box end storage unit (2.2) is used for storing the square wave signal frequency value generated by each first GPIO pin and the GPU box physical channel corresponding to the first GPIO pin in a GPU box end physical channel signal frequency table by the first CPLD chip.
10. The apparatus for automatically checking for misplugging of server and GPU box interconnect cables of claim 7, wherein the interconnect cable fault checking module (3) comprises:
the server end signal frequency acquisition unit (3.1) is used for independently acquiring the frequency value of the connected in-place signal line by each second GPIO pin of the second CPLD chip;
the server-end frequency signal storage unit (3.2) is used for storing the frequency value acquired by each second GPIO pin into an internal register by the second CPLD chip, and the internal register corresponds to the server-end physical channel;
a BMC frequency timing acquisition unit (3.3) for BMC to access the internal register via I2C interface to acquire all frequency values;
and the cable connection relation determining unit (3.4) is used for analyzing the physical channel codes of the GPU box corresponding to each frequency value according to the corresponding relation between the GPU box physical channel and the frequency value in the signal frequency table of the GPU box end physical channel by the BMC and determining the cable connection relation between the server physical channel and the GPU box physical channel.
CN202111065477.4A 2021-09-10 2021-09-10 Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box Pending CN113791368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111065477.4A CN113791368A (en) 2021-09-10 2021-09-10 Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111065477.4A CN113791368A (en) 2021-09-10 2021-09-10 Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box

Publications (1)

Publication Number Publication Date
CN113791368A true CN113791368A (en) 2021-12-14

Family

ID=79183260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111065477.4A Pending CN113791368A (en) 2021-09-10 2021-09-10 Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box

Country Status (1)

Country Link
CN (1) CN113791368A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1048279A (en) * 1996-08-01 1998-02-20 Nec Corp Method for indicating erroneous connection of cable and unit for detecting/indicating erroneous connection of cable
CN1776440A (en) * 2005-12-13 2006-05-24 北京潞电电气设备厂 Wiring identifying device and method
CN101290338A (en) * 2007-04-20 2008-10-22 株式会社东芝 Test system and test method for control cable
CN101865947A (en) * 2010-06-24 2010-10-20 四川长虹电器股份有限公司 AC input detection circuit and detection method of power supply of flat television
CN109726055A (en) * 2017-10-31 2019-05-07 杭州华为数字技术有限公司 Detect the method and computer equipment of PCIe chip exception
CN110988580A (en) * 2019-11-25 2020-04-10 国网四川省电力公司广安供电公司 Secondary cable alignment system and alignment method thereof
CN111176913A (en) * 2019-12-16 2020-05-19 苏州浪潮智能科技有限公司 Circuit and method for detecting Cable Port in server
CN112596983A (en) * 2020-12-30 2021-04-02 苏州浪潮智能科技有限公司 Monitoring method for connector in server

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1048279A (en) * 1996-08-01 1998-02-20 Nec Corp Method for indicating erroneous connection of cable and unit for detecting/indicating erroneous connection of cable
CN1776440A (en) * 2005-12-13 2006-05-24 北京潞电电气设备厂 Wiring identifying device and method
CN101290338A (en) * 2007-04-20 2008-10-22 株式会社东芝 Test system and test method for control cable
CN101865947A (en) * 2010-06-24 2010-10-20 四川长虹电器股份有限公司 AC input detection circuit and detection method of power supply of flat television
CN109726055A (en) * 2017-10-31 2019-05-07 杭州华为数字技术有限公司 Detect the method and computer equipment of PCIe chip exception
CN110988580A (en) * 2019-11-25 2020-04-10 国网四川省电力公司广安供电公司 Secondary cable alignment system and alignment method thereof
CN111176913A (en) * 2019-12-16 2020-05-19 苏州浪潮智能科技有限公司 Circuit and method for detecting Cable Port in server
CN112596983A (en) * 2020-12-30 2021-04-02 苏州浪潮智能科技有限公司 Monitoring method for connector in server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨世兴 等: "《检测监控***原理与实用设计》", 中国电力出版社, pages: 67 - 69 *

Similar Documents

Publication Publication Date Title
EP2531867B1 (en) Cable test method
CN103731663B (en) The testing method of a kind of intelligent television and device
US11782809B2 (en) Test and measurement system for analyzing devices under test
CN211378027U (en) Automatic monitoring and diagnosing system for optical fiber link of intelligent substation
CN113315572B (en) Detection method and device for optical module physical link, optical module and optical transmission system
CN107450013B (en) Circuit board functional integrity test platform and test method
CN116962471A (en) Medical equipment management system based on Internet of things
CN110855353B (en) Error code tester and test system suitable for various types of optical modules
CN115858221A (en) Management method and device of storage equipment, storage medium and electronic equipment
CN103913728A (en) Portable radar general-purpose tester and testing method
CN108419263B (en) Indoor distributed communication system monitoring device and monitoring method
CN111966033B (en) Detection system for connection state of high-density connector
CN113791368A (en) Method and device for automatically checking misplugging of interconnection cables of server and GPU (graphics processing Unit) box
CN111654405A (en) Method, device, equipment and storage medium for fault node of communication link
CN109634849B (en) Visual signal interaction intelligent substation reconstruction and extension virtual testing device and method
CN109753396A (en) A kind of cable self checking method, system and the server of storage system
CN116449134A (en) Method and system for acquiring fault information of photovoltaic inverter
CN115480975A (en) Wiring checking method and device
CN108549042A (en) A kind of NVME LED detecting systems and detection method
CN111679841B (en) Burning system, method and device of camera module
CN103605034A (en) Device for detecting wiring of equipment cabinet
CN111307280A (en) Converter valve base electronic equipment optical power online monitoring system and monitoring method
CN109981394B (en) Communication method and device based on enhanced CAN bus protocol analyzer
CN110737586A (en) computer software test system with short test period
CN219392545U (en) Clock board and clock compatibility verification system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination