TWI704463B - Server system and management method thereto - Google Patents

Server system and management method thereto Download PDF

Info

Publication number
TWI704463B
TWI704463B TW108111294A TW108111294A TWI704463B TW I704463 B TWI704463 B TW I704463B TW 108111294 A TW108111294 A TW 108111294A TW 108111294 A TW108111294 A TW 108111294A TW I704463 B TWI704463 B TW I704463B
Authority
TW
Taiwan
Prior art keywords
management controller
nodes
computing
rack management
switch
Prior art date
Application number
TW108111294A
Other languages
Chinese (zh)
Other versions
TW202036318A (en
Inventor
褚方傑
詹鵬
Original Assignee
英業達股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 英業達股份有限公司 filed Critical 英業達股份有限公司
Priority to TW108111294A priority Critical patent/TWI704463B/en
Application granted granted Critical
Publication of TWI704463B publication Critical patent/TWI704463B/en
Publication of TW202036318A publication Critical patent/TW202036318A/en

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides a server system, which comprises the following elements. A plurality of computing nodes and a plurality of storage nodes start to operate after the computing nodes and the storage nodes are actuated. A switch is electrically connected to the computing nodes through a plurality of first ports, and the switch is electrically connected to the storage nodes through a plurality of second ports. A rack management controller is electrically connected to the computing nodes, the storage nodes, and the switch. When the rack management controller receives a hardware requirement, the rack management controller controls the switch to connect to at least a part of the computing nodes and at least a part of the storage nodes according to the hardware requirement.

Description

伺服器系統與管理方法Server system and management method

本發明係關於一種伺服器系統與管理方法,特別是一種基於機櫃管理控制器的伺服器系統與管理方法。The invention relates to a server system and management method, in particular to a server system and management method based on a cabinet management controller.

隨著大數據時代的來臨,因伺服器具有運算能力強且儲存空間大等特點,並且可透過網際網路對內網或外網提供服務,使得越來越多產業依賴伺服器處理大量的資料。With the advent of the era of big data, because servers have the characteristics of strong computing power and large storage space, and can provide services to intranet or extranet through the Internet, more and more industries rely on servers to process large amounts of data. .

一般而言,伺服器的運算節點與存儲節點的物理特徵(例如:主板各元件的溫度、電壓與電源供應等)係由基板管理控制器(baseboard management controller,BMC)所監控,並將收集到的數據傳送給機櫃管理控制器(rack management controller,RMC)。此外,部分的伺服器架構亦可直接由機櫃管理控制器透過交換器(switch),直接監控各個運算節點與存儲節點的狀態,以簡化伺服器架構並省下維護基板管理控制器所需的成本。Generally speaking, the physical characteristics of the computing node and storage node of the server (for example: the temperature, voltage and power supply of each component of the motherboard) are monitored by the baseboard management controller (BMC) and collected The data is transmitted to the rack management controller (rack management controller, RMC). In addition, part of the server architecture can also be directly monitored by the rack management controller through the switch to directly monitor the status of each computing node and storage node to simplify the server architecture and save the cost of maintaining the baseboard management controller. .

然而,在前述的伺服器架構中,因受限於元件配置和交換器規格等因素,交換器與機櫃管理控制器、各節點之間往往只有單一連接埠連接。因此,當特定的節點(例如某一運算節點)或連接埠損壞時,伺服器便無法繼續使用對應的節點(例如連接至此運算節點的某一存儲節點),進而影響到正在執行的工作。However, in the aforementioned server architecture, due to factors such as component configuration and switch specifications, there is often only a single port connection between the switch and the rack management controller and each node. Therefore, when a specific node (for example, a computing node) or port is damaged, the server cannot continue to use the corresponding node (for example, a storage node connected to this computing node), which affects the work being performed.

因此,目前尚需要一種伺服器系統與管理方法,以改善上述問題。Therefore, there is still a need for a server system and management method to improve the above problems.

本發明在於提供一種伺服器系統與管理方法,所述伺服器系統的交換器能以多個連接埠連接機櫃管理控制器與各節點。當特定的連接埠或節點損壞時,機櫃管理控制器能透過其他連接埠控制運作所需的節點,提供更有效的伺服器系統管理方法,以改善先前技術所提及的問題。The present invention is to provide a server system and a management method. The switch of the server system can connect a rack management controller and each node with a plurality of connection ports. When a specific port or node is damaged, the rack management controller can control the nodes required for operation through other ports, providing a more effective server system management method to improve the problems mentioned in the prior art.

本發明提供一種伺服器系統,包含下列元件。多個計算節點和多個存儲節點,於被致動後開始運作。一交換器,透過多個第一連接埠各別電性連接該些計算節點,以及透過多個第二連接埠各別電性連接該些存儲節點。一機櫃管理控制器,電性連接該些計算節點、該些存儲節點及該交換器,並於接獲一硬體需求時,根據該硬體需求控制該交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份。The present invention provides a server system including the following components. Multiple computing nodes and multiple storage nodes begin to operate after being activated. A switch is electrically connected to the computing nodes through a plurality of first ports, and is electrically connected to the storage nodes through a plurality of second ports. A rack management controller is electrically connected to the computing nodes, the storage nodes, and the switch, and when receiving a hardware demand, controls the switch to connect to at least one of the computing nodes according to the hardware demand Part to at least part of the storage nodes.

本發明提供一種伺服器系統的管理方法,包含:以一機櫃管理控制器致動多個計算節點與多個存儲節點,並於該機櫃管理控制器接獲一硬體需求時,以該機櫃管理控制器根據該硬體需求控制該交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份。The present invention provides a server system management method, which includes: using a rack management controller to activate multiple computing nodes and multiple storage nodes, and when the rack management controller receives a hardware request, use the cabinet to manage The controller controls the switch to connect at least a part of the computing nodes to at least a part of the storage nodes according to the hardware requirements.

本發明在於提供一種伺服器系統與管理方法,所述伺服器系統的交換器能以多個連接埠連接機櫃管理控制器與各節點。當特定的連接埠或節點損壞時,機櫃管理控制器能透過其他連接埠控制運作所需的節點。因此,所述的伺服器系統提供了更有效的伺服器管理方法,並改善先前技術所提及的問題。The present invention is to provide a server system and a management method. The switch of the server system can connect a rack management controller and each node with a plurality of connection ports. When a specific port or node is damaged, the rack management controller can control the nodes required for operation through other ports. Therefore, the server system described provides a more effective server management method and improves the problems mentioned in the prior art.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理,並且提供本發明之專利申請範圍更進一步之解釋。The above description of the content of the disclosure and the description of the following embodiments are used to demonstrate and explain the spirit and principle of the present invention, and to provide a further explanation of the patent application scope of the present invention.

以下在實施方式中詳細敘述本發明之詳細特徵以及優點,其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施,且根據本說明書所揭露之內容、申請專利範圍及圖式,任何熟習相關技藝者可輕易地理解本發明相關之目的及優點。以下之實施例係進一步詳細說明本發明之觀點,但非以任何觀點限制本發明之範疇。The detailed features and advantages of the present invention are described in detail in the following embodiments, and the content is sufficient to enable anyone familiar with the relevant art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, the scope of patent application and the drawings Anyone who is familiar with the relevant art can easily understand the related purpose and advantages of the present invention. The following examples further illustrate the viewpoints of the present invention in detail, but do not limit the scope of the present invention by any viewpoint.

請參考圖1,圖1為本發明一實施例的伺服器系統的方塊結構圖。所述的伺服器系統包含多個計算節點11、多個存儲節點12、交換器13(switch)與機櫃管理控制器14(rack management controller,RMC)。Please refer to FIG. 1, which is a block diagram of a server system according to an embodiment of the present invention. The server system includes a plurality of computing nodes 11, a plurality of storage nodes 12, a switch 13 (switch) and a rack management controller 14 (rack management controller, RMC).

為說明計算節點11和存儲節點12,請繼續參考圖1。計算節點11和存儲節點12皆於被致動後開始運作;其中被致動的方式可以系統自動發送指令給節點,亦可由用戶自行輸入指令給節點。計算節點11和存儲節點12在接收指令後,便開始執行對應的運作(例如:計算節點11存取存儲節點12的資料並運算)。具體來說,計算節點11可配置為中央處理器(central processing unit,CPU)或其他具運算功能的元件,存儲節點12則可配置為修正錯誤記憶體(error-correcting code memory,ECC memory)、暫存器記憶體(registered memory,REG memory)或其他具有儲存功能的元件;本發明不以此為限。To illustrate the computing node 11 and the storage node 12, please continue to refer to Figure 1. Both the computing node 11 and the storage node 12 start to operate after being activated; the method of being activated can be that the system automatically sends instructions to the nodes, or the user can input instructions to the nodes. After receiving the instruction, the computing node 11 and the storage node 12 begin to perform corresponding operations (for example, the computing node 11 accesses the data of the storage node 12 and performs operations). Specifically, the computing node 11 can be configured as a central processing unit (CPU) or other components with computing functions, and the storage node 12 can be configured as an error-correcting code memory (ECC memory), Registered memory (REG memory) or other components with storage function; the invention is not limited to this.

為說明交換器13,請繼續參考圖1。交換器13係透過多個連接埠,與機櫃管理控制器14和各節點連接。詳細來說,交換器13係透過多個第一連接埠15各別電性連接上述的計算節點11,以及透過多個第二連接埠16各別電性連接上述的存儲節點12。此外,該些第一連接埠15與該些第二連接埠16可以是硬體規格支援積體電路匯流排(inter-integrated circuit bus,I2C bus)的連接埠。然而,依據不同的伺服器配置,該些第一連接埠15與該些第二連接埠16也可配置為硬體規格支援其他通訊匯流排的連接埠,本實施例不以此為限。於本實施例中,交換器13可以 SAS 交換器晶片實現。此外,於本實施例的一實施樣態中,上述的SAS 交換器晶片型號為PM8056。然而,交換器13的規格與型號亦可依照不同的伺服器配置而改變,本發明不以此為限。To illustrate the switch 13, please continue to refer to Figure 1. The switch 13 is connected to the rack management controller 14 and each node through multiple connection ports. Specifically, the switch 13 is electrically connected to the aforementioned computing node 11 through a plurality of first connection ports 15, and electrically connected to the aforementioned storage node 12 through a plurality of second connection ports 16. In addition, the first connection ports 15 and the second connection ports 16 may be hardware specifications supporting inter-integrated circuit bus (I2C bus) ports. However, according to different server configurations, the first ports 15 and the second ports 16 can also be configured as ports whose hardware specifications support other communication buses, and this embodiment is not limited thereto. In this embodiment, the switch 13 can be implemented as a SAS switch chip. In addition, in an implementation aspect of this embodiment, the above-mentioned SAS switch chip model is PM8056. However, the specifications and models of the switch 13 can also be changed according to different server configurations, and the present invention is not limited thereto.

為說明機櫃管理控制器14,請繼續參考圖1。機櫃管理控制器14係電性連接該些計算節點11、該些存儲節點12及交換器13。此外,當伺服器正在執行開機程序或是接收到外部指令時,機櫃管理控制器14會接獲一硬體需求。當機櫃管理控制器14接獲上述的硬體需求時,機櫃管理控制器14便會控制交換器13連接該些計算節點11的至少一部份到該些存儲節點12的至少一部份。舉例來說,硬體需求可包含該次工作所需的數據計算量與資料,機櫃管理控制器14可依據數據計算量判斷所需的計算節點11數量,以及依據上述的資料判斷需要選取的存儲節點12。當機櫃管理控制器14選定所需的計算節點11和存儲節點12時,便進一步控制交換器13將被選取的計算節點11連接到被選取的存儲節點12,以使該些計算節點11能存取該些存儲節點12的資料並執行運算。To illustrate the rack management controller 14, please continue to refer to Figure 1. The rack management controller 14 is electrically connected to the computing nodes 11, the storage nodes 12 and the switch 13. In addition, when the server is executing the boot process or receives an external command, the rack management controller 14 will receive a hardware requirement. When the rack management controller 14 receives the aforementioned hardware requirements, the rack management controller 14 controls the switch 13 to connect at least a part of the computing nodes 11 to at least a part of the storage nodes 12. For example, the hardware requirements may include the amount of data calculation and data required for this job. The rack management controller 14 may determine the number of computing nodes 11 required based on the amount of data calculation, and determine the storage that needs to be selected based on the above data. Node 12. When the rack management controller 14 selects the required computing nodes 11 and storage nodes 12, it further controls the switch 13 to connect the selected computing nodes 11 to the selected storage nodes 12, so that these computing nodes 11 can store Take the data of the storage nodes 12 and perform calculations.

承上所述,於實務上,該些計算節點11可包含複雜可程式化邏輯裝置(complex programmable logic device,CPLD)、實時時鐘(real-time clock,RTC)、溫度感測器(temperature sensor)、現場可更換單元(field replace unit,FRU)或其他能和計算節點11搭配運作的元件。值得一提的是,在本發明所揭示的伺服器系統中,機櫃管理控制器14可透過交換器13直接收集各計算節點11的資訊(例如:溫度、電壓和CPLD的韌體版本等),不需另外配置基板管理控制器(baseboard management controller,BMC)。因此,本發明所揭示的伺服器系統不但能簡化伺服器的架構,也能降低維護伺服器所需的成本。In summary, in practice, the computing nodes 11 may include complex programmable logic device (CPLD), real-time clock (RTC), and temperature sensor. , Field replaceable unit (FRU) or other components that can work with the computing node 11. It is worth mentioning that in the server system disclosed in the present invention, the rack management controller 14 can directly collect information (for example, temperature, voltage, and CPLD firmware version, etc.) of each computing node 11 through the switch 13. No additional configuration of baseboard management controller (BMC) is required. Therefore, the server system disclosed in the present invention can not only simplify the structure of the server, but also reduce the cost of maintaining the server.

另一方面,當計算節點11配置了複雜可程式化邏輯裝置時,計算節點11的運作情形係由機櫃管理控制器14所監控。一般而言,習知技術所提及「以基板管理控制器監控各計算節點」的伺服器架構,其複雜可程式化邏輯裝置係受控於基板管理控制器,因此複雜可程式化邏輯裝置的韌體於習知架構下係無法支援帶內升級。然而,在本發明的架構下,複雜可程式化邏輯裝置的韌體不僅能支援帶外(out-of-band)升級,且也能支援帶內(in-band)升級。詳細來說,帶外升級係可透過串列SCSI(serial attached SCSI,SAS)的高速拓樸網路,將複雜可程式化邏輯的韌體傳送到交換器13後,開始進行升級。另一方面,帶內升級則透過機櫃管理控制器14的連接埠,將複雜可程式化邏輯的韌體傳送到交換器13後開始進行;其中該連接埠可以硬體規格支援I2C匯流排的連接埠實現。此外,於本實施例的一實施樣態中,上述的串列SCSI係以SAS 3.0實現。然而,依據不同的傳輸速率需求,上述的串列SCSI亦可以其他版本實現,本實施例不以此為限。由上述說明可得知,本發明所揭示的伺服器系統亦提升了複雜可程式化邏輯裝置的韌體升級的便利性,讓用戶能更有彈性地選擇升級方式。On the other hand, when the computing node 11 is configured with a complex programmable logic device, the operation status of the computing node 11 is monitored by the rack management controller 14. Generally speaking, in the server architecture of "monitoring each computing node with the baseboard management controller" mentioned in the prior art, the complex programmable logic device is controlled by the baseboard management controller, so the complex programmable logic device The firmware cannot support in-band upgrades under the conventional architecture. However, under the framework of the present invention, the firmware of a complex programmable logic device can not only support out-of-band upgrades, but also support in-band upgrades. In detail, the out-of-band upgrade system can transfer the complex programmable logic firmware to the switch 13 through a serial attached SCSI (serial attached SCSI, SAS) high-speed topology network, and then start the upgrade. On the other hand, the in-band upgrade is performed through the port of the cabinet management controller 14 to transfer the complex programmable logic firmware to the switch 13; the port can support I2C bus connection with hardware specifications Port realization. In addition, in an implementation aspect of this embodiment, the aforementioned serial SCSI is implemented in SAS 3.0. However, according to different transmission rate requirements, the serial SCSI described above can also be implemented in other versions, and this embodiment is not limited to this. It can be seen from the above description that the server system disclosed in the present invention also improves the convenience of firmware upgrades of complex programmable logic devices, allowing users to choose upgrade methods more flexibly.

需補充的是,前述的串列SCSI為一種電腦集線的技術,其主要的功能係為電腦週邊零件(例如為硬碟、CD-ROM等)傳輸資料。另一方面,前述的SAS係串列SCSI的一種規格,支援2.5英寸的硬碟並採取直接的點對點(point-to-point)序列式傳輸方式。於前一段落提及的SAS 3.0為第三代SAS,其每一驅動器可提供 12.0 Gbps(12000 Mbps)的傳輸速率。It should be added that the aforementioned serial SCSI is a computer hub technology, and its main function is to transmit data to computer peripheral parts (such as hard disks, CD-ROMs, etc.). On the other hand, the aforementioned SAS serial SCSI is a specification that supports 2.5-inch hard drives and adopts a direct point-to-point serial transmission method. The SAS 3.0 mentioned in the previous paragraph is the third-generation SAS, and each drive can provide a transmission rate of 12.0 Gbps (12000 Mbps).

請參考圖2,圖2為本發明一實施例的伺服器系統的管理方法的流程圖。請參考步驟S0:機櫃管理控制器致動多個計算節點與多個存儲節點。詳細來說,當伺服器的電源被啟動時,機櫃管理控制器會致動該些計算節點與該些存儲節點,使該些計算節點與該些存儲節點進入待機(stand-by)狀態,以供後續的運作使用。當伺服器完成開機程序並產生與運算相關的指令時,請參考步驟S1:機櫃管理控制器根據硬體需求,控制交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份。詳細來說,當伺服器產生與運算相關的指令時,機櫃管理控制器會接獲關聯於該指令的硬體需求,並依據該硬體需求從所有的節點中選取運作所需的計算節點和存儲節點。當機櫃管理控制器完成選取運作所需的計算節點和存儲節點時,能更進一步地控制交換器將被選取的計算節點連接到被選取的存儲節點,以執行相關的運算。Please refer to FIG. 2, which is a flowchart of a server system management method according to an embodiment of the present invention. Please refer to step S0: the rack management controller activates multiple computing nodes and multiple storage nodes. In detail, when the power of the server is turned on, the rack management controller will activate the computing nodes and the storage nodes to make the computing nodes and the storage nodes enter the standby (stand-by) state to For subsequent operations. When the server completes the boot process and generates calculation-related instructions, please refer to step S1: The rack management controller controls the switch to connect at least part of the computing nodes to at least one of the storage nodes according to the hardware requirements. Part. In detail, when the server generates instructions related to operations, the rack management controller will receive the hardware requirements associated with the instructions, and select the computing nodes and computing nodes required for operation from all nodes based on the hardware requirements. Storage node. When the rack management controller completes the selection of computing nodes and storage nodes required for operation, it can further control the switch to connect the selected computing node to the selected storage node to perform related operations.

請參考圖3,圖3為本發明一實施例的伺服器系統的管理方法的步驟S1的細部流程圖。承前所述,當機櫃管理控制器接獲硬體需求時,請參考步驟S11:機櫃管理控制器控制交換器連接該些計算節點的其中之一到該些存儲節點的其中之一。具體來說,該些計算節點的其中之一即為被機櫃管理控制器選取的計算節點,該些存儲節點的其中之一則為被機櫃管理控制器選取的存儲節點。當機櫃管理控制器控制交換器連接該些計算節點的其中之一到該些存儲節點的其中之一時,請參考步驟S12:機櫃管理控制器判斷被連接的計算節點是否能提供硬體需求所需的數據計算量;其中被連接的計算節點即為被機櫃管理控制器選取的計算節點。當機櫃管理控制器判斷被連接的計算節點能提供該硬體需求所需的數據計算量時,請參考步驟S13:計算節點根據存儲節點的資料執行運算,計算節點的運作情形則由機櫃管理控制器所監控。Please refer to FIG. 3, which is a detailed flowchart of step S1 of the server system management method according to an embodiment of the present invention. As mentioned above, when the rack management controller receives the hardware demand, please refer to step S11: the rack management controller controls the switch to connect one of the computing nodes to one of the storage nodes. Specifically, one of the computing nodes is the computing node selected by the rack management controller, and one of the storage nodes is the storage node selected by the rack management controller. When the rack management controller controls the switch to connect one of the computing nodes to one of the storage nodes, please refer to step S12: the rack management controller determines whether the connected computing node can provide the hardware requirements. The amount of data calculation; the connected computing node is the computing node selected by the cabinet management controller. When the rack management controller determines that the connected computing node can provide the required data calculation amount for the hardware, please refer to step S13: the computing node performs calculations based on the data of the storage node, and the operation of the computing node is controlled by the rack management Monitored by the device.

承上所述,當機櫃管理控制器判斷被連接的計算節點不能提供硬體需求所需的數據計算量時,請參考步驟S14:機櫃管理控制器控制交換器連接該些計算節點的另一個到該些存儲節點的其中之一。詳細來說,當機櫃管理控制器當前選取的計算節點無法負荷前述的數據計算量,機櫃管理控制器需要從其他未被選取的計算節點中,再次根據當前的數據計算量選取所需的計算節點,以提供足夠負荷當前的數據計算量的運算效能。由此可知,當伺服器的數據計算量突然增加時(例如:特殊節日導致網路購物量增加,或是線上遊戲舉辦特別活動使網路流量增加等),機櫃管理控制器能即時依據數據計算量的變化,透過交換器靈活地調配計算節點。另一方面,當運轉中的計算節點或使用中的連接埠突然故障時,機櫃管理控制器也能即時透過交換器選擇其他可運作的計算節點或連接埠,使當前的計算工作能持續進行。Continuing from the above, when the rack management controller determines that the connected computing node cannot provide the amount of data calculation required by the hardware requirements, please refer to step S14: the rack management controller controls the switch to connect the other one of these computing nodes. One of these storage nodes. In detail, when the computing node currently selected by the rack management controller cannot load the aforementioned data calculation amount, the rack management controller needs to select the required calculation node from other unselected computing nodes again according to the current data calculation amount , In order to provide sufficient computing performance to load the current data calculation. It can be seen that when the amount of data calculation on the server increases suddenly (for example: special festivals increase online shopping, or online games hold special events to increase network traffic, etc.), the cabinet management controller can calculate based on the data in real time The amount of change, the flexible deployment of computing nodes through the switch. On the other hand, when a computing node or port in use suddenly fails, the rack management controller can also instantly select other operational computing nodes or ports through the switch, so that the current computing work can continue.

綜上所述,本發明在於提供一種伺服器系統與管理方法,所述伺服器系統的交換器能以多個連接埠連接機櫃管理控制器與各節點。當特定的連接埠或節點損壞時,機櫃管理控制器能透過其他連接埠控制運作所需的節點。此外,機櫃管理控制器更能依據數據運算量的變化,即時調整運算節點和存儲節點的運作數量。因此,所述的伺服器系統提供了更靈活的伺服器管理方法,並有效地改善先前技術所提及的問題。In summary, the present invention is to provide a server system and a management method, wherein the switch of the server system can connect the rack management controller and each node with multiple ports. When a specific port or node is damaged, the rack management controller can control the nodes required for operation through other ports. In addition, the rack management controller can adjust the number of operation nodes and storage nodes in real time based on changes in the amount of data calculations. Therefore, the server system described provides a more flexible server management method, and effectively improves the problems mentioned in the prior art.

雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明。在不脫離本發明之精神和範圍內,所為之更動與潤飾,均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。Although the present invention is disclosed in the foregoing embodiments, it is not intended to limit the present invention. All changes and modifications made without departing from the spirit and scope of the present invention fall within the scope of patent protection of the present invention. For the scope of protection defined by the present invention, please refer to the attached patent scope.

11:計算節點 12:存儲節點 13:交換器 14:機櫃管理控制器 15:第一連接埠 16:第二連接埠11: compute node 12: storage node 13: switch 14: Cabinet Management Controller 15: The first port 16: second port

圖1為本發明一實施例的伺服器系統的方塊結構圖。 圖2為本發明一實施例的伺服器系統的管理方法的流程圖。 圖3為本發明一實施例的伺服器系統的管理方法的細部流程圖。 FIG. 1 is a block diagram of a server system according to an embodiment of the invention. FIG. 2 is a flowchart of a method for managing a server system according to an embodiment of the invention. FIG. 3 is a detailed flowchart of a management method of a server system according to an embodiment of the present invention.

11:計算節點 11: compute node

12:存儲節點 12: storage node

13:交換器 13: switch

14:機櫃管理控制器 14: Cabinet Management Controller

15:第一連接埠 15: The first port

16:第二連接埠 16: second port

Claims (10)

一種伺服器系統,包含: 多個計算節點和多個存儲節點,於被致動後開始運作; 一交換器,透過多個第一連接埠各別電性連接該些計算節點,以及透過多個第二連接埠各別電性連接該些存儲節點;以及 一機櫃管理控制器,電性連接該些計算節點、該些存儲節點及該交換器,並於接獲一硬體需求時,根據該硬體需求控制該交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份。 A server system including: Multiple computing nodes and multiple storage nodes begin to operate after being activated; A switch electrically connected to the computing nodes through a plurality of first ports, and electrically connected to the storage nodes through a plurality of second ports; and A rack management controller is electrically connected to the computing nodes, the storage nodes, and the switch, and when receiving a hardware demand, controls the switch to connect to at least one of the computing nodes according to the hardware demand Part to at least part of the storage nodes. 如請求項1所述的伺服器系統,其中該機櫃管理控制器係根據該硬體需求,控制該交換器連接該些計算節點的其中之一到該些存儲節點的其中之一,並判斷被連接的該計算節點是否能提供該硬體需求所需的數據計算量;當該機櫃管理控制器判斷被連接的該計算節點能提供該硬體需求所需的數據計算量,以該計算節點根據該存儲節點的資料執行運算;當該機櫃管理控制器判斷被連接的該計算節點不能提供該硬體需求所需的數據計算量,以該機櫃管理控制器控制該交換器連接該些計算節點的另一個到該些存儲節點的其中之一。The server system according to claim 1, wherein the rack management controller controls the switch to connect one of the computing nodes to one of the storage nodes according to the hardware requirements, and determines whether Whether the connected computing node can provide the required data calculation amount for the hardware; when the rack management controller determines that the connected computing node can provide the required data calculation amount for the hardware request, the computing node is based on The data of the storage node performs calculations; when the rack management controller determines that the connected computing node cannot provide the required data calculation amount required by the hardware, the rack management controller controls the switch connected to the computing nodes The other to one of the storage nodes. 如請求項1所述的伺服器系統,其中該些第一連接埠與該些第二連接埠的硬體規格係支援積體電路匯流排。The server system according to claim 1, wherein the hardware specifications of the first ports and the second ports support an integrated circuit bus. 如請求項1所述的伺服器系統,其中該些計算節點包含一複雜可程式化邏輯裝置。The server system according to claim 1, wherein the computing nodes include a complex programmable logic device. 如請求項1所述的伺服器系統,其中該些計算節點包含一實時時鐘。The server system according to claim 1, wherein the computing nodes include a real-time clock. 如請求項1所述的伺服器系統,其中該些計算節點包含一溫度感測器。The server system according to claim 1, wherein the computing nodes include a temperature sensor. 如請求項1所述的伺服器系統,其中該些計算節點包含一現場可更換單元。The server system according to claim 1, wherein the computing nodes include a field replaceable unit. 一種伺服器系統的管理方法,包含: 以一機櫃管理控制器致動多個計算節點與多個存儲節點;以及 於該機櫃管理控制器接獲一硬體需求時,以該機櫃管理控制器根據該硬體需求控制一交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份。 A management method for a server system includes: Actuate multiple computing nodes and multiple storage nodes with a rack management controller; and When the rack management controller receives a hardware requirement, the rack management controller controls a switch to connect at least part of the computing nodes to at least part of the storage nodes according to the hardware requirement. 如請求項8所述的管理方法,其中於該機櫃管理控制器接獲該硬體需求時,以該機櫃管理控制器根據該硬體需求控制該交換器連接該些計算節點的至少一部份到該些存儲節點的至少一部份,包含: 該機櫃管理控制器控制該交換器連接該些計算節點的其中之一到該些存儲節點的其中之一; 以該機櫃管理控制器判斷被連接的該計算節點是否能提供該硬體需求所需的數據計算量; 當該機櫃管理控制器判斷被連接的該計算節點能提供該硬體需求所需的數據計算量,以該計算節點根據該存儲節點的資料執行運算;以及 當該機櫃管理控制器判斷被連接的該計算節點不能提供該硬體需求所需的數據計算量,以該機櫃管理控制器控制該交換器連接該些計算節點的另一個到該些存儲節點的其中之一。 The management method according to claim 8, wherein when the rack management controller receives the hardware demand, the rack management controller controls at least a part of the switch connected to the computing nodes according to the hardware demand To at least part of these storage nodes, including: The rack management controller controls the switch to connect one of the computing nodes to one of the storage nodes; Use the rack management controller to determine whether the connected computing node can provide the amount of data calculation required by the hardware requirement; When the rack management controller determines that the connected computing node can provide the amount of data calculation required by the hardware, the computing node performs calculations based on the data of the storage node; and When the rack management controller determines that the connected computing node cannot provide the amount of data required by the hardware requirements, the rack management controller controls the switch to connect another of the computing nodes to the storage nodes one of them. 如請求項8所述的管理方法,其中該交換器係透過積體電路匯流排連接該些計算節點與該些存儲節點。The management method according to claim 8, wherein the switch connects the computing nodes and the storage nodes through an integrated circuit bus.
TW108111294A 2019-03-29 2019-03-29 Server system and management method thereto TWI704463B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW108111294A TWI704463B (en) 2019-03-29 2019-03-29 Server system and management method thereto

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108111294A TWI704463B (en) 2019-03-29 2019-03-29 Server system and management method thereto

Publications (2)

Publication Number Publication Date
TWI704463B true TWI704463B (en) 2020-09-11
TW202036318A TW202036318A (en) 2020-10-01

Family

ID=73644194

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108111294A TWI704463B (en) 2019-03-29 2019-03-29 Server system and management method thereto

Country Status (1)

Country Link
TW (1) TWI704463B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102132255A (en) * 2008-05-29 2011-07-20 思杰***有限公司 Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server
US20130010787A1 (en) * 2011-07-08 2013-01-10 Quanta Computer Inc. Rack server system
TW201327144A (en) * 2011-12-21 2013-07-01 Inventec Corp Method for managing cloud server system
CN107239346A (en) * 2017-06-09 2017-10-10 郑州云海信息技术有限公司 A kind of whole machine cabinet computing resource tank node and computing resource pond framework
TW201800952A (en) * 2016-06-16 2018-01-01 廣達電腦股份有限公司 System and method for chassis management
TW201905727A (en) * 2017-06-19 2019-02-01 廣達電腦股份有限公司 Method and system for configuring multi-chassis link and storage medium thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102132255A (en) * 2008-05-29 2011-07-20 思杰***有限公司 Systems and methods for load balancing via a plurality of virtual servers upon failover using metrics from a backup virtual server
US20130010787A1 (en) * 2011-07-08 2013-01-10 Quanta Computer Inc. Rack server system
TW201327144A (en) * 2011-12-21 2013-07-01 Inventec Corp Method for managing cloud server system
TW201800952A (en) * 2016-06-16 2018-01-01 廣達電腦股份有限公司 System and method for chassis management
CN107239346A (en) * 2017-06-09 2017-10-10 郑州云海信息技术有限公司 A kind of whole machine cabinet computing resource tank node and computing resource pond framework
TW201905727A (en) * 2017-06-19 2019-02-01 廣達電腦股份有限公司 Method and system for configuring multi-chassis link and storage medium thereof

Also Published As

Publication number Publication date
TW202036318A (en) 2020-10-01

Similar Documents

Publication Publication Date Title
US10402207B2 (en) Virtual chassis management controller
US9208047B2 (en) Device hardware agent
US8948000B2 (en) Switch fabric management
US9804937B2 (en) Backup backplane management control in a server rack system
US8880937B2 (en) Reducing impact of a repair action in a switch fabric
US8745437B2 (en) Reducing impact of repair actions following a switch failure in a switch fabric
US9329653B2 (en) Server systems having segregated power circuits for high availability applications
JP2016536735A (en) Hard disk and management method
US20200314172A1 (en) Server system and management method thereto
US10852792B2 (en) System and method for recovery of sideband interfaces for controllers
US10853204B2 (en) System and method to detect and recover from inoperable device management bus
TWI704463B (en) Server system and management method thereto
US20240103836A1 (en) Systems and methods for topology aware firmware updates in high-availability systems
TWI611290B (en) Method for monitoring server racks
US20240103971A1 (en) Systems and methods for error recovery in rebootless firmware updates
KR102495712B1 (en) Methods for switching storage systems and operating modes of storage systems
TWI525449B (en) Server control method and chassis controller
US10409940B1 (en) System and method to proxy networking statistics for FPGA cards
US20190197003A1 (en) Systems and methods for managing serial attached small computer system interface (sas) traffic with storage monitoring
US20240103720A1 (en) SYSTEMS AND METHODS FOR SUPPORTING NVMe SSD REBOOTLESS FIRMWARE UPDATES
US20240095020A1 (en) Systems and methods for use of a firmware update proxy
US20240103825A1 (en) Systems and methods for score-based firmware updates
US20240103835A1 (en) Systems and methods for topology aware firmware updates
US20240103829A1 (en) Systems and methods for firmware update using multiple remote access controllers
US20240103846A1 (en) Systems and methods for coordinated firmware update using multiple remote access controllers