TW201816616A - Method for allocating stream computing task and control server - Google Patents

Method for allocating stream computing task and control server Download PDF

Info

Publication number
TW201816616A
TW201816616A TW106127334A TW106127334A TW201816616A TW 201816616 A TW201816616 A TW 201816616A TW 106127334 A TW106127334 A TW 106127334A TW 106127334 A TW106127334 A TW 106127334A TW 201816616 A TW201816616 A TW 201816616A
Authority
TW
Taiwan
Prior art keywords
stream computing
server cluster
cluster
center server
computing center
Prior art date
Application number
TW106127334A
Other languages
Chinese (zh)
Other versions
TWI755417B (en
Inventor
張釗
名浩 李
胡四海
陳友林
汪光煉
Original Assignee
香港商阿里巴巴集團服務有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港商阿里巴巴集團服務有限公司 filed Critical 香港商阿里巴巴集團服務有限公司
Publication of TW201816616A publication Critical patent/TW201816616A/en
Application granted granted Critical
Publication of TWI755417B publication Critical patent/TWI755417B/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Hardware Redundancy (AREA)

Abstract

The present application provides a method for allocating a stream computing task and a control server. The method for allocating a stream computing task is applied to a control server connected to a stream computing center server cluster and a stream computing unit server cluster. The method comprises: allocating stream computing tasks to a target stream computing center server cluster or a target stream computing unit server cluster; determining whether an abnormality has occurred in the target stream computing center server cluster or the target stream computing unit server cluster; and, if yes, allocating a task for which execution has not been completed among the stream computing tasks to a candidate stream computing center server cluster. In embodiments of the present application, when an abnormality occurs in a stream computing center server cluster and a stream computing unit server cluster, execution of a task for which execution has not been completed can be continued on an unaffected stream computing center server cluster, ensuring the smooth execution of stream computing tasks.

Description

流計算任務的分配方法和控制伺服器    Stream computing task distribution method and control server   

本發明涉及流計算技術領域,特別涉及一種流計算任務的分配方法和控制伺服器,一種流計算任務的執行方法和流計算中心伺服器集群,以及,一種流計算系統,一種異地多活系統。 The invention relates to the technical field of stream computing, in particular to a method for distributing stream computing tasks and a control server, a method for executing stream computing tasks, and a cluster of stream computing center servers, and a stream computing system and a remote multi-active system.

在流計算中,無法確定資料的到來時刻和到來順序,也無法將全部資料儲存起來,因此,涉及的伺服器不再進行流式資料的儲存,而是當流動的資料到來後在內部記憶體中直接進行資料的實時計算。隨著流計算在網際網路大資料時代的快速發展,對流式資料的實時性、品質、服務穩定性和可用性,都有了越來越高的要求,因此,對傳統分布式web服務系統也是一個挑戰。由於流計算系統處理的實時計算和讀取的資料量巨大,流計算任務分佈在多個地方時有很多困難,例如,去重統計結果的異地實時合併,如何保證多個地方的資料一致性,資料來源的地域不可控,等等,因此,如何實現對流計算的多地域協同,且實時容災是非常必要的。 In the flow calculation, the arrival time and order of the data cannot be determined, and all data cannot be stored. Therefore, the server involved no longer stores the streaming data, but instead stores the internal data when the flowing data arrives. The real-time calculation of data is performed directly in the process. With the rapid development of stream computing in the era of big data on the Internet, the real-time nature, quality, service stability, and availability of streaming data have become increasingly demanding. Therefore, the traditional distributed web service system is also A challenge. Due to the huge amount of real-time calculation and read data processed by the stream computing system, there are many difficulties when the stream computing tasks are distributed in multiple places. For example, the real-time merging of de-duplicated statistics results, how to ensure the consistency of data in multiple places The area of the data source is uncontrollable, etc. Therefore, how to achieve multi-regional coordination of convection calculation and real-time disaster recovery is very necessary.

現有技術在進行流式任務分配的時候,通常採用異地 冷備的方式進行,即在另外一個地域部署一個閒置伺服器,以便在一個地域的服務不可用時,臨時把流計算任務恢復到另外一個地域的閒置伺服器上。但是該閒置伺服器平時的大量時間都處於空轉狀態,這就造成大量的系統資源浪費的問題。還有另外一種方式,可以將伺服器部署在單個機房或者同地域的多個機房,多個機房資料同時儲存在一個儲存系統來實現流計算。但是這也會導致一旦這個地域的網路不可用(例如出現意外情況,光纜被工程機械挖斷),該地域的儲存系統不可用,或者,該地域的機器資源已經到了擴容上限無法繼續擴容,等等,都會導致流計算系統不可用,無法保證流計算任務的順利分配和後續執行。 In the prior art, when performing streaming task allocation, a remote cold standby method is usually used, that is, an idle server is deployed in another region, so that when the service in one region is unavailable, the stream computing task is temporarily restored to another Geographical idle server. However, the idle server usually spends a lot of time in idle state, which causes a problem of wasting a lot of system resources. There is another way, the server can be deployed in a single computer room or multiple computer rooms in the same area, and the data of multiple computer rooms are stored in a storage system at the same time to implement stream computing. However, this will also cause that once the network in this region is unavailable (for example, an unexpected situation occurs and the optical cable is cut by engineering machinery), the storage system in that region is unavailable, or the machine resources in this region have reached the upper limit of capacity expansion and cannot continue to expand. And so on, it will cause the stream computing system to be unavailable, and the smooth distribution and subsequent execution of stream computing tasks cannot be guaranteed.

基於此,本發明提供了一種流計算任務的分配方法和一種流計算任務的執行方法,用以採用一個控制伺服器來對各流計算任務進行統一分配的方式,由部署在多地的各流計算中心伺服器集群和各流計算單元伺服器集群來執行不同的流計算任務,各流計算中心伺服器集群預留有預設計算資源,且各中心儲存集群之間進行資料同步,並且,各流計算單元伺服器集群的單元儲存集群中的資料也分別同步至各中心儲存集群上。基於此,在某個流計算單元伺服器集群或流計算中心伺服器集群出現異常的時候,能夠將正在執行的流計算任務還未執行完的那部分任務重新分 配至其他地方的某個流計算中心伺服器集群上執行,以實現流計算任務能夠在異地快速的恢復和正常執行,並且不需要配置閒置伺服器,也節省了系統資源。 Based on this, the present invention provides a method for allocating stream computing tasks and a method for executing stream computing tasks. A method for uniformly distributing each stream computing task by using a control server is provided by each stream deployed in multiple places. The computing center server cluster and each stream computing unit server cluster perform different stream computing tasks. Each stream computing center server cluster reserves preset computing resources, and the data is synchronized between the central storage clusters. The data in the unit storage cluster of the stream computing unit server cluster is also synchronized to each central storage cluster. Based on this, when an abnormality occurs in a stream computing unit server cluster or a stream computing center server cluster, it is possible to reassign the part of the stream computing task that is currently being executed to a stream computing elsewhere. Executed on the central server cluster to realize that the stream computing tasks can be quickly resumed and executed normally in different places, and there is no need to configure an idle server, which also saves system resources.

本發明還提供了一種控制伺服器、一種流計算中心伺服器集群和一種流計算系統,用以保證上述方法在實際中的實現及應用。 The invention also provides a control server, a stream computing center server cluster and a stream computing system, which are used to ensure the implementation and application of the above method in practice.

為了解決上述問題,本發明公開了一種計算任務分配方法,該方法應用於與流計算中心伺服器集群和流計算單元伺服器集群相連的控制伺服器上,所述流計算中心伺服器集群預留有預設比例的計算資源;該方法包括:響應於接收到流計算任務,將所述流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群;在所述目標流計算中心伺服器集群或目標流計算單元伺服器集群執行所述流計算任務的過程中,判斷所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,如果是,則將所述流計算任務中未執行完的任務,分配至候選流計算中心伺服器集群。 In order to solve the above problem, the present invention discloses a computing task allocation method. The method is applied to a control server connected to a stream computing center server cluster and a stream computing unit server cluster, and the stream computing center server cluster is reserved. There is a preset proportion of computing resources; the method includes: in response to receiving a stream computing task, allocating the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster; During the execution of the stream computing task by the central server cluster or the target stream computing unit server cluster, it is determined whether an abnormal situation occurs in the target stream computing center server cluster or the target stream computing unit server cluster. The unfinished tasks in the stream computing tasks are allocated to the candidate stream computing center server cluster.

其中,該方法還包括:所述控制伺服器週期性的分別向所述流計算中心伺服器集群和流計算單元伺服器集群發送心跳消息,所述心跳消息用於:檢測所述控制伺服器和所述流計算中心伺服器集群之間是否能夠通訊,以及,檢測所述控制伺服器和所述流計算單元伺服器集群之間是否能夠通訊; 相應的,所述判斷所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,具體為:判斷在預設反饋時間內所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否未反饋心跳響應。 Wherein, the method further comprises: the control server periodically sending a heartbeat message to the stream computing center server cluster and the stream computing unit server cluster respectively, the heartbeat message is used to: detect the control server and Whether the stream computing center server cluster can communicate with each other, and detecting whether the control server and the stream computing unit server cluster can communicate with each other; correspondingly, determining the target stream computing center server cluster Whether the server cluster or the target stream computing unit server cluster is abnormal, specifically, determining whether the target stream computing center server cluster or the target stream computing unit server cluster does not feedback a heartbeat response within a preset feedback time.

其中,所述將所述流計算任務中的未執行完的任務分配至候選流計算中心伺服器集群,包括:所述控制伺服器實時獲取所述流計算中心伺服器集群的負載情況;所述控制伺服器依據所述負載情況,將所述流計算任務中未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the allocating the unfinished tasks in the stream computing task to the candidate stream computing center server cluster includes: the control server obtaining the load situation of the stream computing center server cluster in real time; the The control server allocates the unfinished tasks in the stream computing tasks to the stream computing center server cluster with the smallest current load according to the load situation.

其中,所述流計算中心伺服器集群具有中心儲存集群,各流計算中心伺服器集群之間的中心儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群向各流計算中心伺服器集群的中心儲存集群同步中間狀態資料和中間結果資料;所述方法還包括:控制伺服器將各流計算任務的執行狀態和配置資訊儲存至控制資料庫中;所述執行狀態用於表示:各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上已執行部分;所述配置資訊用於表示:各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計算單元伺服器集群之間的對應關係;相應的,所述將所述流計算任務中未執行完的任務分 配至當前負載最小的流計算中心伺服器集群,包括:所述控制伺服器依據所述控制資料庫中儲存的執行狀態和配置資訊,計算所述流計算任務中未執行完的任務;所述控制伺服器將所述未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the stream computing center server cluster has a central storage cluster, and the central storage clusters between the stream computing center server clusters synchronize intermediate state data and intermediate result data, and each stream computing unit server cluster computes to each stream. The central storage cluster of the central server cluster synchronizes the intermediate state data and intermediate result data. The method further includes: the control server stores the execution status and configuration information of each stream computing task in the control database; the execution status is used for Represents: each stream computing task has been executed on the corresponding stream computing center server cluster or stream computing unit server cluster; the configuration information is used to indicate: each stream computing task and the stream computing center server executing the stream computing task The corresponding relationship between server clusters, or the corresponding relationship between each stream computing task and the stream computing unit server cluster that executes the stream computing task; correspondingly, the unfinished tasks in the stream computing task are executed. The stream computing center server cluster allocated to the current minimum load includes: the control server is based on The execution status and configuration information stored in the control database are used to calculate unfinished tasks in the stream computing task; the control server allocates the unexecuted tasks to the stream computing center server with the smallest current load Cluster.

本發明還提供了一種流計算任務的執行方法,該方法應用於流計算系統中的任意一個預留有預設計算資源的當前流計算中心伺服器集群上,所述流計算系統包括:流計算中心伺服器集群、流計算單元伺服器集群和控制伺服器;所述流計算中心伺服器集群具有中心儲存集群,各中心儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群的單元儲存集群向各中心儲存集群同步中間狀態資料和中間結果資料;該方法包括:響應於所述控制伺服器在所述流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,所述當前流計算中心伺服器集群從中心儲存集群中,獲取執行所述未執行完的任務所需的中間狀態資料和中間結果資料;所述當前流計算中心伺服器集群利用所述預設計算資源、中間狀態資料和中間結果資料執行所述未執行完的任務。 The present invention also provides a method for executing a stream computing task. The method is applied to any current stream computing center server cluster in which a preset computing resource is reserved in a stream computing system. The stream computing system includes: stream computing A central server cluster, a stream computing unit server cluster, and a control server; the stream computing center server cluster has a central storage cluster, and each central storage cluster synchronizes intermediate state data and intermediate result data, and each stream computing unit server The cluster unit storage cluster synchronizes the intermediate state data and intermediate result data to each center storage cluster; the method includes: responding to the control server in other stream computing center server clusters or stream computing unit servos in the stream computing system When an abnormal situation occurs in the server cluster, the unfinished tasks in the redistributed stream computing tasks are obtained, and the current stream computing center server cluster obtains the intermediate state data required to execute the unfinished tasks from the central storage cluster. And intermediate result data; said current stream computing center server cluster utilization Preset computing resources, an intermediate state data and intermediate results of the execution information is not executing the task.

其中,該方法還包括:響應於所述控制伺服器週期性發送心跳消息,所述當前流計算中心伺服器集群週期性向所述控制伺服器反饋心 跳響應;所述心跳消息用於檢測所述控制伺服器與所述當前流計算中心伺服器集群之間是否能夠通訊。 Wherein, the method further comprises: in response to the control server periodically sending a heartbeat message, the current stream computing center server cluster periodically feeds back a heartbeat response to the control server; the heartbeat message is used to detect the control Whether the server can communicate with the server cluster of the current stream computing center.

其中,該方法還包括:所述當前流計算中心伺服器集群檢測向控制伺服器反饋心跳響應失敗的連續次數是否超過預設次數閾值,如果是,則所述當前流計算中心伺服器集群停止所述未執行完的任務的執行。 Wherein, the method further comprises: the current stream computing center server cluster detects whether the number of consecutive times of heartbeat response failure feedback to the control server exceeds a preset number of thresholds, and if so, the current stream computing center server cluster stops Describe the execution of unfinished tasks.

本發明還提供了一種控制伺服器,所述控制伺服器與流計算中心伺服器集群和流計算單元伺服器集群相連,所述流計算中心伺服器集群中預留有預設比例的計算資源;該控制伺服器包括:第一分配單元,用於響應於接收到流計算任務,將所述流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群;判斷單元,用於在所述目標流計算中心伺服器集群或目標流計算單元伺服器集群執行所述流計算任務的過程中,判斷所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況;第二分配單元,用於在所述判斷單元的結果為是的情況下,將所述流計算任務中未執行完的任務分配至候選流計算中心伺服器集群。 The invention also provides a control server, which is connected to a stream computing center server cluster and a stream computing unit server cluster, and a preset proportion of computing resources are reserved in the stream computing center server cluster; The control server includes: a first allocation unit configured to, in response to receiving a stream computing task, allocate the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster; a determining unit, configured to: During the execution of the stream computing task by the target stream computing center server cluster or the target stream computing unit server cluster, determine whether an abnormal situation occurs in the target stream computing center server cluster or the target stream computing unit server cluster. A second allocating unit, configured to allocate, to the case where the result of the judging unit is yes, an unfinished task among the stream computing tasks to a candidate stream computing center server cluster.

其中,該控制伺服器還包括:發送單元,用於週期性的分別向所述流計算中心伺服器集群和流計算單元伺服器集群發送心跳消息,所述心跳 消息用於:檢測所述控制伺服器和所述流計算中心伺服器集群之間是否能夠通訊,以及,檢測所述控制伺服器和所述流計算單元伺服器集群之間是否能夠通訊;相應的,所述判斷單元,具體用於:判斷在預設反饋時間內所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否未反饋心跳響應。 The control server further includes: a sending unit configured to periodically send a heartbeat message to the stream computing center server cluster and the stream computing unit server cluster respectively, and the heartbeat message is used to: detect the control server Whether the server and the stream computing center server cluster can communicate, and detecting whether the control server and the stream computing unit server cluster can communicate; accordingly, the judgment unit is specifically used for : Determine whether the target stream computing center server cluster or the target stream computing unit server cluster does not feedback a heartbeat response within a preset feedback time.

其中,所述第二分配單元包括:獲取負載子單元,用於實時獲取所述流計算中心伺服器集群和流計算單元伺服器集群的負載情況;第一分配子單元,用於依據各流計算中心伺服器集群的負載情況,將所述流計算任務中未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the second distribution unit includes: a load obtaining sub-unit for real-time obtaining the load situation of the stream computing center server cluster and the stream computing unit server cluster; the first distribution sub-unit is used for calculating according to each stream For the load of the central server cluster, the outstanding tasks in the stream computing tasks are allocated to the stream computing center server cluster with the smallest current load.

其中,所述流計算中心伺服器集群具有中心儲存集群,各流計算中心伺服器集群之間的中心儲存集群之間同步中間狀態資料和中間結果資料,且各流計算單元伺服器集群向各流計算中心伺服器集群的中心儲存集群同步中間狀態資料和中間結果資料;所述伺服器還包括:儲存單元,用於將各流計算任務的執行狀態和配置資訊儲存至控制資料庫中;所述執行狀態用於表示:各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上已執行部分;所述配置資訊用於表示:各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計算單元伺服器集群之間的對應關係; 所述第一分配子單元,包括:計算子單元,用於依據所述控制資料庫中儲存的執行狀態和配置資訊,計算所述流計算任務中未執行完的任務;第二分配子單元,用於將所述未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the stream computing center server cluster has a central storage cluster, and the central storage clusters between the stream computing center server clusters synchronize intermediate state data and intermediate result data, and each stream computing unit server cluster sends data to each stream. The central storage cluster of the computing center server cluster stores the synchronized intermediate state data and intermediate result data. The server further includes: a storage unit for storing the execution status and configuration information of each stream computing task in a control database; the The execution status is used to indicate that each stream computing task has been executed on the corresponding stream computing center server cluster or stream computing unit server cluster; the configuration information is used to indicate: each stream computing task and the execution of the stream computing task. The corresponding relationship between the server clusters of the stream computing center, or the corresponding relationship between each stream computing task and the stream computing unit server cluster that executes the stream computing task; the first allocation subunit includes a computing subunit For calculating the flow calculation based on the execution status and configuration information stored in the control database Unfinished tasks in the tasks; a second distribution subunit, configured to allocate the unfinished tasks to a stream computing center server cluster with the smallest current load.

本發明還提供了一種流計算中心伺服器集群,該流計算中心伺服器集群預留有預設計算資源,所述流計算中心伺服器集群與控制伺服器相連,所述控制伺服器還與流計算單元伺服器集群相連;所述流計算中心伺服器集群具有中心儲存集群,中心儲存集群之間同步中間狀態資料和中間結果資料,流計算單元伺服器集群的單元儲存集群向中心儲存集群同步中間狀態資料和中間結果資料;包括:獲取資料單元,用於響應於所述控制伺服器在所述流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,從中心儲存集群中獲取執行所述未執行完的任務所需的中間狀態資料和中間結果資料;執行任務單元,用於利用所述預設計算資源、中間狀態資料和中間結果資料執行所述未執行完的任務。 The invention also provides a stream computing center server cluster, the stream computing center server cluster reserves preset computing resources, the stream computing center server cluster is connected to a control server, and the control server is also connected to the stream server. The computing unit server cluster is connected; the stream computing center server cluster has a central storage cluster, and the central storage cluster synchronizes intermediate state data and intermediate result data, and the unit storage cluster of the stream computing unit server cluster is synchronized to the central storage cluster. State data and intermediate result data; including: an acquisition data unit, configured to respond to an abnormal situation of the control server in another stream computing center server cluster or stream computing unit server cluster in the stream computing system; For the unfinished tasks in the assigned stream computing tasks, obtain the intermediate state data and intermediate result data required to execute the unfinished tasks from the central storage cluster; the execution task unit is configured to utilize the preset computing resources , Intermediate status data, and intermediate result data to perform the unfinished tasks

其中,該流計算中心伺服器集群還包括:反饋單元,用於響應於所述控制伺服器週期性的發送心跳消息,週期性的向所述控制伺服器反饋心跳響應;所述心跳消息用於檢測所述控制伺服器與所述當前流計算中 心伺服器集群之間是否能夠通訊。 Wherein, the stream computing center server cluster further includes: a feedback unit for periodically sending a heartbeat message in response to the control server, and periodically feeding back a heartbeat response to the control server; the heartbeat message is used for It is detected whether the control server and the current stream computing center server cluster can communicate with each other.

其中,該流計算中心服務集群還包括:檢測單元,用於檢測向控制伺服器發送心跳響應失敗的連續次數是否超過預設次數閾值;停止單元,用於在所述檢測單元的結果為是的情況下,停止所述未執行完的任務的執行。 Wherein, the stream computing center service cluster further includes: a detection unit for detecting whether the number of consecutive times that the heartbeat response fails to be sent to the control server exceeds a preset number of thresholds; and a stop unit for the result of the detection unit being yes In the case, execution of the unfinished task is stopped.

本發明還提供了一種流計算系統,所述流計算系統包括:流計算中心伺服器集群和流計算單元伺服器集群,控制伺服器;以及,與所述流計算中心伺服器集群對應的中心儲存集群,與所述控制伺服器對應的控制資料庫,和,與所述流計算單元伺服器集群對應的單元儲存集群。 The invention also provides a stream computing system, which comprises: a stream computing center server cluster and a stream computing unit server cluster, a control server; and a central storage corresponding to the stream computing center server cluster A cluster, a control database corresponding to the control server, and a unit storage cluster corresponding to the stream computing unit server cluster.

本發明還提供了一種異地多活系統,所述異地多活系統包括:第一流計算中心伺服器集群,多個流計算單元伺服器集群,以及控制伺服器;其中,所述第一流計算中心伺服器集群為前述的流計算中心伺服器集群,所述控制伺服器為前述的控制伺服器;以及,所述多個流計算單元伺服器集群分別對應部署於多個第二地理位置;所述第一流計算中心伺服器集群部署於第一地理位置,所述第二地理位置與所述第一地理位置是不同的地理位置。其中,所述異地多活系統還包括:第二流計算中心伺服器集群,所述第二流計算中心伺服器集群與所述第一流計算中心伺服器集群部署在不同的第一地理位置。 The invention also provides a multi-site multi-live system, which comprises: a first-stream computing center server cluster, a plurality of stream-computing unit server clusters, and a control server; wherein the first-stream computing center servos The server cluster is the aforementioned stream computing center server cluster, and the control server is the aforementioned control server; and the plurality of stream computing unit server clusters are respectively deployed in a plurality of second geographical locations; the first A first-class computing center server cluster is deployed in a first geographic location, and the second geographic location is different from the first geographic location. The remote multi-live system further includes: a second stream computing center server cluster, and the second stream computing center server cluster and the first stream computing center server cluster are deployed in different first geographical locations.

本發明還提供了一種異地多活系統,包括:第一流計算中心伺服器,至少用於對外提供計算資源,其中,第一流計算中心伺服器包括第一中心儲存單元;第二流計算中心伺服器,至少用於對外提供計算資源,其中,第二流計算中心伺服器包括第二中心儲存單元;其中,所述第一流計算中心伺服器和第二流計算中心伺服器基於統一的負載均衡策略完成負載均衡,所述第一中心儲存單元和第二中心儲存單元相互熱備援;其中,對於在所述第一流計算中心伺服器上運行的第一流計算任務,當所述第一流計算中心伺服器出現故障無法對外提供計算資源時,終止在第一流計算中心伺服器上運行,並且,基於所述第二流計算中心伺服器的第二中心儲存單元的中間狀態資料和中間結果資料,在所述第二流計算中心伺服器上繼續運行所述第一流計算任務。 The invention also provides a remote multi-active system, including: a first-rate computing center server, at least for providing computing resources to the outside, wherein the first-rate computing center server includes a first central storage unit; and a second-rate computing center server , At least for providing computing resources to the outside, wherein the second stream computing center server includes a second central storage unit; wherein the first stream computing center server and the second stream computing center server are completed based on a unified load balancing policy Load balancing, the first central storage unit and the second central storage unit are hot backup each other; wherein, for a first-rate computing task running on the first-rate computing center server, when the first-rate computing center server is When a failure cannot provide computing resources to the outside, the operation on the first-rate computing center server is terminated, and based on the intermediate state data and intermediate result data of the second central storage unit of the second-rate computing center server, The second stream computing center server continues to run the first stream computing task.

與現有技術相比,本發明包括以下優點: Compared with the prior art, the present invention includes the following advantages:

在本發明實施例中,本發明透過一個控制伺服器來對部署在多地的各流計算中心伺服器集群和流計算單元伺服器集群所執行的任務進行統一分配,實現流計算任務的統一調度和分配,並且利用各中心儲存集群之間實時同步資料的方式,實現了部署在多地的流計算中心伺服器集群或流計算單元伺服器集群同時計算同一個流計算任務的各部分或不同的流計算任務的功能。採用本發明實施例,當一 個地方的流計算中心伺服器集群或流計算單元伺服器集群出現異常時,能快速從異地的流計算中心伺服器集群恢復正在執行的流計算任務,這樣既能保證系統資源平時不空置,也保證了流計算任務的異地多活,即在本地出現異常情況下也能使流計算任務在異地能迅速恢復從而達到流計算服務的高可用性。 In the embodiment of the present invention, the present invention uses a control server to uniformly allocate tasks performed by each stream computing center server cluster and stream computing unit server clusters deployed in multiple places to achieve unified scheduling of stream computing tasks. And distribution, and the use of real-time synchronization of data between the central storage clusters, it is realized that the stream computing center server cluster or stream computing unit server cluster deployed in multiple places simultaneously calculates parts or different parts of the same stream computing task. Features of stream computing tasks. By adopting the embodiment of the present invention, when an abnormality occurs in a stream computing center server cluster or a stream computing unit server cluster in one place, a stream computing task that is being executed can be quickly resumed from a stream computing center server cluster in another place, which can ensure that System resources are not vacant at all times, which also guarantees that the stream computing tasks can be lived in multiple places. Even in the event of a local abnormality, the stream computing tasks can be quickly restored in different places to achieve high availability of stream computing services.

當然,實施本發明的任一產品並不一定需要同時達到以上所述的所有優點。 Of course, it is not necessary for any product of the present invention to achieve all the advantages described above at the same time.

101‧‧‧控制伺服器 101‧‧‧Control server

102‧‧‧流計算中心伺服器集群 102‧‧‧stream computing center server cluster

103‧‧‧流計算單元伺服器集群 103‧‧‧stream computing unit server cluster

104‧‧‧中心儲存集群 104‧‧‧ Central Storage Cluster

105‧‧‧單元儲存集群 105‧‧‧unit storage cluster

201~205‧‧‧步驟 201 ~ 205‧‧‧ steps

301~304‧‧‧步驟 301 ~ 304‧‧‧ steps

401~406‧‧‧步驟 401 ~ 406‧‧‧step

501‧‧‧第一分配單元 501‧‧‧first allocation unit

502‧‧‧判斷單元 502‧‧‧Judgment unit

503‧‧‧第二分配單元 503‧‧‧Second allocation unit

601‧‧‧獲取資料單元 601‧‧‧ Get Data Unit

602‧‧‧執行任務單元 602‧‧‧ execute task unit

為了更清楚地說明本發明實施例中的技術方案,下面將對實施例描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本發明的一些實施例,對於本領域具有通常知識者來講,在不付出創造性勞動性的前提下,還可以根據這些圖式獲得其他的圖式。 In order to explain the technical solutions in the embodiments of the present invention more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those with ordinary knowledge in the field, other schemes can be obtained based on these schemes without paying creative labor.

圖1是本發明在實際應用之後場景架構圖;圖2是本發明的流計算任務的分配方法實施例的流程圖;圖3是本發明的流計算任務的執行方法實施例的流程圖;圖4是本發明的具體例子的方法流程圖;圖5是本發明的控制伺服器實施例的結構方塊圖;圖6是本發明的流計算中心伺服器集群實施例的結構方塊圖。 FIG. 1 is a scene architecture diagram of the present invention after actual application; FIG. 2 is a flowchart of an embodiment of a method for distributing stream computing tasks according to the invention; FIG. 3 is a flowchart of an embodiment of a method for executing stream computing tasks according to the invention; 4 is a method flowchart of a specific example of the present invention; FIG. 5 is a structural block diagram of the control server embodiment of the present invention; and FIG. 6 is a structural block diagram of the stream computing center server cluster embodiment of the present invention.

下面將結合本發明實施例中的圖式,對本發明實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是本發明一部分實施例,而不是全部的實施例。基於本發明中的實施例,本領域具有通常知識者在沒有做出創造性勞動前提下所獲得的所有其他實施例,都屬本發明保護的範圍。 In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described in combination with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those with ordinary knowledge in the art without making creative work are within the scope of the present invention.

為了方便本領域技術人員對本發明中的技術術語有進一步的理解,下面將技術術語進行解釋和介紹。 In order to facilitate those skilled in the art to further understand the technical terms in the present invention, the technical terms are explained and introduced below.

伺服器集群,就是指將一個或多個伺服器集中起來一起進行同一種服務,在客戶端看來就像是只有一個伺服器。伺服器集群可以利用多個電腦進行並行計算從而獲得很高的計算速度,也可以用多個電腦做備份,從而使得任何一個電腦壞了整個伺服器集群還是能正常運行。 Server cluster refers to the aggregation of one or more servers to perform the same service. It appears to the client as if there is only one server. The server cluster can use multiple computers to perform parallel calculations to obtain a high computing speed, and can also use multiple computers for backup, so that any one computer can still run normally if the entire server cluster is broken.

流計算中心伺服器集群,指的是用於執行流計算任務的伺服器集群,這些伺服器集群需要預留有預設計算資源,並將執行流計算任務過程中產生的中間結果資料和中間狀態資料儲存至中心儲存集群中。 Stream computing center server clusters refer to server clusters used to perform stream computing tasks. These server clusters need to reserve preset computing resources and store intermediate results and intermediate states generated during the execution of stream computing tasks. Data is stored in a central storage cluster.

流計算單元伺服器集群,也指的是用於執行流計算任務的伺服器集群,並將執行流計算任務過程中產生的中間結果資料和中間狀態資料儲存至單元儲存集群中,只是這些伺服器集群可以不預留預設計算資源。 Stream computing unit server cluster, also refers to the server cluster used to perform stream computing tasks, and stores intermediate result data and intermediate state data generated during the execution of stream computing tasks in the unit storage cluster, but these servers The cluster may not reserve preset computing resources.

儲存集群,是將一台或多台儲存設備中的儲存空間聚 合成一個能夠給伺服器集群提供統一存取介面和管理介面的儲存池,伺服器集群可以透過該統一存取介面透明地存取和利用所有儲存設備上的磁碟,因此,儲存集群可以充分發揮儲存設備的性能和磁碟利用率。 A storage cluster is a storage pool that aggregates the storage space in one or more storage devices into a storage pool that can provide a unified access interface and a management interface to the server cluster. The server cluster can be transparently accessed through the unified access interface. And the use of disks on all storage devices, so the storage cluster can take full advantage of the performance of the storage device and disk utilization.

中心儲存集群,是用於為流計算中心伺服器集群提供儲存空間的儲存集群;單元儲存集群,是用於為流計算單元伺服器集群提供儲存空間的儲存集群。 A central storage cluster is a storage cluster used to provide storage space for a stream computing center server cluster; a unit storage cluster is a storage cluster used to provide storage space for a stream computing unit server cluster.

參考圖1所示,為本發明中的流計算任務的分配方法在實際應用中的場景架構圖。在圖1所示的一個流計算系統中,可以配置一個控制伺服器101,m個流計算中心伺服器集群102和n個流計算單元伺服器集群103。其中,m和n分別為大於1的整數。優選的,流計算中心伺服器集群102可以配置兩個。控制伺服器101可以向各流計算中心伺服器集群102和流計算單元伺服器集群103分配流計算任務,其中,各個流計算中心伺服器集群102上均可以預留一部分計算資源,流計算單元伺服器集群103上無需預留計算資源,基於此,當該流計算系統中的一個流計算中心伺服器集群102或流計算單元伺服器集群103異常的時候,控制伺服器101可以檢測到該異常進而將該異常的流計算中心伺服器集群102或流計算單元伺服器集群103未執行完的任務,重新分配給其他正常的候選流計算中心伺服器集群102執行。需要說明的是,因為各流計算單元伺服器集群103不會預留計算資源,因此,控制伺服器101在重新分配未執行完的任務的時候,只會選擇正常的流計算中心伺 服器集群102而不會選擇流計算單元伺服器集群103作為候選流計算中心伺服器集群。 Referring to FIG. 1, a scenario architecture diagram of a method for allocating stream computing tasks in an actual application of the present invention is shown. In a stream computing system shown in FIG. 1, one control server 101, m stream computing center server clusters 102, and n stream computing unit server clusters 103 may be configured. Among them, m and n are each an integer greater than 1. Preferably, two stream computing center server clusters 102 can be configured. The control server 101 may allocate stream computing tasks to each stream computing center server cluster 102 and stream computing unit server cluster 103, wherein each stream computing center server cluster 102 may reserve a part of computing resources, and the stream computing unit servos There is no need to reserve computing resources on the server cluster 103. Based on this, when one of the stream computing center server cluster 102 or the stream computing unit server cluster 103 in the stream computing system is abnormal, the control server 101 can detect the abnormality and further The unfinished tasks of the stream computing center server cluster 102 or the stream computing unit server cluster 103 are reassigned to other normal candidate stream computing center server clusters 102 for execution. It should be noted that, because each stream computing unit server cluster 103 does not reserve computing resources, the control server 101 will only select a normal stream computing center server cluster 102 when reassigning unfinished tasks. The stream computing unit server cluster 103 is not selected as a candidate stream computing center server cluster.

此外,在圖1中,為了保證流計算任務在不同的流計算中心伺服器集群102之間或者從流計算單元伺服器集群103到流計算中心伺服器集群102切換的時候,能夠同步執行,各流計算中心伺服器集群102相連的各個中心儲存集群104之間需要進行中間狀態資料和中間結果資料的同步,即各個中心儲存集群104之間實時同步中間狀態資料和中間結果資料。而流計算單元伺服器集群103各自連接的單元儲存集群105需要將中間狀態資料和中間結果資料同步至各個中心儲存集群104上,可以不在各個單元儲存集群之間進行同步,只同步至各中心儲存集群104即可,這樣就減少了中間狀態資料和中間結果資料在各個單元儲存集群105之間同步時耗費的資源。控制伺服器101還連接有控制資料庫,控制資料庫可以儲存控制伺服器101在分配任務時的配置資訊和執行任務時產生的執行狀態。其中,執行狀態可以表示出各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上執行時已經執行完成的已執行部分;所述配置資訊可以表示:各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計算單元伺服器集群之間的對應關係。 In addition, in FIG. 1, in order to ensure that stream computing tasks can be executed synchronously between different stream computing center server clusters 102 or when switching from stream computing unit server cluster 103 to stream computing center server cluster 102, each The central storage clusters 104 connected to the stream computing center server cluster 102 need to synchronize the intermediate state data and intermediate result data, that is, the central state clusters 104 synchronize the intermediate state data and intermediate result data in real time. The stream storage unit server cluster 103 and the unit storage cluster 105 connected to each other need to synchronize the intermediate state data and the intermediate result data to each central storage cluster 104. Instead of synchronizing between the unit storage clusters, only the central storage The cluster 104 is sufficient, which reduces the resources consumed when the intermediate state data and the intermediate result data are synchronized between the unit storage clusters 105. The control server 101 is also connected to a control database. The control database can store the configuration information of the control server 101 when the task is assigned and the execution state generated when the task is executed. The execution status may indicate an executed part that has been completed when each stream computing task is executed on a corresponding stream computing center server cluster or stream computing unit server cluster; the configuration information may indicate that each stream computing task and Correspondence between stream computing center server clusters executing the stream computing task, or correspondence relationship between each stream computing task and the stream computing unit server cluster executing the stream computing task.

可以理解的是,各流計算中心伺服器集群102可以部署在相同的第一地理位置,優選的,也可以部署在不同的 第一地理位置。其中,第一地理位置可以是城市,包括直轄市、省會城市、地級市、縣級市等,例如,北京,杭州,南京等。例如,一個流計算中心伺服器部署在杭州,另外一個六件中心伺服器也部署在杭州,或者,一個流計算中心伺服器集群部署在杭州,另外一個流計算中心伺服器集群部署在南京或者上海等與杭州不同的地理位置。各流計算單元伺服器集群103也可以部署在不同的第二地理位置,包括直轄市、省會城市、地級市、縣級市等,例如,蘇州、廈門、深圳等。其中,第一地理位置用於表示流計算中心伺服器集群102部署的地理位置,而第二地理位置用於表示流計算單元伺服器集群部署的地理位置。在實際應用中,無論各流計算中心伺服器集群和流計算單元伺服器集群分別部署在哪些不同的地理位置,都由控制伺服器101為其分配流計算任務。 It can be understood that each stream computing center server cluster 102 may be deployed in the same first geographical location, preferably, it may also be deployed in a different first geographical location. The first geographic location may be a city, including a municipality directly under the Central Government, a provincial capital, a prefecture-level city, or a county-level city, such as Beijing, Hangzhou, Nanjing, and so on. For example, one stream computing center server is deployed in Hangzhou, and another six central server is also deployed in Hangzhou, or one stream computing center server cluster is deployed in Hangzhou, and another stream computing center server cluster is deployed in Nanjing or Shanghai. And other geographical locations different from Hangzhou. Each stream computing unit server cluster 103 may also be deployed in a different second geographical location, including municipalities directly under the Central Government, provincial capitals, prefecture-level cities, county-level cities, etc., such as Suzhou, Xiamen, Shenzhen, and so on. The first geographic location is used to represent the geographic location where the stream computing center server cluster 102 is deployed, and the second geographic location is used to represent the geographic location where the stream computing unit server cluster is deployed. In practical applications, the control server 101 assigns stream computing tasks to each stream computing center server cluster and stream computing unit server cluster, regardless of which geographical locations are deployed.

在介紹完應用場景之後,參考圖2,示出了本發明一種基於圖1所示的應用場景進行流計算任務分配的方法實施例的流程,本實施例應用於圖1中的控制伺服器上,本實施例可以包括以下步驟: After introducing the application scenario, with reference to FIG. 2, a flowchart of an embodiment of a method for assigning stream computing tasks based on the application scenario shown in FIG. 1 according to the present invention is shown. This embodiment is applied to the control server in FIG. 1. This embodiment may include the following steps:

步驟201:控制伺服器週期性的分別向所述流計算中心伺服器集群和流計算單元伺服器集群發送心跳消息。 Step 201: The control server periodically sends a heartbeat message to the stream computing center server cluster and the stream computing unit server cluster, respectively.

在本實施例中,控制伺服器和各個流計算中心伺服器集群以及各流計算單元伺服器集群都相連,並且在控制伺服器和各個流計算中心伺服器集群之間,以及,控制伺服器和各個流計算單元伺服器集群之間建立心跳消息反饋機 制。基於此,控制伺服器週期性的向各個流計算中心伺服器集群和各個流計算單元伺服器集群,分別發送心跳消息,該心跳消息用於檢測所述控制伺服器和所述流計算中心伺服器集群之間是否能夠正常通訊,以及,檢測所述控制伺服器和所述流計算單元伺服器集群之間是否能夠正常通訊。透過各個流計算中心伺服器集群和各個流計算單元伺服器集群是否正常反饋了心跳響應,可以確認各流計算中心伺服器集群和流計算單元伺服器集群是否能正常通訊,如果不能正常通訊,通常情況下就說明流計算中心伺服器集群或流計算單元伺服器集群出現了異常情況,不能再正常執行任務。 In this embodiment, the control server and each stream computing center server cluster and each stream computing unit server cluster are connected, and between the control server and each stream computing center server cluster, and the control server and A heartbeat message feedback mechanism is established between the server clusters of each stream computing unit. Based on this, the control server periodically sends a heartbeat message to each stream computing center server cluster and each stream computing unit server cluster, and the heartbeat message is used to detect the control server and the stream computing center server. Whether the clusters can communicate normally, and detecting whether the control server and the stream computing unit server cluster can communicate normally. Through the heartbeat response of each stream computing center server cluster and each stream computing unit server cluster, it can be confirmed whether each stream computing center server cluster and stream computing unit server cluster can communicate normally. If they cannot communicate normally, usually In this case, it indicates that an abnormal situation occurs in the server cluster of the stream computing center or the server cluster of the stream computing unit, and the task can no longer be performed normally.

具體的,如果控制伺服器能夠正常接收到各流計算中心伺服器集群或流計算單元伺服器集群反饋的心跳響應,則認為該流計算中心伺服器集群和流計算單元伺服器集群能夠和控制伺服器正常通訊,即沒有出現異常情況,反之則認為流計算中心伺服器集群和流計算單元伺服器集群不能夠和控制伺服器正常通訊,即出現了異常情況。其中,發送心跳消息的週期可以是心跳時長,例如1秒鐘。當然本領域技術人員可以自主設置心跳時長。 Specifically, if the control server can normally receive the heartbeat response from the stream computing center server cluster or the stream computing unit server cluster feedback, it is considered that the stream computing center server cluster and the stream computing unit server cluster can and the control servo The server communicates normally, that is, there is no abnormal situation. On the other hand, it is considered that the stream computing center server cluster and the stream computing unit server cluster cannot communicate with the control server normally, and an abnormal situation occurs. The period for sending a heartbeat message may be a heartbeat duration, for example, 1 second. Of course, those skilled in the art can independently set the heartbeat duration.

步驟202:響應於接收到流計算任務,控制伺服器將所述流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群。 Step 202: In response to receiving the stream computing task, the control server allocates the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster.

在實際應用中,控制伺服器可以由系統管理員操控,控制伺服器可以提供人機互動介面由系統管理員輸入任務 指令,並按照系統管理員輸入的任務指令將流計算任務發送給系統管理員指定的流計算中心伺服器集群或流計算中心(即目標流計算中心伺服器集群或目標流計算單元伺服器集群)。當然,在實際應用中,也可以採用其他方式來確定目標流計算中心伺服器集群或目標流計算單元伺服器集群,例如,控制伺服器按照輪訓的方式隨機確定一個流計算中心伺服器集群作為目標流計算中心伺服器集群,或者隨機確認一個流計算單元伺服器集群作為目標流計算單元伺服器集群。 In practical applications, the control server can be controlled by the system administrator. The control server can provide a human-computer interaction interface. The system administrator inputs task instructions and sends the flow calculation tasks to the system administrator according to the task instructions entered by the system administrator. The specified stream computing center server cluster or stream computing center (that is, the target stream computing center server cluster or the target stream computing unit server cluster). Of course, in practical applications, other methods can also be used to determine the target stream computing center server cluster or target stream computing unit server cluster. For example, the control server randomly determines a stream computing center server cluster as a target in a rotation training manner. Stream computing center server cluster, or randomly confirm a stream computing unit server cluster as the target stream computing unit server cluster.

在步驟202和步驟204之間,可選的,還可以執行步驟203: Between step 202 and step 204, optionally, step 203 may also be performed:

步驟203:控制伺服器將各流計算任務的執行狀態和配置資訊儲存至控制資料庫中。 Step 203: The control server stores the execution status and configuration information of each stream computing task in the control database.

在本實施例中,可選的,控制伺服器在分配流計算任務後,可以將各個流計算任務的配置資訊儲存至與其相連的控制資料庫中,例如,各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計算單元伺服器集群之間的對應關係。此外,控制伺服器還可以將各流計算任務在流計算中心伺服器集群或流計算單元伺服器集群上的執行狀態儲存在控制資料庫中,其中,執行狀態可以表示:各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上執行時已經執行完成的已執行部分。 In this embodiment, optionally, after the stream computing task is assigned, the control server may store configuration information of each stream computing task in a control database connected to the stream computing task, for example, each stream computing task and executing the stream computing task. Correspondence between task stream computing center server clusters, or correspondence between each stream computing task and a stream computing unit server cluster executing the stream computing task. In addition, the control server can also store the execution status of each stream computing task on the stream computing center server cluster or stream computing unit server cluster in the control database, where the execution status can indicate that each stream computing task is corresponding to The executed portion of the stream computing center server cluster or stream computing unit server cluster that has been executed when executed.

步驟204:在所述目標流計算中心伺服器集群或目標 流計算單元伺服器集群執行所述流計算任務的過程中,判斷所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,如果是,則進入步驟205,如果沒有,則繼續執行本步驟進行判斷。 Step 204: In the process of the target stream computing center server cluster or the target stream computing unit server cluster performing the stream computing task, determine whether the target stream computing center server cluster or the target stream computing unit server cluster is If an abnormal situation occurs, if yes, go to step 205; if not, continue to perform this step to determine.

控制伺服器在分配了流計算任務之後,在目標流計算中心伺服器集群或目標流計算單元伺服器集群執行所述流計算任務的過程中,控制伺服器實時檢測自己與該目標流計算中心伺服器集群或目標流計算單元伺服器集群之間的連接是否正常,如果正常則說明目標流計算中心伺服器集群或目標流計算單元伺服器集群沒有出現異常情況。而如果連接不正常,例如,控制伺服器在預設反饋時間內收不到目標流計算中心伺服器集群或目標流計算單元伺服器集群反饋的心跳響應,則說明連接不正常,在這種情況下,可能是目標流計算中心伺服器集群或目標流計算單元伺服器集群出現了異常情況。 After the flow control task is assigned by the control server, in the process of performing the flow calculation task by the target flow calculation center server cluster or the target flow calculation unit server cluster, the control server detects itself and the target flow calculation center server in real time. Whether the connection between the server cluster or the target stream computing unit server cluster is normal, if it is normal, it means that the target stream computing center server cluster or the target stream computing unit server cluster does not have an abnormal situation. If the connection is abnormal, for example, the control server cannot receive the heartbeat response from the target stream computing center server cluster or the target stream computing unit server cluster within the preset feedback time, then the connection is abnormal. In this case, Next, there may be an abnormal situation in the target stream computing center server cluster or the target stream computing unit server cluster.

可以理解的是,如果目標流計算單元伺服器集群只包括一個流計算單元伺服器,則該流計算單元伺服器出現異常就需要進入步驟205;而對於目標流計算單元伺服器集群包括多個流計算單元伺服器的情況,只有該目標流計算單元伺服器集群的所有流計算單元伺服器都出現異常的情況,控制伺服器與該目標流計算單元伺服器集群的連接才會斷掉,在本步驟中才會判斷得到整個流計算中心單元伺服器集群都出現了異常情況。例如,在實際應用中,目標流計算單元伺服器集群所在的機房出現了斷電或者火災等 情況。在實際中還有一種可能是,該目標流計算單元伺服器集群中只有一部分的流計算單元伺服器出現了異常,例如,該流計算單元伺服器出現當機等情況,在這種情況下,該異常的流計算單元伺服器上正在執行的任務中未執行完的部分會切換到其他正常的流計算單元伺服器,以使得整個流計算單元伺服器集群所執行的任務能夠順利執行,保證流計算單元伺服器集群整體上處於正常運行狀態。 It can be understood that if the target stream computing unit server cluster includes only one stream computing unit server, if the stream computing unit server fails, it needs to proceed to step 205; and for the target stream computing unit server cluster including multiple streams In the case of the computing unit server, only when all the stream computing unit servers of the target stream computing unit server cluster are abnormal, the connection between the control server and the target stream computing unit server cluster will be broken. Only in the step can it be determined that an abnormal situation has occurred in the server cluster of the entire stream computing center unit. For example, in the actual application, there is a power outage or fire in the computer room where the server cluster of the target stream computing unit is located. In practice, there is also a possibility that only a part of the stream computing unit server in the target stream computing unit server cluster is abnormal. For example, the stream computing unit server is down. In this case, The unfinished part of the tasks being executed on the abnormal stream computing unit server will be switched to other normal stream computing unit servers, so that the tasks performed by the entire stream computing unit server cluster can be smoothly executed, ensuring the stream The computing unit server cluster is in a normal operating state as a whole.

當然,控制伺服器可以步驟201中發送心跳消息後是否能在預設反饋時間內接收到心跳響應來判斷目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,例如,在連續一分鐘內都沒有收到目標流計算中心伺服器集群或目標流計算單元伺服器集群反饋的心跳響應,則確定該目標流計算中心伺服器集群或目標流計算單元伺服器集群出現異常,接著可以進入步驟205;如果在一分鐘內收到目標流計算中心伺服器集群或目標流計算單元伺服器集群反饋的心跳響應,則確定目標流計算中心伺服器集群或目標流計算單元伺服器集群沒有出現異常,可以繼續執行步驟204進行實時判斷。 Of course, the control server can determine whether an abnormal situation occurs in the target stream computing center server cluster or the target stream computing unit server cluster if it can receive a heartbeat response within a preset feedback time after sending the heartbeat message in step 201. For example, in If the heartbeat response from the target stream computing center server cluster or target stream computing unit server cluster is not received within one minute, it is determined that the target stream computing center server cluster or target stream computing unit server cluster is abnormal, and then Step 205 may be entered; if a heartbeat response from the target stream computing center server cluster or the target stream computing unit server cluster is received within one minute, it is determined that the target stream computing center server cluster or the target stream computing unit server cluster does not have If an abnormality occurs, step 204 may be performed for real-time judgment.

可以理解的是,在一個流計算中心伺服器集群或流計算單元伺服器集群出現異常的情況下,控制伺服器可以向系統管理員報警等進行提示,系統管理員在確定某個流計算中心伺服器集群或流計算單元伺服器集群確實出現異常情況,例如,斷網或者斷電等,則可以進行修復操作等。 待出現異常的流計算中心伺服器集群或流計算單元伺服器集群修復成功之後,還可以作為正常的流計算中心伺服器集群或流計算單元伺服器集群為其分配流計算任務。 It can be understood that, in the case of an abnormality in a stream computing center server cluster or a stream computing unit server cluster, the control server may alert the system administrator to give a prompt, etc. Server cluster or stream computing unit server cluster does have an abnormal situation, for example, if the network is disconnected or power is off, repair operations can be performed. After the abnormal stream computing center server cluster or stream computing unit server cluster is successfully repaired, it can also be used as a normal stream computing center server cluster or stream computing unit server cluster to allocate stream computing tasks to it.

步驟205:將所述流計算任務中未執行完的任務分配至候選流計算中心伺服器集群。 Step 205: Assign unfinished tasks in the stream computing tasks to the candidate stream computing center server cluster.

在本步驟中,未執行完的任務可以為:所述流計算任務中除了所述目標流計算中心伺服器集群或目標流計算單元伺服器集群已執行任務之外的剩餘任務。 In this step, the unfinished tasks may be: the remaining tasks in the stream computing task except the target stream computing center server cluster or the target stream computing unit server cluster that have executed the tasks.

具體的,為了保證流計算任務中未執行完的任務可以快速執行,可以將該未執行完的任務分配至當前負載最小的流計算中心伺服器集群繼續執行。相應的,步驟205可以包括: Specifically, in order to ensure that unfinished tasks in a stream computing task can be executed quickly, the unfinished tasks can be allocated to a stream computing center server cluster with the smallest current load to continue execution. Accordingly, step 205 may include:

步驟A1:所述控制伺服器實時獲取所述多個流計算中心伺服器集群的負載情況。 Step A1: The control server obtains the load conditions of the server clusters of the plurality of stream computing centers in real time.

在步驟A1中,控制伺服器可以實時獲取到各流計算中心伺服器集群和各流計算單元伺服器集群的負載情況。其中,負載情況可以是,CPU的利用率,內部記憶體讀取速度,磁碟輸入輸出I/O性能等硬體的參數值,透過硬體參數值可以確定各流計算中心伺服器集群和流計算單元伺服器集群的負載情況,從而可以在後續需要重新分配某個任務的時候,能夠將任務分配給負載較小的流計算中心伺服器集群或流計算單元伺服器集群。 In step A1, the control server can obtain the load conditions of the server cluster of each stream computing center and the server cluster of each stream computing unit in real time. Among them, the load situation can be CPU utilization, internal memory read speed, disk input / output I / O performance, and other hardware parameter values. Through the hardware parameter values, the server clusters and streams of each stream computing center can be determined. The load of the computing unit server cluster, so that when a task needs to be reassigned in the future, the task can be allocated to a stream computing center server cluster or a stream computing unit server cluster with a smaller load.

可以理解的是,在實際應用中,因為流計算單元伺服器集群不需要預留計算資源,而流計算中心伺服器集群需 要預留計算資源。假設流計算中心伺服器集群的個數為N,其中N為大於1的整數,則預留的計算資源可以是“N*10%”,這樣就可以儘量保證其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時,某個正常的流計算中心伺服器集群有足夠多的計算資源可以執行控制伺服器為其重新分配的任務。其中,該計算資源可以是,CPU、內部記憶體和磁碟等硬體資源。例如,在執行控制伺服器分配的任務時,流計算中心伺服器集群可始終有20%的計算資源空閒,這空閒的20%的計算資源就可以用來執行其他流計算中心伺服器集群或流計算單元伺服器集群上未執行完的任務。 It can be understood that, in practical applications, because the stream computing unit server cluster does not need to reserve computing resources, the stream computing center server cluster needs to reserve computing resources. Assuming that the number of stream computing center server clusters is N, where N is an integer greater than 1, the reserved computing resources can be "N * 10%", so as to ensure the other stream computing center server clusters or streams as much as possible. When an abnormal situation occurs in the computing unit server cluster, a normal stream computing center server cluster has enough computing resources to perform the tasks that the control server redistributes. The computing resource may be a hardware resource such as a CPU, an internal memory, and a magnetic disk. For example, when performing tasks assigned by the control server, the Stream Computing Center server cluster can always have 20% of its computing resources idle, and this idle 20% of computing resources can be used to execute other Stream Computing Center server clusters or streams. Unfinished tasks on a compute unit server cluster.

步驟A2:所述控制伺服器將所述流計算任務中未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Step A2: The control server allocates unfinished tasks in the stream computing tasks to a stream computing center server cluster with the smallest current load.

控制伺服器再將未執行完的任務分配至根據步驟A1中各流計算中心伺服器集群的負載情況確定的、當前負載最小的流計算中心伺服器集群。 The control server then allocates the unfinished tasks to the stream computing center server cluster with the smallest current load, which is determined according to the load situation of each stream computing center server cluster in step A1.

具體的,根據步驟203中的執行狀態和配置資訊,步驟A2可以包括: Specifically, according to the execution status and configuration information in step 203, step A2 may include:

步驟A21:所述控制伺服器依據所述控制資料庫中儲存的執行狀態和配置資訊,計算所述流計算任務中未執行完的任務。 Step A21: The control server calculates unfinished tasks in the stream computing task according to the execution status and configuration information stored in the control database.

控制伺服器在某個目標流計算中心伺服器集群或目標流計算單元伺服器集群出現異常的時候,可以根據配置資訊確定其正在執行的流計算任務,再根據執行狀態可以確 定該流計算任務已經執行完成的部分,進而可以計算出該流計算任務中未執行完的任務。 When an abnormality occurs in a target stream computing center server cluster or a target stream computing unit server cluster, the control server can determine the stream computing task that it is executing based on the configuration information, and then determine that the stream computing task has been performed based on the execution status. After the execution is completed, the unfinished tasks in the stream computing task can be calculated.

步驟A22:所述控制伺服器將所述未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Step A22: The control server allocates the unfinished tasks to a stream computing center server cluster with the smallest current load.

控制伺服器接著將該未執行完的任務重新分配至當前負載最小的流計算中心伺服器集群進行執行。 The control server then redistributes the unfinished tasks to the current computing server cluster with the smallest load for execution.

可以理解的是,在執行步驟205重新分配了未執行完的任務之後,可以再回到步驟202由控制伺服器接著分配當前接收到流計算任務。 It can be understood that after re-assigning unexecuted tasks in step 205, the process may return to step 202 and the control server may then allocate the currently received stream computing task.

本實施例透過一個控制伺服器,來對部署在多地的各流計算中心伺服器集群和流計算單元伺服器集群所執行的流計算任務進行統一分配,實現流計算任務的統一調度和分配,並且利用各中心儲存集群之間實時同步資料的方式,實現了部署在多地的流計算中心伺服器集群或流計算單元伺服器集群同時計算同一個流計算任務的不同部分或不同流計算任務功能,當一個流計算中心伺服器集群或流計算單元伺服器集群出現異常時,能快速從異地的流計算中心伺服器集群恢復正在執行的流計算任務,這樣既能保證系統資源平時不空置,也保證了在異常情況下也能流計算任務可以從異地的流計算中心伺服器集群迅速恢復從而達到流計算服務的高可用性。 In this embodiment, a control server is used to uniformly allocate stream computing tasks performed by each stream computing center server cluster and stream computing unit server clusters deployed in multiple places, so as to implement unified scheduling and distribution of stream computing tasks. In addition, by using the method of real-time synchronization of data between the central storage clusters, the stream computing center server cluster or stream computing unit server cluster deployed in multiple places can simultaneously calculate different parts of the same stream computing task or different stream computing tasks. When an abnormality occurs in a stream computing center server cluster or a stream computing unit server cluster, the stream computing tasks that are being performed can be quickly resumed from a remote stream computing center server cluster. This can ensure that system resources are not vacant and It is guaranteed that even under abnormal conditions, the stream computing task can be quickly recovered from the server cluster of the stream computing center in a different place to achieve high availability of the stream computing service.

參考圖3,示出了本發明一種流計算任務的執行方法實施例的流程圖,該方法應用於圖1所示的任意一個當前流計算中心伺服器集群上,所述流計算系統可以包括:多 個流計算中心伺服器集群、多個流計算單元伺服器集群和控制伺服器;所述流計算中心伺服器集群具有中心儲存集群,各流計算中心伺服器集群之間的中心儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群向各流計算中心伺服器集群的中心儲存集群同步中間狀態資料和中間結果資料。具體的,本實施例可以包括: Referring to FIG. 3, a flowchart of an embodiment of a method for executing a stream computing task according to the present invention is shown. The method is applied to any current stream computing center server cluster shown in FIG. 1. The stream computing system may include: Multiple stream computing center server clusters, multiple stream computing unit server clusters, and control servers; the stream computing center server cluster has a central storage cluster, and between the central storage clusters between each stream computing center server cluster Synchronize intermediate state data and intermediate result data. Each stream computing unit server cluster stores the cluster synchronization intermediate state data and intermediate result data to the center of each stream computing center server cluster. Specifically, this embodiment may include:

步驟301:響應於所述控制伺服器在所述流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,所述當前流計算中心伺服器集群從相連的中心儲存集群中,獲取執行所述未執行完的任務所需的中間狀態資料和中間結果資料。 Step 301: Responding to a task that is not executed in a reallocated stream computing task when an abnormal situation occurs in another stream computing center server cluster or a stream computing unit server cluster in the stream computing system by the control server, The current stream computing center server cluster obtains, from the connected central storage cluster, intermediate state data and intermediate result data required to perform the unfinished tasks.

在本實施例中,假設控制伺服器檢測到其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況了,則會按照圖2所示的實施例為出現異常的流計算中心伺服器集群或流計算單元伺服器集群正在執行的任務重新分配流計算中心伺服器集群。在這種情況下,當前流計算中心伺服器集群從相連的儲存集群中,獲取執行未執行完的任務所需的中間狀態資料和中間結果資料。其中,該中間狀態資料可以為:出現異常的流計算中心伺服器集群或流計算單元伺服器集群在出現異常情況前執行流計算任務產生的任務狀態,例如,該流計算任務已經執行了哪些部分;而中間結果資料可以為:已執行完的那部分任務產生的結果資料等。基於此,當前流計算中心伺服器集群可以不需 要再重複執行該流計算任務已經執行過的部分,而根據中間狀態資料和中間結果資料執行未執行完的那部分任務即可。 In this embodiment, assuming that the control server detects an abnormal situation in other stream computing center server clusters or stream computing unit server clusters, it will be an abnormal stream computing center server according to the embodiment shown in FIG. 2. The task being performed by the cluster or stream computing unit server cluster redistributes the stream computing center server cluster. In this case, the current stream computing center server cluster obtains the intermediate state data and intermediate result data required to perform the unfinished tasks from the connected storage clusters. Wherein, the intermediate state data may be: the task status of the stream computing center server cluster or the stream computing unit server cluster that performed the stream computing task before the abnormal situation occurred, for example, which parts of the stream computing task have been executed ; The intermediate result data can be: the result data generated by the part of the task that has been performed. Based on this, the current stream computing center server cluster does not need to repeatedly execute the part of the stream computing task that has already been performed, but can perform the part of the task that has not been performed according to the intermediate state data and intermediate result data.

步驟302:所述當前流計算中心伺服器集群利用所述中間狀態資料和中間結果資料執行所述未執行完的任務。 Step 302: The current stream computing center server cluster uses the intermediate state data and intermediate result data to perform the unfinished task.

當前流計算中心伺服器集群再參考中間狀態資料和中間結果資料來執行重新分配的該未執行完的任務。 The current stream computing center server cluster then refers to the intermediate state data and the intermediate result data to perform the unassigned task that is redistributed.

其中,在步驟302之後,還可以包括: After step 302, the method may further include:

步驟303:響應於所述控制伺服器週期性發送心跳消息,所述當前流計算中心伺服器集群週期性向所述控制伺服器反饋心跳響應。 Step 303: In response to the control server periodically sending a heartbeat message, the current stream computing center server cluster periodically feeds back a heartbeat response to the control server.

在控制伺服器與流計算中心伺服器集群建立心跳機制的情況下,如果控制伺服器週期性的向當前流計算中心伺服器集群發送了心跳消息,該心跳消息用於檢測所述控制伺服器與所述當前流計算中心伺服器集群之間是否能夠通訊,則當前流計算中心伺服器集群可以週期性的向控制伺服器反饋心跳響應。 In the case where the control server and the stream computing center server cluster establish a heartbeat mechanism, if the control server periodically sends a heartbeat message to the current stream computing center server cluster, the heartbeat message is used to detect that the control server and the If the server clusters of the current stream computing center can communicate with each other, the current stream computing center server cluster can periodically feed back a heartbeat response to the control server.

其中,在步驟303之後,還可以包括: After step 303, the method may further include:

步驟304:所述當前流計算中心伺服器集群檢測向控制伺服器反饋心跳響應失敗的連續次數是否超過預設次數閾值,如果是,則所述當前流計算中心伺服器集群停止所述流計算任務的執行。 Step 304: The current stream computing center server cluster detects whether the number of consecutive times that the heartbeat response fails to feed back to the control server exceeds a preset number of thresholds. If so, the current stream computing center server cluster stops the stream computing task. Implementation.

當前流計算中心伺服器集群也可以實時檢測自己與控制伺服器之間的心跳機制是否正常,例如,檢測向控制伺 服器反饋心跳響應失敗的連續次數是否超過預設次數閾值,例如,是否連續10次向控制伺服器反饋心跳響應失敗,如果是,則當前流計算中心伺服器集群出現了異常,則可以停止流計算任務的執行。如果否,則說明當前流計算中心伺服器集群正常,則可以繼續執行步驟303,向控制伺服器接著週期性的反饋心跳響應。 The current stream computing center server cluster can also detect in real time whether the heartbeat mechanism between itself and the control server is normal, for example, detecting whether the number of consecutive times of heartbeat response failure feedback to the control server exceeds a preset number of thresholds, for example, whether it is 10 consecutive times The heartbeat response failed to be fed back to the control server. If so, the current server cluster of the stream computing center is abnormal, and the stream computing task execution can be stopped. If not, it means that the server cluster of the current streaming computing center is normal, and then step 303 can be continued to periodically feedback the heartbeat response to the control server.

可見,在本發明實施例中,透過一個控制伺服器來對部署在多地的各流計算中心伺服器集群和流計算單元伺服器集群所執行的任務進行統一分配,實現流計算任務的統一調度和分配,並且利用各中心儲存集群之間實時同步資料的方式,實現了部署在多地的流計算中心伺服器集群或流計算單元伺服器集群同時計算同一個流計算任務的不同部分或者不同流計算任務的功能,當一個流計算中心伺服器集群或流計算單元伺服器集群出現異常時,能快速從異地的流計算中心伺服器集群恢復正在執行的流計算任務,這樣既能保證系統資源平時不空置,也保證了在異常情況下也能流計算任務能迅速恢復從而達到流計算服務的高可用性。 It can be seen that, in the embodiment of the present invention, a control server is used to uniformly allocate tasks performed by server clusters of stream computing centers and clusters of stream computing units deployed in multiple places to implement unified scheduling of stream computing tasks. And distribution, and the use of real-time synchronization of data between the central storage clusters, it is realized that the stream computing center server cluster or stream computing unit server cluster deployed in multiple places simultaneously calculates different parts or different streams of the same stream computing task. The function of computing tasks. When an abnormality occurs in a stream computing center server cluster or a stream computing unit server cluster, it can quickly recover the stream computing tasks being performed from a remote stream computing center server cluster, which can ensure system resources at ordinary times. Not being vacant, it also guarantees that the stream computing task can be quickly resumed under abnormal conditions to achieve high availability of the stream computing service.

為了更方便本領域技術人員對本發明的實現過程有更清楚的理解,下面舉出一個具體例子來詳細闡述本發明的實現,本例子可以包括以下步驟: In order to make it easier for those skilled in the art to have a clearer understanding of the implementation process of the present invention, a specific example is given below to elaborate the implementation of the present invention. This example may include the following steps:

步驟401:控制伺服器向流計算中心伺服器集群1和2,以及流計算單元伺服器集群1和2發送心跳消息。 Step 401: The control server sends heartbeat messages to the stream computing center server clusters 1 and 2, and the stream computing unit server clusters 1 and 2.

在本例子中,假設流計算中心伺服器集群一共有兩 個,包括流計算中心伺服器集群1和流計算中心伺服器集群2,而流計算單元伺服器集群的個數也有兩個,包括流計算單元伺服器集群1和流計算單元伺服器集群2,則控制伺服器與各流計算中心伺服器集群或各流計算單元伺服器集群,都以1秒鐘的心跳時長發送心跳消息。流計算中心伺服器集群1和2都可以部署在杭州市的不同地方,當然,也可以部署在不同的城市,流計算單元伺服器集群1部署在杭州,流計算單元伺服器集群2部署在南京。 In this example, it is assumed that there are two Stream Computing Center server clusters, including Stream Computing Center server cluster 1 and Stream Computing Center server cluster 2, and the number of Stream Computing Unit server clusters includes two, including Stream Computing unit server cluster 1 and stream computing unit server cluster 2, the control server and each stream computing center server cluster or each stream computing unit server cluster send heartbeat messages with a heartbeat duration of 1 second. Stream computing center server clusters 1 and 2 can be deployed in different places in Hangzhou. Of course, they can also be deployed in different cities. Stream computing unit server cluster 1 is deployed in Hangzhou and stream computing unit server cluster 2 is deployed in Nanjing. .

步驟402:流計算中心伺服器集群1和2,以及流計算單元伺服器集群1和2分別向控制伺服器反饋心跳響應。 Step 402: The stream computing center server clusters 1 and 2 and the stream computing unit server clusters 1 and 2 feed back heartbeat responses to the control server, respectively.

步驟403:控制伺服器將流計算任務分配至流計算單元伺服器集群1執行。 Step 403: The control server allocates the stream computing task to the stream computing unit server cluster 1 for execution.

系統管理員向控制伺服器觸發一個流計算任務,例如,統計杭州市在2016年8月15號的交易量,並將該流計算任務分配至部署在杭州市的流計算單元伺服器集群1執行。則控制伺服器按照系統管理員的指令將該統計交易量的任務分配至流計算單元伺服器集群1並觸發流計算單元伺服器集群1開始統計交易量。其中,本例子中,流計算中心伺服器集群1有自己的中心儲存集群1,而流計算中心伺服器集群2有自己的中心儲存集群2,流計算單元伺服器集群1有自己的單元儲存集群1,流計算單元伺服器集群2有自己的單元儲存集群2。在實際應用中,單元儲存集群1和2之間不需要同步中間狀態資料和中間結果資料,只需要將各自的中間狀態資料和中間結果資料分別同步至中心 儲存集群1和2即可,並且中心儲存集群1和2之間也需要同步中間狀態資料和中間結果資料。 The system administrator triggers a stream computing task to the control server. For example, it counts the transaction volume of Hangzhou on August 15, 2016, and assigns the stream computing task to the stream computing unit server cluster 1 deployed in Hangzhou. . Then, the control server allocates the task of counting the transaction volume to the stream computing unit server cluster 1 according to the instruction of the system administrator, and triggers the stream computing unit server cluster 1 to start counting the transaction volume. Among them, in this example, the stream computing center server cluster 1 has its own central storage cluster 1, the stream computing center server cluster 2 has its own central storage cluster 2, and the stream computing unit server cluster 1 has its own unit storage cluster. 1. Stream computing unit server cluster 2 has its own unit storage cluster 2. In practical applications, the unit storage clusters 1 and 2 do not need to synchronize the intermediate state data and intermediate result data, and only need to synchronize their respective intermediate state data and intermediate result data to the central storage clusters 1 and 2, respectively, and the center Storage clusters 1 and 2 also need to synchronize intermediate state data and intermediate result data.

具體的,流計算單元伺服器集群1在執行統計交易量的過程中,可以從資料源中獲取到統計交易量所需的源資料,例如,IP地址為杭州市的訂單資訊等,並根據源資料來統計交易量。其中,各地的本地資料源可以都同步到流計算中心伺服器集群對應的中心資料源上,流計算中心伺服器集群和各地的流計算單元伺服器集群可以都從中心資料源中拉取源資料。 Specifically, in the process of executing the statistical transaction volume, the stream computing unit server cluster 1 can obtain source data required for statistical transaction volume from the data source, for example, the order information of the IP address is Hangzhou City, etc., and according to the source, Data to count transaction volume. Among them, the local data sources in each place can be synchronized to the central data source corresponding to the stream computing center server cluster, and the stream computing center server cluster and the stream computing unit server clusters in each place can pull the source data from the central data source. .

步驟404:在流計算單元伺服器集群1執行流計算任務的過程中,流計算單元伺服器集群1連接的單元儲存集群1將執行過程中產生的中間狀態和中間結果資料同步至中心儲存集群1和中心儲存集群2,同時,控制伺服器將該流計算任務的執行狀態和配置資訊儲存至控制資料庫中。 Step 404: During the execution of the stream computing task by the stream computing unit server cluster 1, the unit storage cluster 1 connected to the stream computing unit server cluster 1 synchronizes the intermediate state and intermediate result data generated during the execution to the central storage cluster 1. And the central storage cluster 2, at the same time, the control server stores the execution status and configuration information of the stream computing task in the control database.

在流計算單元伺服器集群1執行任務的過程中,流計算單元伺服器集群1實時產生的中間狀態資料和中間結果資料儲存至單元儲存集群1,並且單元儲存集群1實時將產生的中間狀態資料和中間結果資料同步至中心儲存集群1和中心儲存集群2上。同時,控制伺服器可以實時獲取到該任務的執行狀態,並將執行狀態和將該流計算任務分配至流計算單元伺服器集群1執行的配置資訊,都儲存在控制資料庫中。例如,執行狀態可以表示出,在當前某一時刻,流計算單元伺服器集群獲取到共10000條源資料資訊,已經對其中的4000條源資料資訊進行統計,其他6000 條源資料還未進行統計,等。當然,執行狀態還可以採用別的方式表示。 During the execution of tasks by the stream computing unit server cluster 1, the real-time intermediate state data and intermediate result data generated by the stream computing unit server cluster 1 are stored in the unit storage cluster 1, and the unit storage cluster 1 will generate the intermediate state data in real time. Synchronize the data with the intermediate results to central storage cluster 1 and central storage cluster 2. At the same time, the control server can obtain the execution status of the task in real time, and store the execution status and the configuration information of the stream computing task assigned to the stream computing unit server cluster 1 for execution in the control database. For example, the execution status can indicate that at a certain moment, the stream computing unit server cluster obtained a total of 10,000 source data information, and 4,000 of the source data information have been counted, and the other 6000 source data have not been counted ,Wait. Of course, the execution status can also be expressed in other ways.

步驟405:流計算單元伺服器集群1檢測向控制伺服器反饋心跳響應失敗的連續次數是否超過預設次數閾值,如果是,則所述流計算單元伺服器集群停止所述流計算任務的執行,如果否,則執行步驟405。 Step 405: The stream computing unit server cluster 1 detects whether the number of consecutive times that the heartbeat response fails to feed back to the control server exceeds a preset number of thresholds; if so, the stream computing unit server cluster stops the execution of the stream computing task, If not, step 405 is performed.

在流計算單元伺服器集群1執行任務的過程中,還會實時檢測自己向控制伺服器反饋心跳響應是否失敗,如果失敗了則統計連續失敗的次數,如果連續失敗的次數超過預設次數閾值,例如10次,則表示流計算單元伺服器集群1和控制伺服器的連接已經不能正常通訊,在這種情況下,有可能是流計算單元伺服器集群1斷網或斷電等出現了異常情況,則流計算單元伺服器集群1退出統計交易量的流程。 During the execution of the task by the stream computing unit server cluster 1, it will also detect in real time whether it feedbacks the heartbeat response to the control server. If it fails, it will count the number of consecutive failures. If the number of consecutive failures exceeds the preset number threshold, For example, 10 times, it means that the connection between the stream computing unit server cluster 1 and the control server is no longer able to communicate normally. In this case, there may be an abnormality in the stream computing unit server cluster 1 when the network or power is disconnected. , The stream computing unit server cluster 1 exits the flow of statistical transaction volume.

步驟406:控制伺服器判斷流計算單元伺服器集群1是否在預設反饋時間內反饋心跳響應,如果否,則進入步驟407,如果是,則繼續執行步驟406。 Step 406: The control server judges whether the stream computing unit server cluster 1 feeds back a heartbeat response within a preset feedback time. If not, it proceeds to step 407, and if so, proceeds to step 406.

控制伺服器也會實時判斷流計算單元伺服器集群1是否在預設反饋時間,例如1分鐘內,反饋心跳響應,如果未接收流計算單元伺服器集群1反饋的心跳響應,則說明流計算單元伺服器集群已經不能正常執行任務,反之則控制伺服器繼續監測心跳響應執行本步驟即可。 The control server also judges in real time whether the stream computing unit server cluster 1 is in a preset feedback time, for example, within 1 minute, the heartbeat response is fed back. If the heartbeat response feedback from the stream computing unit server cluster 1 is not received, the stream computing unit is explained The server cluster can no longer perform the task normally, otherwise the control server can continue to monitor the heartbeat response and execute this step.

步驟407:控制伺服器實時獲取各流計算中心伺服器集群的負載情況,並根據執行狀態和配置資訊確定該流計 算任務的未執行完的任務。 Step 407: The control server acquires the load of the server clusters of each stream computing center in real time, and determines the unfinished tasks of the stream computing task according to the execution status and configuration information.

控制伺服器還可以實時獲取到流計算中心伺服器集群1和2的負載情況,從而確定出流計算中心伺服器集群1的負載為CPU利用率為40%,而流計算中心伺服器集群2的負載為CPU利用率為60%,在這種情況下,流計算中心伺服器集群1的負載較小。同時,控制伺服器還根據控制資料庫中儲存的執行狀態和配置資訊,確定出統計交易量的任務已經執行了40%,還剩餘6000條的源資料未進行統計。 The control server can also obtain the load of the stream computing center server clusters 1 and 2 in real time, so as to determine that the load of the stream computing center server cluster 1 is 40% of the CPU utilization, and the stream computing center server cluster 2 has The load is 60% of the CPU utilization. In this case, the load of the stream computing center server cluster 1 is small. At the same time, according to the execution status and configuration information stored in the control database, the control server determines that the task of counting the transaction volume has been executed by 40%, and the remaining 6000 source data have not been counted.

步驟408:控制伺服器將未執行完的任務分配至當前負載最小的流計算中心伺服器集群進行執行。 Step 408: The control server distributes the unfinished tasks to the stream computing center server cluster with the smallest current load for execution.

步驟409:流計算中心伺服器集群1依據中心儲存集群1中同步的中間狀態資料和中間結果資料繼續執行未執行完的任務。 Step 409: The stream computing center server cluster 1 continues to execute unfinished tasks according to the synchronized intermediate state data and intermediate result data in the central storage cluster 1.

則控制伺服器就將剩餘60%的未執行完的任務分配至流計算中心伺服器集群1執行,因為中心儲存集群1中儲存的中間狀態資料和中心結果資料是單元儲存集群1和2實時同步的,所以流計算中心伺服器集群1則可以直接從中心儲存集群1中獲取到統計交易量這個任務的中間狀態資料和中間結果資料,進而依據該中間狀態資料和中間結果資料繼續執行剩餘60%的任務,而不會重複執行已經執行過的那部分40%的任務。 Then the control server allocates the remaining 60% of the unfinished tasks to the stream computing center server cluster 1 for execution, because the intermediate state data and the central result data stored in the central storage cluster 1 are synchronized in real time by the unit storage clusters 1 and 2. Therefore, the stream computing center server cluster 1 can directly obtain the intermediate state data and intermediate result data of the task of statistical transaction volume from the central storage cluster 1, and then continue to execute the remaining 60% based on the intermediate state data and intermediate result data. Tasks without repeating the 40% of tasks that have already been performed.

對於前述的方法實施例,為了簡單描述,故將其都表述為一系列的動作組合,但是本領域技術人員應該知悉,本發明並不受所描述的動作順序的限制,因為依據本發 明,某些步驟可以採用其他順序或者同時進行。其次,本領域技術人員也應該知悉,說明書中所描述的實施例均屬優選實施例,所涉及的動作和模組並不一定是本發明所必須的。 For the foregoing method embodiments, for the sake of simple description, they are all described as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the sequence of actions described. These steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

與上述本發明一種流計算任務的分配方法實施例所提供的方法相對應,參見圖5,本發明還提供了一種控制伺服器實施例,控制伺服器分別與多個流計算中心伺服器集群和多個流計算單元伺服器集群相連,其中,流計算中心伺服器集群中預留有預設比例的計算資源;在本實施例中,該控制伺服器可以包括: Corresponding to the method provided by the embodiment of the method for allocating stream computing tasks of the present invention described above, referring to FIG. 5, the present invention also provides an embodiment of a control server, the control server and a plurality of stream computing center server clusters and Multiple stream computing unit server clusters are connected, where a preset proportion of computing resources are reserved in the stream computing center server cluster; in this embodiment, the control server may include:

第一分配單元501,用於響應於接收到流計算任務,將所述流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群。 The first allocation unit 501 is configured to, in response to receiving a stream computing task, allocate the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster.

判斷單元502,用於在所述目標流計算中心伺服器集群或目標流計算單元伺服器集群執行所述流計算任務的過程中,判斷所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況。 A judging unit 502, configured to judge, when the target stream computing center server cluster or the target stream computing unit server cluster executes the stream computing task, the target stream computing center server cluster or the target stream computing unit servo Whether the server cluster is abnormal.

第二分配單元503,用於在將所述流計算任務中的未執行完的任務分配至候選流計算中心伺服器集群;所述未執行完的任務為:所述流計算任務中除了所述目標流計算中心伺服器集群或目標流計算單元伺服器集群已執行任務之外的剩餘任務。 The second allocation unit 503 is configured to allocate the unfinished tasks in the stream computing tasks to the candidate stream computing center server cluster; the unfinished tasks are: except for the stream computing tasks, The target stream computing center server cluster or the target stream computing unit server cluster has performed the remaining tasks in addition to the tasks.

其中,所述第二分配單元503具體可以包括:獲取負載子單元,用於實時獲取所述多個流計算中心 伺服器集群和多個流計算單元伺服器集群的負載情況;第一分配子單元,用於依據各流計算中心伺服器集群的負載情況,將所述流計算任務中的未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the second distribution unit 503 may specifically include: obtaining a load sub-unit for real-time obtaining load conditions of the multiple stream computing center server clusters and multiple stream computing unit server clusters; the first distribution sub-unit For allocating the unexecuted tasks in the stream computing tasks to the stream computing center server cluster with the smallest current load according to the load situation of each stream computing center server cluster.

其中,該控制伺服器還可以包括:發送單元,用於週期性的分別向所述流計算中心伺服器集群和流計算單元伺服器集群發送心跳消息,所述心跳消息用於:檢測所述控制伺服器和所述流計算中心伺服器集群之間是否能夠通訊,以及,檢測所述控制伺服器和所述流計算單元伺服器集群之間是否能夠通訊;相應的,所述判斷單元502,具體用於:判斷在預設反饋時間內所述目標流計算中心伺服器集群或目標流計算單元伺服器集群是否未反饋心跳響應。 The control server may further include: a sending unit configured to periodically send a heartbeat message to the stream computing center server cluster and the stream computing unit server cluster respectively, and the heartbeat message is used to: detect the control Whether the server and the stream computing center server cluster can communicate, and detecting whether the control server and the stream computing unit server cluster can communicate; accordingly, the determining unit 502, specifically, It is used to determine whether the target stream computing center server cluster or the target stream computing unit server cluster has not fed back a heartbeat response within a preset feedback time.

其中,所述流計算中心伺服器集群具有儲存集群,各流計算中心伺服器集群之間的儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群向各中心儲存集群同步中間狀態資料和中間結果資料;所述伺服器還可以包括:儲存單元,用於將各流計算任務的執行狀態和配置資訊儲存至控制資料庫中;所述執行狀態用於表示:各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上已執行部分;所述配置資訊用於表示:各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計 算單元伺服器集群之間的對應關係;相應的,所述第一分配子單元,具體可以包括:計算子單元,用於依據所述控制資料庫中儲存的執行狀態和配置資訊,計算所述流計算任務中未執行完的任務;第二分配子單元,用於將所述未執行完的任務分配至當前負載最小的流計算中心伺服器集群。 Wherein, the stream computing center server cluster has a storage cluster, and the storage clusters between the stream computing center server clusters synchronize the intermediate state data and the intermediate result data, and each stream computing unit server cluster is synchronized to each central storage cluster. Intermediate state data and intermediate result data; the server may further include: a storage unit, configured to store execution status and configuration information of each stream calculation task in a control database; the execution state is used to indicate: each stream calculation The task has been executed on the corresponding stream computing center server cluster or stream computing unit server cluster; the configuration information is used to indicate: between each stream computing task and the stream computing center server cluster executing the stream computing task Correspondence relationship, or a correspondence relationship between each stream computing task and a stream computing unit server cluster executing the stream computing task; correspondingly, the first allocation subunit may specifically include: a computing subunit, used for Execution status and configuration information stored in the control database, calculating unexecuted in the stream computing task Completed tasks; a second distribution subunit, configured to allocate the unexecuted tasks to a stream computing center server cluster with the smallest current load.

本實施例的控制伺服器,可以對部署在多地的各流計算中心伺服器集群和流計算單元伺服器集群所執行的任務進行統一分配,實現流計算任務的統一調度和分配,並且利用各中心儲存集群之間實時同步資料的方式,實現了部署在多地的流計算中心伺服器集群或流計算單元伺服器集群同時計算同一個流計算任務的不同部分或不同流計算任務的功能,當一個流計算中心伺服器集群或流計算單元伺服器集群出現異常時,能快速從異地的流計算中心伺服器集群恢復正在執行的流計算任務,這樣既能保證系統資源平時不空置,也保證了在異常情況下也能流計算任務能迅速恢復從而達到流計算服務的高可用性。 The control server in this embodiment can uniformly allocate tasks performed by server clusters and stream computing unit server clusters deployed in multiple locations to implement unified scheduling and distribution of stream computing tasks. The method of real-time synchronization of data between the central storage clusters realizes the function of simultaneously computing different parts of the same stream computing task or different stream computing tasks in a stream computing center server cluster or stream computing unit server cluster deployed in multiple places. When an abnormality occurs in a stream computing center server cluster or a stream computing unit server cluster, the stream computing tasks that are being performed can be quickly resumed from a remote stream computing center server cluster, which can ensure that system resources are not vacant and ensure that Stream computing tasks can be quickly resumed under abnormal conditions to achieve high availability of stream computing services.

與上述本發明一種流計算任務的執行方法實施例所提供的方法相對應,參考圖6所示,本發明還提供了一種流計算中心伺服器集群實施例,在本實施例中,所述流計算中心伺服器集群在流計算系統中有多個且都預留有預設計算資源,多個所述流計算中心伺服器集群分別與控制伺服器相連,所述控制伺服器還與多個流計算單元伺服器集群 相連;所述流計算中心伺服器集群具有中心儲存集群,各流計算中心伺服器集群的中心儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群的單元儲存集群向各流計算中心伺服器集群的儲存集群同步中間狀態資料和中間結果資料;該流計算中心伺服器集群可以包括: Corresponding to the method provided by the embodiment of the method for executing a stream computing task of the present invention, as shown in FIG. 6, the present invention also provides an embodiment of a stream computing center server cluster. In this embodiment, the stream There are multiple computing center server clusters in the stream computing system, and all of them have preset computing resources. A plurality of the computing center server clusters are respectively connected to a control server, and the control server is also connected to multiple streams. The computing unit server cluster is connected; the stream computing center server cluster has a central storage cluster, and the central storage cluster of each stream computing center server cluster synchronizes intermediate state data and intermediate result data. The unit storage cluster synchronizes the intermediate state data and intermediate result data to the storage clusters of each stream computing center server cluster; the stream computing center server cluster may include:

獲取資料單元601,用於響應於所述控制伺服器在所述流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,從中心儲存集群中獲取執行所述未執行完的任務所需的中間狀態資料和中間結果資料。 The data obtaining unit 601 is configured to respond to that the control server is not executed in a reallocated stream computing task when an abnormal situation occurs in another stream computing center server cluster or stream computing unit server cluster in the stream computing system. For completed tasks, intermediate state data and intermediate result data required to perform the unfinished tasks are obtained from the central storage cluster.

執行任務單元602,用於利用所述預設計算資源、中間狀態資料和中間結果資料執行所述未執行完的任務。 The execution task unit 602 is configured to execute the unfinished task by using the preset computing resources, intermediate state data, and intermediate result data.

其中,該流計算中心伺服器集群還可以包括:反饋單元,用於響應於所述控制伺服器週期性的發送心跳消息,週期性的向所述控制伺服器反饋心跳響應;所述心跳消息用於檢測所述控制伺服器與所述當前流計算中心伺服器集群之間是否能夠通訊。 Wherein, the stream computing center server cluster may further include: a feedback unit for periodically sending a heartbeat message in response to the control server, and periodically feeding back a heartbeat response to the control server; the heartbeat message is used for It is used to detect whether the control server and the current stream computing center server cluster can communicate with each other.

其中,該流計算中心伺服器集群還可以包括:檢測單元,用於檢測向控制伺服器發送心跳響應失敗的連續次數是否超過預設次數閾值;和,停止單元,用於在所述檢測單元的結果為是的情況下,停止所述未執行完的任務的執行。 Wherein, the stream computing center server cluster may further include: a detecting unit for detecting whether the number of consecutive times that the heartbeat response fails to be sent to the control server exceeds a preset number of thresholds; and a stopping unit for detecting If the result is yes, execution of the unfinished task is stopped.

本實施例之後的流計算中心伺服器集群可以接收控制 伺服器統一分配的流計算任務進行執行,並且利用各中心儲存集群之間實時同步資料的方式,實現了部署在多地的流計算中心伺服器集群或流計算單元伺服器集群同時計算同一流計算任務的不同部分或不同的流計算任務的功能,當一個流計算中心伺服器集群或流計算單元伺服器集群出現異常時,能快速從異地的流計算中心伺服器集群恢復正在執行的流計算任務,這樣既能保證系統資源平時不空置,也保證了在異常情況下流計算任務也能迅速恢復從而達到流計算服務的高可用性。 The stream computing center server cluster after this embodiment can receive stream computing tasks that are uniformly distributed by the control server for execution, and use the method of real-time synchronization of data between the central storage clusters to implement the stream computing center server deployed in multiple locations. Server cluster or stream computing unit server cluster can simultaneously calculate different parts of the same stream computing task or different stream computing tasks. When an abnormality occurs in a stream computing center server cluster or stream computing unit server cluster, it can quickly move from another place. The Stream Computing Center server cluster resumes the stream computing tasks that are being performed. This can not only ensure that system resources are not vacant at all times, but also ensure that the stream computing tasks can be quickly restored under abnormal conditions to achieve high availability of stream computing services.

本發明實施例還提供了一種流計算任務的分配和執行系統,該系統可以包括圖5所示的控制伺服器,多個圖6所示的流計算中心伺服器集群,以及多個流計算單元伺服器集群,其中,各流計算中心伺服器集群都具有各自的中心儲存集群,各流計算單元伺服器集群都具有各自的單元儲存集群,控制伺服器具有自己的控制資料庫,該系統的結構方塊圖可以參考圖1所示,該系統的未盡之處參考前述實施例的詳細介紹即可,在此不再贅述。 An embodiment of the present invention further provides a stream computing task distribution and execution system. The system may include a control server shown in FIG. 5, a plurality of stream computing center server clusters shown in FIG. 6, and a plurality of stream computing units. Server cluster, where each stream computing center server cluster has its own central storage cluster, each stream computing unit server cluster has its own unit storage cluster, the control server has its own control database, and the structure of the system For a block diagram, refer to FIG. 1. For the inexhaustible parts of the system, refer to the detailed description of the foregoing embodiment, and details are not described herein again.

本發明實施例還提供了一種異地多活系統,所述異地多活系統包括:第一流計算中心伺服器集群,第二流計算中心伺服器集群,多個流計算單元伺服器集群,以及控制伺服器;其中,所述第一流計算中心伺服器集群和第二流計算中心伺服器集群為圖6所示的流計算中心伺服器集群,所述控制伺服器可以參考圖5所示;以及,所述多個流計算單元伺服器集群分別對應部署於多個第二地理位 置;所述第一流計算中心伺服器集群和第二流計算中心伺服器集群分別部署於相同或不同的第一地理位置。 An embodiment of the present invention further provides a remote multi-live system, the remote multi-live system includes: a first stream computing center server cluster, a second stream computing center server cluster, a plurality of stream computing unit server clusters, and a control server Wherein the first stream computing center server cluster and the second stream computing center server cluster are the stream computing center server cluster shown in FIG. 6, and the control server can refer to FIG. 5; and The plurality of stream computing unit server clusters are respectively deployed in a plurality of second geographical locations; the first stream computing center server cluster and the second stream computing center server cluster are respectively deployed in the same or different first geographic locations.

在本實施例中,流計算中心伺服器集群和流計算單元伺服器集群分別部署於第一地理位置和第二地理位置,所以當某個流計算單元伺服器集群出現異常時,可以在異地的第一或第二流計算中心伺服器集群上恢復該出現異常的流計算單元伺服器集群正在執行的流計算任務,將該流計算任務中未執行完的部分在異地的流計算中心伺服器集群上繼續執行,實現異地多活的功能。此外,第一流計算中心伺服器集群和第二流計算中心伺服器集群在部署在不同的第一地理位置時,其中一個流計算中心伺服器集群出現異常的時候,也可以在異地的另一個流計算中心伺服器恢復該出現異常的流計算單元伺服器正在執行的流計算任務,同樣將未執行完的部分在異地的另一個流計算中心伺服器集群上繼續執行,也可以實現異地多活的功能。 In this embodiment, the stream computing center server cluster and the stream computing unit server cluster are respectively deployed in the first geographic location and the second geographic location, so when an abnormality occurs in a certain stream computing unit server cluster, the The first or second stream computing center server cluster resumes the stream computing task that the abnormal stream computing unit server cluster is executing, and the unexecuted part of the stream computing task is in a different stream computing center server cluster. Continue to implement the function of multiple live in different places. In addition, when the first stream computing center server cluster and the second stream computing center server cluster are deployed in different first geographic locations, when one of the stream computing center server clusters is abnormal, it can also be located in another stream in a different location. The computing center server resumes the streaming computing task that is being performed by the abnormal streaming computing unit server, and also continues to execute the unexecuted part on another streaming computing center server cluster in a different location, which can also achieve multiple live multiple locations. Features.

本發明還提供了一種異地多活系統,具體可以包括:第一流計算中心伺服器,至少用於對外提供計算資源,其中,第一流計算中心伺服器包括第一中心儲存單元;第二流計算中心伺服器,至少用於對外提供計算資源,其中,第二流計算中心伺服器包括第二中心儲存單元;其中,所述第一流計算中心伺服器和第二流計算中心伺服器基於統一的負載均衡策略完成負載均衡,所述第一中心儲存單元和第二中心儲存單元相互熱備援;其中,對於在所述第一流計算中心伺服器上運行的第一流計算任務,當所述第一 流計算中心伺服器出現故障無法對外提供計算資源時,終止在第一流計算中心伺服器上運行,並且,基於所述第二流計算中心伺服器的第二中心儲存單元的中間狀態資料和中間結果資料,在所述第二流計算中心伺服器上繼續運行所述第一流計算任務。 The invention also provides a remote multi-live system, which may specifically include: a first-rate computing center server for at least providing external computing resources, wherein the first-rate computing center server includes a first central storage unit; and a second-rate computing center A server, at least for providing computing resources to the outside, wherein the second stream computing center server includes a second central storage unit; wherein the first stream computing center server and the second stream computing center server are based on unified load balancing The strategy completes load balancing, and the first central storage unit and the second central storage unit are in hot backup with each other; wherein, for a first-rate computing task running on a server of the first-rate computing center, when the first-rate computing center When the server fails to provide external computing resources, it stops running on the first-rate computing center server, and based on the intermediate state data and intermediate result data of the second central storage unit of the second-rate computing center server, The second stream computing center server continues to run the first stream computing task.

需要說明的是,本說明書中的各個實施例均採用遞進的方式描述,每個實施例重點說明的都是與其他實施例的不同之處,各個實施例之間相同相似的部分互相參見即可。對於裝置類實施例而言,由於其與方法實施例基本相似,所以描述的比較簡單,相關之處參見方法實施例的部分說明即可。 It should be noted that each embodiment in this specification is described in a progressive manner. Each embodiment focuses on the differences from other embodiments. The same and similar parts between the various embodiments refer to each other. can. As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For the relevant part, refer to the description of the method embodiment.

最後,還需要說明的是,在本文中,諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來,而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且,術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含,從而使得包括一系列要素的過程、方法、物品或者設備不僅包括那些要素,而且還包括沒有明確列出的其他要素,或者是還包括為這種過程、方法、物品或者設備所固有的要素。在沒有更多限制的情況下,由語句“包括一個……”限定的要素,並不排除在包括所述要素的過程、方法、物品或者設備中還存在另外的相同要素。 Finally, it should be noted that in this article, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between OR operations. Moreover, the terms "including", "comprising", or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article, or device that includes a series of elements includes not only those elements but also those that are not explicitly listed Or other elements inherent to such a process, method, article, or device. Without more restrictions, the elements defined by the sentence "including a ..." do not exclude the existence of other identical elements in the process, method, article, or equipment including the elements.

以上對本發明所提供的流計算任務的分配方法及控制伺服器、流計算任務的執行方法及流計算中心伺服器集 群、流計算系統、異地多活系統進行了詳細介紹,本文中應用了具體個例對本發明的原理及實施方式進行了闡述,以上實施例的說明只是用於幫助理解本發明的方法及其核心思想;同時,對於本領域的一般技術人員,依據本發明的思想,在具體實施方式及應用範圍上均會有改變之處,綜上所述,本說明書內容不應理解為對本發明的限制。 The above describes in detail the method and control server for allocating stream computing tasks provided by the present invention, the execution method of stream computing tasks, and the cluster of stream computing center servers, stream computing systems, and remote multi-active systems. The example explains the principle and implementation of the present invention. The descriptions of the above embodiments are only used to help understand the method of the present invention and its core ideas; meanwhile, for a person of ordinary skill in the art, according to the ideas of the present invention, the specific implementation There will be changes in the method and the scope of application. In summary, the content of this description should not be construed as a limitation on the present invention.

Claims (13)

一種計算任務分配方法,其特徵在於,該方法應用於與流計算中心伺服器集群和流計算單元伺服器集群相連的控制伺服器上,該流計算中心伺服器集群預留有預設比例的計算資源;該方法包括:響應於接收到流計算任務,將該流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群;在該目標流計算中心伺服器集群或目標流計算單元伺服器集群執行該流計算任務的過程中,判斷該目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,如果是,則將該流計算任務中未執行完的任務,分配至候選流計算中心伺服器集群。     A computing task allocation method, which is characterized in that the method is applied to a control server connected to a stream computing center server cluster and a stream computing unit server cluster, and the stream computing center server cluster reserves a preset proportion of calculations. Resources; the method includes: in response to receiving a stream computing task, allocating the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster; and at the target stream computing center server cluster or target stream computing During the execution of the stream computing task by the unit server cluster, determine whether an abnormal situation occurs in the target stream computing center server cluster or the target stream computing unit server cluster, and if so, perform the unfinished tasks in the stream computing task. , Assigned to the candidate stream computing center server cluster.     根據申請專利範圍第1項所述的方法,其中,還包括:該控制伺服器週期性的分別向該流計算中心伺服器集群和流計算單元伺服器集群發送心跳消息,該心跳消息用於:檢測該控制伺服器和該流計算中心伺服器集群之間是否能夠通訊,以及,檢測該控制伺服器和該流計算單元伺服器集群之間是否能夠通訊;相應的,該判斷該目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況,具體為:判斷在預設反饋時間內該目標流計算中心伺服器集群 或目標流計算單元伺服器集群是否未反饋心跳響應。     The method according to item 1 of the patent application scope, further comprising: the control server periodically sending a heartbeat message to the stream computing center server cluster and the stream computing unit server cluster, respectively, the heartbeat message is used to: Detecting whether the control server and the stream computing center server cluster can communicate, and detecting whether the control server and the stream computing unit server cluster can communicate; accordingly, the target stream computing center is judged Whether the server cluster or the target stream computing unit server cluster has an abnormal situation is specifically: judging whether the target stream computing center server cluster or the target stream computing unit server cluster does not feedback a heartbeat response within a preset feedback time.     根據申請專利範圍第1項所述的方法,其中,該將該流計算任務中的未執行完的任務分配至候選流計算中心伺服器集群,包括:該控制伺服器實時獲取該流計算中心伺服器集群的負載情況;該控制伺服器依據該負載情況,將該流計算任務中未執行完的任務分配至當前負載最小的流計算中心伺服器集群。     The method according to item 1 of the scope of patent application, wherein the uncompleted task in the stream computing task is assigned to a candidate stream computing center server cluster, including: the control server acquires the stream computing center server in real time Load condition of the server cluster; the control server allocates the unfinished tasks in the stream computing task to the stream computing center server cluster with the smallest current load according to the load condition.     根據申請專利範圍第3項所述的方法,其中,該流計算中心伺服器集群具有中心儲存集群,各流計算中心伺服器集群之間的中心儲存集群之間同步中間狀態資料和中間結果資料,各流計算單元伺服器集群向各流計算中心伺服器集群的中心儲存集群同步中間狀態資料和中間結果資料;該方法還包括:控制伺服器將各流計算任務的執行狀態和配置資訊儲存至控制資料庫中;該執行狀態用於表示:各流計算任務在對應的流計算中心伺服器集群或流計算單元伺服器集群上已執行部分;該配置資訊用於表示:各流計算任務與執行該流計算任務的流計算中心伺服器集群之間的對應關係,或,各流計算任務與執行該流計算任務的流計算單元伺服器集群之間的對應關係; 相應的,該將該流計算任務中未執行完的任務分配至當前負載最小的流計算中心伺服器集群,包括:該控制伺服器依據該控制資料庫中儲存的執行狀態和配置資訊,計算該流計算任務中未執行完的任務;該控制伺服器將該未執行完的任務分配至當前負載最小的流計算中心伺服器集群。     The method according to item 3 of the scope of patent application, wherein the stream computing center server cluster has a central storage cluster, and the central storage clusters between the stream computing center server clusters synchronize intermediate state data and intermediate result data, Each stream computing unit server cluster stores the cluster synchronization intermediate state data and intermediate result data to the center of each stream computing center server cluster; the method further includes: a control server storing the execution state and configuration information of each stream computing task to the control Database; the execution status is used to indicate: each stream computing task has been executed on the corresponding stream computing center server cluster or stream computing unit server cluster; the configuration information is used to indicate: each stream computing task and executing the Correspondence between stream computing center server clusters for stream computing tasks, or between stream computing tasks and stream computing unit server clusters executing the stream computing task; correspondingly, the stream computing tasks The unfinished tasks in the distribution are distributed to the current computing server cluster with the smallest load, including : The control server calculates the uncompleted tasks in the flow calculation task according to the execution status and configuration information stored in the control database; the control server allocates the unexecuted tasks to the flow calculation with the smallest load currently Central server cluster.     一種流計算任務的執行方法,其特徵在於,該方法應用於流計算系統中的任意一個預留有預設計算資源的當前流計算中心伺服器集群上,該流計算系統包括:流計算中心伺服器集群、流計算單元伺服器集群和控制伺服器;該流計算中心伺服器集群具有中心儲存集群,中心儲存集群之間同步中間狀態資料和中間結果資料,流計算單元伺服器集群的單元儲存集群向中心儲存集群同步中間狀態資料和中間結果資料;該方法包括:響應於該控制伺服器在該流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,該當前流計算中心伺服器集群從中心儲存集群中,獲取執行該未執行完的任務所需的中間狀態資料和中間結果資料;該當前流計算中心伺服器集群利用該預設計算資源、中間狀態資料和中間結果資料執行該未執行完的任務。     A method for executing a stream computing task, which is characterized in that the method is applied to any current stream computing center server cluster in which a preset computing resource is reserved in the stream computing system. The stream computing system includes: a stream computing center server Server cluster, stream computing unit server cluster, and control server; the stream computing center server cluster has a central storage cluster, the central storage cluster synchronizes intermediate state data and intermediate result data, and the unit storage cluster of the stream computing unit server cluster Synchronize intermediate state data and intermediate result data to the central storage cluster; the method includes: responding to an abnormal situation of the control server in another stream computing center server cluster or stream computing unit server cluster in the stream computing system, and The unfinished tasks in the assigned stream computing tasks, the current stream computing center server cluster obtains the intermediate state data and intermediate result data required to execute the unfinished tasks from the central storage cluster; the current stream computing center The server cluster uses the preset computing resources and intermediate states Materials and intermediate results data to perform tasks that are not completed execution.     根據申請專利範圍第5項所述的方法,其中,還包 括:響應於該控制伺服器週期性發送心跳消息,該當前流計算中心伺服器集群週期性向該控制伺服器反饋心跳響應;該心跳消息用於檢測該控制伺服器與該當前流計算中心伺服器集群之間是否能夠通訊。     The method according to item 5 of the patent application scope, further comprising: in response to the control server periodically sending a heartbeat message, the current stream computing center server cluster periodically feedbacks a heartbeat response to the control server; the heartbeat message It is used to detect whether the control server and the current stream computing center server cluster can communicate with each other.     根據申請專利範圍第6項所述的方法,其中,還包括:該當前流計算中心伺服器集群檢測向控制伺服器反饋心跳響應失敗的連續次數是否超過預設次數閾值,如果是,則該當前流計算中心伺服器集群停止該未執行完的任務的執行。     The method according to item 6 of the patent application scope, further comprising: whether the current stream computing center server cluster detects whether the number of consecutive times that the heartbeat response fails to feedback to the control server exceeds a preset number of thresholds, and if so, the current The stream computing center server cluster stops execution of the unfinished task.     一種控制伺服器,其特徵在於,該控制伺服器與流計算中心伺服器集群和流計算單元伺服器集群相連,該流計算中心伺服器集群中預留有預設比例的計算資源;該控制伺服器包括:第一分配單元,用於響應於接收到流計算任務,將該流計算任務分配至目標流計算中心伺服器集群或目標流計算單元伺服器集群;判斷單元,用於在該目標流計算中心伺服器集群或目標流計算單元伺服器集群執行該流計算任務的過程中,判斷該目標流計算中心伺服器集群或目標流計算單元伺服器集群是否出現異常情況; 第二分配單元,用於在該判斷單元的結果為是的情況下,將該流計算任務中未執行完的任務分配至候選流計算中心伺服器集群。     A control server is characterized in that the control server is connected to a stream computing center server cluster and a stream computing unit server cluster, and a preset proportion of computing resources are reserved in the stream computing center server cluster; the control server The device includes: a first allocating unit for allocating the stream computing task to a target stream computing center server cluster or a target stream computing unit server cluster in response to receiving the stream computing task; a judging unit for During the execution of the stream computing task by the computing center server cluster or the target stream computing unit server cluster, determine whether there is an abnormality in the target stream computing center server cluster or the target stream computing unit server cluster; the second allocation unit, uses In a case where the result of the judgment unit is yes, the unfinished tasks in the stream computing task are allocated to the candidate stream computing center server cluster.     一種流計算中心伺服器集群,其特徵在於,該流計算中心伺服器集群預留有預設計算資源,該流計算中心伺服器集群與控制伺服器相連,該控制伺服器還與流計算單元伺服器集群相連;該流計算中心伺服器集群具有中心儲存集群,中心儲存集群之間同步中間狀態資料和中間結果資料;該流計算單元伺服器具有單元儲存集群,單元儲存集群向中心儲存集群同步中間狀態資料和中間結果資料;包括:獲取資料單元,用於響應於該控制伺服器在該流計算系統中的其他流計算中心伺服器集群或流計算單元伺服器集群出現異常情況時、重新分配的流計算任務中未執行完的任務,從中心儲存集群中獲取執行該未執行完的任務所需的中間狀態資料和中間結果資料;執行任務單元,用於利用該預設計算資源、中間狀態資料和中間結果資料執行該未執行完的任務。     A stream computing center server cluster is characterized in that the stream computing center server cluster reserves preset computing resources, the stream computing center server cluster is connected to a control server, and the control server is also servoed to the stream computing unit. Server clusters are connected; the stream computing center server cluster has a central storage cluster, and the central storage clusters synchronize intermediate state data and intermediate result data; the stream computing unit server has a unit storage cluster, and the unit storage clusters are synchronized to the central storage cluster. Status data and intermediate result data; including: obtaining data units for reassigning the control server to other stream computing center server clusters or stream computing unit server clusters in the stream computing system when an abnormal situation occurs; For the unfinished tasks in the stream computing task, the intermediate state data and intermediate result data required to execute the unfinished tasks are obtained from the central storage cluster; the execution task unit is used to use the preset computing resources and intermediate state data And intermediate result data to perform the unfinished task.     一種流計算系統,其特徵在於,該流計算系統包括:申請專利範圍第9項所述的流計算中心伺服器集群和流計算單元伺服器集群,申請專利範圍第8項所述的控制伺服器;以及, 與該流計算中心伺服器集群對應的中心儲存集群,與該控制伺服器對應的控制資料庫,和,與該流計算單元伺服器集群對應的單元儲存集群。     A stream computing system, characterized in that the stream computing system includes: a stream computing center server cluster and a stream computing unit server cluster described in item 9 of the patent application scope, and a control server described in item 8 of the patent application scope And, a central storage cluster corresponding to the stream computing center server cluster, a control database corresponding to the control server, and a unit storage cluster corresponding to the stream computing unit server cluster.     一種異地多活系統,其特徵在於,該異地多活系統包括:第一流計算中心伺服器集群,多個流計算單元伺服器集群,以及控制伺服器;其中,該第一流計算中心伺服器集群為申請專利範圍第9項該的流計算中心伺服器集群,該控制伺服器為申請專利範圍第8項所述的控制伺服器;以及,該多個流計算單元伺服器集群分別對應部署於多個第二地理位置;該第一流計算中心伺服器集群部署於第一地理位置。     An off-site multi-live system is characterized in that the off-site multi-live system includes a first-stream computing center server cluster, a plurality of stream-computing unit server clusters, and a control server; wherein the first-stream computing center server cluster is The stream computing center server cluster of the patent application scope item 9, the control server is the control server described in the patent application scope item 8; and the multiple stream computing unit server clusters are respectively deployed in multiple The second geographical location; the first-class computing center server cluster is deployed in the first geographical location.     根據申請專利範圍第11項所述的系統,其中,該異地多活系統還包括:第二流計算中心伺服器集群,該第二流計算中心伺服器集群與該第一流計算中心伺服器集群部署在不同的第一地理位置。     The system according to item 11 of the scope of patent application, wherein the remote multi-living system further comprises: a second stream computing center server cluster, the second stream computing center server cluster and the first stream computing center server cluster being deployed In a different first geographical location.     一種異地多活系統,其特徵在於,包括:第一流計算中心伺服器,至少用於對外提供計算資源,其中,第一流計算中心伺服器包括第一中心儲存單元;第二流計算中心伺服器,至少用於對外提供計算資 源,其中,第二流計算中心伺服器包括第二中心儲存單元;其中,該第一流計算中心伺服器和第二流計算中心伺服器基於統一的負載均衡策略完成負載均衡,該第一中心儲存單元和第二中心儲存單元相互熱備援;其中,對於在該第一流計算中心伺服器上運行的第一流計算任務,當該第一流計算中心伺服器出現故障無法對外提供計算資源時,終止在第一流計算中心伺服器上運行,並且,基於該第二流計算中心伺服器的第二中心儲存單元的中間狀態資料和中間結果資料,在該第二流計算中心伺服器上繼續運行該第一流計算任務。     A remote multi-active system is characterized in that it includes: a first-rate computing center server for at least providing external computing resources, wherein the first-rate computing center server includes a first central storage unit; and a second-rate computing center server, At least for providing external computing resources, wherein the second stream computing center server includes a second central storage unit; wherein the first stream computing center server and the second stream computing center server complete load balancing based on a unified load balancing policy , The first central storage unit and the second central storage unit are hot standby each other; wherein, for a first-class computing task running on the first-class computing center server, when the first-class computing center server fails, it cannot provide external services When computing resources, it terminates running on the first-rate computing center server, and based on the intermediate state data and intermediate result data of the second central storage unit of the second-rate computing center server, Continue to run this first-class computing task.    
TW106127334A 2016-10-18 2017-08-11 Computing task allocation method, execution method of stream computing task, control server, stream computing center server cluster, stream computing system and remote multi-active system TWI755417B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201610908946.7A CN107959705B (en) 2016-10-18 2016-10-18 Distribution method of streaming computing task and control server
??201610908946.7 2016-10-18
CN201610908946.7 2016-10-18

Publications (2)

Publication Number Publication Date
TW201816616A true TW201816616A (en) 2018-05-01
TWI755417B TWI755417B (en) 2022-02-21

Family

ID=61954266

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106127334A TWI755417B (en) 2016-10-18 2017-08-11 Computing task allocation method, execution method of stream computing task, control server, stream computing center server cluster, stream computing system and remote multi-active system

Country Status (3)

Country Link
CN (1) CN107959705B (en)
TW (1) TWI755417B (en)
WO (1) WO2018072618A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737270B (en) * 2018-05-07 2021-01-26 北京京东尚科信息技术有限公司 Resource management method and device for server cluster
CN109358983A (en) * 2018-09-04 2019-02-19 深圳市宝德计算机***有限公司 Server data processing method, device and storage medium
CN111090502B (en) * 2018-10-24 2024-05-17 阿里巴巴集团控股有限公司 Stream data task scheduling method and device
CN109656782A (en) * 2018-12-24 2019-04-19 成都四方伟业软件股份有限公司 Visual scheduling monitoring method, device and server
CN112148439B (en) * 2019-06-28 2024-03-08 浙江宇视科技有限公司 Task processing method, device, equipment and storage medium
CN111092931B (en) * 2019-11-15 2021-08-06 中国科学院计算技术研究所 Method and system for rapidly distributing streaming data of online super real-time simulation of power system
CN111124812A (en) * 2019-12-02 2020-05-08 深圳市智微智能软件开发有限公司 Server monitoring method and system
CN112732491B (en) * 2021-01-22 2024-03-12 中国人民财产保险股份有限公司 Data processing system and business data processing method based on data processing system
CN113190364A (en) * 2021-04-30 2021-07-30 平安壹钱包电子商务有限公司 Remote call management method and device, computer equipment and readable storage medium
CN113283803B (en) * 2021-06-17 2024-04-23 金蝶软件(中国)有限公司 Method for making material demand plan, related device and storage medium
CN113391902B (en) * 2021-06-22 2023-03-31 未鲲(上海)科技服务有限公司 Task scheduling method and device and storage medium
CN113472662B (en) * 2021-07-09 2022-10-04 武汉绿色网络信息服务有限责任公司 Path redistribution method and network service system
WO2023077451A1 (en) * 2021-11-05 2023-05-11 中国科学院计算技术研究所 Stream data processing method and system based on column-oriented database
CN114884946B (en) * 2022-04-28 2024-01-16 抖动科技(深圳)有限公司 Remote multi-activity implementation method based on artificial intelligence and related equipment
CN115242648B (en) * 2022-07-19 2024-05-28 北京百度网讯科技有限公司 Expansion and contraction capacity discrimination model training method and operator expansion and contraction capacity method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6779016B1 (en) * 1999-08-23 2004-08-17 Terraspring, Inc. Extensible computing system
CN103197936B (en) * 2005-10-07 2016-04-06 茨特里克斯***公司 For the method selected between the manner of execution of the predetermined quantity of application program
TWI476610B (en) * 2008-04-29 2015-03-11 Maxiscale Inc Peer-to-peer redundant file server system and methods
CN101483673B (en) * 2009-02-20 2013-02-13 杭州华三通信技术有限公司 Implementation method and system for heat backup at different sites
CN102158387A (en) * 2010-02-12 2011-08-17 华东电网有限公司 Protection fault information processing system based on dynamic load balance and mutual hot backup
CN103973725B (en) * 2013-01-28 2018-08-24 阿里巴巴集团控股有限公司 A kind of distributed cooperative algorithm and synergist
CN103703830B (en) * 2013-05-31 2017-11-17 华为技术有限公司 A kind of physical resource adjustment, device and controller
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
US9785480B2 (en) * 2015-02-12 2017-10-10 Netapp, Inc. Load balancing and fault tolerant service in a distributed data system
CN104683488B (en) * 2015-03-31 2018-03-30 百度在线网络技术(北京)有限公司 Streaming computing system and its dispatching method and device

Also Published As

Publication number Publication date
WO2018072618A1 (en) 2018-04-26
TWI755417B (en) 2022-02-21
CN107959705B (en) 2021-08-20
CN107959705A (en) 2018-04-24

Similar Documents

Publication Publication Date Title
TW201816616A (en) Method for allocating stream computing task and control server
US11307943B2 (en) Disaster recovery deployment method, apparatus, and system
US11249815B2 (en) Maintaining two-site configuration for workload availability between sites at unlimited distances for products and services
US7350098B2 (en) Detecting events of interest for managing components on a high availability framework
WO2017067484A1 (en) Virtualization data center scheduling system and method
US10609159B2 (en) Providing higher workload resiliency in clustered systems based on health heuristics
US8862928B2 (en) Techniques for achieving high availability with multi-tenant storage when a partial fault occurs or when more than two complete faults occur
US20170279674A1 (en) Method and apparatus for expanding high-availability server cluster
US20130318221A1 (en) Variable configurations for workload distribution across multiple sites
WO2017128507A1 (en) Decentralized resource scheduling method and system
Xu et al. Enhancing survivability in virtualized data centers: A service-aware approach
EP3442201B1 (en) Cloud platform construction method and cloud platform
US10761869B2 (en) Cloud platform construction method and cloud platform storing image files in storage backend cluster according to image file type
CN111459642B (en) Fault processing and task processing method and device in distributed system
WO2020192065A1 (en) Method for achieving cross-cluster high availability, apparatus, system, and device
CN112631764A (en) Task scheduling method and device, computer equipment and computer readable medium
CN104484228B (en) Distributed parallel task processing system based on Intelli DSC
JP5647561B2 (en) Power system supervisory control system
CN109165122B (en) Method for improving disaster recovery capability of application system same city multi-park deployment realized based on block chain technology
CN111200518B (en) Decentralized HPC computing cluster management method and system based on paxos algorithm
JP2015060375A (en) Cluster system, cluster control method, and cluster control program
CN104486447A (en) Large platform cluster system based on Big-Cluster
CN103973811A (en) High-availability cluster management method capable of conducting dynamic migration
CN105591780B (en) Cluster monitoring method and equipment
Anand et al. Modelling, implementation and testing of an effective fault tolerant multiprocessor real-time system