CN112527486A - Scheduling optimization method and device - Google Patents

Scheduling optimization method and device Download PDF

Info

Publication number
CN112527486A
CN112527486A CN202011496875.7A CN202011496875A CN112527486A CN 112527486 A CN112527486 A CN 112527486A CN 202011496875 A CN202011496875 A CN 202011496875A CN 112527486 A CN112527486 A CN 112527486A
Authority
CN
China
Prior art keywords
node
pod
log
cluster
logs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011496875.7A
Other languages
Chinese (zh)
Other versions
CN112527486B (en
Inventor
赖新明
邓应强
***
张�浩
舒南飞
林文辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202011496875.7A priority Critical patent/CN112527486B/en
Priority claimed from CN202011496875.7A external-priority patent/CN112527486B/en
Publication of CN112527486A publication Critical patent/CN112527486A/en
Application granted granted Critical
Publication of CN112527486B publication Critical patent/CN112527486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to the technical field of cloud computing, and provides a scheduling optimization method and a scheduling optimization device, which are used for solving the problem that a node cannot normally create a Pod (Pod), wherein the method comprises the following steps: screening each node in the cluster according to the demand of the Pod to be scheduled on the CPU memory resource; determining a comprehensive value of the screened nodes according to the priority of the screened nodes and the evaluation values of the screened nodes; and selecting the nodes with the comprehensive value exceeding the preset threshold value as the nodes for scheduling the Pod to be scheduled. In the embodiment of the present application, on the premise that the static resources of the Pod to be scheduled are satisfied, the probability of successfully creating the Pod on the node is also considered, so as to avoid that the Pod cannot be created normally due to Pod creation failure, Pod synchronization failure, or other errors when the Pod is created by the node.

Description

Scheduling optimization method and device
Technical Field
The application relates to the technical field of cloud computing, and provides a scheduling optimization method and device.
Background
The cloud computing platform mainly utilizes a virtualization technology to achieve on-demand supply of resources, and the traditional virtualization technology has huge resource consumption and relatively complex system architecture. The container technology, as a new generation of virtualization technology, has the advantages of low resource overhead, simple architecture and lighter weight, so the container technology becomes the development direction of the next generation of cloud computing platform.
The Kubernets platform is a lightweight resource scheduling platform based on a container technology, a container group Pod is a minimum Unit of resource scheduling in the Kubernets platform, the Kubernets platform schedules the Pod to an optimal node according to a static resource attribute of an operating environment, and the static resource attribute can be the load of a Central Processing Unit (CPU) of the node, the residual memory condition of a working node and the like. However, the nodes screened out in the above manner may not be able to create Pod normally due to Pod sandbox creation failure, Pod synchronization failure, or other errors.
In view of this, the present application provides a scheduling optimization method and apparatus.
Disclosure of Invention
The embodiment of the application provides a scheduling optimization method and a scheduling optimization device, which are used for solving the problem that a node cannot normally create a Pod.
In a first aspect, an embodiment of the present application provides a scheduling optimization method, including:
screening each node in the cluster according to the demand of the container group Pod to be scheduled on the CPU memory resource of the central processing unit;
determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod creation fails within a preset time period of each node in the cluster, and the evaluation values represent the probability of the Pod creation on the nodes;
and selecting the nodes with the comprehensive value exceeding a preset threshold value as the nodes for scheduling the Pod to be scheduled.
Optionally, the evaluation value is generated by:
collecting logs of Pod failure created by each node in the cluster within a preset time period;
aggregating all collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is created at the same node for the same application service;
respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set;
and calculating to obtain an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
Optionally, after acquiring logs of Pod failure created by each node in the cluster within a preset time period, before performing aggregation processing on each log to obtain a plurality of log sets, the method further includes:
determining the error type of each log based on the log error information of each log;
aggregating each log to obtain a plurality of log sets, including:
according to the resource identification, the node identification, the error type and the log type of the logs, carrying out aggregation processing on the logs to obtain a plurality of log sets;
and the resource identifier of the log is used for determining the application service running on the created Pod.
Optionally, calculating a Pod creation failure rate of each node in the cluster according to the log set of each node in the cluster, respectively, includes:
for any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating second ratios between the total number of the Pod contained in each log set and the total number of the Pod occupied on the node for the plurality of log sets, executing weighted summation operation based on the second ratios, and determining the corresponding Pod creation failure rate.
Optionally, calculating an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster, where the calculation includes:
logarithm is taken on the failure rate of each Pod creation to obtain the initial evaluation value of each node in the cluster;
and taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster.
In a second aspect, an embodiment of the present application further provides a scheduling optimization apparatus, including:
the primary screening unit is used for screening each node in the cluster according to the requirement of the container group Pod to be scheduled on the CPU memory resource;
the scheduling optimization unit is used for determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod creation fails in a preset time period of each node in the cluster, and the evaluation values represent the probability of the Pod being successfully created on the nodes;
and the determining unit is used for selecting the nodes with the comprehensive values exceeding a preset threshold value as the nodes for scheduling the Pod to be scheduled.
Optionally, the apparatus further includes a generation unit, and the evaluation value is generated by:
collecting logs of Pod failure created by each node in the cluster within a preset time period;
aggregating all collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is created at the same node for the same application service;
respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set;
and calculating to obtain an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
Optionally, the generating unit is further configured to:
determining the error type of each log based on the log error information of each log;
aggregating each log to obtain a plurality of log sets, including:
according to the resource identification, the node identification, the error type and the log type of the logs, carrying out aggregation processing on the logs to obtain a plurality of log sets;
and the resource identifier of the log is used for determining the application service running on the created Pod.
Optionally, the generating unit is configured to:
for any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating second ratios between the total number of the Pod contained in each log set and the total number of the Pod occupied on the node for the plurality of log sets, executing weighted summation operation based on the second ratios, and determining the corresponding Pod creation failure rate.
Optionally, the generating unit is configured to:
logarithm is taken on the failure rate of each Pod creation to obtain the initial evaluation value of each node in the cluster;
and taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster.
In a third aspect, an embodiment of the present application further provides an electronic device, including a processor and a memory, where the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute the steps of any one of the scheduling optimization methods.
In a fourth aspect, this application further provides a computer-readable storage medium including program code for causing an electronic device to perform any of the steps of the schedule optimization methods described above when the program product runs on the electronic device.
The beneficial effect of this application is as follows:
according to the scheduling optimization method and device provided by the embodiment of the application, each node in a cluster is screened according to the demand of a Pod to be scheduled on CPU (central processing unit) memory resources; determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod is failed to be created by each node in the cluster within a preset time period, and the evaluation values represent the probability of the Pod being successfully created on the nodes; and finally, selecting the nodes with the comprehensive value exceeding the preset threshold value as the nodes for scheduling the Pod to be scheduled. In the embodiment of the application, the bypass scheduler is called to obtain the evaluation value of the screened node, weighting operation is performed on the basis of the priority and the evaluation value to obtain a comprehensive value, and on the premise of meeting the static resource of the Pod to be scheduled, the probability of successfully creating the Pod on the node is also considered, so that the problem that the Pod cannot be normally created due to Pod sandbox creation failure, Pod synchronization failure or other errors when the Pod is created on the node is avoided.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1a is a schematic diagram of an architecture of a scheduling optimization system;
FIG. 1b is a schematic flow chart of a scheduling optimization method;
FIG. 2a is a schematic diagram of an architecture of an evaluation value generation system;
FIG. 2b is a schematic flow chart of generating an evaluation value;
fig. 3 is a schematic structural diagram of a scheduling optimization apparatus in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in the present embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
The Kubernets platform is a lightweight resource scheduling platform based on a container technology, a plurality of nodes capable of operating Kubernets functional components form the Kubernets platform, and each node can be a physical server (namely, a server with an entity) or a virtual server (namely, a cloud server). The Pod is used as a minimum unit of resource scheduling in the Kubernetes platform, and the Kubernetes platform schedules the Pod to an optimal node according to a static resource attribute of an operating environment, wherein the static resource attribute can be the CPU load of the node, the residual memory condition of a working node and the like. However, the nodes screened out in the above manner may not be able to create Pod normally due to Pod sandbox creation failure, Pod synchronization failure, or other errors. In view of this, an embodiment of the present application provides a scheduling optimization method.
Referring to the schematic architecture diagram shown in fig. 1a, in the embodiment of the present application, the default scheduler and the bypass scheduler are included, and the bypass scheduler specifically includes three sub modules, namely, a Web Server (Web Server), a Scorer (Scorer), and a Data Manager (Data Manager). The default scheduler is mainly used for preliminarily screening all nodes in the Kubernets cluster, calling the bypass scheduler through an HTTP interface provided by a Web Server, and determining a comprehensive value of the nodes meeting the requirement by the Scorer based on the priority and the evaluation value of the nodes, wherein the evaluation value is generated by the Scorer; and the default scheduler selects a proper node as a node for scheduling the Pod to be scheduled based on the comprehensive value. And the Data Manager is used for maintaining the comprehensive value of each node in the Kubernets cluster, and updating the comprehensive value of the corresponding node, reserving a new comprehensive value and abandoning the original comprehensive value when receiving the evaluation Data sent by the Scorer.
Referring to the flowchart illustrated in fig. 1b, a scheduling optimization method proposed by the embodiment of the present application is described.
S101: and screening each node in the cluster according to the demand of the Pod to be scheduled on the CPU memory resource.
According to the above description, a plurality of nodes capable of operating Kubernets function component form a Kubernets cluster, and assuming that there are 8 nodes in total, each node can operate Kubernets function component, and the 8 nodes form a Kubernets cluster; assuming that there are 8 nodes in total, of which 5 nodes can run the Kubernets feature, the 5 nodes form a Kubernets cluster.
Nodes in a Kubernets cluster can be divided into two categories, one is a control node and the other is a load node. Traversing all nodes in the Kubernets cluster according to the demand of the Pod to be scheduled on the CPU memory resource, filtering out the nodes which do not meet the conditions, and ensuring that the screened nodes can provide enough CPU memory resource for the Pod to be scheduled; assuming that all nodes in the Kubernets cluster do not satisfy the condition, the Pod will be in a Pending (Pending) state all the time, and the Pod will not end the Pending state until a node satisfies the condition.
S102: and determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod creation fails within a preset time period of each node in the cluster, and the evaluation values represent the probability of the Pod being successfully created on the nodes.
In the conventional scheduling process, if a plurality of nodes meeting the requirements are screened out in step 101, the nodes with the highest priority are selected to deploy Pod applications according to the node priority ranking. However, when creating a Pod, a node obtained through screening based on static resource attributes may fail to create the Pod normally due to Pod creation failure, Pod synchronization failure, or other errors. Therefore, the obtained nodes are subjected to secondary screening based on the evaluation values, namely the priority of the screened nodes and the corresponding evaluation values are weighted and summed to obtain the comprehensive value of the corresponding nodes, and therefore the probability of successfully creating the Pod can be greatly improved.
Next, referring to the schematic diagram of the architecture shown in fig. 2a, the evaluation value generation system will be described. The system comprises a log collection module, a log aggregation module and a bypass scheduler. The system comprises a log acquisition module, a log processing module and a log processing module, wherein the log acquisition module is used for acquiring logs of Pod failure created by each node in a Kubernets cluster within a preset time period as the name suggests; the log aggregation module is specifically divided into a log processing submodule and a log aggregation submodule, log types are divided into event-level logs, application-level logs and system-level logs in the embodiment of the application, in order to avoid the confusion of different types of logs, a corresponding log processing submodule is respectively established for each type of logs, namely the log processing submodule can only process the type of logs, and the log aggregation submodule is used for performing aggregation processing to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is established at the same node for the same application service; the bypass scheduler generates an evaluation value based on the Pod creation failure rate of the node, and then generates a comprehensive value based on the priority and the evaluation value of the node.
Referring to the flow chart shown in fig. 2b, the process of generating the evaluation value will be described again.
S201: and collecting logs of Pod failure created by each node in the cluster within a preset time period.
Running a log collection service on each node of the Kubernets cluster, and if the node is a control node, collecting three types of logs on the node by the log collection service, wherein the three types of logs are an event-level log, an application-level log and a system-level log; if the node is a load node, the log collection service only collects the application-level log and the system-level log on the node.
The event-level log comprises container state data in the Pod and can be acquired through a Kubernets management Application Program Interface (API); the application-level logs comprise program output logs in the container and can be acquired through a Docker management API; the system level log is mainly log data generated by operating a Docker service and a Kubelet service, comprises service state data and a systemd output log, can be acquired through the management work of the Systemctl, and can be acquired through a jounalctl tool.
And after collecting all log data in a preset time period of the node, the log collection service on each node only reserves the logs with Pod creation failures, and in order to save bandwidth, each node compresses and packs the reserved logs in batches and sends the compressed and packed logs to a log processing unit for further processing, wherein one log at least comprises a resource identifier, a node identifier, a log type, a time stamp for collecting the log and log error information.
S202: and aggregating the collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when the same application service creates a Pod on the same node.
And each node in the Kubernets cluster simultaneously sends the collected log set to the three log processing submodules. Any log processing submodule X reads a log set X ', and carries out decompression operation on the log set X ' to obtain all logs in the log set X '; then, the log processing submodule X reads a log Y, if the log type of the log Y does not conform to the log type which can be processed by the log processing submodule X, the log processing submodule X filters the log Y and does not process the log; if the log type of the log Y meets the requirement, the log processing submodule X takes a resource identifier, a node identifier and a Pod name carried by the log Y as a positioning identifier of the log Y, wherein the resource identifier is used for determining an application service running on the created Pod, and meanwhile, the log processing submodule X also determines the error type of the log Y according to log error information of the log Y and takes < positioning identifier, error type > as a label of the log Y; and finally, the log processing submodule X compresses and packs the processed multiple logs Y and sends the compressed and packed logs Y to the log aggregation submodule for further processing.
After receiving the log sets processed by the three log processing sub-modules, the log aggregation sub-module decompresses the log sets, aggregates the logs according to the resource identifiers, the node identifiers, the error types and the log types of the logs to obtain a plurality of log sets, wherein the positioning identifier of each log set is < resource identifier, node identifier, error type and log type >, and the log in each log set is represented as a failure log generated due to the same error type when a Pod is created on the same node for the same application service.
S203: and respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set.
For any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating a second ratio between the total number of the Pod contained in each log set and the total number of the occupied Pod on the node for the plurality of log sets, executing weighted summation operation based on each second ratio, and determining the corresponding Pod creation failure rate.
The resource identifier can determine not only the application service running on the created Pod, but also which Pod the application service runs on, that is, the resource identifier at least includes two kinds of identification information, namely, the application service name and the Pod name, so that the Scorer can determine the total number of pods included in one log set according to the resource identifier of the log. If two logs with different collection times but identical positioning identifiers exist in the log set, the number of Pod contained in the two logs can only be counted as 1, and cannot be counted as 2.
S204: and calculating to obtain the evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
The Scorer logarithmically acquires the failure rate of each Pod creation to obtain an initial evaluation value of each node in the cluster; and then, taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster. Wherein the formula for calculating the evaluation value of one node is
Figure BDA0002842435830000091
Is the comprehensive value of the node, N represents the total number of logs contained in each log set under the node, N is the total number of Pods occupied on the node, a, b and c are weight parameters corresponding to an event level log, an application level log and a system level log in sequence, and X is the weight parameter corresponding to the event level log, the application level log and the system level logi、YiAnd ZiThe total number of the Pod contained in the event-level log, the total number of the Pod contained in the application-level log and the total number of the Pod contained in the system-level log are sequentially arranged.
S103: and selecting the nodes with the comprehensive value exceeding the preset threshold value as the nodes for scheduling the Pod to be scheduled.
In the embodiment of the application, one node can be selected from the nodes of which the comprehensive value exceeds the preset threshold value and used as the node for scheduling the Pod to be scheduled; a more preferred embodiment may also be used, that is, the node with the highest composite value is used as the node for scheduling the Pod to be scheduled.
Referring to the schematic structural diagram shown in fig. 3, the schedule optimization apparatus may include a primary screening unit 301, a schedule optimization unit 302, a determination unit 303, and a generation unit 304, wherein,
a primary screening unit 301, configured to screen each node in the cluster according to a demand of the to-be-scheduled container group Pod for a CPU memory resource of the central processing unit;
a scheduling optimization unit 302, configured to determine a comprehensive value of the screened nodes according to priorities of the screened nodes and evaluation values of the screened nodes, where the evaluation values are generated based on logs of failures in creating Pod within a preset time period of each node in the cluster, and the evaluation values represent probabilities of successfully creating Pod on the nodes;
a determining unit 303, configured to select a node from nodes whose integrated value exceeds a preset threshold as a node for scheduling the Pod to be scheduled.
Optionally, the apparatus further comprises a generating unit 304, and the evaluation value is generated by:
collecting logs of Pod failure created by each node in the cluster within a preset time period;
aggregating all collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is created at the same node for the same application service;
respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set;
and calculating to obtain an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
Optionally, the generating unit 304 is further configured to:
determining the error type of each log based on the log error information of each log;
aggregating each log to obtain a plurality of log sets, including:
according to the resource identification, the node identification, the error type and the log type of the logs, carrying out aggregation processing on the logs to obtain a plurality of log sets;
and the resource identifier of the log is used for determining the application service running on the created Pod.
Optionally, the generating unit 304 is configured to:
for any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating second ratios between the total number of the Pod contained in each log set and the total number of the Pod occupied on the node for the plurality of log sets, executing weighted summation operation based on the second ratios, and determining the corresponding Pod creation failure rate.
Optionally, the generating unit 304 is configured to:
logarithm is taken on the failure rate of each Pod creation to obtain the initial evaluation value of each node in the cluster;
and taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster.
In some possible implementations, embodiments of the present application further provide an electronic device, which, referring to fig. 4, may include at least one processor 401 and at least one memory 402. The memory 402 stores therein program code, which when executed by the processor 401, causes the processor 401 to perform the steps of the schedule optimization method according to various exemplary embodiments of the present application described above in the present specification. For example, the processor 401 may perform the steps as shown in fig. 1 b.
In some possible embodiments, the aspects of the traffic control method provided in the present application may also be implemented in the form of a program product, which includes program code for causing an electronic device to perform the steps in the scheduling optimization method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the electronic device, for example, a computer device may perform the steps as shown in fig. 1 b.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product for traffic control of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user equipment, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (12)

1. A method for scheduling optimization, comprising:
screening each node in the cluster according to the demand of the container group Pod to be scheduled on the CPU memory resource of the central processing unit;
determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod creation fails within a preset time period of each node in the cluster, and the evaluation values represent the probability of the Pod creation on the nodes;
and selecting the nodes with the comprehensive value exceeding a preset threshold value as the nodes for scheduling the Pod to be scheduled.
2. The method of claim 1, wherein the evaluation value is generated by:
collecting logs of Pod failure created by each node in the cluster within a preset time period;
aggregating all collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is created at the same node for the same application service;
respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set;
and calculating to obtain an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
3. The method of claim 2, wherein after collecting logs of which the nodes in the cluster have failed to create Pod within a preset time period, and before performing aggregation processing on the logs to obtain a plurality of log sets, the method further comprises:
determining the error type of each log based on the log error information of each log;
aggregating each log to obtain a plurality of log sets, including:
according to the resource identification, the node identification, the error type and the log type of the logs, carrying out aggregation processing on the logs to obtain a plurality of log sets;
and the resource identifier of the log is used for determining the application service running on the created Pod.
4. The method of claim 2, wherein calculating the Pod creation failure rate of each node in the cluster from the log set of each node in the cluster comprises:
for any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating second ratios between the total number of the Pod contained in each log set and the total number of the Pod occupied on the node for the plurality of log sets, executing weighted summation operation based on the second ratios, and determining the corresponding Pod creation failure rate.
5. The method of claim 2, wherein calculating the evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster comprises:
logarithm is taken on the failure rate of each Pod creation to obtain the initial evaluation value of each node in the cluster;
and taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster.
6. A scheduling optimization apparatus, comprising:
the primary screening unit is used for screening each node in the cluster according to the requirement of the container group Pod to be scheduled on the CPU memory resource;
the scheduling optimization unit is used for determining a comprehensive value of the screened nodes according to the priorities of the screened nodes and evaluation values of the screened nodes, wherein the evaluation values are generated based on logs of which the Pod creation fails in a preset time period of each node in the cluster, and the evaluation values represent the probability of the Pod being successfully created on the nodes;
and the determining unit is used for selecting the nodes with the comprehensive values exceeding a preset threshold value as the nodes for scheduling the Pod to be scheduled.
7. The apparatus of claim 6, further comprising a generation unit, wherein the evaluation value is generated by:
collecting logs of Pod failure created by each node in the cluster within a preset time period;
aggregating all collected logs to obtain a plurality of log sets, wherein the logs in each log set are failure logs generated due to the same error type when a Pod is created at the same node for the same application service;
respectively determining the Pod creation failure rate of each node in the cluster according to the obtained log set;
and calculating to obtain an evaluation value of each node in the cluster based on the Pod creation failure rate of each node in the cluster.
8. The apparatus of claim 7, wherein the generating unit is further to:
determining the error type of each log based on the log error information of each log;
aggregating each log to obtain a plurality of log sets, including:
according to the resource identification, the node identification, the error type and the log type of the logs, carrying out aggregation processing on the logs to obtain a plurality of log sets;
and the resource identifier of the log is used for determining the application service running on the created Pod.
9. The apparatus of claim 7, wherein the generating unit is to:
for any node, if the log corresponding to the node is only in one log set, calculating a first ratio between the total number of the Pod contained in the log set and the total number of the Pod occupied on the node, and determining a corresponding Pod creation failure rate based on the first ratio and a preset weight parameter; or
And for any node, if the log corresponding to the node is in a plurality of log sets, respectively calculating second ratios between the total number of the Pod contained in each log set and the total number of the Pod occupied on the node for the plurality of log sets, executing weighted summation operation based on the second ratios, and determining the corresponding Pod creation failure rate.
10. The apparatus of claim 7, wherein the generating unit is to:
logarithm is taken on the failure rate of each Pod creation to obtain the initial evaluation value of each node in the cluster;
and taking the negative number of each initial evaluation value to obtain the evaluation value of each node in the cluster.
11. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 5.
12. Computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 5, when said program product is run on said electronic device.
CN202011496875.7A 2020-12-17 Scheduling optimization method and device Active CN112527486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011496875.7A CN112527486B (en) 2020-12-17 Scheduling optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011496875.7A CN112527486B (en) 2020-12-17 Scheduling optimization method and device

Publications (2)

Publication Number Publication Date
CN112527486A true CN112527486A (en) 2021-03-19
CN112527486B CN112527486B (en) 2024-07-26

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297568A (en) * 2021-06-04 2021-08-24 国网汇通金财(北京)信息科技有限公司 Sandbox-based data processing method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378775A1 (en) * 2014-06-26 2015-12-31 Amazon Technologies, Inc. Log-based transaction constraint management
CN107590008A (en) * 2017-08-02 2018-01-16 中国科学院计算技术研究所 A kind of method and system that distributed type assemblies reliability is judged by weighted entropy
CN109361750A (en) * 2018-10-24 2019-02-19 上海精数信息科技有限公司 Resource allocation methods, device, electronic equipment, storage medium
CN111143059A (en) * 2019-12-17 2020-05-12 天津大学 Improved Kubernetes resource scheduling method
CN111190875A (en) * 2019-12-27 2020-05-22 航天信息股份有限公司 Log aggregation method and device based on container platform
CN111221631A (en) * 2018-11-23 2020-06-02 ***通信集团有限公司 Task scheduling method, device and storage medium
CN111857990A (en) * 2020-06-23 2020-10-30 苏州浪潮智能科技有限公司 Method and system for enhancing YARN long type service scheduling
CN111966500A (en) * 2020-09-07 2020-11-20 网易(杭州)网络有限公司 Resource scheduling method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378775A1 (en) * 2014-06-26 2015-12-31 Amazon Technologies, Inc. Log-based transaction constraint management
CN107590008A (en) * 2017-08-02 2018-01-16 中国科学院计算技术研究所 A kind of method and system that distributed type assemblies reliability is judged by weighted entropy
CN109361750A (en) * 2018-10-24 2019-02-19 上海精数信息科技有限公司 Resource allocation methods, device, electronic equipment, storage medium
CN111221631A (en) * 2018-11-23 2020-06-02 ***通信集团有限公司 Task scheduling method, device and storage medium
CN111143059A (en) * 2019-12-17 2020-05-12 天津大学 Improved Kubernetes resource scheduling method
CN111190875A (en) * 2019-12-27 2020-05-22 航天信息股份有限公司 Log aggregation method and device based on container platform
CN111857990A (en) * 2020-06-23 2020-10-30 苏州浪潮智能科技有限公司 Method and system for enhancing YARN long type service scheduling
CN111966500A (en) * 2020-09-07 2020-11-20 网易(杭州)网络有限公司 Resource scheduling method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297568A (en) * 2021-06-04 2021-08-24 国网汇通金财(北京)信息科技有限公司 Sandbox-based data processing method and system
CN113297568B (en) * 2021-06-04 2024-04-30 国网汇通金财(北京)信息科技有限公司 Data processing method and system based on sandboxes

Similar Documents

Publication Publication Date Title
Yao et al. Fog resource provisioning in reliability-aware IoT networks
WO2021213293A1 (en) Ubiquitous operating system oriented toward group intelligence perception
CN107291545B (en) Task scheduling method and device for multiple users in computing cluster
CN108632365B (en) Service resource adjusting method, related device and equipment
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
CN111414233A (en) Online model reasoning system
CN111355606B (en) Web application-oriented container cluster self-adaptive expansion and contraction system and method
US20200159622A1 (en) Rule based failure addressing
CN115543577B (en) Covariate-based Kubernetes resource scheduling optimization method, storage medium and device
CN111770162A (en) Network bandwidth limiting method, device, main node and storage medium
CN114493373B (en) Emergency task processing method and device in remote sensing satellite processing system
US20070198697A1 (en) Method of refactoring methods within an application
CN114911615B (en) Intelligent prediction scheduling method and application during micro-service running
CN110781180A (en) Data screening method and data screening device
CN111984505A (en) Operation and maintenance data acquisition engine and acquisition method
WO2024045784A1 (en) Job scheduling method, scheduler, and related device
CN115665157B (en) Balanced scheduling method and system based on application resource types
CN108427599A (en) Method, apparatus and storage medium is uniformly processed in asynchronous task
CN110971532B (en) Network resource management method, device and equipment
CN112527486B (en) Scheduling optimization method and device
CN112527486A (en) Scheduling optimization method and device
CN116700929A (en) Task batch processing method and system based on artificial intelligence
CN110515716A (en) It is a kind of to support priority and anti-affine cloud Optimization Scheduling and system
CN114443293A (en) Deployment system and method for big data platform
CN115309501A (en) Cluster resource planning method, device, apparatus and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant