CN111431748B

CN111431748B - Method, system and device for automatically operating and maintaining cluster

Info

Publication number: CN111431748B
Application number: CN202010202542.2A
Authority: CN
Inventors: 乔彦辉
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2022-09-30
Anticipated expiration: 2040-03-20
Also published as: CN111431748A

Abstract

The embodiment of the specification discloses a method for automatically operating and maintaining a cluster. The method comprises the following steps: the characteristic parameters of at least one target file can be acquired respectively. A cluster may be determined based on the characteristic parameters, the cluster including at least one server for supporting the target file. The at least one target file may be distributed to the at least one server. The operating parameters of the cluster in the operating process of the target file can be obtained. The cluster may be dynamically adjusted based on the operating parameter, the dynamic adjustment including at least expanding or contracting the cluster. The method disclosed by the embodiment of the specification can solve the problem when the target file is deployed on the cluster for the first time, and the full-life-cycle operation and maintenance of the target file is performed, so that the automatic operation and maintenance capability of the cluster is improved.

Description

Method, system and device for automatically operating and maintaining cluster

Technical Field

The present disclosure relates to the field of computers, and in particular, to a method, a system, and an apparatus for performing automatic operation and maintenance on a cluster.

Background

With the development of internet technology, online services have become more and more popular. Some online services, such as user preference prediction, require a cluster of multiple servers to run one or more files (e.g., predictive model files) at the same time to be implemented. In practical situations, for various reasons, a situation may occur where a cluster cannot meet service requirements (e.g., the number of servers or configuration cannot meet smooth operation of files), in which case a new server needs to be added to an existing cluster, i.e., the cluster is expanded. It may also happen that cluster access pressure is low (e.g. there are idle servers), and it is desirable to reduce the number of servers in the cluster that are served, i.e. to scale down the cluster. At present, the deployment, the capacity expansion and the capacity reduction of services in a cluster are generally completed by manual operation, so that the operation is troublesome, and the real-time and rapid automatic operation and maintenance of the cluster are difficult to realize.

Therefore, it is desirable to provide a method for quickly and reliably automating operation and maintenance of a cluster.

Disclosure of Invention

One aspect of the embodiments of the present specification provides a method for automatically operating and maintaining a cluster. The cluster is used for supporting one or more files to run, wherein the method can comprise the following steps: the characteristic parameters of at least one target file can be acquired respectively. A cluster may be determined based on the characteristic parameters, the cluster including at least one server for supporting the target file. The at least one target file may be distributed to the at least one server. The operating parameters of the cluster in the operating process of the target file can be obtained. The cluster may be dynamically adjusted based on the operating parameter, the dynamic adjustment including at least expanding or contracting the cluster.

Another aspect of the embodiments of the present specification provides a system for performing automatic operation and maintenance on a cluster, where the system includes a first obtaining module, which may be configured to obtain characteristic parameters of at least one target file respectively. A first determining module may be configured to determine a cluster based on the characteristic parameters, the cluster including at least one server for supporting the target file. An allocation module may be configured to allocate the at least one target file to the at least one server. The second obtaining module may be configured to obtain an operation parameter of the cluster in the operation process of the target file. A first adjusting module, configured to dynamically adjust the cluster based on the operating parameter, where the dynamic adjustment at least includes performing capacity expansion or capacity reduction on the cluster.

One aspect of an embodiment of the present specification provides a service automation operation and maintenance method, where the method includes: characteristic parameters related to a service to be deployed can be acquired, the service at least comprises an online reasoning service, and the characteristic parameters at least comprise characteristic parameters of a file for supporting the service. A cluster can be determined based on the characteristic parameters, and the service to be deployed is deployed by utilizing the cluster; wherein the cluster comprises at least one server for deploying the service to be deployed. The operating parameters of the cluster during execution of the deployed service may be obtained. The cluster deploying the service may be dynamically adjusted based on an operating parameter of the cluster, the dynamic adjustment including at least capacity expansion or capacity reduction of the servers in the cluster.

Another aspect of an embodiment of the present specification provides a service automation operation and maintenance system, wherein the system includes: a third obtaining module, configured to obtain feature parameters related to a service to be deployed, where the service includes at least an online inference service, and the feature parameters include at least feature parameters of a file used to support the service. A second determining module, configured to determine a cluster based on the feature parameter, and deploy the service to be deployed by using the cluster; wherein the cluster comprises at least one server for deploying the service to be deployed. The fourth obtaining module may be configured to obtain an operation parameter of the cluster in an execution process of the deployed service. A second adjusting module, configured to dynamically adjust, based on an operating parameter of the cluster, the cluster that deploys the service, where the dynamic adjustment at least includes performing capacity expansion or capacity reduction on the servers in the cluster.

Another aspect of the embodiments of the present specification provides an apparatus for automatically operating and maintaining a cluster, which includes a processor, where the processor is configured to execute a method for automatically operating and maintaining a cluster or a service automation operation and maintenance method.

Another aspect of an embodiment of the present specification provides a service automation operation and maintenance device, including a processor, configured to execute a service automation operation and maintenance method.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is an exemplary flow diagram of a method for automatic operation and maintenance of a cluster, according to some embodiments herein;

FIG. 2 is an exemplary flow diagram illustrating determining a cluster according to some embodiments of the present description;

FIG. 3 is an exemplary flow diagram illustrating dynamic adjustment of a cluster according to some embodiments of the present description;

FIG. 4 is an exemplary flow diagram illustrating dynamic adjustment of a cluster according to some embodiments of the present description;

FIG. 5 is a block diagram of a system for automated operations and maintenance of a cluster in accordance with certain embodiments of the present description;

FIG. 6 is an exemplary flow diagram of a service automation operation and maintenance method according to some embodiments described herein;

FIG. 7 is a block diagram of a service automation operation and maintenance system in accordance with certain embodiments of the present description;

fig. 8 is a schematic diagram of an exemplary determination cluster shown in accordance with some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

The embodiment of the specification can be applied to operation and maintenance of the cluster. The operation and maintenance of the cluster comprise deployment and release of online services, capacity expansion and capacity reduction of the cluster and the like. A cluster is a group of mutually independent computers interconnected by a high-speed network, which form a group, and the cluster operates like an independent server when files are operated for some purposes (for example, for online service provision), and can be managed in a single system mode. The automated operation and maintenance is to change daily static operation and maintenance deployment, such as file allocation and recovery, cluster capacity expansion and contraction, server restart, and the like, into automated autonomous decision making, so as to improve the quality and efficiency of operation and maintenance and reduce the operation and maintenance cost.

In the related art, the focus of performing automation operation and maintenance on a cluster is after file allocation (for example, a service is on line), that is, after the file allocation is completed, the cluster is subjected to capacity expansion and capacity reduction in the operation process. No consideration is given to the initial allocation of files, e.g., how large a cluster needs to be when allocating files is not considered or not considered. Since the workload of evaluating the size of the cluster is large and the evaluation difficulty is also large, the size of the cluster is generally specified by the staff. Therefore, after the file is allocated, a situation that the cluster cannot meet the file operation requirement may occur, the cluster needs to be expanded, and a situation that the cluster access pressure is low may occur, and the cluster needs to be reduced.

The embodiment of the specification provides a method for automatically operating and maintaining a cluster, which solves the problem of subsequent capacity expansion or capacity reduction of the cluster caused by incomplete consideration during initial file allocation by predicting the size of the cluster required during file allocation, and dynamically adjusts the cluster by acquiring the operating parameters of the cluster in the file operating process, so that the quality and the efficiency of operation and maintenance are improved, and the operation and maintenance cost is effectively reduced. The embodiment of the specification also provides a full-life-cycle automatic operation and maintenance method for the service, and the method is used for automatically maintaining the cluster supporting the service in the processes of initial deployment, online operation and offline of the service. The technical solutions disclosed in the present specification are explained in detail by the description of the drawings below.

Fig. 1 is an exemplary flow diagram of a method for automatically operating and maintaining a cluster according to some embodiments of the present disclosure. In some embodiments, flow 100 may be performed by a processing device. For example, the process 100 may be stored in a storage device (such as an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 100. For another example, the process 100 may be implemented by the cluster operation and maintenance system 500 on a processing device. As shown in fig. 1, the process 100 may include the following steps:

step 102, respectively obtaining characteristic parameters of at least one target file. Step 102 may be performed by the first obtaining module 510.

In some embodiments, the cluster may refer to a collection of one or more servers (or containers) that may support one or more file operations, e.g., by a single server or by a combination of servers in the cluster. In this specification, a container is a general term in the field of computers, and a container and a server may be used interchangeably. In some embodiments, the target file may refer to a model file, a system program, an application program, a process or routine, etc. that needs to be distributed to the servers of the cluster and run. The configuration (or parameters) of the server required at runtime for different object files is different. For example, the size of the server memory and CPU required for a larger target file (e.g., 10G) is different from the size of the server memory and CPU required for a smaller target file (e.g., 100M). And the reasonable distribution of different target files to the matched servers is beneficial to the maximum utilization of resources. In some embodiments, the target file may be a model file. For example, various models for prediction or classification, operating individually or in concert to predict or classify objects to which input data corresponds, such as to predict user preferences. The characteristic parameter of the target file may be a parameter or data describing or characterizing the target file, such as a file size, a file type, a support frame, and the like. When the target file is a model file, the characteristic parameters may include a size of the model file, an algorithm type, a machine learning framework, and the like. For example, the model file size may be 10 megabits, 100 megabits, 1G, etc., the algorithm type may be Logistic Regression (LR, Logistic Regression), Support Vector Machine (SVM), Deep Neural Network (DNN), Convolutional Neural Network (CNN), etc., and the Machine learning framework may be tensoflow, scit-left, etc.

In some embodiments, the target files may be one or more, and the first obtaining module 510 may obtain the characteristic parameters of the target files respectively by reading stored data, calling related interfaces, reading from the target files, or other means.

Step 104, determining a cluster based on the characteristic parameters, wherein the cluster comprises at least one server for supporting the target file. Step 104 may be performed by the first determination module 520.

In some embodiments, the determining the cluster may refer to determining the number of servers (or containers) for supporting the target file and a configuration of each server (or container), such as a CPU size and a memory size. When the target files such as the model files are operated, the same server only supports the operation of one model file or one type of model file (the same type of machine learning framework) due to different machine learning frameworks. Meanwhile, in order to reasonably utilize resources, model files with different sizes can be distributed to servers with different configurations to run. For example, the server configuration required for a model file of 10M size is different from that required for a model file of 1G size. The reasonable allocation can avoid resource waste.

In some embodiments, the first determination module 520 may determine the cluster based on the characteristic parameter using a rule. Determining clusters using rules may be a process of matching. With a pre-established server configuration and file size matching range, the first determination module 520 determines the configuration of the server supporting the target file by determining the range within which the size of the target file falls. As an example, assume that the set rule is: files with file sizes of 1K to 10M are allocated to a server of 100M, files with file sizes of 10M to 100M are allocated to a server of 1G, files with file sizes of 100M to 1G are allocated to a server of 5G, and files with file sizes of 1G to 10G are allocated to a server of 10G. The rule set by the rule is that three target files with file sizes of 1M, 50M and 2G are respectively distributed to servers of 100M, 1G and 10G.

In some embodiments, the first determination module 520 may utilize an estimation model to predict the number and/or configuration of servers needed to deploy the target file in the cluster based on the characteristic parameters of the target file. The first determining module 520 may input the characteristic parameters of the target file into the estimation model to directly obtain the server adapted to the target file. For example, taking a model file as an example, assuming that the model size is 100 megabits, the algorithm type is LR, and the machine learning framework is tensoflow, the first determining module 520 may input the above feature parameters into the estimation model, and directly obtain a server that is adapted to the model file and is 1G. The estimation model can be obtained by training by using the existing file-server matching data. For example, model files, for each model file, even if a server is randomly allocated to the model file, the optimal server to be adapted to the model file can be obtained through intervention and adjustment in the subsequent operation process. This can form a match data before, i.e. how large, what type of algorithm, what kind of model file of the machine learning framework is the best adaptation server is. The estimation model can be trained by using the data, and the estimation model can be used for prediction of the cluster after completion.

For more description on determining clusters based on feature parameters, reference may be made to the description elsewhere in this specification, for example, fig. 2 and the related description thereof, which are not repeated here.

Step 106, distributing the at least one target file to the at least one server. Step 106 may be performed by the assignment module 530.

In some embodiments, the distribution module 530 may package and send the target file to a server in the cluster adapted to the target file for deployment to complete the distribution of the target file. It will be appreciated that the same server may be assigned to only one target file, or may be assigned to one or more files. For example, two or more model files that are specific to the same machine learning framework may be distributed to one server. The same target file may also be supported by two or more servers.

And 108, acquiring the operation parameters of the cluster in the operation process of the target file. Step 108 may be performed by the second obtaining module 540.

In some embodiments, the operation parameter of the cluster may refer to a real-time performance parameter of a server in the cluster when the server operates the target file, and the real-time performance parameter includes a central processing unit utilization rate, a memory utilization rate, a disk utilization rate of the server, a connection number of the server, a query rate of the server per second, a scheduling request number of the target file, and the like.

In some embodiments, the second obtaining module 540 may obtain the operation parameters of the cluster server during the operation of the target file by monitoring the operation of the cluster in real time. For example, the second obtaining module 540 may learn, by monitoring the operating parameters of the cluster, that the utilization rate of the central processing unit of the current cluster is 20 percent, the memory utilization rate is 50 percent, the number of scheduling requests of the target file is 10000, and the like.

And step 110, dynamically adjusting the cluster based on the operation parameters, wherein the dynamically adjusting at least comprises expanding or contracting the cluster. Step 110 may be performed by the first adjustment module 550.

In some embodiments, the operation parameters of the cluster may reflect the operation state of the cluster in real time, and whether the current server of the cluster can meet the operation requirement of the target file and whether the load of the server supporting the target file exceeds the set operation load may be determined in real time according to the operation parameters of the cluster, so as to perform capacity expansion or capacity reduction on the cluster according to the determination result. The capacity expansion may be understood as increasing the number of servers supporting the target file in the cluster and/or increasing the memory of the servers for running the target file, and the capacity reduction may be understood as reducing the number of servers supporting the target file in the cluster and/or reducing the memory of the servers for running the target file. For example, the capacity expansion may be changed from one server to two servers to support the operation of the target file, or from the allocation of 1G memory to operate the target file to the allocation of 2G memory to operate the target file. The opposite is true for the shrinkage.

In some embodiments, dynamically adjusting a cluster may refer to automatically increasing or decreasing the number of servers of the cluster based on operating parameters of the cluster. For example, the first adjusting module 550 may determine whether the server supporting the target file meets the operation requirement of the target file according to any one or any combination of the central processing unit utilization rate, the memory utilization rate, the disk utilization rate, the connection number of the servers, and the scheduling request number of the target file of the cluster, and if the server supporting the target file does not meet the operation requirement, the cluster is expanded, for example, the scheduling request number of the current target file is 5000, and if the server supporting the target file operation in the current cluster can only meet 4000 scheduling requests, the number of servers needs to be increased to meet the remaining scheduling requests. The running state of the cluster is judged in real time according to the running parameters of the cluster, so that the automatic and rapid capacity expansion and capacity reduction of the cluster are realized according to the change of the running state of the cluster, and the operation and maintenance efficiency of the cluster is improved.

Referring to fig. 8, fig. 8 is a schematic diagram illustrating determining clusters and dynamically adjusting clusters based on feature parameters of a target file according to some embodiments of the present description. As shown in FIG. 8, the first determining module 520 may determine clusters 820 (including the server to which each target file corresponds and its size) based on the characteristic parameters 810 of the target files. And the first adjustment module 550 may dynamically adjust the cluster 820 based on the operating parameters of the cluster 820 to obtain an adjusted cluster 830 (e.g., a change in server size).

In some embodiments, dynamically adjusting the cluster may further include performing reclamation processing on the target file. Further description of dynamic adjustment of clusters based on operational parameters may be found elsewhere in this specification, e.g., fig. 3 and 4 and their associated description.

It will be appreciated that running the target file may achieve certain objectives. For example, assuming that the target file is a classification model file, the classification model file may be run for data classification purposes. If the file is allocated to a server and no call is made for a period of time (e.g., the target file does not need to be run), the server will be idle and resources are wasted. Based on this, the first adjustment module 550 may determine whether the target file is recycled to release the allocated server according to the number of scheduling requests of the target file. For example, if the number of scheduling requests for the target file is less than a predetermined number within a period of time, the first adjustment module 550 may recycle the target file.

It should be noted that the above description related to the flow 100 is only for illustration and description, and does not limit the application scope of the present specification. Various modifications and alterations to process 100 will become apparent to those skilled in the art in light of the present description. However, such modifications and variations are still within the scope of the present specification. For example, other steps are added between the steps, such as a preprocessing step and a storing step.

FIG. 2 is an exemplary flow diagram illustrating determining a cluster according to some embodiments of the present description. In some embodiments, flow 200 may be performed by a processing device. For example, the process 200 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 200. As shown in fig. 2, the process 200 may include the following operations.

At step 202, an estimation model is obtained. Step 202 may be performed by the first acquisition module 510.

In some embodiments, the estimation model may be a trained machine learning model for estimating clusters required for deploying the target file according to the characteristic parameters, for example, estimating the memory size of a single server for supporting the target file.

As an example, the estimation model may be trained in the following manner:

the existing files are deployed in the cluster, after the files are deployed, the scheduling request can be simulated to perform pressure test on the deployed files, so that the amount of memory provided by a server, the occupied central processing unit, the achievable query rate per second, the characteristic parameters of the files and the like can be obtained. And training the model by using the obtained data as training sample data to obtain the estimation model. For example, taking a file as a model prediction service as an example, by performing a stress test, characteristic parameters of the file may be obtained, including a size of the model file, an algorithm type, a machine learning framework, a memory size of a single server actually required for deploying the file, an operation parameter of a cluster during file runtime (for example, a central processing unit utilization rate, a memory utilization rate, a query rate per second that can be achieved, and the like), and a query rate per second that can be supported by the single server after deploying the file. And training the known training data to obtain the estimation model, wherein part of the known training data (for example, the size of a model file, the type of an algorithm and a machine learning framework) is used as the input of the machine learning model, and the other part of the known training data (for example, the memory size of a single server and the query rate per second which can be supported) is used as the output of the machine learning model. The trained machine learning model may then be used to estimate the memory size of the individual servers supporting the target document.

In some embodiments, the estimation model may be stored in a storage device (e.g., an on-board storage unit of the processing device or an external storage device), and the first obtaining module 510 may read the estimation model by communicating with the storage device. The first obtaining module 510 may also obtain the estimation model by calling an associated interface or other means.

Step 204, inputting the characteristic parameters into the estimation model, and determining the cluster. Step 204 may be performed by the first determination module 550.

In some embodiments, after the characteristic parameters of the target file are input to the estimation model, the estimation model may output a result for determining the cluster, and then a desired cluster may be determined according to the output result of the estimation model.

In some embodiments, the input of the estimation model is the characteristic parameters of the target file, and the output of the estimation model is the memory size of the single server required for deploying the target file, and the query rate per second that can be achieved by the single server. Further, the cluster may be determined based on the memory size, the query rate per second, and the expected query rate per second. Wherein the expected query rate per second is the query rate per second that can be supported by the target file runtime that the user desires to deploy.

Specifically, the expected query rate per second may be divided by the query rate per second that can be achieved by a single server, i.e., the number of servers needed may be obtained, i.e., the cluster may be determined. For example, a query rate per second is 1000, in the feature parameters of the target file, the model size is 10 million, the algorithm type is LR, and the machine learning framework is tensorflow, after the feature parameters are input into the estimation model, the output of the estimation model is 500 million, and the query rate per second that can be achieved by a single server is 100, then the determined cluster is 1000/100-10 servers, and the memory size of a single server is 500 million, that is, 10 servers with a memory size of 500 million are required.

It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and changes to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are still within the scope of the present specification. For example, other steps are added between the steps, such as a preprocessing step and a storing step.

FIG. 3 is an exemplary flow diagram illustrating dynamic adjustment of a cluster according to some embodiments of the present description. In some embodiments, flow 300 may be performed by a processing device. For example, the process 300 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 300. In some embodiments, the flow 300 may be performed by a first adjustment module 550 located on a processing device. As shown in fig. 3, the process 300 may include the following operations.

Step 302, a first decision rule is obtained.

In some embodiments, the operating parameters of the cluster include server resource utilization, which may refer to central processor utilization, memory utilization, disk utilization, and the like. The first determination rule may include that the resource utilization rate of a server in the cluster for supporting the target file to run is in a normal running range, is lower than a lower preset threshold, and is higher than an upper preset threshold. For example, the utilization rate of the central processing unit of the server is 20% -40% of the normal operation range, the utilization rate of the central processing unit of the server is lower than the lower limit preset threshold value by 20%, and the utilization rate of the central processing unit of the server is higher than the upper limit preset threshold value by 40%. For another example, the memory utilization rate of the server is in a range of 10% -60% of the normal operation range, the memory utilization rate of the server is lower than a lower-limit preset threshold value by 30%, the memory utilization rate of the server is higher than an upper-limit preset threshold value by 60%, and the like. It should be understood that the above operating ranges, preset thresholds, are merely examples and are not intended to limit the scope.

In some embodiments, whether to obtain the first decision rule may be determined by monitoring an operating parameter of the cluster. For example, when the query rate per second of the cluster is monitored to increase or decrease by more than a certain ratio, for example, when the query rate per second of the cluster is monitored to increase by 20 percent, a first determination rule may be obtained to further determine whether dynamic adjustment of the cluster is required.

In some embodiments, the first decision rule may be obtained by reading stored data, invoking an associated interface, or otherwise.

Step 304, determining whether the server resource utilization rate matches the first determination rule.

In some embodiments, the matching of the server resource utilization rate with the first determination rule may be to determine whether any of the resource utilization rates of the servers is higher than an upper limit preset threshold in the first determination rule, and whether the resource utilization rate of the servers is lower than a lower limit preset threshold in the first determination rule, and if so, dynamically adjust the cluster. For example, it is known through monitoring that the resource utilization of the current server is 50% of the central processing unit utilization, and the memory utilization is 50%. In the first determination rule, the normal operation range of the central processing unit is 20% -40%, the normal operation range of the memory utilization rate is 30% -60%, and after the determination, the utilization rate of the central processing unit is higher than the upper limit preset threshold value, and the memory utilization rate is in the normal operation range, then it can be determined that the server resource utilization rate is matched with the rule of the first determination rule, which is higher than the upper limit preset threshold value. The server resource utilization rate is matched with the lower limit preset threshold in the first determination rule, and the central processing unit utilization rate, the memory utilization rate, the disk utilization rate and the like included in the server resource utilization rate are required to be lower than the corresponding lower limit preset threshold. For example, the cpu utilization is 30% and the memory utilization is 20%. In the first determination rule, the normal operation range of the central processing unit is 20% -40%, the normal operation range of the memory utilization rate is 30% -60%, the central processing unit utilization rate is within the normal operation range after the determination, the memory utilization rate is lower than the lower limit preset threshold value, and at this time, the current server resource utilization rate is not matched with the rule which is lower than the lower limit preset threshold value in the first determination rule.

And 306, if so, dynamically adjusting the cluster according to a preset adjustment rule.

In some embodiments, the preset adjustment rule may be to increase or decrease the number of servers according to a preset ratio. For example, if any one of the server resource utilization rates matches with the value higher than the preset threshold in the first determination rule, the number of servers may be increased according to the number of servers currently supporting the target file according to a preset ratio. If the server resource utilization rate matches the normal range of the cluster operating parameters in the first decision rule, the cluster may not be adjusted. If the server resource utilization rate is matched with the value lower than the preset threshold value in the first judgment rule, the number of servers can be reduced according to the number of the servers currently supporting the target file according to a preset proportion.

Specifically, the dynamic adjustment of the cluster deploying the service according to the preset adjustment rule may be performed in the following manner.

The number of at least one server, and the system parameters of each server may be obtained. The number of servers is the number of servers in the cluster that support the target file, e.g., 5, 10, 20, etc. The system parameters of the server may be a central processor utilization of the server, e.g., 10%, 20%, 30%, etc., a memory utilization of the server, e.g., 10%, 20%, 30%, and a per-second access of the server, e.g., 1000, 10000, 20000, etc. Based on the number and the system parameter, capacity expansion or capacity reduction can be performed on the server according to a preset proportion. The expanding or contracting at least comprises increasing or decreasing the number of servers supporting the target file and increasing or decreasing the memory usage amount of the servers supporting the target file. For example, assuming that the number of servers is 10, the size of the memory of the server is 16G, when the central processing unit utilization rate of the server is 30% and exceeds the upper limit preset threshold, the number of servers may be increased according to a proportion of (1+ 30%) to 40% by 10, after calculation, 5.2 servers are obtained to be added, and the number of servers to be added is obtained by rounding up to 6. Wherein 40% is an increased proportion of the number of servers. For another example, assuming that the upper limit preset threshold of the memory utilization rate of the server is 60% of the memory size of the current server, when the memory utilization rate is greater than 60%, the capacity expansion may be performed by multiplying the real memory usage amount of the target file in the server by a ratio of 2, so as to increase the memory usage amount of the server supporting the target file. It should be understood that the above examples are merely exemplary and are not intended to limit the manner in which a server may be expanded or contracted.

It should be noted that the above description of the process 300 is for illustration and description only and is not intended to limit the scope of the present disclosure. Various modifications and changes to flow 300 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, other steps are added between the steps, such as a preprocessing step and a storing step.

FIG. 4 is an exemplary flow diagram illustrating dynamic adjustment of a cluster according to some embodiments of the present description. In some embodiments, flow 400 may be performed by a processing device. For example, the process 400 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 400. In some embodiments, the flow 400 may be performed by a first adjustment module 550 located on a processing device. As shown in fig. 4, the flow 400 may include the following operations.

Step 402, a second decision rule is obtained.

In some embodiments, dynamically adjusting the cluster may further include performing reclamation processing on the at least one target file based on an operating parameter of the cluster. When the target file is not called any more within a period of time, the target file is recycled, so that the automatic operation and maintenance of the cluster can comprise the whole process from the first online to the last offline recovery of the target file, the automatic operation and maintenance of the full life cycle of the target file is realized, the waste of server resources is effectively avoided, and the automatic operation and maintenance capability is improved.

In some embodiments, the operating parameters of the cluster may further include a file call amount. The file calling amount is the number of times that the target file is called in continuous time. For example, the target file is called 10 times, 100 times, 10000 times, etc. in total for three consecutive days. In some embodiments, the target file may be recycled based on the file call amount.

In some embodiments, the second determination rule may be that the file is not called within a preset time, and then the file is recycled. The preset time may be continuous for a certain period of time, for example, for 3 days, 5 days, 10 days, etc. For example, the second determination rule may be that the target file is not called for 10 consecutive days, or that the file is subjected to the collection process or the like if the target file is not called for 15 consecutive days.

Step 404, determining whether the file call amount matches the second determination rule.

In some embodiments, the file allocation amount of the target file may be obtained by monitoring the cluster. The obtained file call amount may be compared with a second determination rule to determine whether the file call amount matches the second determination rule. For example, if the obtained file call amount is that the target file has not been called for 10 consecutive days, and the second criterion rule is that the target file has not been called for 10 consecutive days, the file call amount matches the second criterion rule.

And 406, if yes, recycling the target file.

In some embodiments, the recovery processing may be offline processing of the target file deployed on the cluster, and the server in the cluster for supporting the target file is released, so that the server for supporting the target file can be used for supporting other files after the target file is offline, thereby avoiding resource waste.

In some embodiments, when it is determined that the file calling amount matches the second determination rule, the target file may also be subjected to early warning processing. For example, if the second determination rule matched for performing the recycling processing on the file is that the file call amount is zero for 10 consecutive days, the second determination rule may further include that the file call amount is zero for 3 consecutive days, and the file is subjected to the early warning processing. For example, when the file call volume is zero for 3 consecutive days, the file call volume is matched with the second determination rule through judgment, and at this time, an early warning can be sent to operation and maintenance personnel, and the early warning mode can include voice early warning, pop-up window early warning, mail early warning and the like. Further, after the early warning is given out, the file call volume is continuously zero, and when the file call volume reaches zero after 10 continuous days, the file call volume is matched with the second judgment rule through further judgment, and at the moment, the target file can be recycled. It should be understood that the above examples are by way of example only and are not intended to limit the disclosed embodiments.

It should be noted that the above description related to the flow 400 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and changes to flow 400 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, other steps are added between the steps, such as a preprocessing step and a storing step.

Fig. 5 is a block diagram of a system for automatically maintaining a cluster according to some embodiments of the present disclosure. As shown in fig. 5, the system may include a first obtaining module 510, a first determining module 520, an assigning module 530, a second obtaining module 540, and a first adjusting module 550.

The first obtaining module 510 may obtain the feature parameters of at least one target file respectively.

In some embodiments, the cluster may refer to a collection of one or more servers (or containers) that may support one or more file runs, e.g., by a server in the cluster operating alone or in combination with multiple servers. Refers to model files, system programs, application programs, processes or routines, etc. that need to be distributed to the servers of the cluster and run. The characteristic parameter of the target file may be a parameter or data describing or characterizing the target file, such as a file size, a file type, a support frame, and the like. When the target file is a model file, the characteristic parameters may include a size of the model file, an algorithm type, a machine learning framework, and the like. The first obtaining module 510 may obtain the characteristic parameters of the target file by reading the stored data, calling the related interface, reading from the target file, or other methods.

In some embodiments, the first obtaining module 510 may further obtain an estimation model, which may be a trained machine learning model, for estimating clusters required for deploying the target file according to the feature parameters. The estimated models may be stored in a storage device (e.g., an on-board storage unit of the processing device or an external storage device), and the first obtaining module 510 may read the estimated models by communicating with the storage device. The first obtaining module 510 may also obtain the estimation model by calling an associated interface or other means.

The first determination module 520 may determine a cluster based on the characteristic parameters, the cluster including at least one server for supporting the target file.

In some embodiments, the determining the cluster may refer to determining the number of servers (or containers) for supporting the target file and a configuration of each server (or container), such as a CPU size and a memory size. The first determination module 520 may determine the cluster based on the feature parameter using a rule. Determining clusters using rules may be a process of matching. With a pre-set range of server configurations matching the file size, the first determination module 520 determines the configuration of the server supporting the target file by determining the range within which the size of the target file falls. The first determination module 520 may utilize an estimation model to predict the number and/or configuration of servers needed to deploy the target file in the cluster based on the characteristic parameters of the target file. The first determining module 520 may input the feature parameters of the target document into the estimation model to directly obtain the server adapted to the target document.

The distribution module 530 may distribute the at least one target file to the at least one server.

The second obtaining module 540 may obtain the operation parameters of the cluster in the operation process of the target file.

In some embodiments, the operation parameter of the cluster may refer to a real-time performance parameter of a server in the cluster when the target file is operated, and includes a central processing unit utilization rate, a memory utilization rate, a disk utilization rate, a connection number of the server, a query rate per second of the server, a scheduling request number of the target file, and the like. The second obtaining module 540 may obtain the operation parameters of the cluster server during the operation process of the target file by monitoring the operation of the cluster in real time. For example, the second obtaining module 540 may know that the utilization rate of the central processing unit of the current cluster is 20 percent, the utilization rate of the memory is 50 percent, the number of scheduling requests of the target file is 10000, and the like by monitoring the operation parameters of the cluster.

The first adjusting module 550 may perform dynamic adjustment on the cluster based on the operation parameter, where the dynamic adjustment at least includes performing capacity expansion or capacity reduction on the cluster.

In some embodiments, dynamically adjusting a cluster may refer to automatically increasing or decreasing the number of servers of the cluster based on operating parameters of the cluster. For example, the first adjusting module 550 may determine whether the server supporting the target file meets the operation requirement of the target file according to any one or any combination of the central processor utilization rate, the memory utilization rate, the disk utilization rate, the connection number of the servers, and the scheduling request number of the target file of the cluster, and if the server supporting the target file does not meet the requirement, the cluster is expanded, for example, the scheduling request number of the current target file is 5000, and the servers supporting the target file operation in the current cluster can only meet 4000 scheduling requests, and the number of the servers needs to be increased to meet the remaining scheduling requests. The running state of the cluster is judged in real time according to the running parameters of the cluster, so that the automatic and rapid capacity expansion and capacity reduction of the cluster are realized according to the change of the running state of the cluster, and the operation and maintenance efficiency of the cluster is improved. The first adjusting module 550 may also determine whether the target file is recycled to release the allocated server according to the number of scheduling requests of the target file. For example, if the number of scheduling requests for the target file is less than a predetermined number within a period of time, the first adjustment module 550 may recycle the target file.

For other descriptions of the modules, reference may be made to the flowchart section of this specification, for example, the relevant description of fig. 1 to 4.

FIG. 6 is an exemplary flow diagram of a service automation operation and maintenance method according to some embodiments described herein. In some embodiments, flow 600 may be performed by a processing device. For example, the process 600 may be stored in a storage device (e.g., an onboard storage unit of a processing device or an external storage device) in the form of a program or instructions that, when executed, may implement the process 600. As another example, the process 600 may be implemented by the service operation system 700 on a processing device. As shown in fig. 6, the process 600 may include the following steps:

step 602, obtaining characteristic parameters related to the service to be deployed. Step 602 may be performed by a third acquisition module 710.

In some embodiments, the service may refer to some functions provided for the front-end user through a background running file in an online scenario, for example, user preference recommendation, or some purposes of the service itself, for example, risk determination. The service at least comprises an online reasoning service which is used for reasoning or predicting the user behavior and providing personalized prediction content for the user. At least some of said services comprise an online reasoning service, said characteristic parameters comprising at least characteristic parameters for files supporting said service.

In some embodiments, the characteristic parameter may refer to a characteristic parameter of a file supporting the service, including a file size, a file type, a support frame, and the like. Specifically, taking the online inference service as an example, the file supporting the online inference service may be a model file, and the characteristic parameters of the model file may include a model file size, an algorithm type, a machine learning framework, and the like.

In some embodiments, the third obtaining module 710 may obtain the feature parameters by reading stored data, calling a related interface, reading from a service to be deployed, or other means.

Step 604, determining a cluster based on the characteristic parameters, and deploying the service to be deployed by using the cluster; wherein the cluster comprises at least one server for deploying the service to be deployed. Step 604 may be performed by the second determination module 720.

In some embodiments, the characteristic parameter may reflect server resources required to deploy the service, and thus the cluster may be determined based on the characteristic parameter. Reference may be made to other parts of the description regarding determining clusters based on feature parameters, for example, fig. 1 and 2 and their associated description.

Deploying the service to be deployed with the cluster may be installing the service to be deployed to at least one server of the cluster.

Step 606, obtaining the operation parameters of the cluster in the execution process of the deployed service. Step 606 may be performed by a fourth acquisition module 730.

In some embodiments, the operating parameter of the cluster may be a real-time performance parameter of the servers in the cluster. The fourth obtaining module 730 may obtain real-time operation parameters of the cluster during the operation process of the target file by monitoring the operation of the cluster in real time. For more description of the operating parameters of the cluster, reference may be made to fig. 1 and its associated description of this specification.

Step 608, dynamically adjusting the cluster deploying the service based on the operation parameter of the cluster, where the dynamic adjustment at least includes performing capacity expansion or capacity reduction on the servers in the cluster. Step 608 may be performed by the second adjustment module 740.

In some embodiments, the operation parameters of the cluster may reflect the operation state of the cluster in real time, and whether the cluster can meet the operation requirement of the service and whether the load of the server deploying the service exceeds the set operation load may be determined in real time according to the operation parameters of the cluster, so as to perform capacity expansion or capacity reduction on the cluster according to the determination result.

For more description of how to dynamically adjust the cluster based on the operating parameters of the cluster, and how to expand or reduce the capacity of the server, reference may be made to other parts of this specification, such as fig. 1, 3, and 4, and their associated descriptions.

It should be noted that the above description of the flow 600 is for illustration and description only, and does not limit the scope of the application of the present disclosure. Various modifications and changes to flow 600 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description. For example, other steps are added between the steps, such as a preprocessing step and a storing step.

FIG. 7 is a block diagram of a service automation operation and maintenance system in accordance with some embodiments of the present description. As shown in fig. 7, the system may include a third obtaining module 710, a second determining module 720, a fourth obtaining module 730, and a second adjusting module 740.

The third obtaining module 710 may obtain a feature parameter related to the service to be deployed.

In some embodiments, the service may refer to some functions provided for the front-end user through a background running file in an online scenario, for example, user preference recommendation, or some purposes of the service itself, for example, risk determination. The service at least comprises an online reasoning service which is used for reasoning or predicting the user behavior and providing personalized prediction content for the user. At least some of said services comprise an online reasoning service, said characteristic parameters comprising at least characteristic parameters for files supporting said service. The characteristic parameters may refer to characteristic parameters of a file supporting the service, including a file size, a file type, a support frame, and the like. Specifically, taking the online inference service as an example, the file supporting the online inference service may be a model file, and the characteristic parameters of the model file may include a model file size, an algorithm type, a machine learning framework, and the like. The third obtaining module 710 may obtain the feature parameters by reading stored data, calling a related interface, reading from the service to be deployed, or other means.

The second determining module 720 may determine a cluster based on the characteristic parameters, and deploy the service to be deployed by using the cluster.

In some embodiments, the second determination module 710 determines the configuration of the server supporting the target file by determining the range within which the size of the target file falls. The second determination module 710 may utilize an estimation model to predict the number and/or configuration of servers needed to deploy the target file in the cluster based on the characteristic parameters of the target file. The first determining module 520 may input the feature parameters of the target document into the estimation model to directly obtain the server adapted to the target document. In some embodiments, the second determining module 710 may package and send the target file supporting the service to a server in the cluster adapted to the target file for deployment, so as to complete the deployment of the service.

The fourth obtaining module 730 obtains the operating parameters of the cluster during the execution of the deployed service.

In some embodiments, the operation parameter of the cluster may refer to a real-time performance parameter of a server in the cluster when the server operates the target file, and the real-time performance parameter includes a central processing unit utilization rate, a memory utilization rate, a disk utilization rate of the server, a connection number of the server, a query rate of the server per second, a scheduling request number of the target file, and the like. The second obtaining module 540 may obtain the operation parameters of the cluster server during the operation process of the target file by monitoring the operation of the cluster in real time. For example, the second obtaining module 540 may know that the utilization rate of the central processing unit of the current cluster is 20 percent, the utilization rate of the memory is 50 percent, the number of scheduling requests of the target file is 10000, and the like by monitoring the operation parameters of the cluster. The fourth obtaining module 730 may obtain real-time operation parameters of the cluster during the operation process of the target file by monitoring the operation of the cluster in real time.

The second adjusting module 740 dynamically adjusts the cluster deploying the service based on the operating parameter of the cluster, where the dynamic adjustment at least includes capacity expansion or capacity reduction of the servers in the cluster.

In some embodiments, dynamically adjusting a cluster may refer to automatically increasing or decreasing the number of servers of the cluster based on operating parameters of the cluster. For example, the second adjusting module 740 may determine whether the server implementing the service meets the operation requirement according to any one or any combination of the central processor utilization rate, the memory utilization rate, the disk utilization rate, the number of connections of the servers, and the number of scheduling requests of the service of the cluster, and when the requirement is not met, perform capacity expansion on the cluster, for example, the number of scheduling requests of the current service is 5000, and the number of servers implementing the service in the current cluster only can meet 4000 scheduling requests, so that the number of servers needs to be increased to meet the remaining scheduling requests. The running state of the cluster is judged in real time according to the running parameters of the cluster, so that the automatic and rapid capacity expansion and capacity reduction of the cluster are realized according to the change of the running state of the cluster, and the operation and maintenance efficiency of the cluster is improved. The second adjusting module 740 may also determine whether the service is offline to release the allocated server according to the number of scheduling requests of the service. For example, if the number of served scheduling requests in a period of time is less than a predetermined number, the second adjusting module 740 may take the service offline.

For further description of modules, reference may be made to the flow chart portion of this specification, e.g., the associated description of fig. 1-4 and 6.

It should be understood that the systems and their modules shown in fig. 5 and 7 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the system and the modules thereof for performing automatic operation and maintenance on the cluster is only for convenience of description, and the description is not limited to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, for example, the first obtaining module 510, the first determining module 520, the allocating module 530, the second obtaining module 540 and the first adjusting module 550 disclosed in fig. 5 may be different modules in one system, or may be one module to implement the functions of two or more modules described above. For example, the second obtaining module 540 and the first adjusting module 550 may be two modules, or one module may have both obtaining and adjusting functions. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: the number of servers for supporting the target file and the size of the memory are predicted by using a pre-trained estimation model based on the characteristic parameters of the target file before the target file is deployed to the servers, so that the operation on the servers can be well supported after the target file is deployed, and the problem that the number of the servers is too large or insufficient after the target file is deployed for the first time is solved. When the target file runs after being deployed, the cluster is dynamically adjusted by detecting the running parameters of the cluster, so that the operation and maintenance capability is improved. When the target file runs, early warning or offline processing is carried out on the target file by detecting the scheduling access flow, whether the target file can be offline recycled is automatically judged, the problem of resource recycling in the later period is solved, and then automatic operation and maintenance of the full life cycle of the target file are realized. It is to be noted that different embodiments may produce different advantages, and in different embodiments, the advantages that may be produced may be any one or combination of the above, or any other advantages that may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered as illustrative only and not limiting, of the present invention. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such alterations, modifications, and improvements are intended to be suggested in this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.

Also, the description uses specific words to describe embodiments of the specification. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, certain features, structures, or characteristics may be combined as suitable in one or more embodiments of the specification.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While certain presently contemplated useful embodiments of the invention have been discussed in the foregoing disclosure by way of various examples, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein described. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Where numerals describing the number of components, attributes or the like are used in some embodiments, it is to be understood that such numerals used in the description of the embodiments are modified in some instances by the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments described herein. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method for automatic operation and maintenance of a cluster for supporting one or more file runs, wherein the method comprises:

respectively acquiring characteristic parameters of at least one target file;

determining a cluster based on the characteristic parameters, the cluster comprising at least one server for supporting the target file; wherein determining clusters based on the feature parameters comprises:

determining the memory size of a single server and the query rate per second which can be reached by the single server based on the characteristic parameters and the estimation model;

determining the cluster based on the memory size, the query rate per second, and an expected query rate per second;

distributing the at least one target file to the at least one server;

acquiring the operation parameters of the cluster in the operation process of the target file;

and dynamically adjusting the cluster based on the operation parameters, wherein the dynamic adjustment at least comprises the expansion or contraction of the cluster.

2. The method of claim 1, wherein the object file comprises at least a model file, and the feature parameters comprise at least:

model file size, algorithm type, and machine learning framework.

3. The method of claim 1, wherein the operational parameters include server resource utilization; the dynamically adjusting the cluster based on the operating parameters of the cluster includes:

acquiring a first judgment rule, and judging whether the utilization rate of the server resource is matched with the first judgment rule;

and if so, dynamically adjusting the cluster according to a preset adjustment rule.

4. The method of claim 3, wherein the server resource utilization comprises central processor utilization and memory utilization.

5. The method of claim 3, wherein the dynamically adjusting the cluster according to a preset adjustment rule comprises:

acquiring the number of the at least one server and the system parameters of each server;

based on the number and the system parameters, carrying out capacity expansion/capacity reduction on the servers according to a preset proportion, wherein the capacity expansion/capacity reduction at least comprises increasing/decreasing the number of the servers supporting the target file and increasing/decreasing the memory usage of the servers supporting the target file.

6. The method of claim 1, wherein the method further comprises:

and recycling at least one target file based on the operation parameters of the cluster.

7. The method of claim 6, wherein the operational parameters further include a file call amount; the recycling processing of at least one target file based on the operation parameters of the cluster comprises the following steps:

acquiring a second judgment rule, and judging whether the file calling amount is matched with the second judgment rule;

and if so, recycling the target file.

8. A service automation operation and maintenance method comprises the following steps:

acquiring characteristic parameters related to a service to be deployed, wherein the service at least comprises an online reasoning service, and the characteristic parameters at least comprise characteristic parameters of a file for supporting the service;

determining a cluster based on the characteristic parameters, and deploying the service to be deployed by utilizing the cluster; wherein the cluster comprises at least one server for deploying the service to be deployed; determining clusters based on the feature parameters, including:

acquiring the operation parameters of the cluster in the executing process of the deployed service;

and dynamically adjusting the cluster for deploying the service based on the operating parameters of the cluster, wherein the dynamic adjustment at least comprises the capacity expansion or capacity reduction of the servers in the cluster.

9. A system for automated operation and maintenance of a cluster for supporting one or more file runs, wherein the system comprises:

the first acquisition module is used for respectively acquiring the characteristic parameters of at least one target file;

a first determination module to determine a cluster based on the characteristic parameters, the cluster including at least one server to support the target file; wherein determining clusters based on the feature parameters comprises:

an allocation module for allocating the at least one target file to the at least one server;

the second acquisition module is used for acquiring the operation parameters of the cluster in the operation process of the target file;

and the first adjusting module is used for dynamically adjusting the cluster based on the operating parameters, wherein the dynamic adjustment at least comprises the expansion or contraction of the cluster.

10. The system of any of claims 9, wherein the object files include at least model files, and the feature parameters include at least:

model file size, algorithm type, and machine learning framework.

11. The system of claim 9, wherein the operational parameters include server resource utilization; in order to dynamically adjust the cluster based on the operating parameters of the cluster, the first adjustment module is further configured to:

12. The system of claim 11, wherein the server resource utilization includes central processor utilization and memory utilization.

13. The system of claim 11, wherein to dynamically adjust the cluster of deployed services according to preset adjustment rules:

the second obtaining module is further configured to obtain the number of the at least one server and a system parameter of each server;

the first adjusting module is further configured to perform capacity expansion/capacity reduction on the servers according to a preset ratio based on the number and the system parameter, where the capacity expansion/capacity reduction at least includes increasing/decreasing the number of servers supporting the target file and increasing/decreasing the memory usage amount of the servers supporting the target file.

14. The system of claim 9, wherein the system further comprises:

and the recovery module is used for recovering at least one target file based on the operation parameters of the cluster.

15. The system of claim 14, wherein the operational parameters further include a file call amount; in order to perform a recycling process on at least one target file based on the operating parameters of the cluster, the recycling module is further configured to:

and if so, recycling the target file.

16. A service automation operation and maintenance system, comprising:

a third obtaining module, configured to obtain feature parameters related to a service to be deployed, where the service at least includes an online inference service, and the feature parameters at least include feature parameters of a file used to support the service;

the second determining module is used for determining a cluster based on the characteristic parameters and deploying the service to be deployed by utilizing the cluster; wherein the cluster comprises at least one server for deploying the service to be deployed; determining clusters based on the feature parameters, including:

a fourth obtaining module, configured to obtain an operation parameter of the cluster in an execution process of the deployed service;

and a second adjusting module, configured to dynamically adjust the cluster that deploys the service based on an operating parameter of the cluster, where the dynamic adjustment at least includes performing capacity expansion or capacity reduction on the servers in the cluster.

17. An apparatus for automated operation and maintenance of a cluster, comprising at least one storage medium and at least one processor, the at least one storage medium configured to store computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of any of claims 1-7.

18. A service automation operation and maintenance device comprising at least one storage medium and at least one processor, the at least one storage medium for storing computer instructions; the at least one processor is configured to execute the computer instructions to implement the method of claim 8.