CN111240749B - Suspending control method, device, equipment and storage medium of instance in cluster system - Google Patents

Suspending control method, device, equipment and storage medium of instance in cluster system Download PDF

Info

Publication number
CN111240749B
CN111240749B CN201811436447.8A CN201811436447A CN111240749B CN 111240749 B CN111240749 B CN 111240749B CN 201811436447 A CN201811436447 A CN 201811436447A CN 111240749 B CN111240749 B CN 111240749B
Authority
CN
China
Prior art keywords
instance
threads
control module
request signals
thread control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811436447.8A
Other languages
Chinese (zh)
Other versions
CN111240749A (en
Inventor
吁玲
王璇
竺士杰
任赣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Group Zhejiang Co Ltd
Priority to CN201811436447.8A priority Critical patent/CN111240749B/en
Publication of CN111240749A publication Critical patent/CN111240749A/en
Application granted granted Critical
Publication of CN111240749B publication Critical patent/CN111240749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4418Suspend and resume; Hibernate and awake
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a method and a device for controlling suspension of an instance in a cluster system, wherein the method comprises the following steps: in an instance to be subjected to high availability test in a cluster system, deploying a thread control module for suspending the threads of the instance; and sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals. The embodiment of the invention ensures the test accuracy of the high-availability test when the instance is suspended.

Description

Suspending control method, device, equipment and storage medium of instance in cluster system
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a suspension control method and device for an instance in a cluster system.
Background
With the rapid development of information technology, enterprises have placed higher and higher demands on service availability. For enterprises, service unavailability caused by downtime of the system or other reasons directly affects the operating income, image and customer satisfaction of the enterprises, and even causes legal disputes. Current clusters have become a mainstream structure in the server industry, and although the reliability of single hardware is increasingly improved, due to the complexity of the environment and uncertainty of artifacts caused by the increase of the cluster size, the cluster system still shows frequent faults, which makes the problem of high availability performance of the system especially necessary from the viewpoint of software. By availability is meant the percentage of time that the system is operating normally without shutdown. Typically, high availability tests are made during a period of time after code updates before and after system entry.
Among them, applications are typically deployed on middleware instances, and high availability testing of middleware applications is categorized into two types, instance stopping (down) and instance suspending (hang). Instance down is a state in which the instance is stopped. Instance hang refers to a state where the middleware instance port is listening, the existence of a process can be detected, but in fact the instance has no traffic response capability, such as a state where a process memory overflows and is not responsive, a state where a process thread pool is full of no response, etc.
A typical high availability architecture for existing middleware includes several middleware servers and a load balancer, with different service requests being load balanced by the load balancer to the middleware servers for processing. However, during the operation of the system, a state of a certain instance hang will often occur. When the system has a hang condition, the load balancer needs to have a mechanism to identify the abnormal instance, forward the service request to the normal instance, and make the service continuously available, so that it is especially necessary to perform high availability test on the instance hang.
The existing high availability test method for the middleware instance hang comprises the following steps: an oversized bulk port call request (via Telnet instructions) is sent to the instance that needs hang in hope of hang the instance. However, through the fact test, since the method sends signals with ultra-large magnitude, commands sent in unit time are extremely large, if the commands are sent too frequently, a large amount of resources of a host are consumed, so that the maximum connection number of the TCP/IP of the host is consumed, the performance of the host is abnormal, the performance of other examples of the host is affected, and the effect of high-availability testing is affected. In addition, the method does not call the internal method in the instance, so the instance cannot be completely halved, the halved result has randomness, namely the instance cannot be halved finally through actual inspection, and the result has unpredictability.
In summary, the prior art has the problem of low accuracy of the test results when performing high availability tests.
Disclosure of Invention
The embodiment of the invention provides a suspension control method and device for an instance in a cluster system, which are used for solving the problem of lower accuracy in the prior art when the instance is subjected to high-availability test.
In order to solve the above problem, in a first aspect, an embodiment of the present invention provides a suspension control method for an instance in a cluster system, where the method includes:
in an instance to be subjected to high availability test in a cluster system, deploying a thread control module for suspending the threads of the instance;
and sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals.
In a second aspect, an embodiment of the present invention provides a suspension control apparatus for an instance in a cluster system, where the apparatus includes:
the deployment module is used for deploying a thread control module for suspending the threads of the examples in the examples to be tested in the cluster system;
and the signal sending module is used for sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module can carry out suspending operation on all threads of the instance according to the simulation request signals.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements steps of a suspension control method for an instance in the cluster system when the computer program is executed.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a suspension control method of an instance in a clustered system as described.
According to the method and the device for controlling the suspension of the instance in the cluster system, the thread control module is deployed in the middleware instance to be subjected to high availability test, and then the thread control module is triggered to control all threads of the instance by sending a plurality of simulation request signals matched with the number of threads of the instance, so that all threads are suspended, all threads can not respond to the request any more, the complete suspension of the instance is realized, the certainty of the suspension result of the instance is ensured, the number of simulation request signals is controlled, the problem that the performance of other instances of the host is affected due to abnormal host performance caused when an ultra-large batch port call request is sent to the instance to be suspended is avoided, and the test accuracy of the high availability test when the instance is suspended is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart illustrating steps of a method for suspension control of an instance in a clustered system in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of an example high availability architecture in a clustered system in accordance with an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a suspension control apparatus for an example in a cluster system in accordance with one embodiment of the invention;
fig. 4 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a flowchart of a step of a suspension control method of an example in a cluster system according to an embodiment of the present invention is shown, where the method includes the following steps:
step 101: in an instance to be tested for high availability in a clustered system, a thread control module for suspending threads of the instance is deployed.
In this step, specifically, when a high availability test needs to be performed on the cluster system with the suspended instance, a thread control module for suspending the thread of the instance may be deployed in the instance to be subjected to the high availability test in the cluster system.
The method can be applied to an architecture with high availability of an instance in a cluster system, as shown in fig. 2, and the architecture includes a signal transmitter 21, where the signal transmitter 21 can be used to perform the step, that is, to deploy a thread control module 22 in an instance to be tested with high availability in the cluster system, so as to perform a suspension operation on a thread of the instance.
Specifically, the signal transmitter 21 may include an automated deployment module 211 and a signal transmission module 212, where after the architecture receives a task of performing a high availability test, the automated deployment module 211 of the signal transmitter 21 may be started to transmit and automatically deploy a deployment package of the thread control module 22 to an instance to be subjected to the high availability test. The thread control module 22 may be configured to perform a suspension operation on a thread of an instance, thereby implementing suspension control of the instance.
Step 102: and sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module can carry out suspending operation on all threads of the instance according to the simulation request signals.
In this step, specifically, after the thread control module is deployed in the instance, a plurality of analog request signals matched with the number of threads of the instance may be sent to the thread control module, so that the thread control module carries out a suspension operation on all threads of the instance according to the plurality of analog request signals, thereby implementing complete suspension of the instance.
Therefore, based on the fact that the sending number of the simulation request signals is matched with the number of threads of the examples, accuracy of example suspension results and effective control of high-availability test results are achieved, the problem that the examples cannot be suspended completely in the prior art is avoided, when an ultra-large batch of port call requests are sent to the examples to be suspended to suspend the examples in the prior art, the performance of other examples of the host is affected due to the fact that a large amount of host resources are consumed due to the fact that the sending signals are more in unit time, and then the problem that the results are inaccurate in the high-availability test is solved, and availability and stability of the cluster system are improved.
Specifically, the analog request signal may be a request signal for analog page access.
In addition, in particular, the thread control module 22 may include a signal receiving unit 221 and a thread suspending unit 222. Wherein the signal receiving unit 221 is configured to receive a plurality of analog request signals transmitted by the signal transmitter 21, and the signal receiving unit 221 may use a page component program in WEB programming, which exposes a borrow to the signal transmitter 21 for access, and in addition, the signal transmitter 21 is also configured with a program for accessing the interface, and by concurrent program statements, batch access, that is, batch transmission of a plurality of analog request signals is achieved. In addition, the thread suspension unit 222 may implement a function of scheduling thread suspension using a multithreading programming technique, and the thread control module 22 may suspend one thread whenever the signal transmitter 21 transmits an analog request signal corresponding to one order of magnitude to the thread control module 22. When a plurality of analog request signals matching the number of threads of an instance are transmitted in batch, the thread suspension unit 222 can call the thread control program to perform a thread suspension operation to fully suspend all threads included in the instance, at which time the load balancing module 23 can balance analog request signals subsequently transmitted to the instance in a suspended state to other instances to test high availability of the system.
In this way, in this embodiment, in an instance to be tested for high availability in the cluster system, a thread control module for suspending threads of the instance is deployed, and a plurality of analog request signals matched with the number of threads of the instance are sent to the thread control module, so that the thread control module performs suspending operations on all threads of the instance according to the plurality of analog request signals. Based on the fact that the sending number of the simulation request signals is matched with the thread number of the examples, accuracy of example suspension results and effective control of high-availability test results are achieved, the problem that the examples cannot be suspended completely in the prior art is avoided, when an ultra-large batch of port calling requests are sent to the suspension examples to suspend the examples in the prior art, the performance of other examples of the host is affected due to the fact that a large amount of host resources are consumed due to the fact that the sending signals are more in unit time, and then the problem of inaccurate results in high-availability test is caused, and availability and stability of the cluster system are improved.
Further, when a plurality of analog request signals matching the number of threads of an instance are sent to the thread control module, a target amount of analog request signals required to suspend all threads included in the instance may be acquired; the target amount of analog request signals are then sent to the thread control module, so that the thread control module can carry out suspending operation on all threads in the instance according to the target amount of analog request signals.
When the target quantity of the simulation request signal required by suspending all threads included in the instance is obtained, the transmission quantity of the simulation request signal under each time slice in a preset period and the number of suspended threads in the instance can be used as training samples, the corresponding relation between the suspended number of the threads in the instance and the transmission quantity of the simulation request signal is trained and learned, and the corresponding relation between the suspended number of the threads and the transmission quantity of the simulation request signal is obtained through training; and then determining the target quantity of the simulation request signal required for suspending all threads included in the instance according to the corresponding relation between the number of suspended threads obtained through training and the transmission quantity of the simulation request signal.
Specifically, the instances (including those within the docker) typically set a maximum thread count and a minimum thread count. Assuming the maximum number of threads of an instance, namely the replacement pins of a thread pool can be flicked without limitation, when the concurrency of the service is large, the total number of execution threads can be quickly increased, and the problem of instance memory overflow is easily caused; in addition, if the total number of threads is not set, when the traffic volume suddenly increases, the thread pool may not be able to make elastic adjustment in time, and thus a batch of traffic fails, so the number of threads in the example in this embodiment is a preset value. For example 500.
In addition, specifically, training learning is performed by taking the analog signal request amount at each time slice within a preset period and the number of actual suspended threads in the instance as training samples. In addition, because the Logistic regression (Logistic) algorithm has the characteristics of easy convergence and quick obtaining of a global optimal solution, the Logistic algorithm can be adopted to train a training sample. The method comprises the steps that if the number of suspended threads and the sending quantity of the simulation request signals in an instance meet the higher-order distribution Bernoulli distribution (Bernoulli), a gradient descent method can be adopted for iteration of a cost function, when gradient convergence occurs, an algorithm is stopped, at the moment, a corresponding relation between the number of suspended threads and the sending quantity of the simulation request signals can be obtained, and according to the corresponding relation, the target quantity of the simulation request signals required for suspending all threads included in the instance, namely the target quantity of the simulation request signals matched with the number of the threads of the instance, can be determined.
In this way, the corresponding relation between the thread suspension number and the transmission quantity of the simulation request signals is obtained through training and learning, and then the target quantity of the simulation request signals matched with the thread number of the examples is obtained, so that the accuracy of the target quantity of the obtained simulation request signals is ensured, the examples can be completely suspended when the simulation request signals with the target quantity are transmitted, the situation that the examples cannot be completely suspended due to the fact that the transmission quantity of the simulation request signals is too small is avoided, and the performance influence of the host where the examples are located due to the fact that the transmission quantity of the simulation request signals is large is avoided.
Further, after sending a plurality of analog request signals matching the number of threads of an instance to the thread control module, it may also be detected whether all threads included in the instance are in a suspended state; when all threads included in the instance are detected to be in a suspended state, it is detected whether the clustered system is in a high availability state.
After detecting whether the cluster system is in the high available state, the release operation can be further performed on the thread in the suspended state in the example, so that the thread in the suspended state is switched to the non-suspended state.
Specifically, when detecting whether all threads included in an instance are in a suspended state, a dump log for collecting the thread state on a host computer where the instance is located can be obtained, and whether the threads are suspended is judged according to all the thread state information recorded by the dump log.
Of course, it should be noted that, if it is detected that there are threads that are not suspended in all the threads included in the instance, the number of analog request signals sent to the thread control module may be increased until the threads in the instance are completely in the suspended state.
Specifically, the architecture of the cluster system where the instances are highly available may further include a thread management module 24, where the thread management module 24 includes a thread detection unit 241 and a thread release unit 242. In this embodiment, the thread detecting unit 241 may detect whether all the threads included in the instance are in the suspended state, and perform a release operation on the threads in the suspended state in the instance by the thread releasing unit 242, so that the threads in the suspended state are switched to the non-suspended state.
In addition, specifically, the high availability test is to test the failure transfer mechanism of the main test case, that is, the high availability of the cluster system can be verified as long as the tested case is ensured to be switched mutually when the failure occurs. Therefore, when detecting whether the cluster system is in a high availability state, whether the service success rate of the cluster system when all the threads included in the instance are in the suspension state is the same as the service success rate of the cluster system when the instance is not suspended or not can be detected, and if so, the cluster system is in the high availability state. Of course, if the service success rate of the cluster system when all the threads included in the instance are in the suspended state is smaller than the service success rate of the cluster system when the instance is not suspended, the log of the instance needs to be obtained, whether the analog request signal is still continuously received or whether an error reporting exists is determined, and if the error reporting exists, the defect of a high-availability mechanism of the cluster system is indicated.
In this way, in the example to be tested in high availability in the cluster system, the thread control module for suspending the threads of the example is deployed, and a plurality of simulation request signals matched with the number of the threads of the example are sent to the thread control module, so that the thread control module suspends all the threads of the example according to the plurality of simulation request signals, and based on the fact that the sending number of the simulation request signals is matched with the number of the threads of the example, the accuracy of the suspending result of the example and the effective control of the high availability test result are realized, the problem that the example cannot be suspended completely in the prior art is avoided, and the problem that a great amount of host resources are consumed when the excessive batch port call request is sent to suspend the example in the prior art is avoided, and the inaccurate result is caused when the high availability test is performed.
As shown in fig. 3, a block diagram of a suspension control apparatus of an example in a cluster system according to an embodiment of the present invention includes:
a deployment module 301, configured to deploy, in an instance to be tested for high availability in the clustered system, a thread control module for suspending a thread of the instance;
and the signal sending module 302 is configured to send a plurality of analog request signals matched with the number of threads of an instance to the thread control module, so that the thread control module carries out a suspension operation on all threads of the instance according to the plurality of analog request signals.
Optionally, the illustrated signaling module 302 includes:
an obtaining unit, configured to obtain a target amount of an analog request signal required to suspend all threads included in the instance;
and the sending unit is used for sending the simulation request signal of the target quantity to the thread control module so that the thread control module can carry out suspending operation on all threads in the example according to the simulation request signal of the target quantity.
Optionally, the acquiring unit is configured to use the transmission amount of the analog request signal and the number of suspended threads in the instance in each time slice in a preset period as a training sample, perform training learning on a corresponding relationship between the number of suspended threads in the instance and the transmission amount of the analog request signal, and perform training to obtain a corresponding relationship between the number of suspended threads and the transmission amount of the analog request signal; and determining the target quantity of the simulation request signal required for suspending all threads included in the instance according to the corresponding relation between the number of suspended threads obtained through training and the transmission quantity of the simulation request signal.
Optionally, the apparatus further comprises:
the thread management module is used for detecting whether all threads included in the instance are in a suspended state; when all threads included in the instance are detected to be in a suspended state, it is detected whether the clustered system is in a high availability state.
Optionally, the thread management module is further configured to perform a release operation on the thread in the suspended state in the instance, so that the thread in the suspended state is switched to the non-suspended state.
According to the suspension control device for the instance in the cluster system, the thread control module for suspending the threads of the instance is deployed in the instance to be subjected to the high availability test in the cluster system, and a plurality of simulation request signals matched with the number of the threads of the instance are sent to the thread control module, so that the thread control module suspends all the threads of the instance according to the simulation request signals, and the accuracy of the suspending result of the instance and the effective control of the high availability test result are realized based on the fact that the sending number of the simulation request signals is matched with the number of the threads of the instance.
In addition, as shown in fig. 4, an entity structure diagram of an electronic device according to an embodiment of the present invention may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. The processor 410 may invoke a computer program stored in the memory 430 and executable on the processor 410 to perform the methods provided by the above embodiments, including, for example: in an instance to be subjected to high availability test in a cluster system, deploying a thread control module for suspending the threads of the instance; and sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided by the above embodiments, for example, comprising: in an instance to be subjected to high availability test in a cluster system, deploying a thread control module for suspending the threads of the instance; and sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for controlling suspension of an instance in a clustered system, the method comprising:
in an instance to be subjected to high availability test in a cluster system, deploying a thread control module for suspending the threads of the instance;
sending a plurality of simulation request signals matched with the number of threads of an instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals;
wherein the sending, to the thread control module, a plurality of analog request signals that match the number of threads of an instance includes:
obtaining a target amount of analog request signals required for suspending all threads included in the instance;
and sending the simulation request signal of the target quantity to the thread control module so that the thread control module can carry out suspending operation on all threads in the instance according to the simulation request signal of the target quantity.
2. The method of claim 1, wherein the obtaining a target amount of analog request signals required to suspend all threads included in the instance comprises:
taking the transmission quantity of the simulation request signals and the number of suspended threads in the example in each time slice within a preset period as training samples, training and learning the corresponding relation between the suspension quantity of the threads in the example and the transmission quantity of the simulation request signals, and training to obtain the corresponding relation between the suspension quantity of the threads and the transmission quantity of the simulation request signals;
and determining the target quantity of the simulation request signal required for suspending all threads included in the instance according to the corresponding relation between the number of suspended threads obtained through training and the transmission quantity of the simulation request signal.
3. The method of claim 1, wherein after sending a plurality of analog request signals to the thread control module that match the number of threads of an instance, the method further comprises:
detecting whether all threads included in the instance are in a suspended state;
when all threads included in the instance are detected to be in a suspended state, it is detected whether the clustered system is in a high availability state.
4. A method according to claim 3, wherein after said detecting whether the cluster system is in a high availability state, the method further comprises:
releasing the thread in the suspended state in the example to switch the thread in the suspended state to the non-suspended state.
5. A suspension control device for an instance in a clustered system, said device comprising:
the deployment module is used for deploying a thread control module for suspending the threads of the examples in the examples to be tested in the cluster system;
the signal sending module is used for sending a plurality of simulation request signals matched with the number of threads of the instance to the thread control module so that the thread control module carries out suspending operation on all threads of the instance according to the simulation request signals;
the sending, to the thread control module, a plurality of analog request signals matching the number of threads of an instance, including:
an obtaining unit, configured to obtain a target amount of an analog request signal required to suspend all threads included in the instance;
and the sending unit is used for sending the simulation request signal of the target quantity to the thread control module so that the thread control module can carry out suspending operation on all threads in the example according to the simulation request signal of the target quantity.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the suspension control method of an instance in a cluster system as claimed in any one of claims 1 to 4 when the computer program is executed.
7. A non-transitory computer readable storage medium, having stored thereon a computer program, which when executed by a processor, implements the steps of the suspension control method of an instance in a clustered system according to any of claims 1 to 4.
CN201811436447.8A 2018-11-28 2018-11-28 Suspending control method, device, equipment and storage medium of instance in cluster system Active CN111240749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811436447.8A CN111240749B (en) 2018-11-28 2018-11-28 Suspending control method, device, equipment and storage medium of instance in cluster system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811436447.8A CN111240749B (en) 2018-11-28 2018-11-28 Suspending control method, device, equipment and storage medium of instance in cluster system

Publications (2)

Publication Number Publication Date
CN111240749A CN111240749A (en) 2020-06-05
CN111240749B true CN111240749B (en) 2023-07-21

Family

ID=70874025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811436447.8A Active CN111240749B (en) 2018-11-28 2018-11-28 Suspending control method, device, equipment and storage medium of instance in cluster system

Country Status (1)

Country Link
CN (1) CN111240749B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07219789A (en) * 1994-01-27 1995-08-18 Internatl Business Mach Corp <Ibm> Method for processing of external event in plurality of thread systems
CN101442437A (en) * 2008-10-31 2009-05-27 金蝶软件(中国)有限公司 Method, system and equipment for implementing high availability
CN102207894A (en) * 2011-05-25 2011-10-05 盛乐信息技术(上海)有限公司 Keyboard filter and method for waking up no-response operation system
CN102411513A (en) * 2011-08-10 2012-04-11 复旦大学 Garbage collection method for mixed mode execution engine
CN103455423A (en) * 2013-09-03 2013-12-18 浪潮(北京)电子信息产业有限公司 Software automatic testing device and system based on cluster framework
CN103744724A (en) * 2014-02-19 2014-04-23 互联网域名***北京市工程研究中心有限公司 Timed task clustering method and device thereof
CN103957246A (en) * 2014-04-22 2014-07-30 广州杰赛科技股份有限公司 Dynamic load balancing method and system based on tenant sensing
CN104205043A (en) * 2012-02-22 2014-12-10 惠普发展公司,有限责任合伙企业 Hiding logical processors from an operating system on a computer
CN105183591A (en) * 2015-09-07 2015-12-23 浪潮(北京)电子信息产业有限公司 High-availability cluster implementation method and system
CN206164554U (en) * 2016-08-31 2017-05-10 广州唯品会信息科技有限公司 Business information processing system
CN107577525A (en) * 2017-08-22 2018-01-12 努比亚技术有限公司 A kind of method, apparatus and computer-readable recording medium for creating concurrent thread
CN107810488A (en) * 2017-08-11 2018-03-16 深圳前海达闼云端智能科技有限公司 A kind of method of state management of virtual machine, device and intelligent terminal
CN107832146A (en) * 2017-10-27 2018-03-23 北京计算机技术及应用研究所 Thread pool task processing method in highly available cluster system
CN108833131A (en) * 2018-04-25 2018-11-16 北京百度网讯科技有限公司 System, method, equipment and the computer storage medium of distributed data base cloud service

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187991A1 (en) * 2002-03-08 2003-10-02 Agile Software Corporation System and method for facilitating communication between network browsers and process instances
US7711721B2 (en) * 2004-09-01 2010-05-04 International Business Machines Corporation Apparatus, system, and method for suspending a request during file server serialization reinitialization
US8429657B2 (en) * 2008-04-28 2013-04-23 Oracle International Corporation Global avoidance of hang states via priority inheritance in multi-node computing system
US7996722B2 (en) * 2009-01-02 2011-08-09 International Business Machines Corporation Method for debugging a hang condition in a process without affecting the process state
US8327336B2 (en) * 2009-03-18 2012-12-04 International Business Machines Corporation Enhanced thread stepping
US20140282564A1 (en) * 2013-03-15 2014-09-18 Eli Almog Thread-suspending execution barrier
US10101981B2 (en) * 2015-05-08 2018-10-16 Citrix Systems, Inc. Auto discovery and configuration of services in a load balancing appliance
US10574741B2 (en) * 2016-04-18 2020-02-25 Nokia Technologies Oy Multi-level load balancing
US10191808B2 (en) * 2016-08-04 2019-01-29 Qualcomm Incorporated Systems and methods for storing, maintaining, and accessing objects in storage system clusters

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07219789A (en) * 1994-01-27 1995-08-18 Internatl Business Mach Corp <Ibm> Method for processing of external event in plurality of thread systems
CN101442437A (en) * 2008-10-31 2009-05-27 金蝶软件(中国)有限公司 Method, system and equipment for implementing high availability
CN102207894A (en) * 2011-05-25 2011-10-05 盛乐信息技术(上海)有限公司 Keyboard filter and method for waking up no-response operation system
CN102411513A (en) * 2011-08-10 2012-04-11 复旦大学 Garbage collection method for mixed mode execution engine
CN104205043A (en) * 2012-02-22 2014-12-10 惠普发展公司,有限责任合伙企业 Hiding logical processors from an operating system on a computer
CN103455423A (en) * 2013-09-03 2013-12-18 浪潮(北京)电子信息产业有限公司 Software automatic testing device and system based on cluster framework
CN103744724A (en) * 2014-02-19 2014-04-23 互联网域名***北京市工程研究中心有限公司 Timed task clustering method and device thereof
CN103957246A (en) * 2014-04-22 2014-07-30 广州杰赛科技股份有限公司 Dynamic load balancing method and system based on tenant sensing
CN105183591A (en) * 2015-09-07 2015-12-23 浪潮(北京)电子信息产业有限公司 High-availability cluster implementation method and system
CN206164554U (en) * 2016-08-31 2017-05-10 广州唯品会信息科技有限公司 Business information processing system
CN107810488A (en) * 2017-08-11 2018-03-16 深圳前海达闼云端智能科技有限公司 A kind of method of state management of virtual machine, device and intelligent terminal
CN107577525A (en) * 2017-08-22 2018-01-12 努比亚技术有限公司 A kind of method, apparatus and computer-readable recording medium for creating concurrent thread
CN107832146A (en) * 2017-10-27 2018-03-23 北京计算机技术及应用研究所 Thread pool task processing method in highly available cluster system
CN108833131A (en) * 2018-04-25 2018-11-16 北京百度网讯科技有限公司 System, method, equipment and the computer storage medium of distributed data base cloud service

Also Published As

Publication number Publication date
CN111240749A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN110798375B (en) Monitoring method, system and terminal equipment for enhancing high availability of container cluster
CN109586952B (en) Server capacity expansion method and device
US10855791B2 (en) Clustered storage system path quiescence analysis
JP6788178B2 (en) Setting support program, setting support method and setting support device
US8448025B2 (en) Fault analysis apparatus, fault analysis method, and recording medium
US20140250334A1 (en) Detection apparatus and detection method
CN110995851B (en) Message processing method, device, storage medium and equipment
US10523550B2 (en) Scout functions
CN111159029B (en) Automated testing method, apparatus, electronic device and computer readable storage medium
CN110990289B (en) Method and device for automatically submitting bug, electronic equipment and storage medium
CN110618853B (en) Detection method, device and equipment for zombie container
US10860411B2 (en) Automatically detecting time-of-fault bugs in cloud systems
CN111240749B (en) Suspending control method, device, equipment and storage medium of instance in cluster system
CN114118991A (en) Third-party system monitoring system, method, device, equipment and storage medium
CN111078480B (en) Exception recovery method and server
CN109726124B (en) Test system, test method, management device, test device and computing equipment
CN115712521A (en) Cluster node fault processing method, system and medium
CN112860509A (en) Dial testing alarm method and device
CN110321261B (en) Monitoring system and monitoring method
CN109672573B (en) Configuration file deployment method, configuration file determination method, server and storage medium
US9176806B2 (en) Computer and memory inspection method
CN112068935A (en) Method, device and equipment for monitoring deployment of kubernets program
CN112286797B (en) Service monitoring method and device, electronic equipment and storage medium
CN114920020B (en) Feeding control method of hopper equipment, hopper equipment and readable storage medium
US20240159812A1 (en) Method for monitoring in a distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant