CN112749042A - Application running method and device - Google Patents
Application running method and device Download PDFInfo
- Publication number
- CN112749042A CN112749042A CN201911052617.7A CN201911052617A CN112749042A CN 112749042 A CN112749042 A CN 112749042A CN 201911052617 A CN201911052617 A CN 201911052617A CN 112749042 A CN112749042 A CN 112749042A
- Authority
- CN
- China
- Prior art keywords
- peripheral component
- computing process
- killing
- disk
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 230000002093 peripheral effect Effects 0.000 claims abstract description 95
- 230000008569 process Effects 0.000 claims abstract description 63
- 230000002159 abnormal effect Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 18
- 239000000523 sample Substances 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 19
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000003638 chemical reducing agent Substances 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/1438—Restarting or rejuvenating
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Retry When Errors Occur (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses an application running method and device, and relates to the technical field of computers. A specific implementation mode of the method comprises the steps of receiving a peripheral component restarting event, and writing information registered by a peripheral component into a disk through a mount file; and determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component. Therefore, the method and the device can solve the problem that the peripheral component is abnormal to cause the failure of the whole application in the prior art.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an application running method and device.
Background
Spark is a new generation of distributed memory computing framework, and is a top-level project of Apache open source, so that the method can be better suitable for data mining and machine learning algorithms, and the development efficiency is greatly improved.
And Spark on kubernets is a container-based scheduling scheme, and Spark can be deployed in a mixed manner with other container services based on the strong container management and arrangement characteristics of kubernets. The scheme is relatively new in technology accumulation and also faces a series of problems, wherein the external buffer service is a peripheral component of Spark, the recovery method is a critical problem and determines the stability and performance of Spark on kubernets, and the external buffer service is a peripheral component of Spark and is responsible for storing Spark intermediate data.
The Kubernetes is a container cluster management system, provides a series of complete functions such as deployment and operation, resource scheduling, load balancing, service discovery and dynamic expansion for containerized applications, and improves convenience of large-scale container cluster management.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
if the external shuffle service is restarted, the shuffle file location information of the internally registered executors (the executors are computation processes of spark) will be lost, and the executors of the machine will not exit and continue to execute tasks. Since the task cannot find the last-stage shuffle file location information in the external shuffle service, a fetchfailed exception occurs. The task will be rescheduled according to spark self framework rules. However, because the executor always works normally, the shuffle metadata maintained by the driver (spare scheduling management process) is not changed all the time (only when the executor exits, the execute lost occurs, the shuffle metadata of the driver can be refreshed). And the subsequent task still searches the restarted external buffer service for the file location information. Due to this information loss, the task continues to fail. After a certain number of retries, the entire application will fail.
Disclosure of Invention
In view of this, embodiments of the present invention provide an application running method and apparatus, which can solve the problem that a peripheral component is abnormal to cause a failure of an entire application in the prior art.
In order to achieve the above object, according to an aspect of the embodiments of the present invention, there is provided an application running method, including receiving a peripheral component restart event, and writing information registered by a peripheral component into a disk through a mount file; and determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component.
Optionally, killing the computing process corresponding to the peripheral component, including:
adding probe check to the computing process to monitor the peripheral component port;
and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
Optionally, after killing the computing process corresponding to the peripheral component, the method includes:
and rescheduling to start the computing process, receiving the loss message of the computing process, and deleting all the shuffle meta-information corresponding to the computing process.
Optionally, after writing the information registered by the peripheral component into the disk, the method includes:
triggering a peripheral component restart event, and loading all information in a disk to recover the peripheral component data; wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
In addition, according to an aspect of the embodiments of the present invention, there is provided an application execution apparatus, including a receiving module, configured to receive a peripheral component restart event, and write information registered by a peripheral component into a disk through a mount file; and the processing module is used for determining that the peripheral component restarting event fails based on a preset time threshold value and killing the computing process corresponding to the peripheral component.
Optionally, the killing, by the processing module, the computing process corresponding to the peripheral component includes:
adding probe check to the computing process to monitor the peripheral component port;
and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
Optionally, after the processing module kills the computing process corresponding to the peripheral component, the processing module includes:
and rescheduling to start the computing process, receiving the loss message of the computing process, and deleting all the shuffle meta-information corresponding to the computing process.
Optionally, after the receiving module writes the information registered by the peripheral component in the disk, the receiving module includes:
triggering a peripheral component restart event, and loading all information in a disk to recover the peripheral component data; wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any of the application execution embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, the program, when executed by a processor, implementing the method according to any of the above embodiments of application-based execution.
One embodiment of the above invention has the following advantages or benefits: the invention writes the information registered by the peripheral component into a disk by receiving a peripheral component restart event and mounting a file; and determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component. Therefore, the application can normally continue to run until the application is finished after the peripheral component external shuffle service is restarted.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of an application execution method according to a first embodiment of the present invention
Fig. 2 is a schematic diagram of a main flow of an application running method according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a main flow of an application running method according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of a main flow of an application running method according to a fourth embodiment of the present invention;
fig. 5 is a schematic diagram of a main flow of an application running method according to a fifth embodiment of the present invention;
fig. 6 is a schematic diagram of main blocks of an application execution apparatus according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of an application execution method according to a first embodiment of the present invention, and the application execution method may include:
step S101, receiving a peripheral component restart event, and writing the information registered by the peripheral component into a disk through a mount file.
Preferably, after the information registered by the peripheral component is written into the disk, a peripheral component restart event may be triggered to load all information in the disk, so as to recover the peripheral component data. Wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system. DaemonSet, among other things, ensures that a container copy is run on every node.
And S102, determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component.
Preferably, the specific implementation process of killing the computing process corresponding to the peripheral component may include: probe checks are added to the computing process to listen to the peripheral component ports. And then, according to a preset check frequency threshold value, determining that the port is abnormal and killing the corresponding calculation process.
It should be noted that after the computing process corresponding to the peripheral component is killed, the computing process may be rescheduled to start the computing process, receive a computing process loss message, and delete all the shuffle meta information corresponding to the computing process. Wherein, Shuffle is a process of describing data output from Map task to Reduce task input, and comprises data preparation in the Map phase and data copy processing in the Reduce phase.
Therefore, the invention provides an application running method, which can normally continue running until the peripheral component is restarted, and all the applications on the machine can run until the peripheral component is completed. Where multiple applications may run on the same machine. Moreover, the peripheral component restart has little influence on the completion time of the application, automatic recovery can be realized at the application level, and the operation performance can be obviously improved.
That is, the invention can achieve high availability at the application level, and can complete the whole application without the perception of the user. And under the condition that a plurality of applications run in the same machine, each application can run normally.
Fig. 2 is a schematic diagram of a main flow of an application execution method according to a second embodiment of the present invention, and the application execution method may include:
step S201, receiving a peripheral component restart event, and writing information registered by the peripheral component into a disk through a mount file.
In step S202, the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
Step S203, triggering a peripheral component restart event.
Step S204, all information in the disk is loaded to recover the peripheral component data.
Fig. 3 is a schematic diagram of a main flow of an application execution method according to a third embodiment of the present invention, and the application execution method may include:
step S301, determining that the peripheral component restart event fails based on a preset time threshold.
Step S302, add probe check to the computing process to monitor the peripheral component port.
Step S303, according to a preset check frequency threshold value, determining that the port is abnormal, and killing the corresponding calculation process.
In step S304, the computing process is rescheduled to start.
Step S305, receiving the loss message of the computing process, and deleting all the shuffle meta information corresponding to the computing process.
Based on the application operation method, the invention is further explained by taking external short service work in Spark on kubernets as an example.
When the execution is started, a local shuffle storage path (for example, spark local diameter) needs to be registered in the external shuffle service. The Executor logically performs two types of tasks from the function logic, the mapper task and the reducer task. The mapper task stores the shuffle data to the local disk through a hostpath provided by kubernets, and the storage path is spark. The Reducer task will read the shuffle data from the external shuffle service of the specified machine according to the shuffle meta information provided by the driver. Local service finds a storage path according to information maintained during registration. The external shuffle service can accept shuffle requests of a plurality of different applications, and each different application finds its own shuffle data through app id. This solution presents two problems: 1. if the external shuffle service is restarted, the registration information is lost, resulting in application failure. 2. If the external shuffle service cannot be restarted after being exited accidentally or the restart time is long, the entire application will also fail because executors do not exit.
The storage type hostPath type maps files or directories in the node file system to the pod.
Fig. 4 is a schematic diagram of a main flow of an application running method according to a third embodiment of the present invention, where if the external short service is restarted, the application running method may include:
step S401, receiving an external shuffle service restart event, and writing the registered information into a disk through a hostpath mounted file.
In step S402, external shuffle service is configured as a DaemonSet in kubernets.
Step S403, triggering an external shuffle service restart event.
Step S404, all information in the disk is loaded to recover external shuffle service data.
Fig. 5 is a schematic diagram of a main flow of an application running method according to a fourth embodiment of the present invention, where if the external short service cannot be restarted or the restart time is long after the external short service exits unexpectedly, the application running method may include:
step S501, determining that the external shuffle service restart event fails based on a preset time threshold.
Step S502, adding a TCP-live/address probe check to the execute, and monitoring an external buffer service port.
In step S503, the port is determined to be abnormal according to a preset threshold of checking times (for example, a parameter failureThreshold), and the executor is killed by kubel.
And step S504, rescheduling through the spark on kubernets framework to start the executor.
And step S505, receiving the executor lost message, and deleting all shuffle meta information corresponding to the executor.
It is noted that the rescheduled task after step S505 may complete the application.
In addition, in order to recover the external short service at the fastest speed after the external short service is restarted, the following parameters are also set:
initialddelayssecond is how many seconds it takes to wait for the probe to be executed for the first time after the container is started.
period seconds is the frequency at which probing is performed. Preferably, the default is 10 seconds, a minimum of 1 second.
timeoutSeconds is the detection timeout time. Preferably, default is 1 second, minimum 1 second.
success threshold is the number of times a probe has failed, and is considered successful after a minimum of consecutive probes have been successful. Preferably, the default is 1. For liveness must be 1, the minimum value is 1.
failureThreshold is the number of times a minimum of consecutive probes fail after a successful probe is identified as failed. Preferably, the default is 3 and the minimum value is 1.
According to the various embodiments, it can be seen that the invention achieves the purpose of quickly recovering the registration information by persisting the registration information of executors for the external short service quick restart scenario, so that the application can be smoothly carried out. In addition, under the scene that the starting is slow or the starting cannot be carried out after the external short service is accidentally exited, the operation of the whole application is effectively recovered by configuring a probe mode of kubernets. In addition, by setting a series of parameters, the entire application completion time can be made not to be greatly delayed.
Fig. 6 is a schematic diagram of main modules of an application execution apparatus according to a first embodiment of the present invention, and as shown in fig. 6, the application execution apparatus 600 includes a receiving module 601 and a processing module 602. The receiving module 601 receives a peripheral component restart event, and writes information registered by the peripheral component into a disk through a mount file. The processing module 602 determines that the peripheral component restart event fails based on a preset time threshold, and kills the computing process corresponding to the peripheral component.
Preferably, the processing module 602 kills the computing process corresponding to the peripheral component, including:
adding probe check to the computing process to monitor the peripheral component port; and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
In addition, the processing module 602 kills the computing process corresponding to the peripheral component, including:
adding probe check to the computing process to monitor the peripheral component port;
and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
As another embodiment, after the receiving module 601 writes the information registered by the peripheral component into the disk, the method includes:
triggering a peripheral component restart event, and loading all information in a disk to recover the peripheral component data; wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
It should be noted that the application running method and the application running apparatus according to the present invention have corresponding relation in the specific implementation content, and therefore, the repeated content is not described again.
Fig. 7 shows an exemplary system architecture 700 to which an application execution method or application execution apparatus according to an embodiment of the present invention can be applied.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the terminal devices 701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. The terminal devices 701, 702, 703 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the application running method provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the application running apparatus is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module and a processing module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving a peripheral component restart event, and writing the information registered by the peripheral component into a disk through a mount file; and determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component.
According to the technical scheme of the embodiment of the invention, the problem that the whole application fails due to the abnormality of the peripheral component in the prior art can be solved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An application running method, comprising:
receiving a peripheral component restart event, and writing the information registered by the peripheral component into a disk through a mount file;
and determining that the peripheral component restarting event fails based on a preset time threshold, and killing the computing process corresponding to the peripheral component.
2. The method of claim 1, wherein killing the computing process corresponding to the peripheral component comprises:
adding probe check to the computing process to monitor the peripheral component port;
and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
3. The method according to claim 1 or 2, wherein after killing the computing process corresponding to the peripheral component, the method comprises:
and rescheduling to start the computing process, receiving the loss message of the computing process, and deleting all the shuffle meta-information corresponding to the computing process.
4. The method of claim 1, wherein after writing the information registered by the peripheral component to the disk, the method comprises:
triggering a peripheral component restart event, and loading all information in a disk to recover the peripheral component data; wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
5. An application execution apparatus, comprising:
the receiving module is used for receiving a peripheral component restarting event and writing the information registered by the peripheral component into a disk through a mount file;
and the processing module is used for determining that the peripheral component restarting event fails based on a preset time threshold value and killing the computing process corresponding to the peripheral component.
6. The apparatus of claim 5, wherein the processing module kills the computing process corresponding to the peripheral component, and comprises:
adding probe check to the computing process to monitor the peripheral component port;
and determining that the port is abnormal according to a preset check frequency threshold value, and killing the corresponding calculation process.
7. The apparatus according to claim 5 or 6, wherein the processing module, after killing the computing process corresponding to the peripheral component, comprises:
and rescheduling to start the computing process, receiving the loss message of the computing process, and deleting all the shuffle meta-information corresponding to the computing process.
8. The apparatus of claim 5, wherein the receiving module, after writing the information registered by the peripheral component to the disk, comprises:
triggering a peripheral component restart event, and loading all information in a disk to recover the peripheral component data; wherein the peripheral component is configured in a manner of DaemonSet in the container cluster management system.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911052617.7A CN112749042B (en) | 2019-10-31 | 2019-10-31 | Application running method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911052617.7A CN112749042B (en) | 2019-10-31 | 2019-10-31 | Application running method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112749042A true CN112749042A (en) | 2021-05-04 |
CN112749042B CN112749042B (en) | 2024-03-01 |
Family
ID=75644594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911052617.7A Active CN112749042B (en) | 2019-10-31 | 2019-10-31 | Application running method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112749042B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030120700A1 (en) * | 2001-09-11 | 2003-06-26 | Sun Microsystems, Inc. | Task grouping in a distributed processing framework system and methods for implementing the same |
CN101727629A (en) * | 2008-10-10 | 2010-06-09 | 北京资和信担保有限公司 | Self-organization distribution business system |
CN103023805A (en) * | 2012-11-22 | 2013-04-03 | 北京航空航天大学 | MapReduce system |
CN103316472A (en) * | 2013-05-17 | 2013-09-25 | 南京睿悦信息技术有限公司 | Android device gas platform system based on Bluetooth handle and implementation method of Android device gas platform system |
CN105306964A (en) * | 2015-10-23 | 2016-02-03 | 北京理工大学 | Quick recovery system and quick recovery method for video stream transcoding fault |
US20170139816A1 (en) * | 2015-11-17 | 2017-05-18 | Alexey Sapozhnikov | Computerized method and end-to-end "pilot as a service" system for controlling start-up/enterprise interactions |
CN107832344A (en) * | 2017-10-16 | 2018-03-23 | 广州大学 | A kind of food security Internet public opinion analysis method based on storm stream calculation frameworks |
-
2019
- 2019-10-31 CN CN201911052617.7A patent/CN112749042B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030120700A1 (en) * | 2001-09-11 | 2003-06-26 | Sun Microsystems, Inc. | Task grouping in a distributed processing framework system and methods for implementing the same |
CN101727629A (en) * | 2008-10-10 | 2010-06-09 | 北京资和信担保有限公司 | Self-organization distribution business system |
CN103023805A (en) * | 2012-11-22 | 2013-04-03 | 北京航空航天大学 | MapReduce system |
CN103316472A (en) * | 2013-05-17 | 2013-09-25 | 南京睿悦信息技术有限公司 | Android device gas platform system based on Bluetooth handle and implementation method of Android device gas platform system |
CN105306964A (en) * | 2015-10-23 | 2016-02-03 | 北京理工大学 | Quick recovery system and quick recovery method for video stream transcoding fault |
US20170139816A1 (en) * | 2015-11-17 | 2017-05-18 | Alexey Sapozhnikov | Computerized method and end-to-end "pilot as a service" system for controlling start-up/enterprise interactions |
CN107832344A (en) * | 2017-10-16 | 2018-03-23 | 广州大学 | A kind of food security Internet public opinion analysis method based on storm stream calculation frameworks |
Non-Patent Citations (1)
Title |
---|
石俊;徐小伟;蔡富强;刘晓洁;陈恩;: "Linux高可用性***的改进方案", 计算机安全, no. 08 * |
Also Published As
Publication number | Publication date |
---|---|
CN112749042B (en) | 2024-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12019652B2 (en) | Method and device for synchronizing node data | |
CN107729176B (en) | Disaster recovery method and disaster recovery system for configuration file management system | |
CN111897633A (en) | Task processing method and device | |
CN109245908B (en) | Method and device for switching master cluster and slave cluster | |
CN109783151B (en) | Method and device for rule change | |
CN111338834B (en) | Data storage method and device | |
CN111666134A (en) | Method and system for scheduling distributed tasks | |
CN114064438A (en) | Database fault processing method and device | |
CN107818027B (en) | Method and device for switching main name node and standby name node and distributed system | |
CN107526838B (en) | Method and device for database cluster capacity expansion | |
CN111767126A (en) | System and method for distributed batch processing | |
CN117435569A (en) | Dynamic capacity expansion method, device, equipment, medium and program product for cache system | |
CN113541987A (en) | Method and device for updating configuration data | |
CN114070889B (en) | Configuration method, traffic forwarding device, storage medium, and program product | |
CN112749042B (en) | Application running method and device | |
CN113760469A (en) | Distributed computing method and device | |
CN110445628B (en) | NGINX-based server and deployment and monitoring methods and devices thereof | |
CN113742376A (en) | Data synchronization method, first server and data synchronization system | |
CN112463514A (en) | Monitoring method and device for distributed cache cluster | |
CN111767113A (en) | Method and device for realizing container eviction | |
CN114356214B (en) | Method and system for providing local storage volume for kubernetes system | |
CN117349035B (en) | Workload scheduling method, device, equipment and storage medium | |
US11630584B2 (en) | Storage management system and method | |
CN112799863B (en) | Method and device for outputting information | |
CN115203334A (en) | Data processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |