CN116719663B - Data processing method, device, equipment and readable storage medium - Google Patents

Data processing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN116719663B
CN116719663B CN202310980930.7A CN202310980930A CN116719663B CN 116719663 B CN116719663 B CN 116719663B CN 202310980930 A CN202310980930 A CN 202310980930A CN 116719663 B CN116719663 B CN 116719663B
Authority
CN
China
Prior art keywords
request
kernel
access request
record file
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310980930.7A
Other languages
Chinese (zh)
Other versions
CN116719663A (en
Inventor
葛凯凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310980930.7A priority Critical patent/CN116719663B/en
Publication of CN116719663A publication Critical patent/CN116719663A/en
Application granted granted Critical
Publication of CN116719663B publication Critical patent/CN116719663B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: when reaching the detection moment of the target kernel, reading a request record file about the kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through a kernel; performing request detection processing on the access request in the request record file, and determining a request waiting state of the access request in the request record file; and when the request waiting state of the access request in the request record file is a continuous abnormal waiting state, determining that the kernel is abnormal, and unloading the kernel. By adopting the method and the device, the abnormal kernel can be identified pertinently, and the timeliness of the abnormal kernel in abnormal processing is improved.

Description

Data processing method, device, equipment and readable storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium.
Background
Ceph is an open-source distributed storage system that can provide object, file, and block storage services simultaneously. Ceph may specifically provide distributed storage services through a block storage component (Rados block device, RBD), which may resemble a physical hard disk on a server. The RBD can provide services to the outside in two ways, one is to provide user mode access through librbd, and the other is to provide kernel mode access through kernel component (KRBD). For kernel mode access, performance is better due to the use of kernel components, so for some business applications with relatively high performance requirements, kernel mode access to RBDs may be performed by the kernel components.
However, in the manner that kernel access is performed through the kernel component, when the Ceph storage system fails, the Ceph storage system cannot timely process related requests, so that the requests sent by the kernel component cannot be responded all the time and are blocked, and then application processes related to the requests cannot be in a state of being responded all the time, so that a large number of application processes cannot be smoothly performed and cannot be terminated, and only the blocked state can be maintained. In order to solve the problems of kernel blocking and application process blocking, the related technology can only adopt a forced hard restart mode to forcedly restart the physical server where the service application is located, thereby stopping all kernels running on the physical server, and stopping all running application processes. This way of forced hard restarting of the physical server, while it can solve the problem of application process stuck once and for all, generally requires manual operation, possibly with hysteresis; meanwhile, application processes of other services may exist on the physical server, and the cut-off hard restart mode can enable other normally running application processes to be terminated. As can be seen, there is a need for a way to solve the problem that the application process with a jam exists and does not affect other normally running application processes to continue running.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment and a readable storage medium, which can be used for identifying abnormal cores in a targeted manner and improving the timeliness of the abnormal processing of the abnormal cores.
In one aspect, an embodiment of the present application provides a data processing method, including:
when reaching the detection moment of the target kernel, reading a request record file about the kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through a kernel;
performing request detection processing on the access request in the request record file, and determining a request waiting state of the access request in the request record file;
and when the request waiting state of the access request in the request record file is a continuous abnormal waiting state, determining that the kernel is abnormal, and unloading the kernel.
An aspect of an embodiment of the present application provides a data processing apparatus, including:
the file reading module is used for reading a request record file about the kernel when reaching the detection moment of the target kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through a kernel;
the request detection module is used for carrying out request detection processing on the access request in the request record file and determining a request waiting state of the access request in the request record file;
The exception determining module is used for determining that the kernel is abnormal when the request waiting state of the access request in the request record file is a continuous exception waiting state;
and the kernel unloading module is used for unloading the kernel.
In one embodiment, the request detection module performs request detection processing on the access request in the request record file, and determines a specific implementation manner of a request waiting state of the access request in the request record file, including:
obtaining an access request with a minimum sending time stamp in a request record file, and determining the access request with the minimum sending time stamp in the request record file as a target earliest access request corresponding to a target kernel detection moment;
determining the last kernel detection time of the target kernel detection time as the historical kernel detection time of the target kernel detection time;
acquiring a historical earliest access request corresponding to the historical kernel detection time, and determining a request waiting state of the access request in the request record file according to the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time.
In one embodiment, the request detection module determines a specific implementation manner of a request waiting state of an access request in a request record file according to a target earliest access request corresponding to a target kernel detection time and a historical earliest access request corresponding to a historical kernel detection time, where the specific implementation manner includes:
Comparing the target earliest access request with the historical earliest access request;
if the target earliest access request is the same as the historical earliest access request, determining a request waiting state of the access request in the request record file based on a time period between the target kernel detection time and the historical kernel detection time;
if the target earliest access request is different from the historical earliest access request, determining that the request waiting state of the access request in the request record file is a reasonable waiting state.
In one embodiment, the request detection module determines a specific implementation of a request waiting state of an access request in a request record file based on a time period between a target kernel detection time and a history kernel detection time, including:
determining a time period between the target kernel detection time and the history kernel detection time;
determining the time period between the target kernel detection time and the history kernel detection time as the incremental blocking time length of the kernel at the target kernel detection time;
acquiring a historical request blocking time length of the kernel at a historical kernel detection time, and adding the incremental blocking time length and the historical request blocking time length to obtain a target request blocking time length of the kernel at a target kernel detection time;
And when the blocking time of the target request is longer than the time threshold, determining that the request waiting state of the access request in the request record file is a continuous abnormal waiting state.
In one embodiment, a specific implementation manner of the kernel unloading module for unloading the kernel includes:
obtaining a kernel version corresponding to a kernel;
according to the version interval of the kernel version, determining the kernel unloading rule of the kernel;
and unloading the kernel according to kernel unloading rules.
In one embodiment, the kernel unloading module determines a specific implementation manner of a kernel unloading rule of the kernel according to a version interval to which the kernel version belongs, including:
when the version interval of the kernel version is the first interval, determining a low version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel;
when the version interval of the kernel version is a second interval, determining a high version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel; the first interval is lower than the second interval.
In one embodiment, the kernel offload rules are low-version offload rules; the request record file contains access request S i I is a positive integer;
The specific implementation mode of the kernel unloading module for unloading the kernel according to the kernel unloading rule comprises the following steps:
access request S according to low version offload rules i Performing request setting processing, and setting the access request S after the request setting processing i Is determined as a request success state;
when the request state of each access request is determined to be a request success state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file is determined to not contain the access request, unloading the kernel through a kernel unloading instruction.
In one embodiment, the kernel offload module offload access request S in accordance with low-version offload rules i The specific implementation manner of the request setting processing comprises the following steps:
according to the low version uninstalling rule, obtaining the access request S i Is a request identifier of (1);
acquiring a first request termination logic code corresponding to the low version uninstalling rule, and according to the first request termination logic code and the access request S i Generates a request identifier for an access request S i A first request termination command of (a);
according to the first request termination command, the access request S in the request record file i Setting the request execution state of (1) to the execution completion state, and accessing the request S i Setting the completion code of (2) as a first completion code; the first completion code is used for indicating the access request S i The normal execution is completed.
In one embodiment, the kernel offload rules are high-version offload rules; the request record file contains access request S i I is a positive integer;
the specific implementation mode of the kernel unloading module for unloading the kernel according to the kernel unloading rule comprises the following steps:
access request S according to high version offload rules i Performing request setting processing, and setting the access request S after the request setting processing i Is determined to be a request failure state;
when the request state of each access request is a request failure state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file is determined to not contain the access request, unloading the kernel through a kernel unloading instruction.
In one embodiment, the kernel offload module offload access request S in accordance with high-version offload rules i The specific implementation manner of the request setting processing comprises the following steps:
according to the high version uninstalling rule, obtaining the access request S i Is a request identifier of (1);
acquiring a second request termination logic code corresponding to the high version uninstalling rule, and according to the second request termination logic code and the access request S i Generates a request identifier for an access request S i A second request termination command of (2);
according to the second request termination command, the access request S in the request record file i Setting the request execution state of (1) to the execution failure state, and accessing the request S i Setting the completion code of (2) to a second completion code; the second completion code is used for indicating the access request S i The execution is not completed normally.
In one embodiment, the kernel is deployed in an application server;
after the anomaly determination module determines that the kernel has an anomaly, the data processing apparatus further includes:
the set determining module is used for determining each kernel deployed by the application server as a deployment kernel to obtain a deployment kernel set;
the abnormal kernel determining module is used for determining a deployment kernel with abnormality at the detection moment of the target kernel in the deployment kernel set as an abnormal kernel;
the operation attribute determining module is used for determining the system operation attribute of the distributed storage system based on the first number of abnormal cores contained in the deployment core set;
and the information pushing module is used for pushing the fault warning information aiming at the distributed storage system to a system maintenance object of the distributed storage system when the system operation attribute of the distributed storage system is an abnormal operation attribute, so that the system maintenance object carries out system maintenance processing on the distributed storage system based on the fault warning information.
In one embodiment, the operation attribute determination module determines a specific implementation of a system operation attribute of the distributed storage system based on a first number of exception cores contained in the deployment core set, including:
counting a first number of abnormal kernels contained in the deployment kernel set and a second number of deployment kernels contained in the deployment kernel set;
determining a number ratio between the first number and the second number;
if the number ratio is greater than the ratio threshold, determining that the system operation attribute of the distributed storage system is an abnormal operation attribute;
if the number ratio is smaller than the ratio threshold, determining that the system operation attribute of the distributed storage system is a normal operation attribute.
In one aspect, a computer device is provided, including: a processor and a memory;
the memory stores a computer program that, when executed by the processor, causes the processor to perform the methods of embodiments of the present application.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, where the computer program includes program instructions that, when executed by a processor, perform a method in an embodiment of the present application.
In one aspect of the present application, a computer program product is provided that includes a computer program stored in a computer readable storage medium. A processor of a computer device reads the computer program from a computer-readable storage medium, and the processor executes the computer program to cause the computer device to perform the method provided in an aspect of the embodiments of the present application.
In the embodiment of the application, when the kernel uses the block storage service of the distributed storage system, a method for unloading the kernel under the condition of abnormality is provided, so that the application process indicated by the kernel can be pertinently and timely terminated, and the influence on other normally operated application processes can be reduced. Specifically, for a certain kernel, the application may periodically read a request record file related to the kernel (where an access request for accessing a block storage component in the distributed storage system through the kernel is recorded), for example, when reaching a target kernel detection time, the request record file of the kernel may be read, and then, a request detection process may be performed on an access request in the request record file, and a request waiting state of an access request contained in the request record file may be determined through the request detection process; when the request waiting state of the access requests is a continuous abnormal waiting state, the related access requests of the kernel can be determined, the distributed storage system does not process the related access requests in time, and the access requests of the kernel are continuously waiting, so that the condition that the kernel is abnormal can be determined, and the kernel can be unloaded in a targeted manner. It should be understood that by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server does not need to be restarted strongly, and the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process on the physical server is protected. In conclusion, the method and the device can specifically identify the abnormal kernel, and improve the timeliness of exception handling of the abnormal kernel.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a network architecture diagram of a data processing system provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of terminating an application process by a hard restart method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an architecture for automatically detecting a kernel and performing an anomaly alarm according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an architecture of an automatic detection service according to an embodiment of the present application;
FIG. 6 is a schematic flow chart of offloading processing to a kernel according to an embodiment of the present application;
FIG. 7 is a diagram of an overall logic architecture for offloading kernels according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an architecture for offloading low-version kernels according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an architecture for offloading high-version kernels according to an embodiment of the present application;
FIG. 10 is a schematic flow chart of a maintenance process for a distributed storage system based on a kernel with an exception according to an embodiment of the present application;
FIG. 11 is a system flow diagram of kernel offloading provided in an embodiment of the present application;
FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Embodiments of the present application relate to Ceph and related technologies, and for ease of understanding, the following will briefly describe the concepts of Ceph and related technologies:
ceph: ceph is an open-source distributed storage system, and can provide object storage service, file system storage service, block storage service and the like at the same time. Ceph has advantages such as high expansibility, high performance and high reliability, and Ceph make full use of the computational power of storage node when storing, can calculate the position of obtaining this data when storing each data, and the distribution is balanced as far as possible.
RBD: RBD is collectively Rados block device, which is a block storage component of Ceph, which can provide block storage services through RBD.
KRBD: KRBD may be understood as a kernel component, where Ceph may be implemented in a KRBD manner, through which the block storage services of Ceph may be used.
It will be appreciated that Ceph is a distributed storage system, distributed block storage services may be provided by RBDs, which may provide kernel-mode access by KRBDs. That is, for different business applications, it may access the Ceph by installing a KRBD deployment (e.g., reading data from the Ceph or storing business data into the Ceph).
For ease of understanding, FIG. 1 is a diagram of a network architecture of a data processing system according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a service server 1000 and a terminal device cluster, which may include one or more terminal devices, the number of which will not be limited here. As shown in fig. 1, the plurality of terminal devices may include a terminal device 100a, a terminal device 100b, terminal devices 100c, …, a terminal device 100n; as shown in fig. 1, the terminal devices 100a, 100b, 100c, …, 100n may respectively perform network connection with the service server 1000, so that each terminal device may perform data interaction with the service server 1000 through the network connection. In addition, any terminal device in the terminal device cluster 100 may refer to an intelligent device running an operating system, and the operating system of the terminal device is not specifically limited in the embodiment of the present application.
The terminal device in the data processing system shown in fig. 1 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a desktop computer, a mobile internet device (MID, mobile internet device), a POS (Point Of sale) machine, a smart speaker, a smart television, a smart watch, a smart car terminal, a Virtual Reality (VR) device, an augmented Reality (Augmented Reality, AR) device, and the like. The terminal device is often configured with a display device, which may be a display, a display screen, a touch screen, etc., and the touch screen may be a touch screen, a touch panel, etc.
The service server in the data processing system shown in fig. 1 may be a single physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal device and the service server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In one possible implementation, a terminal device (e.g., terminal device 100 a) has a client (e.g., a client may also be referred to as a business application) running therein, such as a video client, a browser client, a game client, an educational client, a web client, etc., which will not be illustrated one by one. In the embodiment of the present application, a game client is described as an example. The use object (may refer to an object using a game client, such as a user using the game client) may generate a large amount of game data when the game client is executed, and the service server 1000 may refer to a server corresponding to the game client, where the service server 1000 may obtain the game data, the service server 1000 may store the game data into a Ceph system (which may be referred to as a distributed storage system in this application), specifically, a Kernel (KRBD) may be deployed in the service server 1000, so that a relevant application process of the game client may run in the kernel, and the service server 1000 may access the distributed storage system through the kernel, for example, the service server 1000 may initiate a write request for relevant game data to the distributed storage system through the kernel to store the game data into the distributed storage system; of course, when a user object has a query request for some game data in the game client, the service server 1000 may also initiate a read request for the game data to the distributed storage system through the kernel, so as to read the game data from the distributed storage system and display the game data on the user object for viewing.
Based on the above, the present application may use a distributed storage system, such as a Ceph system, to store service data (e.g., game data) of a related service application, where a service server corresponding to the service application may access the distributed storage system through a kernel. However, in the manner that kernel access is performed through the kernel component, when the Ceph storage system fails, the Ceph storage system cannot timely process requests sent by the kernel component, so that application processes associated with the requests can be in a state of not being responded all the time, and therefore a large number of application processes in the service server cannot be performed smoothly and cannot be terminated, and only a stuck state can be maintained. In the related art, in order to solve the problems of kernel blocking and application process blocking, a hard restarting mode is adopted to perform forced restarting on a server, so that all kernels running on the server can be terminated forcefully, and all application processes are terminated. For ease of understanding, please refer to fig. 2, fig. 2 is a schematic diagram of terminating an application process by a hard restart method according to an embodiment of the present application.
As shown in fig. 2, the process of accessing the distributed storage system (Ceph storage system) through the kernel for the business application may at least include the following steps: 1) The kernel mode KRBD exposes a block storage component (such as/dev/rbd 0 in particular) to a user mode through mounting; 2) Formatting the file system (e.g., ext 4) for the block storage component and hooking the file system to the block storage component; 3) The service application sends a file system access request (such as a data read request or a data write request), the file system access request is converted into a block request bio to reach the block storage component, and then the kernel KRBD converts the block request bio into a block request of the block storage component in the Ceph storage system, and sends the block request to the Ceph storage system through a network. Based on the above, when the Ceph storage system fails, the request sent by the kernel cannot be processed in the Ceph storage system, so that the blocking cannot be completed, and the kernel continuously waits for the request to be completed, so that the application processes of the service applications corresponding to the requests are in a state of being unable to respond to the external event, and the related technology can only terminate all the application processes by hard restarting the service server where the service application is located. That is, by restarting the service server hard, all kernels deployed in the service server are unloaded, thereby terminating running all application processes on each kernel. However, hard restarting the service server requires operation by a professional, and may have operational hysteresis; in addition, because the service server may further have other service application processes, the forced hard restart manner may also cause other application processes to be terminated, which affects other normally running application processes.
In order to pertinently and timely unload the kernel with the jam, so as to reduce the influence on other normally running application processes, the application provides a method for pertinently unloading the kernels, specifically, the application can periodically detect each kernel to detect the kernel with the jam (such as the jam), and for the kernel with the jam, the unloading process can be performed based on the version of the kernel.
Specifically, when a certain detection time (may be referred to as a target kernel detection time) is reached, the service server 1000 may read a request record file about a certain kernel, where the request record file records an access request for accessing a block storage component (rbd) in a distributed storage system (Ceph storage system) through the kernel; subsequently, the service server 1000 may perform a request detection process on the access request in the request record file to determine a request waiting state of the access request in the request record file; the request waiting state may include a persistent exception waiting state and a reasonable waiting state, and when the request waiting state of the access requests in the request record file is the persistent exception waiting state, it may be determined that the access requests are continuously waiting for processing of the distributed storage system, where the distributed storage system may have a fault, and it may be determined that the kernel has a katon, and the kernel may be unloaded. The specific manner of performing the request detection processing on the access request in the request record file to determine the request waiting state of the access request in the request record file may be determined based on the request sending time of each access request in the request record file, and the specific manner of performing the offloading processing on the kernel may be performed based on the version of the kernel, and the specific implementation manner may be described in the embodiment corresponding to fig. 3 below.
It should be understood that by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server does not need to be restarted strongly, the normal running application process on the physical server can be protected, the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process is continued, and the influence on the normal running application process is reduced.
It is understood that the methods provided by the embodiments of the present application may be performed by a computer device, including but not limited to the terminal device or service server mentioned in fig. 1.
In the specific embodiment of the present application, the data related to the user information, the user data (such as the game data generated by the user in the game client) and the like are all obtained by the user manually authorizing the license (i.e. by the user agreeing). That is, when the above embodiments of the present application are applied to specific products or technologies, the methods and related functions provided by the embodiments of the present application are performed with the permission or consent of the user (the functions provided by the embodiments of the present application may be actively turned on by the user), and the collection, use and processing of related data are required to comply with the related laws and regulations and standards of the related countries and regions.
For ease of understanding, the data processing method provided in the embodiments of the present application will be described in detail below with reference to the accompanying drawings. Referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. The method may be performed by a terminal device (e.g., any terminal device in the terminal device cluster shown in fig. 1, such as the terminal device 100 a), or may be performed by a server (e.g., the service server 1000 in the embodiment corresponding to fig. 1), or may be performed by both the terminal device and the server. For ease of understanding, this embodiment will be described in terms of this method being performed by a server as an example. As shown in fig. 3, the data processing method may at least include the following steps S101 to S103:
step S101, when reaching the detection moment of the target kernel, reading a request record file about the kernel; the request record file records access requests for accessing the block storage components in the distributed storage system through the kernel.
In the application, the distributed storage system may refer to a Ceph storage system, which is an open-source distributed storage system, and can provide object, file and block storage services at the same time. Ceph may specifically provide distributed storage services through a block storage component (Rados block device, RBD), which may resemble a physical hard disk on a server. The block storage component can provide services to the outside in two ways, one is to provide user mode access through librbd, and the other is to provide kernel mode access through kernel component (KRBD). In other words, for the distributed storage system in the present application, access may be performed through a Kernel (KRBD), and for the access of the distributed storage system, access to a block storage component in the distributed storage system may refer to that, and storing service data of a service application into the distributed storage system or reading related service data from the distributed storage system is performed through the access to the block storage component.
It should be appreciated that access requests by a business application to a distributed storage system may be forwarded by the kernel, that is, the business application may access a block storage component in the distributed storage system by the kernel, and for different access requests to a block storage component in the distributed storage system by the kernel may be recorded by a file, which may be referred to as a request record file for the kernel. For ease of understanding, please refer to table 1, table 1 is an exemplary illustration of a request record file provided in the embodiments of the present application, as shown in table 1:
TABLE 1
Wherein each row as shown in table 1 may be used to characterize an access request, for example, "500,AAA,AAA,rbd_data.123456.AAAA AAAA" may refer to an access request. Each access request contains a request number, the size of the request number of the access request can be used to characterize the sending order of the access request, for example, in an access request "500,AAA,AAA,rbd_data.123456.AAAA AAAA", the request number can be 500, in an access request "501,AAA,AAA,rbd_data.123456.AAAA AAAA", the request number can be 501, in an access request "502,AAA,AAA,rbd_data.123456.AAAA AAAA", the request number can be 502, and since the request number 500 is smaller than the request number 501 and the request number 501 is smaller than the request number 502, it can be seen that the time of sending the access request "500,AAA,AAA,rbd_data.123456.AAAA AAAA" is earlier than the time of sending the access request "501,AAA,AAA,rbd_data.123456.AAAA AAAA" and the time of sending the access request "501,AAA,AAA,rbd_data.123456.AAAA AAAA" is earlier than the time of sending the access request "502,AAA,AAA,rbd_data.123456.AAAA AAAA". Each access request also includes the component number of the RBD block storage component of the operation, e.g., as shown in table 1, the component number of the block storage component included in each access request is "123456", which may characterize the operation on the same block storage component.
It should be understood that an automatic detection component may be deployed in the server, where the automatic detection component may periodically (e.g., every 5s, every 3s, every ten minutes, etc.) read the request log file of each core, and then detect whether a kernel has a katon anomaly based on the access request contained in the request log file of the core. The target core detection time here may refer to any time for detecting the core, for example, assuming that the core needs to be detected every 5s from time a, the target core detection time here may refer to time a, time a+5s, and time a+10s … … each time the target core detection time is reached, a request record file about the core needs to be read, so as to detect whether the core has an exception by an access request recorded in the request record file.
Step S102, carrying out request detection processing on the access request in the request record file, and determining a request waiting state of the access request in the request record file.
In the present application, the request waiting state of the access request in the request record file may be determined by performing a request detection process on the access request in the request record file. For a specific mode of performing request detection processing on an access request in a request record file, the request detection can be performed by the request sending time of each access request, and specifically, for performing request detection processing on an access request in a request record file, a specific implementation mode of determining a request waiting state of an access request in the request record file can be as follows: the access request with the minimum sending time stamp in the request record file can be obtained, and then the access request with the minimum sending time stamp in the request record file can be determined as the target earliest access request corresponding to the target kernel detection moment; further, the last core detection time of the target core detection time may be determined as a history core detection time of the target core detection time; the historical earliest access request corresponding to the historical kernel detection time can be obtained, and the request waiting state of the access request in the request record file can be determined according to the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time.
It can be understood that, for each core detection time, the application may obtain an access request with a minimum sending timestamp in the request record file (that is, an access request with the earliest sending time, which is recorded in the request record file and is sent to the distributed storage system by the core, and then each access request has a sending time, that is, an access request with the smallest sending timestamp is an access request with the earliest sending time), where the access request is an earliest access request at the core detection time, and the application may determine, when determining the core detection time, a request waiting state of an access request in the request record file based on earliest access requests respectively corresponding to a certain core detection time and a previous core detection time. Then for the target kernel detection time, at the target kernel detection time, the access request with the smallest sending timestamp in the request record file may be referred to as the target earliest access request; for the last core detection time of the target core detection time, the last core detection time may be referred to as a history core detection time, and when the history core detection time is the history core detection time, an access request with the smallest sending time error in the request record file may be referred to as a history earliest access request. And based on the earliest target access request and the earliest historical access request, the request waiting state of the access request in the request record file can be determined together when the target kernel is detected.
Specifically, for the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time, the specific implementation manner of determining the request waiting state of the access request in the request record file may be as follows: the target earliest access request may be compared to the historical earliest access request; if the target earliest access request is the same as the historical earliest access request, determining a request waiting state of the access request in the request record file based on a time period between the target kernel detection time and the historical kernel detection time; if the target earliest access request is different from the historical earliest access request, the request waiting state of the access request in the request record file can be directly determined to be a reasonable waiting state.
It can be understood that, for the access request that the kernel sequentially sends to the distributed storage system, the access request may include a data read request or a data write request (data storage request), and the distributed storage system may sequentially process each access request, and each time the distributed storage system completes one access request, the kernel receives a feedback message that the processing is completed, and then the processed access request may be marked and cleared, and correspondingly, the access request that the distributed storage system processes is cleared in the request record file. It can be seen that if the earliest access requests in the request record file are different when the two front and rear kernel detection times (for example, the two front and rear kernel detection times are the history kernel detection time and the target kernel detection time), the distributed storage system can reflect that the access requests are continuously processed all the time, the earliest access requests in the front and rear two times are different, the history earliest access requests corresponding to the history kernel detection time are processed, the access requests contained in the request record file are in a normal waiting state, each access request needs to wait for the distributed storage system to sequentially process the access requests with earlier transmission time, and then the request waiting state of the access requests in the request record file can be directly determined as a reasonable waiting state. And if the earliest access request in the request record file is the same at the front and rear kernel detection moments, it may reflect that the processing of the historical earliest access request in the history kernel detection moment is not completed by the distributed storage system within a period of time, in other words, the processing of the historical earliest access request is not completed by the distributed storage system at least within a period of time between the front and rear kernel detection moments, at this time, a request waiting duration corresponding to the historical earliest access request may be determined based on a period of time between the front and rear kernel detection moments, where the request waiting duration may be understood as a request blocking duration of the distributed storage system at the target kernel detection moment, and if the request blocking duration is greater than a duration threshold (the duration threshold may be a preset duration value, such as 50s, 30s, 10min, etc.), it may be determined that a request waiting state of the access request in the request record file is a continuous abnormal waiting state.
Specifically, based on the time period between the target kernel detection time and the history kernel detection time, the specific implementation manner of determining the request waiting state of the access request in the request record file may be: the time period between the target kernel detection time and the history kernel detection time can be determined; then, the time period between the target kernel detection time and the history kernel detection time can be determined as the incremental blocking time length of the kernel at the target kernel detection time; the method comprises the steps that the historical request blocking time of a core at the historical core detection time can be obtained, and the incremental blocking time and the historical request blocking time can be added for operation, so that the target request blocking time of the core at the target core detection time can be obtained; when the blocking time of the target request is longer than the time threshold, the request waiting state of the access request in the request record file can be determined to be a continuous abnormal waiting state. And when the target request blocking time length is smaller than the time length threshold value, determining that the request waiting state of the access request in the request record file is a reasonable waiting state.
It can be understood that when the target core detection time is the first core detection time (e.g. core detection time 1), if the target core detection time does not have a history core detection time, it is impossible to compare whether the earliest access requests of the two previous and subsequent core detection times are the same, based on this, the target core detection time of the present application can be understood as the second core detection time (e.g. core detection time 2) and any one of the subsequent core detection times (e.g. core detection time 3, core detection time 4, core detection time 5 … …), when the target core detection time is the second core detection time, the history core detection time can be the first core detection time, if the earliest access requests of the two previous and subsequent core detection times are the same, then the time period (which can be referred to as time period 12) between the first core detection time and the second core detection time can be determined as an incremental blocking duration, and because the history core detection time is the first core detection time, and the request blocking duration of the first core detection time is not existed, when the first core detection time can be 0, based on this, and the first core detection time and the second time can be the target time period can be determined as the time period between the first core detection time and the target time.
Correspondingly, when the target core detection time is the third core detection time (core detection time 3), the historical core detection time of the target core detection time can be understood as the first core detection time (core detection time 2), at this time, whether the earliest access request at the core detection time 2 is the same as the earliest access request at the core detection time 3 or not can be compared, if so, the time period 23 between the core detection time 2 and the core detection time 3 can be determined to be an incremental blocking time, the request blocking time (time period 12) at the core detection time 2 can be determined to be a historical request blocking time, and the incremental blocking time (time period 23) and the historical request blocking time (time period 12) are subjected to addition operation processing, so that the request blocking time (namely, the target request blocking time is time period 12+time period 23, namely, time period 13) at the core detection time 3 can be obtained.
Based on the above, according to the above-described manner, the request blocking time length at each core detection time can be determined, for each core detection time, the request blocking time length at the time can be determined, if the request blocking time length at a certain core detection time is greater than the time length threshold, the request waiting state of the access request in the request record file can be determined to be a continuous abnormal waiting state, and the core can be considered to have an abnormality at the core detection time, so that the core should be unloaded without subsequent detection of the core. In other words, once an abnormality is detected in a core at a core detection time, the core does not need to be continuously detected later.
Step S103, when the request waiting state of the access request in the request record file is a continuous abnormal waiting state, determining that the kernel is abnormal, and unloading the kernel.
In the application, when the request waiting state of the access requests in the request record file is determined to be the continuous abnormal waiting state, it can be determined that the distributed storage system is not timely processing the access requests, the access requests are always in the abnormal waiting state, and the access requests are in a blocking condition, namely, the kernel is blocked, based on the fact, the kernel is determined to be abnormal, and at the moment, unloading processing can be performed on the kernel. It should be noted that, in this application, different policies are proposed for offloading according to different kernel versions, and for a specific implementation manner of offloading processing on a kernel, reference may be made to descriptions in the following embodiments.
Optionally, it may be understood that a monitoring alarm service may be preset in the present application, which is configured to send, after detecting that an abnormal kernel exists, abnormal alarm information for the abnormal kernel to the kernel maintenance object, so that the kernel maintenance object performs further maintenance processing on the kernel. For ease of understanding, please refer to fig. 4, fig. 4 is a schematic diagram of an architecture for automatically detecting a kernel and performing an anomaly alarm according to an embodiment of the present application. As shown in fig. 4, the architecture may include at least a business server, a monitoring service, and a distributed storage system. For ease of understanding, various parts of the architecture will be described below:
Service server: based on the foregoing embodiments, it may be known that the service server may refer to a server corresponding to a service application, and the process of accessing the distributed storage system (Ceph storage system) by the service application through the kernel may at least include the following steps: 1) The kernel mode KRBD exposes a block storage component (such as/dev/rbd 0 in particular) to a user mode through mounting; 2) Formatting the file system (e.g., ext 4) for the block storage component and hooking the file system to the block storage component; 3) The service application sends a file system access request (e.g. a read request or a write request), which is converted into a block request bio to reach the block storage component, and then the kernel KRBD converts the block request bio into a block request of the block storage component in the Ceph storage system, and sends the block request to the Ceph storage system through the network. And an automatic detection component can be deployed in the service server, a request record file related to a certain kernel is listed through a communication component (such as a debug component, which can be understood as a simple and easy-to-use virtual file system for kernel debugging) that the kernel is exposed to a user state, and the earliest access request in the request record file is counted, so that the earliest access request in each request record file can be obtained. And based on the earliest access request at each kernel detection time, the request waiting state of the access request sent by the kernel can be determined, if the request waiting state of the access request is a continuous abnormal waiting state, the condition that the access requests are stuck can be determined, and the kernel has the abnormal problem of stuck.
For easy understanding of the flow of the automatic detection component for automatically detecting kernel exception, please refer to fig. 5, fig. 5 is a schematic diagram of an architecture of an automatic detection service according to an embodiment of the present application. As shown in FIG. 5, the automatic detection service may include a request detection component that may be exposed to a user state communication component (e.g., a debug file system) via a Kernel (KRBD) to enumerate access request queues sent by the kernel to the distributed storage system, thereby forming a request log file for the kernel. Wherein it should be understood that the communication component (e.g., debug) may be configured to communicate with the kernel in the user mode, and in particular, the kernel may send access requests (e.g., data read requests, data write requests, etc.) to the distributed storage system via a request sending component (osdc), so that all access requests sent by the kernel may be obtained from the request sending component (osdc), and the kernel may write all osdc requests (access requests) to the osdc file associated with debug, and then by the access requests recorded in the osdc file and sent by the relevant kernel, the request detecting component may be configured to list all access request queues associated with the kernel, thereby obtaining a request record file. Through the request record file, the request detection component can count the earliest access request sent by each kernel, so that the earliest access request corresponding to each kernel can be obtained. Based on the above, each access request may include a request number, where the request number of the access request may be used to reflect the sending time of the access request, for example, the smaller the request number is, the earlier the sending time may be reflected, and when the request detecting component counts the earliest access request in a request record file, the request detecting component may determine an earliest number by directly comparing the request numbers of the access requests, where the access request indicated by the earliest number may be the earliest access request.
It should be noted that, for accessing the distributed storage system through the kernel, it may be understood that a block storage component in the distributed storage system is accessed through the kernel, and a kernel may correspond to a block storage component, so by performing request statistics on each request record file, an earliest number (i.e. an earliest access request is counted) may be counted for each installed block storage component. For example, as shown in fig. 5, for the block storage component 0, by making request statistics, the determined earliest number is the earliest number 0 (the access request indicated by this number 0 is the earliest access request at this time); for the block storage component 1, by making request statistics, the earliest number determined is the earliest number 1 (the access request indicated by this number 1 is the earliest access request at this time); for the block storage component 2, by making request statistics, the earliest number determined is the earliest number 2 (the access request indicated by this number 2 is the earliest access request at this point).
It should be understood that, for the cores, there may be different core detection times corresponding to each other, the request detection component may count the earliest access request of each core at each core detection time, and the persistent storage component (store component) may store the earliest access request of each core detection time counted by the request detection component in a persistent manner, so when reaching the next core detection time, the request detection component may determine whether the core has a stuck abnormal condition based on the earliest access request of the previous and the next core detection times. If an abnormality exists in a certain kernel, the request detection component can report the kernel with the abnormality to the monitoring service through the reporting component, and the monitoring service generates abnormality warning information aiming at the abnormal kernel and sends the abnormality warning information to the kernel maintenance object.
Monitoring service: after detecting the abnormal kernel, the automatic detection component in the service server can report to the statistics component of the monitoring service, and the statistics component can report to the alarm component, and the alarm component can generate the abnormal alarm information aiming at the abnormal kernel and send the abnormal alarm information to the kernel maintenance object, so that the kernel maintenance object can determine that the kernel has the abnormality, and certain maintenance processing may be needed.
Distributed storage system: may be used to request processing (e.g., processing of writing traffic data to a distributed storage system, reading related traffic data from a distributed storage system, etc.) for access requests sent by the cores.
It should be noted that, for the kernel with the exception, the present application may perform unloading processing on the kernel, and before performing unloading processing on the kernel, the exception warning information may be sent to the kernel maintenance object, and after the kernel maintenance object issues an unloading instruction for the exception kernel, the kernel is then subjected to unloading processing; of course, after detecting the abnormal kernel, the abnormal kernel can be automatically unloaded, so that manual participation is not needed, the whole process is automated, and the hysteresis of kernel unloading is reduced.
In summary, the automatic detection service in the present application may read the access request from osdc through debug every fixed time (for example, every 5s, every 3s, every 5min, etc.), so as to list the request record file about the kernel, and by performing piece-by-piece analysis statistics on the request record file, the earliest access request of each kernel at each time point may be obtained through statistics, and comparing the earliest access request at a certain time point with the earliest access request at the last detection time point of the persistent storage, so as to determine whether the distributed storage system processes the access requests normally, and if the earliest access requests at a plurality of continuous time points are all the same, it may indicate that the earliest access request is not processed all the time in a continuous time period, then it may be determined that the distributed storage has a fault, so that the kernel has a jam, and it is required to perform unloading processing on the kernel.
In the embodiment of the application, when the kernel uses the block storage service of the distributed storage system, a method for unloading the kernel under the condition of abnormality is provided, so that the application process indicated by the kernel can be pertinently and timely terminated, and the influence on other normally operated application processes can be reduced. Specifically, for a certain kernel, the application may periodically read a request record file related to the kernel (where an access request for accessing a block storage component in the distributed storage system through the kernel is recorded), for example, when reaching a target kernel detection time, the request record file of the kernel may be read, and then, a request detection process may be performed on an access request in the request record file, and a request waiting state of an access request contained in the request record file may be determined through the request detection process; when the request waiting state of the access requests is a continuous abnormal waiting state, the related access requests of the kernel can be determined, the distributed storage system does not process the related access requests in time, and the access requests of the kernel are continuously waiting, so that the condition that the kernel is abnormal can be determined, and the kernel can be unloaded in a targeted manner. It should be understood that by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server does not need to be restarted strongly, and the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process on the physical server is protected.
Further, referring to fig. 6, fig. 6 is a schematic flow chart of unloading processing for a kernel according to an embodiment of the present application. This flow may correspond to the flow of offloading processing to the kernel in the embodiment corresponding to fig. 3 described above. As shown in fig. 6, the flow may include at least the following steps S601 to S603:
step S601, obtaining a kernel version corresponding to the kernel.
Specifically, different kernels correspond to different kernel versions, with some kernels having a higher kernel version (e.g., kernel version 4.X version, 6.X version, etc.) and some kernels having a lower kernel version (e.g., 3.X version). The method and the device can adopt different unloading strategies to carry out unloading processing on the kernel based on different kernel versions.
Step S602, according to the version interval of the kernel version, determining the kernel unloading rule of the kernel.
Specifically, based on the foregoing embodiment, it can be known that, the kernel may send an access request to the distributed storage system through osdc, if the distributed storage system fails, the distributed storage system cannot normally process the access requests, the distributed storage system cannot return notification information of completion response to the kernel, and when the kernel does not receive the communication information of completion response to the access request returned by the distributed storage system, the access request waits for the distributed storage system to process until receiving the notification information of completion response returned by the distributed storage system. Based on the above, when the kernel is unloaded, the access request which is not processed in the osdc needs to be cleaned and released first, and after all the access requests which are not processed are released, the kernel can be unloaded. For ease of understanding, please refer to fig. 7, fig. 7 is a diagram of an overall logic architecture for offloading kernels according to an embodiment of the present application. As shown in fig. 7, the overall architecture may include at least a command writing component, a communication component, and a kernel. For ease of understanding, the functions implemented by the various components will be described below:
A command writing component: the command writing component may be a component for writing a computer command (such as an echo command) exposed to a user state by a communication component (debug), and the command writing component may specifically refer to a remove_single_major file, through which the user state may write a request termination command, where the request termination command may include a number of a kernel to be offloaded (a kernel number, since the offloading kernel is also equivalent to offloading a block storage component RBD corresponding to the kernel, the kernel number may be replaced by a component number of the RBD) and a termination instruction (such as an absolute_request instruction) for an access request.
A communication component: for communication between the user mode and the kernel mode.
And (3) a kernel: after the kernel receives the request termination command, the corresponding osdc component may be found by the kernel number (or the corresponding osdc component may be found by the component number of the block storage component). Each access request is terminated by traversing the access requests contained in the osdc (including deleting from the queue, marking the access request as processing complete, and setting a corresponding completion code for the access request). For example, as shown in fig. 7, the request queue included in the osdc contains an access request 701, and when the kernel is unloaded, the access request 701 needs to be terminated, the access request 701 is marked as a processing completion, and a corresponding completion code is set for the access request. After all access requests in the request queue are marked as being processed, the kernel may then be offloaded (or the kernel-mode block storage component may be offloaded) via a normal offload instruction (e.g., a rbd unmap instruction).
It will be appreciated that, in this application, the completion code of the access request may be set to a code for characterizing execution failure, for example, the completion code may be "ERROR IO", and the access request may be reflected to the upper file system by the completion code, and although marked as processing is completed, the access request is not successfully executed. However, when the kernel version of the kernel is low (e.g., version 3. X), there are cases where the processing metadata "IO" cannot be normally recognized by a part of the file system, which may cause the file system existing on the block storage component to be damaged, thereby making the entire file system unusable. In order to improve the problem, different kernel unloading rules are configured for different kernel versions, and when the kernel is unloaded, the kernel unloading rules of the kernel can be determined according to version intervals to which the kernel version belongs, and then the kernel is unloaded according to the corresponding kernel unloading rules.
Specifically, for the version interval to which the kernel version belongs, the specific implementation manner of determining the kernel unloading rule of the kernel may be: when the version interval to which the kernel version belongs is the first interval, determining a low version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel; when the version interval to which the kernel version belongs is the second interval, the high version uninstalling rule in the configuration uninstalling rule set can be determined as the kernel uninstalling rule of the kernel; wherein the first interval is lower than the second interval. It may be understood that the first interval in the present application may be a low version interval, and the second interval may be a high version interval, specifically, an interval lower than 3.5x version may be determined as the first interval, and an interval higher than 3.5x version may be determined as the second interval. The method and the device can be used for configuring a low-version unloading rule for a low-version interval, configuring a high-version unloading rule for a high-version interval, and determining the low-version unloading rule and the high-version unloading rule as configuration unloading rules so as to form a configuration unloading rule set, wherein if the version interval of a kernel version belongs to a first interval, the low-version unloading rule in the configuration unloading rule set can be determined as the kernel unloading rule of the kernel; if the version interval to which the kernel version belongs is the second interval, the high version uninstalling rule in the configuration uninstalling rule set can be determined as the kernel uninstalling rule of the kernel.
Step S603, unloading the kernel according to kernel unloading rules.
Specifically, the kernel offload rules herein include low-version offload rules and high-version offload rules. When the kernel uninstallation rule is a low version uninstallation rule, the request record file contains the access request S i For example, (i is a positive integer), the specific implementation manner of unloading the kernel according to the kernel unloading rule may be: according to the low version offload rule, the access request S may be first i Performing request setting processing, and processing the access request S after the request setting processing i Is determined as a request success state; when the request state of each access request in the request record file is determined to be the request success state, the access request contained in the request record file can be emptied; and then, when the request record file does not contain the access request, unloading the kernel through a kernel unloading instruction.
Wherein for the access request S according to the low version uninstalling rule i The specific implementation manner of the request setting processing may be: according to the low version offload rules, access request S can be obtained i Is a request identifier of (1); subsequently, a first request termination logic code corresponding to the low version uninstalling rule can be obtained, and the request S is accessed according to the first request termination logic code i Can generate a request identification for an access request S i A first request termination command of (a); according to the first request termination command, the access request S in the request record file can be received i Setting the request execution state of (1) to the execution completion state, and setting the access request S i Setting the completion code of (2) as a first completion code; wherein the first completion code is used for indicating the access request S i The normal execution is completed.
Based on the above, for the completion code of the access request, the file system of the upper layer is used to sense whether the access request is normally executed, and when there is an exception in the kernel, the completion code may be set to a code for characterizing the execution failure, for example, the completion code may be "ERROR IO", and the completion code may reflect that the access request is marked as being processed, but the access request is not successfully executed. However, when the kernel version of the kernel is low (e.g., version 3. X), there are cases where the processing metadata "IO" cannot be normally recognized by a part of the file system, which may cause the file system existing on the block storage component to be damaged, thereby making the entire file system unusable. Based on this, when the kernel version is in the first interval, a corresponding request termination logic code (such as an abort_requst instruction code) may be acquired according to a low version offload rule, then, based on a request identifier of an access request that needs to be terminated, a corresponding request termination command (a first request termination command) is generated and issued, based on the first request termination command, a request execution state of each access request may be set to an execution completion state (i.e. marked as processing completion) one by one, and a completion code may be set to a first completion code (the first completion code may be used to indicate that the access request has been normally executed, that is, the first completion code may be used to indicate that the execution has been completed, for example, the completion code may be "0"), by which the upper file system may reflect that the access request is marked as processing completion, and the access request has been successfully executed. It should be appreciated that while each access request is marked as processing completed and the completion code is set to the first completion code, the access request is not actually processed normally, and the data or metadata operated by the file system layer is not actually persisted to the remote distributed storage system, then the next time the data is read from the distributed storage system, the data still cannot be read normally or the read data is old. That is, the relevant service data indicated by the access requests are not permanently stored in the distributed storage system, but the loss of setting the completion code of the access request to the first completion code is lower than that of damaging the whole file system, so that the completion code of each access request is set to the first completion code in the kernel unloading process with a lower kernel version.
For ease of understanding, please refer to fig. 8, fig. 8 is a schematic diagram of an architecture for offloading a low-version kernel according to an embodiment of the present application. As shown in fig. 8, the architecture may include at least a command writing component, a communication component, and a kernel. For ease of understanding, the functions implemented by the various components will be described below:
a command writing component: the command writing component may be a component for writing a computer command (such as an echo command) exposed to a user state by a communication component (debug), and the command writing component may specifically refer to a remove_single_major file, through which the user state may write a request termination command, where the request termination command may include a number of a kernel to be offloaded (a kernel number, since the offloading kernel is also equivalent to offloading a block storage component RBD corresponding to the kernel, the kernel number may be replaced by a component number of the RBD) and a termination instruction (such as an absolute_request instruction) for an access request. The request termination command may refer to a first request termination command, and may be generated based on a first request termination logic code (writing may be manually preconfigured).
A communication component: communication may be between a user mode and a kernel mode.
And (3) a kernel: after the kernel receives the first request termination command, the corresponding osdc component may be found by the kernel number (or the corresponding osdc component may be found by the component number of the block storage component). Each access request is terminated by traversing the access requests contained in the osdc (including deleting from the queue, marking the access request as processing complete, and setting the completion code of the access request to the first completion code). For example, as shown in fig. 8, the request queue included in the osdc contains an access request 701, and when the kernel is unloaded, the access request 701 needs to be terminated, the access request 701 is marked as a process completion, and the completion code of the access request 701 is set to the first completion code. The kernel may return the first completion code to the file system layer through the block component layer (layer where the block storage component is located), and the file system layer may determine that the access request has been successfully executed through the first completion code.
When the kernel uninstallation rule is a high version uninstallation rule, the request record file contains the access request S i For example, (i is a positive integer), the specific implementation manner of unloading the kernel according to the kernel unloading rule may be: access request S may be subject to high version offload rules i The request setting process is performed, and then the access request S after the request setting process can be performed i Is determined to be a request failure state; when the request state of each access request in the request record file is determined to be the request failure state, the access request contained in the request record file can be subjected to clearing processing; and then, when the request record file is determined to not contain the access request, unloading the kernel through a kernel unloading instruction.
Wherein for the access request S according to the high version uninstalling rule i The specific implementation manner of the request setting processing may be: according to the high version offload rules, access request S can be obtained i Is a request identifier of (1); subsequently, a second request termination logic code corresponding to the high version uninstalling rule can be obtained, and the second request termination logic code and the access request S are used according to the second request termination logic code i Can generate a request identification for an access request S i A second request termination command of (2); according to the second request termination command, the access request S in the request record file can be received i The request execution state of (1) is set to the execution failure state, and the access request S is set to i Setting the completion code of (2) to a second completion code; wherein the second completion code is used for indicating the access request S i The execution is not completed normally.
Based on the above, for the completion code of the access request, the file system of the upper layer is used to sense whether the access request is normally executed, and when there is an exception in the kernel, the completion code may be set to a code for characterizing the execution failure, for example, the completion code may be "ERROR IO", and the completion code may reflect that the access request is marked as being processed, but the access request is not successfully executed. The execution failure code may be a second completion code of the present application, when the kernel version is in the second interval, a corresponding request termination logic code (such as an able_request_and_fake instruction code) may be obtained according to a high version unloading rule, and then, based on a request identifier of an access request that needs to be terminated, a corresponding request termination command (a second request termination command) is generated and issued, based on the second request termination command, a request execution state of each access request may be set to an execution failure state (i.e. marked as a processing completion) one by one, and a completion code may be set to a second completion code (the second completion code may be used to indicate that the access request is not normally executed but fails to be executed), that is, the second completion code may be used to characterize the execution failure code, by which the access request may be reflected to an upper file system that is marked as a processing completion, and the access request fails to be executed.
For ease of understanding, please refer to fig. 9, fig. 9 is a schematic diagram of an architecture for offloading a high-version kernel according to an embodiment of the present application. As shown in fig. 9, the architecture may include at least a command writing component, a communication component, and a kernel. For ease of understanding, the functions implemented by the various components will be described below:
a command writing component: the command writing component may be a component for writing a computer command (such as an echo command) exposed to a user state by a communication component (debug), and the command writing component may specifically refer to a remove_single_major file, through which the user state may write a request termination command, where the request termination command may include a number of a kernel to be offloaded (a kernel number, since the offloading kernel is also equivalent to offloading a block storage component RBD corresponding to the kernel, the kernel number may be replaced by a component number of the RBD) and a termination instruction (such as an absolute_request_and_fake instruction) for an access request. The request termination command may refer to a second request termination command, and may be generated based on a second request termination logic code (writing may be manually preconfigured, such as an absolute_request_and_make).
A communication component: communication may be between a user mode and a kernel mode.
And (3) a kernel: after the kernel receives the second request termination command, the corresponding osdc component may be found by the kernel number (or the corresponding osdc component may be found by the component number of the block storage component). Each access request is terminated by traversing the access requests contained in the osdc (including deleting from the queue, marking the access request as processing complete, and setting the completion code of the access request to a second completion code). For example, as shown in fig. 9, the request queue included in the osdc contains an access request 701, and when the kernel is unloaded, the access request 701 needs to be terminated, the access request 701 is marked as a process completion, and the completion code of the access request 701 is set to a second completion code. The kernel may return the second completion code to the file system layer through the block component layer (layer where the block storage component is located), and the file system layer may determine that the access request fails to execute through the second completion code. It will be appreciated that, after determining that the execution of the access request fails, the file system layer may freeze the subsequent access request to be issued to the kernel (i.e. freeze the subsequent request queue), so that a situation that more access requests get stuck at the kernel may be avoided.
In the embodiment of the application, by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server is not required to be restarted strongly, and the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process on the physical server is protected. In addition, the method configures different kernel unloading rules for kernels of different versions, so that file system loss caused by kernel unloading can be reduced based on the level of the kernel version.
It will be appreciated that, based on the foregoing, a kernel may be disposed in the service server, and the service server may access the distributed storage system through the kernel, store relevant service data of the service application to the distributed storage system, or read relevant service data from the distributed storage system. And a plurality of different kernels can be deployed in the service server, each kernel can be detected periodically, so that the abnormal kernel can be detected, and the abnormal kernel can be unloaded. The blocking of the kernel is generally caused by the failure of the distributed storage system, and the access request sent by the kernel cannot be processed in time due to the failure of the distributed storage system, so that the access request is continuously waiting for processing, and the kernel is blocked. In case of failure of the distributed storage system, most of cores deployed in the service server are stuck, based on the problem that whether the distributed storage system needs to be subjected to system maintenance processing is determined by counting the number of cores with abnormality, if the number of cores with abnormality in the service server is small, the distributed storage system can be considered to have small influence even if the distributed storage system has failure, at the moment, the cores only need to be subjected to targeted unloading processing, the distributed storage system does not need to be subjected to maintenance processing, or if the number of cores with abnormality in the service server is small, the cores possibly need to be subjected to self-maintenance processing; if the number of abnormal kernels in the service server is large, the distributed storage system can be considered to have faults, and the influence of the faults is large, and at the moment, besides the abnormal kernels are required to be subjected to targeted unloading processing, the distributed storage system is required to be subjected to maintenance processing, so that the distributed storage system can normally operate, and access requests sent by the kernels can be timely processed.
That is, each kernel in the present application is deployed in a service server (may also be referred to as an application server) corresponding to a service application, and after determining that an abnormality exists in the kernel, the present application may determine whether to perform a system maintenance process on the distributed storage system based on the number of kernels having the abnormality in the application server. For ease of understanding, please refer to fig. 10, fig. 10 is a schematic flow chart of a maintenance process for a distributed storage system based on a kernel with an exception according to an embodiment of the present application. This flow may correspond to the flow after determining that there is an exception in the kernel in the embodiment corresponding to fig. 3 described above. As shown in fig. 10, the flow may include at least the following steps S201 to S204:
step S201, each kernel deployed by the application server is determined to be a deployment kernel, and a deployment kernel set is obtained.
Specifically, each core deployed in the application server may be named as a deployment core, and each deployment core may form a set, which may be named as a deployment core set.
Step S202, determining a deployed kernel with abnormality at the target kernel detection moment in the deployed kernel set as an abnormal kernel.
Specifically, in the deployment kernel set, a deployment kernel having an exception when the target kernel detects may be determined as an exception kernel.
Step S203, determining the system operation attribute of the distributed storage system based on the first number of abnormal cores contained in the deployment core set.
Specifically, for a specific implementation of determining a system operation attribute of a distributed storage system based on a first number of abnormal kernels included in a deployment kernel set may be: a first number of exception cores contained in the deployment core set may be counted, and a second number of deployment cores contained in the deployment core set may be counted; subsequently, a number ratio between the first number and the second number may be determined; if the number ratio is greater than the ratio threshold, determining that the system operation attribute of the distributed storage system is an abnormal operation attribute; if the number ratio is smaller than the ratio threshold, the system operation attribute of the distributed storage system can be determined to be a normal operation attribute.
It can be understood that if the duty ratio of the abnormal kernel exceeds a certain threshold (e.g. 2/3) in the deployment kernel set, it can be determined that the distributed storage system has a fault, and the system operation attribute of the distributed storage system is an abnormal operation attribute; if the duty ratio of the abnormal kernel in the deployment kernel set does not exceed the preset threshold, determining that the distributed storage system has no fault, wherein the system operation attribute of the distributed storage system is a normal operation attribute.
Step S204, when the system operation attribute of the distributed storage system is an abnormal operation attribute, the fault warning information aiming at the distributed storage system is pushed to a system maintenance object of the distributed storage system, so that the system maintenance object carries out system maintenance processing on the distributed storage system based on the fault warning information.
Specifically, when the system operation attribute of the distributed storage system is an abnormal operation attribute, fault warning information aiming at the distributed storage system can be generated and pushed to a system maintenance object of the distributed storage system, and the system maintenance object can conduct problem investigation on the distributed storage system based on the fault warning information and conduct system maintenance processing.
Alternatively, it may be understood that, for a distributed storage system, the distributed storage system may be divided into a plurality of sub-portions to provide services for different service applications, and when the number of kernels corresponding to a certain service application is greater, the corresponding sub-portion in the distributed storage system may be considered to be faulty, and fault warning information for the sub-portion (which may be referred to as a partial subsystem) may be generated and pushed to a system maintenance object, so that the system maintenance object may perform a small-scale investigation on the partial subsystem without investigating the entire distributed storage system, thereby improving investigation efficiency and improving efficiency and timeliness of system maintenance.
In the embodiment of the application, by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server is not required to be restarted strongly, and the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process on the physical server is protected. In addition, the method and the device can judge whether the distributed storage system has faults or not based on the number of detected cores with the abnormality, so that the maintenance timeliness of the distributed storage system can be improved, and the running stability of the distributed storage system is further improved.
Further, referring to fig. 11, fig. 11 is a schematic flow diagram of a system for kernel offloading according to an embodiment of the present application. As shown in fig. 11, the system flow may at least include the following steps S21 to S30:
step S21, the record file is read.
Specifically, when the detection time of the target kernel arrives, a request record file can be obtained through debug list, each access request about the kernel can be obtained through reading the request record file, and each access request can be traversed one by one to obtain the sending time, so that the earliest access request is determined based on the sending time of each access request.
Step S22, determining whether it is the last access request of the file.
Specifically, since the request record file needs to be read to obtain the earliest access request, all access requests need to be traversed, and then for each access request traversed, it needs to be determined whether it is the last access request. If the last access request in the traversal, a subsequent step S24 may be performed; if not the last access request, a subsequent step S23 may be performed.
Step S23, analyzing the sending time and the corresponding block storage component.
Specifically, the sending time of the access request may be analyzed, and which block storage component the access request expects to access may be obtained.
Step S24, determining whether the transmission time is smaller than the transmission time of the current earliest access request.
Specifically, for each access request, it may be compared with the transmission time of the current earliest access request, and if it is smaller than the transmission time of the current earliest access request, the subsequent step S25 may be executed; if so, a subsequent step S26 may be performed.
Step S25, update the earliest access request.
Specifically, if the transmission time of the access request obtained by the current traversal is smaller than the transmission time of the current earliest access request, the access request may be updated to the current earliest access request.
In step S26, it is counted whether the earliest access request is identical to the last stored content.
Specifically, the steps can be sequentially compared until the last access request is traversed, so that the earliest access request in the request record file can be determined, and after the earliest access request in the request record file is determined, the earliest access request can be compared with the earliest access request of the last time (namely, the last kernel detection moment) to determine whether the two access requests are identical, if so, the subsequent step S27 can be executed; if not, a subsequent step S28 may be performed.
Step S27, accumulating the request blocking time length.
Specifically, when the two earliest access requests are the same, the request blocking time length of the kernel may be accumulated, for example, the historical request blocking time length at the time of detecting the historical kernel may be added to the incremental blocking time length at the time of detecting the target kernel, so as to obtain the target request blocking time length at the time of detecting the target kernel.
Step S28, determining whether the accumulated request blocking period is greater than a threshold.
Specifically, it may be determined whether the accumulated request blocking duration (the target request blocking duration at the target kernel detection time) is greater than the duration threshold, and if so, the subsequent step S29 may be executed; if smaller, the subsequent step S30 may be performed.
S29, determining that the kernel is abnormal.
Specifically, if the accumulated request blocking time length is greater than the time length threshold, it may be determined that the kernel has an exception, and the kernel may be unloaded.
Step S30, determining that no exception exists in the kernel.
Specifically, if the accumulated request blocking duration is less than the duration threshold, it may be determined that no exception occurs in the kernel.
For the specific implementation manner of step S21 to step S30, reference may be made to the descriptions of the foregoing embodiments, and details will not be repeated here, and for the beneficial effects brought by the foregoing descriptions will not be repeated here.
Further, referring to fig. 12, fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus may be a computer program (including program code) running in a computer device, for example the data processing apparatus is an application software; the data processing device may be used to perform the method shown in fig. 3. As shown in fig. 12, the data processing apparatus 1 may include: a file reading module 11, a request detecting module 12, an anomaly determining module 13, and a kernel unloading module 14.
A file reading module 11, configured to read a request record file about a kernel when reaching a target kernel detection time; the request record file records an access request for accessing a block storage component in the distributed storage system through a kernel;
A request detection module 12, configured to perform request detection processing on the access request in the request record file, and determine a request waiting state of the access request in the request record file;
the exception determining module 13 is configured to determine that an exception exists in the kernel when a request waiting state of an access request in the request record file is a persistent exception waiting state;
the kernel unloading module 14 is configured to unload the kernel.
The specific implementation manners of the file reading module 11, the request detecting module 12, the anomaly determining module 13, and the kernel unloading module 14 may be referred to the descriptions of step S101 to step S103 in the embodiment corresponding to fig. 3, and will not be described herein.
In one embodiment, the request detection module 12 performs a request detection process on the access request in the request record file, and determines a specific implementation manner of a request waiting state of the access request in the request record file, including:
obtaining an access request with a minimum sending time stamp in a request record file, and determining the access request with the minimum sending time stamp in the request record file as a target earliest access request corresponding to a target kernel detection moment;
determining the last kernel detection time of the target kernel detection time as the historical kernel detection time of the target kernel detection time;
Acquiring a historical earliest access request corresponding to the historical kernel detection time, and determining a request waiting state of the access request in the request record file according to the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time.
In one embodiment, the request detection module 12 determines a specific implementation manner of a request waiting state of an access request in a request record file according to a target earliest access request corresponding to a target kernel detection time and a historical earliest access request corresponding to a historical kernel detection time, where the specific implementation manner includes:
comparing the target earliest access request with the historical earliest access request;
if the target earliest access request is the same as the historical earliest access request, determining a request waiting state of the access request in the request record file based on a time period between the target kernel detection time and the historical kernel detection time;
if the target earliest access request is different from the historical earliest access request, determining that the request waiting state of the access request in the request record file is a reasonable waiting state.
In one embodiment, the request detection module 12 determines a specific implementation of the request wait state of the access request in the request record file based on a time period between the target kernel detection time and the history kernel detection time, including:
Determining a time period between the target kernel detection time and the history kernel detection time;
determining the time period between the target kernel detection time and the history kernel detection time as the incremental blocking time length of the kernel at the target kernel detection time;
acquiring a historical request blocking time length of the kernel at a historical kernel detection time, and adding the incremental blocking time length and the historical request blocking time length to obtain a target request blocking time length of the kernel at a target kernel detection time;
and when the blocking time of the target request is longer than the time threshold, determining that the request waiting state of the access request in the request record file is a continuous abnormal waiting state.
In one embodiment, the specific implementation of the kernel offloading module 14 to offload a kernel includes:
obtaining a kernel version corresponding to a kernel;
according to the version interval of the kernel version, determining the kernel unloading rule of the kernel;
and unloading the kernel according to kernel unloading rules.
In one embodiment, the kernel unloading module 14 determines a specific implementation manner of the kernel unloading rule of the kernel according to the version interval to which the kernel version belongs, including:
when the version interval of the kernel version is the first interval, determining a low version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel;
When the version interval of the kernel version is a second interval, determining a high version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel; the first interval is lower than the second interval.
In one embodiment, the kernel offload rules are low-version offload rules; the request record file contains access request S i I is a positive integer;
the specific implementation manner of the kernel unloading module 14 for unloading the kernel according to the kernel unloading rule includes:
access request S according to low version offload rules i Performing request setting processing, and setting the access request S after the request setting processing i Is determined as a request success state;
when the request state of each access request is determined to be a request success state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file is determined to not contain the access request, unloading the kernel through a kernel unloading instruction.
In one embodiment, the kernel offload module 14 offload access requests S in accordance with low-version offload rules i The specific implementation manner of the request setting processing comprises the following steps:
according to the low version uninstalling rule, obtaining the access request S i Is a request identifier of (1);
acquiring a first request termination logic code corresponding to the low version uninstalling rule, and according to the first request termination logic code and the access request S i Generates a request identifier for an access request S i A first request termination command of (a);
according to the first request termination command, the access request S in the request record file i Setting the request execution state of (1) to the execution completion state, and accessing the request S i Setting the completion code of (2) as a first completion code; the first completion code is used for indicating the access request S i The normal execution is completed.
In one embodiment, the kernel offload rules are high-version offload rules; the request record file contains access request S i I is a positive integer;
the specific implementation manner of the kernel unloading module 14 for unloading the kernel according to the kernel unloading rule includes:
access request S according to high version offload rules i Performing request setting processing, and setting the access request S after the request setting processing i Is determined to be a request failure state;
when the request state of each access request is a request failure state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file is determined to not contain the access request, unloading the kernel through a kernel unloading instruction.
In one embodiment, the kernel offload module 14 offload access requests S in accordance with high-version offload rules i The specific implementation manner of the request setting processing comprises the following steps:
according to the high version uninstalling rule, obtaining the access request S i Is a request identifier of (1);
acquiring a second request termination logic code corresponding to the high version uninstalling rule, and according to the second request termination logic code and the access request S i Generates a request identifier for an access request S i A second request termination command of (2);
according to the second request termination command, the access request S in the request record file i Setting the request execution state of (1) to the execution failure state, and accessing the request S i Setting the completion code of (2) to a second completion code; the second completion code is used for indicating the access request S i The execution is not completed normally.
In one embodiment, the kernel is deployed in an application server;
after the abnormality determination module 13 determines that there is an abnormality in the kernel, the data processing apparatus 1 further includes: the system comprises a set determining module 15, an abnormal kernel determining module 16, an operation attribute determining module 17 and an information pushing module 18.
The set determining module 15 is configured to determine each kernel deployed by the application server as a deployment kernel, to obtain a deployment kernel set;
The abnormal kernel determining module 16 is configured to determine, as an abnormal kernel, a deployed kernel that has an abnormality at a target kernel detection time in the deployed kernel set;
an operation attribute determining module 17, configured to determine a system operation attribute of the distributed storage system based on a first number of abnormal kernels included in the deployment kernel set;
the information pushing module 18 is configured to push, when the system operation attribute of the distributed storage system is an abnormal operation attribute, fault warning information for the distributed storage system to a system maintenance object of the distributed storage system, so that the system maintenance object performs system maintenance processing on the distributed storage system based on the fault warning information.
The specific implementation manners of the set determining module 15, the abnormal kernel determining module 16, the operation attribute determining module 17, and the information pushing module 18 may be referred to the description of step S201 to step S204 in the embodiment corresponding to fig. 10, and will not be described herein.
In one embodiment, the operation attribute determining module 17 determines a specific implementation of a system operation attribute of the distributed storage system based on a first number of exception cores contained in the deployment core set, including:
Counting a first number of abnormal kernels contained in the deployment kernel set and a second number of deployment kernels contained in the deployment kernel set;
determining a number ratio between the first number and the second number;
if the number ratio is greater than the ratio threshold, determining that the system operation attribute of the distributed storage system is an abnormal operation attribute;
if the number ratio is smaller than the ratio threshold, determining that the system operation attribute of the distributed storage system is a normal operation attribute.
In the embodiment of the application, by means of periodically and automatically detecting whether the kernel has an abnormality, the abnormal kernel can be determined in a targeted and timely manner, so that the abnormal kernel can be unloaded in a targeted and timely manner, the physical server is not required to be restarted strongly, and the related application process indicated by the abnormal kernel can be terminated in time while the normal running application process on the physical server is protected. In addition, the method configures different kernel unloading rules for kernels of different versions, so that file system loss caused by kernel unloading can be reduced based on the level of the kernel version.
Further, referring to fig. 13, fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 13, the above-described computer device 8000 may include: processor 8001, network interface 8004, and memory 8005, and further, the above-described computer device 8000 further includes: a user interface 8003, and at least one communication bus 8002. Wherein a communication bus 8002 is used to enable connected communications between these components. The user interface 8003 may include a Display screen (Display), a Keyboard (Keyboard), and the optional user interface 8003 may also include standard wired, wireless interfaces, among others. Network interface 8004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). Memory 8005 may be a high speed RAM memory or a non-volatile memory, such as at least one disk memory. Memory 8005 may optionally also be at least one memory device located remotely from the aforementioned processor 8001. As shown in fig. 13, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 8005, which is one type of computer-readable storage medium.
In the computer device 8000 shown in fig. 13, the network interface 8004 may provide a network communication function; while user interface 8003 is primarily an interface for providing input to the user; and the processor 8001 may be used to invoke a device control application stored in the memory 8005 to implement:
when reaching the detection moment of the target kernel, reading a request record file about the kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through a kernel;
performing request detection processing on the access request in the request record file, and determining a request waiting state of the access request in the request record file;
and when the request waiting state of the access request in the request record file is a continuous abnormal waiting state, determining that the kernel is abnormal, and unloading the kernel.
It should be understood that the computer device 8000 described in the embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to fig. 3 to 10, and may also perform the description of the data processing apparatus 1 in the embodiment corresponding to fig. 12, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the embodiments of the present application further provide a computer readable storage medium, where a computer program executed by the computer device 8000 for data processing mentioned above is stored, and the computer program includes program instructions, when the processor executes the program instructions, the description of the data processing method in the embodiments corresponding to fig. 3 to 10 can be executed, and therefore, will not be repeated herein. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.
The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
In one aspect of the present application, a computer program product is provided that includes a computer program stored in a computer readable storage medium. A processor of a computer device reads the computer program from a computer-readable storage medium, and the processor executes the computer program to cause the computer device to perform the method provided in an aspect of the embodiments of the present application.
The terms first, second and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The methods and related devices provided in the embodiments of the present application are described with reference to the method flowcharts and/or structure diagrams provided in the embodiments of the present application, and each flowchart and/or block of the method flowcharts and/or structure diagrams may be implemented by computer program instructions, and combinations of flowcharts and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.
The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims (15)

1. A method of data processing, comprising:
when reaching the detection moment of the target kernel, reading a request record file about the kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through the kernel;
performing request detection processing on the access request in the request record file, and determining a request waiting state of the access request in the request record file;
when the request waiting state of the access request in the request record file is a continuous abnormal waiting state, determining that the kernel is abnormal, and unloading the kernel; the kernel unloading is performed according to kernel unloading rules, wherein the kernel unloading rules comprise low-version unloading rules and high-version unloading rules; when the kernel uninstallation rule is a low version uninstallation rule, the uninstallation of the kernel is executed after setting a request state of an access request in the request record file to be a request success state according to the low version uninstallation rule and performing request clearing processing; when the kernel uninstallation rule is a high version uninstallation rule, the uninstallation of the kernel is performed after the request state of the access request in the request record file is set to be a request failure state according to the high version uninstallation rule and request clearing processing is performed.
2. The method of claim 1, wherein the performing a request detection process on the access request in the request record file, and determining a request waiting state of the access request in the request record file, comprises:
acquiring an access request with a minimum sending time stamp in the request record file, and determining the access request with the minimum sending time stamp in the request record file as a target earliest access request corresponding to the target kernel detection moment;
determining the last kernel detection time of the target kernel detection time as the historical kernel detection time of the target kernel detection time;
and acquiring the historical earliest access request corresponding to the historical kernel detection time, and determining a request waiting state of the access request in the request record file according to the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time.
3. The method according to claim 2, wherein the determining the request waiting state of the access request in the request record file according to the target earliest access request corresponding to the target kernel detection time and the historical earliest access request corresponding to the historical kernel detection time includes:
Comparing the target earliest access request with the historical earliest access request;
if the target earliest access request is the same as the historical earliest access request, determining a request waiting state of the access request in the request record file based on a time period between the target kernel detection time and the historical kernel detection time;
and if the target earliest access request is determined to be different from the historical earliest access request, determining that the request waiting state of the access request in the request record file is a reasonable waiting state.
4. The method of claim 3, wherein the determining a request wait state for an access request in the request log file based on a time period between the target kernel detection time and the history kernel detection time comprises:
determining a time period between the target kernel detection time and the history kernel detection time;
determining the time period between the target kernel detection time and the history kernel detection time as the incremental blocking time length of the kernel at the target kernel detection time;
acquiring a historical request blocking time length of the kernel at the historical kernel detection time, and adding the incremental blocking time length and the historical request blocking time length to obtain a target request blocking time length of the kernel at the target kernel detection time;
And when the blocking time length of the target request is greater than a time length threshold value, determining that the request waiting state of the access request in the request record file is a continuous abnormal waiting state.
5. The method of claim 1, wherein the offloading the kernel comprises:
obtaining a kernel version corresponding to the kernel;
determining a kernel unloading rule of the kernel according to a version interval of the kernel version;
and unloading the kernel according to the kernel unloading rule.
6. The method of claim 5, wherein determining the kernel offload rule for the kernel according to the version interval to which the kernel version belongs comprises:
when the version interval of the kernel version is a first interval, determining a low version uninstalling rule in a configuration uninstalling rule set as a kernel uninstalling rule of the kernel;
when the version interval of the kernel version is a second interval, determining a high version uninstalling rule in the configuration uninstalling rule set as a kernel uninstalling rule of the kernel; the first interval is lower than the second interval.
7. The method of claim 5, wherein the kernel offload rules are low-version offload rules; the request record file contains an access request S i I is a positive integer;
the unloading processing of the kernel according to the kernel unloading rule comprises the following steps:
according to the low version uninstalling rule, the access request S i Performing request setting processing, and setting the access request S after the request setting processing i Is determined as a request success state;
when the request state of each access request is determined to be a request success state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file does not contain the access request, unloading the kernel through a kernel unloading instruction.
8. The method of claim 7, wherein said accessing request S is subject to said low version offload rules i Performing request setting processing including
According to the low version unloading rule, acquiring the access request S i Is a request identifier of (1);
acquiring a first request termination corresponding to the low version uninstalling ruleLogic code for terminating the logic code and the access request S according to the first request i Generates a request identification for the access request S i A first request termination command of (a);
According to the first request termination command, the access request S in the request record file is processed i Setting the request execution state of the access request S to an execution completion state i Setting the completion code of (2) as a first completion code; the first completion code is used for indicating the access request S i The normal execution is completed.
9. The method of claim 5, wherein the kernel offload rules are high-version offload rules; the request record file contains an access request S i I is a positive integer;
the unloading processing of the kernel according to the kernel unloading rule comprises the following steps:
according to the high version uninstalling rule, the access request S i Performing request setting processing, and setting the access request S after the request setting processing i Is determined to be a request failure state;
when the request state of each access request is a request failure state in the request record file, carrying out clearing treatment on the access request contained in the request record file;
and when the request record file does not contain the access request, unloading the kernel through a kernel unloading instruction.
10. The method according to claim 9, wherein said accessing request S is subject to said high version offloading rules i Performing request setting processing, including:
according to the high version unloading rule, acquiring the access request S i Is a request identifier of (1);
acquiring a second request termination logic code corresponding to the high-version uninstalling rule, and according to the second request termination logic code and the visitQuestion request S i Generates a request identification for the access request S i A second request termination command of (2);
according to the second request termination command, the access request S in the request record file is processed i Setting the request execution state of the access request S to be an execution failure state i Setting the completion code of (2) to a second completion code; the second completion code is used for indicating the access request S i The execution is not completed normally.
11. The method of claim 1, wherein the kernel is deployed in an application server;
after determining that the kernel has an exception, the method further comprises:
each kernel deployed by the application server is determined to be a deployment kernel, and a deployment kernel set is obtained;
determining a deployment kernel with abnormality in the deployment kernel set at the detection moment of the target kernel as an abnormal kernel;
determining a system operation attribute of the distributed storage system based on a first number of abnormal kernels contained in the deployment kernel set;
When the system operation attribute of the distributed storage system is an abnormal operation attribute, the fault warning information aiming at the distributed storage system is pushed to a system maintenance object of the distributed storage system, so that the system maintenance object carries out system maintenance processing on the distributed storage system based on the fault warning information.
12. The method of claim 11, wherein the determining the system operational attribute of the distributed storage system based on the first number of exception cores contained in the set of deployment cores comprises:
counting a first number of abnormal kernels contained in the deployment kernel set and a second number of deployment kernels contained in the deployment kernel set;
determining a number ratio between the first number and the second number;
if the number ratio is greater than the ratio threshold, determining that the system operation attribute of the distributed storage system is an abnormal operation attribute;
and if the number ratio is smaller than the ratio threshold, determining that the system operation attribute of the distributed storage system is a normal operation attribute.
13. A data processing apparatus, comprising:
The file reading module is used for reading a request record file about the kernel when reaching the detection moment of the target kernel; the request record file records an access request for accessing a block storage component in the distributed storage system through the kernel;
the request detection module is used for carrying out request detection processing on the access request in the request record file and determining a request waiting state of the access request in the request record file;
the exception determining module is used for determining that the kernel is abnormal when the request waiting state of the access request in the request record file is a continuous exception waiting state;
the kernel unloading module is used for unloading the kernel; the kernel unloading is performed according to kernel unloading rules, wherein the kernel unloading rules comprise low-version unloading rules and high-version unloading rules; when the kernel uninstallation rule is a low version uninstallation rule, the uninstallation of the kernel is executed after setting a request state of an access request in the request record file to be a request success state according to the low version uninstallation rule and performing request clearing processing; when the kernel uninstallation rule is a high version uninstallation rule, the uninstallation of the kernel is performed after the request state of the access request in the request record file is set to be a request failure state according to the high version uninstallation rule and request clearing processing is performed.
14. A computer device, comprising: a processor, a memory, and a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is configured to provide a network communication function, the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-12.
15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-12.
CN202310980930.7A 2023-08-07 2023-08-07 Data processing method, device, equipment and readable storage medium Active CN116719663B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310980930.7A CN116719663B (en) 2023-08-07 2023-08-07 Data processing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310980930.7A CN116719663B (en) 2023-08-07 2023-08-07 Data processing method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116719663A CN116719663A (en) 2023-09-08
CN116719663B true CN116719663B (en) 2024-01-30

Family

ID=87871883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310980930.7A Active CN116719663B (en) 2023-08-07 2023-08-07 Data processing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116719663B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109791684A (en) * 2016-10-31 2019-05-21 英特尔公司 Kernel execution is transferred to graphics device
CN111124731A (en) * 2019-12-20 2020-05-08 浪潮电子信息产业股份有限公司 File system abnormity monitoring method, device, equipment and medium
CN111913667A (en) * 2020-08-06 2020-11-10 平安科技(深圳)有限公司 OSD blocking detection method, system, terminal and storage medium based on Ceph
CN115964192A (en) * 2022-12-13 2023-04-14 斑马网络技术有限公司 Request processing method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7617292B2 (en) * 2001-06-05 2009-11-10 Silicon Graphics International Multi-class heterogeneous clients in a clustered filesystem
WO2020132308A2 (en) * 2018-12-19 2020-06-25 Apple Inc. Configuration management, performance management, and fault management to support edge computing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109791684A (en) * 2016-10-31 2019-05-21 英特尔公司 Kernel execution is transferred to graphics device
CN111124731A (en) * 2019-12-20 2020-05-08 浪潮电子信息产业股份有限公司 File system abnormity monitoring method, device, equipment and medium
CN111913667A (en) * 2020-08-06 2020-11-10 平安科技(深圳)有限公司 OSD blocking detection method, system, terminal and storage medium based on Ceph
CN115964192A (en) * 2022-12-13 2023-04-14 斑马网络技术有限公司 Request processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116719663A (en) 2023-09-08

Similar Documents

Publication Publication Date Title
US10771306B2 (en) Log monitoring system
CN110661659A (en) Alarm method, device and system and electronic equipment
US8516499B2 (en) Assistance in performing action responsive to detected event
WO2023115999A1 (en) Device state monitoring method, apparatus, and device, and computer-readable storage medium
US20220050765A1 (en) Method for processing logs in a computer system for events identified as abnormal and revealing solutions, electronic device, and cloud server
US10936386B2 (en) Method, device and computer program product for monitoring access request
CN110851471A (en) Distributed log data processing method, device and system
CN111740868A (en) Alarm data processing method and device and storage medium
CN112671602B (en) Data processing method, device, system, equipment and storage medium of edge node
CN116719663B (en) Data processing method, device, equipment and readable storage medium
CN116719657A (en) Firmware fault log generation method, device, server and readable medium
CN116069591A (en) Interface performance monitoring method, device, equipment and storage medium
US20230066698A1 (en) Compute instance warmup operations
CN113656378A (en) Server management method, device and medium
CN114416560A (en) Program crash analysis aggregation method and system
CN114138615A (en) Service alarm processing method, device, equipment and storage medium
US10735246B2 (en) Monitoring an object to prevent an occurrence of an issue
CN113421109A (en) Service checking method, device, electronic equipment and storage medium
CN113656247A (en) Service monitoring method and device, electronic equipment and readable storage medium
CN111542048A (en) Method and device for restarting acquisition function of code detection equipment, server and storage medium
CN111026612A (en) Application program operation monitoring method and device, storage medium and electronic equipment
CN113778800B (en) Error information processing method, device, system, equipment and storage medium
CN111381994B (en) Mirror image bad layer repairing method, device, equipment and medium
CN108111611B (en) Client detection method and device and electronic equipment
CN114418488A (en) Inventory information processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40092632

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant